A two-stage neural network approach for ARMA model identification with ESACF

A two-stage neural network approach for ARMA model identification with ESACF

Decision Support Systems 11 (1994) 461-479 North-Holland 46l A two-stage neural network approach for ARMA model identification with ESACF Jae Kyu Le...

1MB Sizes 27 Downloads 116 Views

Decision Support Systems 11 (1994) 461-479 North-Holland

46l

A two-stage neural network approach for ARMA model identification with ESACF Jae Kyu Lee

1. Introduction

Korea Adcanced Institute of Science and Technology, Seoul, Korea

Among various time series analysis techniques, ARMA modeling is one of the most widely used methods for managerial forecasting. A typical ARMA model building procedure consists of three steps: identification, parameter estimation, and diagnostic checking as depicted in Figure l(a) [2]. Among these three steps, the identification step, which determines the order p of the AR process and the order q of the MA process to construct a model ARMA(p, q), is very important in building a good model. However, this step requires expert judgment in interpreting statistical information such as the Autocorrelation Function (ACF) and the Partial Autocorrelation Function (PACF) [2], the ESACF [24], and so forth [1,7,12,22,29]. This is why the ARMA modeling procedure cannot be automated fully, and thus cannot be used widely in practical situations despite its theoretical comprehensiveness [17,27].

Won Chu! Jhee Hong Ik Unit,,ersity, Seoul, Korea We attempt to design artificial neural networks that can help in the automatic identification of the Autoregressive Moving Average (ARMA) model. For this purpose, we adopt the Extended Sample Autocorrelation Function (ESACF) as a feature extractor, and the Multi-Layered Pereeptron as a Pattern Classification Network. Since the performance test from the network is sensitive to the noise in input ESACF patterns, we suggest a preprocessing Noise Filtering Network. It turns out that the Noise Filtering Network significantly improves the performance. To reduce the computational burden of training the full Pattern Classification Network, we suggest a Reduced Network that can still perform as good as the full network. The two-stage filtering and classifying networks performed very well (90% of accuracy) not only with the artificially generated data sets but also with the real world time series. We have also reconfirmed that the performance of ESACF is superior to that of ACF and PACF.

Keywords: Artificial Neural Network (ANN); Time series modeling; ARMA model identification; Extended sample autocorrelation function (ESACF); Pattern classification; Noise filtering; Backpropagation algorithm.

Jae Kyu Lee is an associate professor in the Department of Management Information Systems in Korea Advanced Institute of Science and Technology at Seoul. He has received Ph.D. from the Wharton School, University of Pennsylvania. He has written several books on expert systems and numerous papers in Expert Systems with Applications: An International Journal, Expert Systems, Decision Support Systems, Decision Sciences, Fuzzy Sets and Systems, International Journal of Man-Machine Studies, etc. Currently, he serves as a member of editors for Expert Systems with Applications: An International Journal and International Journal of Intelligent Systems in Accounting, Finance and Management.

Correspondence to: Jae Kyu Lee, Department of Management Information Systems, Korea Advanced Institute of Science and Technology, P.O. Box 210, Seoul, Korea. Phone: 82-2962-3723, 82-2-958-3612. Fax: 82-2-958-3604 E-mail: jklee @msd.kaist.ac.kr 0167-9236/94/$07.00 © 1994 - Elsevier Science B.V. All rights reserved SSDI 0 1 6 7 - 9 2 3 6 ( 9 2 ) 0 0 0 8 0 - C

Won Chul Jhee is an assistant professor in the department of Industrial Engineering at the Hong Ik University. He received B.B.A from Seoul National University, M.S. in I.E., Ph.D. in management information system from Korean Advanced Institute of Science and Technology. His research interests are business applications of neural networks, intelligent decision support systems, applied A1 and knowledge-based simulation.

462

J.K. Lee / A two-stageneural network approach

The purpose of our research is to automate the human judgment step of ARMA model identification by adopting the Artificial Neural Networks (ANN) approach based on selected statistical features as depicted in Figure l(b) [3,14,15]. Thus, the key research issues are: (1) (2) (3) (4)

is the forecasting performance in comparison with the two approaches above? [10]. We have dedicated one paper to each project, and this paper is the result of the second project. The remainder of this paper is organized as follows. In section 2, the ESACF approach to ARMA model identification is briefly reviewed. In section 3, an ANN - specifically a Multi-

Which statistical feature is most effective? How should the ANN be designed? How does the ANN approach perform? Can the ANN replace not only the identification step, but also the parameter estimation step?

Layered

Perceptron

To proceed with the research, we have decomposed the research into three projects. (1) When we use ACF and PACF as statistical features, how does the ANN approach for ARMA model identification perform? [9]. (2) When we use ESACF as a statistical feature, how does the ANN approach perform in identification? What is the performance of this approach in comparison with that of the first approach? (3) When a set of time series data is fed directly into the ANN as depicted in Figure l(c), what

a) Traditional ARMA Model Building Procedure

I HumanExpert's Judgments ' ~ ]

I

I 'limeSeries~ - ~

i V ARMAModel ~

i V Diagnostic

Parameter~

-t O,.o,,,r,0,

b) Automate the Identification Step by Adopting the ANN

I TimeSeries

Feature

..ro

r.oo, I

~. Classifying

--

I

Parameter

ForecastingI

e) Modeling the Whole Procedure by an ANN

TimeSeries

with

two

hidden

layers

(MLP2H) - is designed for ESACF pattern classification; data sets for experiments are generated; the network is trained by backpropagation algorithm; and the performance is evaluated. In section 4, we introduce a preprocessing ANN to reduce the noises in the original ESACF patterns, and briefly discuss the effect of features by comparing the performance of ESACF with that of ACF and PACF. In section 5, we suggest a Reduced Pattern Classification Network (Reduced PCN) with simpler architecture than MLP2H, in an attempt to enhance computational efficiency. The performance of the Reduced PCN is tested using noise-filtered ESACF patterns as inputs. In section 6, we show the usefulness of the two-stage neural network approach by testing on three real world economic time series data.

Artificial Neural Network

Fig. 1. ARMAmodelbuildingprocedure and suggestedapproaches.

~

1~Forecasting]

J.K. Lee / A two-stage neural network approach

2. Extended Sample Autocorrelation Function Since we use ESACF as a statistical feature in this project, let us review the definition and properties of ESACF. The ESACF table looks like the example given in Figure 2(a). Since the Figure 2(a) is not easy to read, we convert the table into a format resembling that of Figure 2(b), which shows the theoretical prototype pattern of ARMA(1, 1) without any noise. With this figure, all a human expert has to do is to find the vertex of the triangle of l's. However, this task may not be straightforward if noises exist as in Figure 2(c), which is converted from Figure 2(a). Furthermore, a time series with some seasonality can often yield multiple overlapping triangular patterns which make it difficult for the human analyzer to select the right vertex. Thus, it is essential to automate the judgment process. Therefore, our objective is to build an ANN which receives the converted ESACF pattern as input and classifies it into the A R M A ( p , q) model as depicted in Figure 3. Before moving to the issue of the ANN design, we briefly summarize the ESACF approach to explain how to get such a pattern from a set of time series data. Readers may skip the remainder of this section if they are not particularly interested in statistics. An A R M A ( p , q) model for a time series {Zt, t = 0, + 1, + 2 . . . . } is expressed as: Z t =

C~IZt_

1 ~"

-- Oqat_q,

" " " -~-~)pZt_

p -t- a t - - O l a t _

1 . . . .

(1)

where [a t } consists of normally distributed independent random variables from ~(0, cr2), and ~b's and O's are autoregressive (AR) and moving average (MA) parameters to be estimated, respectively. For the selection of appropriate orders (p, q), Tsay and Tiao [24] extend the Box-Jenkins method using the property that the Sample Autocorrelation Function (SACF) of pure MA(q) model abruptly drops to zeros after lag q, which is called "cut-off behavior" [2]. Since the moving average portion in (1) is the residuals from the autoregression of order p, i.e. A R ( p ) regression, they first developed the iterated regressions to obtain consistent estimates of the A R parameters. Since the estimates from the j-th iterated A R ( k ) regression can be recursively computed

463

using the ordinary least square estimates of AR(k), AR(k + 1), . . . , AR(k + j ) fittings, the ESACF approach can be computationally efficient. Furthermore, there is no need to worry about the order of differencing which is indispensible for Box-Jenkins method, because the j-th iterated regression procedure yields consistent estimates of true AR parameters even for the nonstationary A R M A ( k , q) model if j > q. Therefore, for each assumed value k for p, the k-th ESACF is defined as follows:

Definition

The value of the k-th ESACF at lag j is defined as the sample autocorrelation of an estimate of the moving average portion, which is the residuals of the j-th iterated AR(k) regression.

The k-th ESACF for the A R M A ( p , q) model has the following properties: (1) If k = 0, ESACF is just the ordinary ACF in the Box-Jenkins method. a) An Illustrative ESACF Table

0 1 2 3

4

5 6 7 8 9

0

1

2

3

4

5

6

0,57 -0.29 -0.29 -0.50 -0.48 -0.39 -0.49 0.18 0.31 0.50

0.50 0.01 -0.27 -0, 01 -0.02 -0.41 0.15 -0.0"2 0.07 -0,02

0.40 -0,05 -0,04 0.10 0.05 -0.17 -0.18 0.04 0.00 0.01

0.36 -0,01 0.01 -0.01 -0.02 O. Ol 0.00 0.33 0.25 0,17

0.33 -0.05 -0.05 -0.01 -0.01 -0.17 -0.26 0,26 0.30 -0.03

0.35 -0.01 -0.01 -0.03 -0.04 -0.02 -0.05 -0.09 -0.11 0.15

0.39 0.15 0,16 0.16 0.14 O. I0 0.09 -0.23 -0.15 -0.05

b)~ePro~

7

8

0.32 0.30 -0,07 0 . 0 1 0.03 0.04 -0.03 0.11 0,03 0.09 -0. Ol O. 05 -0. I0 0,05 0.03 0,01 0.13 -0.03 0.13 0 . 0 1

9 0.25 0.04 0.07 -0,0"2 -0.03 O. 07 0.02 0.03 0.01 0.01

Pattern of lhe Converted ESACF Table for ARMA(I,1) ~\~

0

1

2

3

4

5

6

0

0

0

0

0

0

0

0

7 0

8 0

9 0

1 0 1 1 1 1 1 1 1 1 1

2 3 4 5 6 7 8 9

0 0 0 0 0 0 0 0

0 ! 0 0 0 0 0 0 0 0 0 0 0 0 0 0

1 1 1 1 1 1 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

1 1 1 1 1 0 0 0

1 1 1 1 1 1 0 0

5

6

1 1 1 1 1 1 1 0

1 1 1 1 1 1 1 1

c)~eConve~ACFTable~om(a) ~\~[ 0 1 2 3 4 5 6 7 8

9

0

1

2

0 0 0 0 0 1 1 0 0 1 0 1 1 0 1 1 0 0 0 0 0 0 0 1 1 0 1 1

3 0 1 1 1 1 1 1 0 0

4

0 0 0 0 1 1 0 1 1 0 1 1 0 1 1 1 0 1 1 0 1 1 0 1 0 0 1 1

0 1 1 1 1 1 1 1 1

7 1 1 1 1 1 1 1 1

8

9

1 1 1 1 1 1 1 1

0 1 1 0 1 0 1 1 1 1

Fig. 2. Example of ESACF pattern for ARMA(1, 1) model This example is extracted from the Series A in Box & Jenkins (1976).

J./~ Lee / A two-stage neural network approach

464

(2) If 0 < k
Target Output Vector

after lag q because the estimated residuals from the j'-th iterated AR(k) regressions follow an pure MA(q) process when j > q. (4) If k > p, ESACF can display cut-off behavior

ARMA(1,1) :

o o}

iii 'i I

Backward Pass

'

I

ERRORS

Output Layer

Second Hidden Layer

First Hidden Layer

Input Layer

Forward Pass

Converted Output from Feature Extractor: ESACF

0 0 0 0 0 O 0

0 0 0 0 1 1 1 0 1 1 1 1 1 1 1 1 1 O 0 1 0 0 0 1

0 0 0 0 0 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 01 1 1 1

0 1 1 0 0 1 0 1 1/i / 01 1 001 1 1 0 1 1 0 1 0 1 1 1

1

1

Fig. 3. ESACF pattern classification with Multi-Layered Perceptron of two hidden layers.

J.K. Lee / A two-stage neural network approach

when 0 _
3. The Multi-Layered Perceptron approach for ARMA model identification For the identification of ARMA models with ESACF patterns, we adopted the Multi-Layered Perceptron (MLP) which is one of the most widely used ANN models as pattern classification networks (PCN) [15,20]. MLP is a feedforward network in which a Processing Element (PE) in one layer is fully connected with all PE's in the adjacent layer. For the training of MLP, we used a backpropagation algorithm [21,26], which consists of three steps: forward pass, backward pass, and weight update as depicted in Figure 3. Since each connection between PE's has a corresponding weight, learning in the backpropagation algorithm is achieved by adjusting the weights based

1

The variance of ESACF values can be approximated by

(n-k-j)

1

465

on the given training data set of input/output pairs. 3.1. Design of the network To configure the structure of MLP, we need to describe the number of hidden layers, the number of PE's in each layer, and the types of activation functions for PE's. These issues are handled in the following way: (1) We adopt two hidden layers in order to exploit the full classification power of MLP [15]. (2) Since we use 10 × 10 ESACF table, we arrange 100 corresponding input PE's. By the same token, since we consider the maximum orders of p = 5 and q = 5 , we arrange 35 output PE's which correspond to (p + 1)(q + l ) - 1 PE's. (3) The number of hidden PE's affects the convergence rate in learning and the performance of MLP. In general, the increased number of hidden PE's will improve the recognition performance. However, excessive use may cause the MLP to try to learn from unnecessary noise in training patterns which may deteriorate the generalization power. On the other hand, too small a number of hidden PB's may lead to slow convergence and bad performance. Therefore, many researchers have tried to find an appropriate size by simulating a various number of hidden PB's [4,13]. In the MLP2H, we adopt 70 and 50 PB's in the first and second hidden layer respectively as a starting point. (4) The sigmoid function is adopted as an activation function for all PE's in hidden and output layers. 3.2. Data sets for experiments To make the MLP2H robust to the input noises, we added noises to our training data sets, and generated four groups each of which included ESACF patterns with different noise levels: 0%, 10%, 20%, and 30%. By using these four data groups, we can observe the impact of noises in the training data set. The training data sets are prepared as follows:

466

J.K~Lee / A two-stage neural network approach

(1) To add noises into a pure E S A C F prototype pattern, we randomly inverted 10, 20 and 30 binary values respectively. Prototypes were considered to have 0% noise level.

(2) Since the noises in the triangle easily break the original shape of the E S A C F pattern, we restricted the number of noises in the triangle to less than 12 - p - q .

(a) Classification Performance of MLP2H for Each Noise Level In the Test Sets

00 95

)¢' X..__.)
X'

x

~"

90

L= -I O

E e)

o

/

70

~

~

,,.

65 60 55 50

~)

3

~,

5

(3

7

8

~] 1'0 1'1 1'2 1'3 1'4 1'5

Test Set Number

(b) Effects.of Training Sets on the Classification Performance of MLP2H

100 ...................

95

i>~ . . . . . . . . . . . . .

: . . . . . -X-- . . . . . . . . . . . . . . . . .

~K

90

g

85

:3

80

cO

7~= 70

O U)

-..p.j'-.

65

~

60

____.__----~

f

f

55

50 Training Set Number Legend: Noise Levelsin the Test Set

I

- - i - Average --->~-- 10%

-~-- 20%

Fig. 4. Test results from MLP with two hidden layers.

--~- 30%

I

J.K. Lee / A two-stage neural network approach

(3) In the target output vector, the ARMA(p, q) model had "1" in the (6p +q)-th element while having zeros in the remainders. (4) After generating five noisy patterns for each noise level of each ARMA(p, q) model with their corresponding target output vectors, we organized these i nput / out put pairs into the following four training data sets: 1. the first set had 35 prototype patterns; 2. the second set had 210 patterns (the first set plus 175 patterns of 10% noises, i.e., 5 noisy patterns for each of 35 ARMA models); 3. the third set had 385 patterns (the second set plus 175 patterns of 20% noises); and 4. the fourth set had 560 patterns (the third set plus 175 patterns of 30% noises). To test the performance of the learned ANN's, we generated the test patterns with different noise levels in the same way we prepared the training data sets. We prepared 15 test sets, each of which consisted of 210 test patterns: 2 noisy ESACF patterns for each ARMA model for each of three noise levels (10%, 20%, 30%). Therefore, each

I 100 PEs

55° 70 PEs

100 PEs

...

000

O0

.-.

///T

000

I

test set can be divided into 3 subsets of different noise levels, each of which has 70 patterns. 3.3. Training

Before we start the training, the binary values of 0 or 1 in training sets are transformed into 0.1 and 0.9 respectively to facilitate the learning. Since convergence behavior in backpropagation algorithm can be regulated by the learning rate r/, which can be globally or locally fixed [8], and the constant a for the momentum term [21], we adopted r / = 0.1 and te = 0.9. We stop the training when the average squared error becomes less than the predetermined error bound of 0.005. We have applied a step-by-step training method to the four training sets. First, the training starts with the first training set. Once the MLP2H has reached the predetermined error bound, the learned MLP2H is trained again with the second training set. This iterative procedure continues with the third and fourth training sets. By adopting this approach, we can compare the performances of MLP2H's which have learned from the training sets of different noise levels. In

Noise-filtered ESACF Patterns

000

...

O0

.--

0 0 0

ESACF Patterns with Noises

Fig. 5. The structure of Noise Filtering Network.

000

/// • ..

467

Noise FIHering Network

000

I

J.K. Lee / A two-stage neural network approach

468

Table 1 Average classification accuracy of MLP2H for each test subset as a function of training sets Noise Level

Noise level in training sets

One-way

in test subset

1st

ANOVA test: F(3, 56)

10%

94.95 9 6 . 8 6 95.81 96.10 1.50 (2.80)* (2.40) (2.24) (2.11) p = 0 . 2 2 4 " *

20%

75.05 (3.25)

7 8 . 8 6 84.10 83.04 20.85 (4.43) (2.75) (2.90) p < 0.001

30%

55.33 (4.15)

5 8 . 2 9 65.05 65.72 15.01 (5.88) (4.72) (5.11) p<0.001

Average over 75.11 all subsets (2.01)

7 8 . 0 0 81.65 81.71 23.50 (3.06) (2.10) (2.52) p < 0.001

2nd

3rd

4th

Legend: * standard deviation. * * F probability.

addition, we have confirmed that the incremental increase of the noise level during the learning phase reduced the training time and led to stable learning over the noisy training sets. Similar experimental results were reported by Waibel et al.

[25]. Under the training setting, the MLP2H for each training set required 791, 882, 917 and 969 epochs respectively to converge into the predetermined error bound. The trained MLP2H correctly recognized all the patterns in the training sets.

3.4. Test and discussions For each of three noise levels in test sets, the classification accuracy of the MLP2H is tested. The results are displayed in Figure 4(a). To know the effect of noises in training sets, the averaged performance of the MLP2H learned from each training set is also tested. The results are summarized in Figure 4(b) and Table 1. From the above test results, we can first observe that the designed MLP2H suffers in classifying the test patterns with the higher noise level, although the MLP2H can learn from noisy training patterns. As shown in Figure 5 and the fifth column in Table 1, the averaged classification accuracy drops from 96.1% to 83.0% and 65.7% as the noise level in the test patterns increases from 10% to 20% and 30% respectively. The result of one-way analysis of variance, F(2, 42) =

244.93 p < 0.001, supports this hypothesis. The observation implies that the higher the noise level of the test pattern, the higher the risk of wrong classification. This general principle is a fact of life which we cannot overcome, even though we may be able to ameliorate the issue to some extent. Our second observation is that the performance of the MLP2H trained by the fourth training set had not improved at all in comparison with the results by the third training set. For instance, the performance of the 20% noisy test patterns had even fallen slightly. The F-values in Table 1 tell us the effect of noise levels in training sets. When the test subset has a 10% noise level, we cannot accept the significant effect of training sets because F(3, 5 6 ) = 1.50 p = 0.224. However, test subsets with 20% and 30% noise levels were significantly affected by the noise levels of training sets for F(3, 5 6 ) = 20.85 p < 0.001 and F(3, 5 6 ) = 15.01 p < 0.001 respectively. In spite of these effects of the training set, Scheffe's procedure in one-way A N O V A analysis [18] indicated that there is no difference between the results from third and fourth training sets. Therefore, we confirm that training sets with too high a noise level do not contribute to improvements in the performance of the MLP2H. From these observations, we conclude that it is necessary to devise a step that can reduce noises in the original ESACF patterns. This requires us to introduce a preprocessing Noise Filtering Network (NFN) before the ESACF Pattern Classification Network (PCN). The following section deals with NFN.

4. Noise filtering network 4.1. Design of the network As we have observed in the previous section, if the noises in ESACF patterns can be eliminated or reduced below a 10% noise level, the PCN's performance can be about 95%. To implement this idea, we suggest a two-stage neural network approach by adopting a Noise Filtering Network as a predecessor to PCN. The role of NFN is to filter noises in the ESACF patterns to recover patterns as close as the triangular prototype patterns seen in Figure 2(b). By the nature of noise

J.K. Lee / A two-stageneuralnetworkapproach filtering, NFN should be able to respond to highly noisy ESACF patterns. As a candidate model of NFN, we can consider the content addressable memories or associative memories which are useful when only a cue for an input pattern is available and the recall of the complete pattern of the class exemplar or prototype is necessary [15,20]. Since theoretical prototypes of ESACF patterns exist, it seems that associative memories can be easily applied to our problem and can give us the benefit of greatly reduced training efforts because the associative memories require only class exemplars for training. However, associative memories such as the Hopfield Network [6] may not be appropriate for identifying A R M A models with ESACF for the following two reasons: (1) The number of classes in associative memories should be kept well below 15% of the input PE number, the so-called limitation of memory capacity [16]. Therefore, 35 A R M A model classes are excessively large. (2) Since similar ESACF prototypes have much in common with each other, the recall of the exemplar pattern may be very unstable [15]. Thus, we use a multi-layered perceptron as the ANN model for NFN, but make the MLP behave like associative memories by making the input

and output layers of NFN have the same number of PB's as depicted in Figure 5. In the NFN, we used only one hidden layer whose number of PB's is smaller than those in i n p u t / o u t p u t layers. The reduction in the hidden PE number will clear out the noises by abstracting the essential information about the triangular region in the input pattern and by restoring it in the output layer. This process will recover the ESACF pattern as close as its prototype. To trace the effect of the number of the hidden PE's, we examine 4 cases: 55, 60, 65 and 70 hidden PE's.

4.2. Training For the training of NFN, we prepared four training sets of different noise levels by mapping the noisy ESACF patterns with the theoretical ESACF prototypes. We adopted the backpropagation algorithm and the step-by-step training method as we did for the MLP2H. However, we used 0.01 as a predetermined error bound, and to prevent oscillations of error, the learning rate ~7 was lowered from 0.1 to 0.05 when the NFN learns from the fourth training set. Table 2 shows, for each training set, the number of cumulative epochs for each hidden PE number for each training set. The learned NFN correctly recognized all training patterns.

.

5.62 S.D. =O.PSO4/Im

/ ~5'

/._

r-

U.

.....

0 Z



~

469

S.D.=0..~33

2'

IP

0.67 $,D.=0.2509

o

36% Noise Level in Test Subsets

Fig. 6. Average noise level after filtering by the Noise Filtering Network.

J.K. Lee / A two-stage neural network approach

470

Table 2 Cumulative epochs for the convergence of noise filtering network as a function of hidden PE numbers Number of hidden PE's in NFN

Noise level in training sets 1st

2nd

3rd

4th

55 60 65 70

214 186 159 137

317 301 253 216

493 475 456 434

1104 718 676 648

PE's PE's PE's PE's

To validate the idea, we have tested the performance of PCN with the filtered patterns by NFN. Figure 7 shows the average performance of the MLP2H for the total test patterns (solid lines) and the 30% noise level test patterns (dotted lines) for each test set. We can observe that the performance of the MLP2H with a noise filtering stage is superior to that of the MLP2H without noise filtering for all test patterns. This performance gap becomes even larger for the 30% noise level. The t-test supports the significant difference between them by the value of t = 6.98 p < 0.001 in Table 3. For total test patterns, average classification accuracy of the MLP2H with NFN is increased to 89.05% from 81.71% by the MLP2H without noise filtering. However, the MLP2H with NFN failed to perform to our expectation. This is caused by wrong filtering in NFN. The analysis of noise-filtered patterns revealed that two types of noises are left:

4.3. Tests and discussions We use the 15 test sets used in section 3, each of which consisted of 210 patterns that can be further divided into 3 subsets of 10, 20 and 30% noise levels. Before we discuss the effect of a hidden PE number, let us analyze the performance of the NFN with 65 hidden PE's. The results are graphically shown in Figures 6 and 7. The input noises 10%, 20%, and 30% are significantly reduced to 0.67%, 2.11%, and 5.62%, respectively. Owing to the noise reduction, we anticipate that the PCN can classify the ESACF patterns better than the performance of the test set with 10% noise in Table 1.

(1) The noise filtered pattern still has some spurious values. The shaded areas in Figure 8(a) represent the noises which NFN does not eliminate. This type of noise occurs when the input pattern has overlapping triangles.

Effect of Noise Filtering Stage on Classification Performance of M L P 2 H 100

I

¢~J]

i o

-',r

.=\./

=0//

...__.

70 ."'~.. .'/ "" .~ 751 "i ..... Y ' " ' ~ ...... 65 60

,~ " r "

i

~

,,

, = .......

X: ",.

,.

/",, a" '-

!

,.,

.m..

. _ / ~

i

"., ,,~'" ,,, ....

....

"~-,

,,. "~

55 50

. 2. .3 .4 . 5 . 6 . 7 . 8 . 9 .

1

10

11

12

13

14 15

Test Set Number

-l-

MLPNFN

-die- MLP2H

--E3-- MLPNFN(30%) ---x-- MLP2H(30%)

I

Fig. 7. Average classification accuracy of MLP2H with Noise Filtering Network of 65 Hidden PE's.

J.K. Lee / A two-stage neural network approach

471

Table 3 Average classification accuracy of PCN's with N F N of 65 hidden PE's

Table 4 Average classification accuracy of N F N on total test sets as a function of hidden PE numbers

Noise level

MLP's used in experiments

Noise level in training sets

in test subsets

MLP2H w/o NFN

MLP2H with N F N

R. P C N t with N F N

N u m b e r of hidden PE's in N F N

1st

2nd

3rd

4th

10%

96.10 (2.11) *

96.67 (2.74)

97.05 (2.78)

55 PE's

74.16 (3.17) *

78.89 (2.68)

82.35 (2.36)

84.22 (2.88)

20%

83.04 (2.90)

89.71 (3.50)

89.62 (3.10)

60 PE's

76.16 (2.17)

80.98 (2.32)

85.30 (1.96)

88.03 (2.61)

3.62 p=0.001**

30%

65.72 (5.11)

80.76 (5.30)

79.43 (4.36)

65 PE's

75.49 (3.79)

83.08 (2.92)

86.67 (2.27)

90.32 (2.55)

2.17 p=0.034

Average over test subsets

81.71 (2.52)

89.05 (2.97)

88.70 (2.69)

70 PE's

74.32 (3.23)

79.14 (3.34)

83.97 (3.06)

88.89 (3.08)

-1.36 p=0.193

t-values

6.98 p < 0.001 **

-

Legend: * standard deviations ** t probability.

- 1.54 p = 0.129

Legend: * standard deviations, * * t probability, t The Reduced PCN model is discussed in section 5.

(2) The noise filtered pattern is a clean triangular pattern, but it does not match with its corresponding prototype. That means the pattern is wrongly recovered. In Figure 8(b), the solid-line triangle is a desired prototype while the dotted-line triangle is one that is wrongly recovered one by NFN. We found that as the noise level in ESACF patterns goes up, the probability of triangles overlapping goes up. This is why the performance by the 30% noise level is inferior to the one by total patterns in Figure 7. To determine the structure of NFN, we compare the performance of NFN with different numbers of hidden PB's: 55, 60, 65, and 70. The values in noise-filtered patterns are changed into

binary values using the threshold of 0.7. For each hidden PE number, the changes in the performance of NFN are shown in Table 4. According to the three-way A N O V A test, the number of hidden PE's significantly affected the performance of NFN. F-values of three main effects (the noise level in the training set, the noise level in the test set, and the hidden PB number) were F(3, 852) = 426.71 p < 0.001, F(2, 852) = 4747.44 p < 0.001, and F(3, 852) = 37.21 p < 0.001, respectively. Our second observation with respect to hidden PE numbers is that performance improves as the number of hidden PE's increases to 65. However, increasing the hidden PE number to 70 did not bring further improvement, although it required less training epochs as shown in Table 2. This observation was supported by the one-way A N O V A test on the performance results obtained from NFN. That is,

~////////////////////,~,

(a) Recovered Pattern with Some Spurious Values

t-values

(b) Wrongly Recovered Pattern

Fig. 8. Two types of noise distributions in the noise-filtered test patterns.

J.K. Lee

472

6 PEs

20 PEs

39 PEs

/ A two-stage neural network approach

AR Network

MA Network

Order P of AR Process

Order Q of MA Process

000

0

6 PEs

000

OO .-,0

O0 .-.0

I t \ 000 ,,,00

I I \ 000 ...00

28 PEs

39 PEs

Preprocessing the Noise-filtered ESACF Pattern Fig. 9. Architecture of Reduced Pattern Classification Network.

F(3, 56) = 12.24 p < 0.001. Although Scheffe's procedure did not indicate differences among the performances of 60, 65 and 70 hidden PB's, the pairwise t-tests support that the difference between 60 and 65 hidden PB's is t = 2.17 p = 0.034, although the difference between 65 and 70 hidden PE's cannot be strongly supported because t = - 1.36 p = 0.180. We have also confirmed that the performance of NFN with 65 hidden PE's on test subsets of a 30% noise level significantly increases as the noise level in training sets increases. The A N O V A test confirms that F(3, 56) = 70.15 p < 0.001. Specifically, performance is improved by 17% when we Table 5 The average n u m b e r of wrongly recovered noise-filtered patterns that indicated incorrect A R M A models Noise Level Deviations in A R of M A Order in AR1 MA1 A R 1 / M A 1 Others test subsets

Total

10%

0.00 1.43 0.29 (0.00) * (2.04) (0.41)

0.29 (0.41)

2.01 (2.87)

20%

0.29 (0.41)

3.86 1.26 (5.51) (1.80)

0.86 (1.23)

6.27 (8.96)

30%

0.71 (1.01)

5.86 2.57 (8.37) (3.67)

2.92 (4.17)

12.06 (17.22)

Total

1.00 (0.47)

11.15 4.12 (5.31) (1.96)

4.07 (1.94)

20.34 (9.69)

Legend: * in percentage.

compared the result (82.76%) from NFN with that without noise filtering stage (65.72% in Table 1). In short, our NFN correctly recovered 90.32% of 210 test patterns on the average. In other words, about 20 patterns in each test set were wrongly recovered. However, as shown in Table 5, even in the case of wrongly recovered patterns, most of them were only near-misses. AR1 and MA1 in Table 5 indicate that the identified /3 and 4 deviate from the original p and q by 1. Although NFN has performed relatively poorly in identifying MA orders, the performance of NFN is in general quite acceptable. To experiment with the effect of the feature extractors, we apply the MLP2H with NFN to the data set used in our first project which adopted ACF and PACF as feature extractors [9]. According to our experiment, it is confirmed that ESACF performs much better than ACF and PACF. 2

2 In the first project, 202 patterns were used for training while 54 patterns for testing. The A C F and P A C F patterns for both training and testing were obtained from the simulated time series data sets of size n = 100. W h e n we generate the time series, the parameter sets for each A R M A model are designed to be located in the admissible region. Since we consider the m a x i m u m orders of p = 2 and q = 2, there are 8 A R M A models. T h e averaged noise level in E S A C F test patterns was 24.5%. The following table summarizes the results of comparative experiments between two feature extractors.

J.K. Lee / A two-stage neural network approach

t

5. Reduced Pattern Classification Network

~

5.1. Design of the network Since the effort of training the MLP2H is a great burden, we attempt to design a reduced network which can perform as well as the MLP2H while reducing the training effort significantly. To implement the idea of a reduced network (let's call it Reduced PCN), we consider only one hidden layer with fewer input and output PE's. Figure 9 shows the structure of the Reduced PCN, including the number of PE's in each layer. The sigmoid activation function is also used in all hidden and output PE's.

5.1.1. Preprocessing the noise-filtered ESACF pattern We reduce the number of input PE's by utilizing the boundary information of the triangular region. For this purpose, we process m x n noise-filtered ESACF patterns to prepare the input vectors for Reduced PCN in the following way: (see Figure 10) (1) ESACF values are averaged row-wise and column-wise respectively. (2) ESACF values in two adjacent left-to-right diagonal lines are divided by 2m. The above preprocessing method reduces the input PE number from mn to 2(m + n) - 1. Therefore, the number of input PE's for Reduced PCN is reduced from 100 to 39.

5.1.2. Partition of the Pattern Classification Network We also reduce the output PE number by dividing the PCN into two sub-networks: one for the A R process and one for the MA process. Owing to the partitioning, we can reduce ( p +

473

1

2

(11-20) 4

5

6

7

8

9

° I

* b,. r",.I 5 ~,.P-,.I

II

(1 -10)

I I

I II I

P'..F"..t "~"

( 21 - 39 )

Fig. 10. Preprocessing method for the input of Reduced PCN.

1)(q + 1) - 1 output PB's in the MLP2H to ( p + 1) output PE's for the A R network and (q + 1) output PB's for the MA network. When p = q = 5, the output PE's fall from 35 to 6 in each sub-network.

5.2. Training To eliminate the effect of wrong-filtered patterns as in Figure 9(b), we collected 126 patterns excluding such wrong-filtered patterns. The input vectors are prepared as described in section 5.1.1. Then two reduced networks (AR and MA) are trained by the backpropagation algorithm. For both networks, we used r / = 0.1, a = 0.9 and the predetermined error bound was 0.001. The learned A R and MA networks correctly recognized all their training patterns as shown in Table 6. The Reduced PCN required 63% of the training time required for the MLP2H.

5.3. Test and discussions We tested the performance of AR and MA networks using the noise filtered test patterns from NFN with 65 hidden PB's. To eliminate the bias that might be caused by wrongly recovered

Table 6 Training and test results of PCN's on correctly noise-filtered patterns Models of PCN

Epochs

MLP2H Reduced PCN

AR MA

Legend: • standard deviation.

Training Time

Recognition of Training Set

Classification Accuracy of Test Set

467

100%

100%

97.33% (1.39) *

889 2122

12% 51%

100% 100%

96.60% (2.03)

99.79 (1.39) * 97.05 (1.94)

J.K. Lee / A two-stage neural network approach

474

patterns, we first remove the wrongly recovered test patterns from NFN. The average number of patterns with this qualification in 15 test sets is 189.66. As shown in Table 6, the percentage of correct classifications by the MLP2H is 97.33%,

while the performances by the AR and MA networks are 99.79 and 97.05% respectively. The percentage that correctly classified the orders of both AR and MA models is 96.6%. According to the F-test, the performances of the MLP2H and

CLASSIFICATION

TRAINING

ESACF Prototype Patterns

I

Time Series from RealWorld

Noise Model L

Training Sets for Noise Filtering Network

I Patterns I for PCN i

I Feature Extractor : ESACF

Step-by-step Training

Noise Filtering Network

1

Preprocessing the outputs of Noise Filtering Network

Noise-filtered ESACF Pattern

1 Training Sets for Pattern Classification Networks

Preprocessing for PCN

TRAINING

IL AR II LI oA II Network

Network

Pattern Classification Networks ARMA Models Fig. 11. Flow diagram of experiment with the two-staged neural network approach.

I

J.K. Lee / A two-stage neural network approach

R e d u c e d P C N do not reveal any significant differences because F(1, 2 8 ) = 1.33 p = 0.259. O n e thing of note is that, as is the case with N F N , the M A network did not p e r f o r m as well as the A R network. Even w h e n we include the wrongly recovered patterns from N F N , the p e r f o r m a n c e of R e d u c e d P C N is as good as that of M L P 2 H as

shown in Table 3, although the p e r f o r m a n c e of both networks fell to 89.05% and 88.70%. Therefore, we use the R e d u c e d P C N in place of the M L P 2 H without d a m a g i n g to classification performance, and reduce the training efforts for PCN. T h e two-stage neural network a p p r o a c h consists of three M L P ' s - namely, N F N , A R and

~\~

0

1

2

3

4

5

6

0

0

0

0

0

0

0

0

0

0

0

1

0

0

1

1

0

1

1

1

1

1

2 3 4 5

1 0 0 0 0

1 1 1 0 0

1 1 1 1 1

1 1 1 1 1

1 1 1 1 1

1 1 1 1 1

1 1 1 1 1

1 1 1 I 1

1 1 1 1 1

1 1 1 1 1

1

0

1

1

1

1

1

1

1

1

8 9

1 0

0 1

0 0

1 1

1 1

1 1

1 1

1 1

1 1

1 1

1

2

(a)

7

475

8

9

(b) ~\~

0

3

4

5

6

7

8

9

0.107 0.099 0.105 0.096 0.096 0.096 0.096 0.096 0.096

0 1 2

0. i00 0.099 0,873

3 4 5 6 7 8 9

0.113 0.115 0.100 0.100 0.100 0.100 0.100

0.116

0.111

0.118

0.100

0.093

0.092

0.092

0.093

0.093

0,907 0,899 0.112 0.115 0.100 0.100 0.100 0.100

0,907 0.984 0,895 0.112 0.115 0.100 0.100 0.100

0,900 0.905 0.903 0,895 0.112 0.115 0.100 0.100

0,914 0.900 0.906 0.903 0,895 0.112 0.115 0.100

0,907 0.915 0.900 0.906 0.903 0,895 0.112 0.115

0,901 0.909 0.914 0.900 0.907 0.903 0,895 0.112

0,901 0.900 0.909 0.914 0.900 0.907 0.903 0,895

0,901 0.900 0.900 0.914 0.914 0.900 0.907 0.903

0,901 0.900 0.900 0.900 0.910 0.914 0.900 0.907

30

35 39

0.9 0.8 0.7 (c)

0.6 0.5 0.4 0.3 0.2 0.1

5

1 10

15

20

25

(d) AR Network : 0.1063 0.2140 0.8036 0.0952 0.0943 0.1023 MA Network : 0.9433 0.1583 0.0937 0.1127 0.1532 0.0831

Fig. 12. Test result of the U.S. GNP Data: ARMA(2, 0) ((a) ESACF pattern (b) Noise Filtered Pattern (c) Preprocessed Values (d) Outputs of Reduced PCN).

J.K. Lee / A two-stage neural network approach

476

M A Networks. Figure 11 summarizes the training and classification procedures in our two-stage approach.

three data sets of economic time series found in the real world.

CASE I: Quarterly Gross National Product o f the U.S. Figure 12 shows the test result obtained from U.S. quarterly GNP data between 1946 and 1970. The 10 × 10 converted E S A C F pattern in Figure 12(a) has 24 different values from the prototype

6. T e s t s w i t h real w o r l d d a t a

In order to test the applicability of the twostage approach in practical situations, we used

~\~

0

1

2

3

4

5

6

0

0

0

0

0

0

0

0

0

0

0

1

0

0

1

0

0

1

1

0

0

0

2 3 4 5 6 7

0 1 1 0 0 0

1 0 0 0 1 1

1 1 0 0 1 0

0 0 1 0 0 0

0 1 1 1 0 1

1 1 1 1 1 0

1 1 1 1 1 1

0 0 0 1 1 1

1 1 1 1 1 1

1 1 1 1 1 1

(a)

7

9

8

8

1

0

0

0

0

1

1

1

1

1

9

0

0

1

0

0

0

1

0

I

1

(b)

AR\MA

0

1

2

3

4

5

6

7

0 1 2 3 4 5 6 7 8 9

0.100 0.062 0.165 0.004 0.109 0.100 0.100 0.100 0.100 0.100

0.094 0.044 0,890 0.005 0.100 0.110 0.100 0.100 0.100 0.100

0.095 0.093 0,770 0.2'22 0.139 0.100 0.110 0.100 0.100 0.100

0.103 0.107 0,891 0.103 0,900 0.140 0.100 0.109 0.100 0.100

0.091 0.087 0,913 0.223 0,799 0~901 0.139 0.100 0.109 0.100

0.091 0.077 0,914 0.264 0,899 0.799 0,901 0.139 0.100 0.100

0.091 0.074 0,891 0.256 0,920 0.899 0.800 0,900 0.139 0.109

0.091 0.074 0,890 0.225 0,920 0.920 0.899 0.799 0,900 0.139

30

35 39

8 0.091 0.074 0,891 0.225 0,900 0.920 0.920 0.899 0.799 0,900

9 0.091 0.075 .0,891 0.225 0,900 0.900 0.920 0.920 0.899 0.799

0.9 0.8 0.7

(c)

0.6 0.5 0.4 0.3 0.2 0.1

5

25

(d) AR Network : 0 . 1 1 1 7 0 . 1 0 3 4 0 . 4 9 4 3 0 . 1 0 1 4 0 . 9 5 8 9 0 . 1 0 6 9 M A Network : 0 . 1 1 9 6 0 . 2 5 8 6 0 . 1 2 1 8 0 . 7 2 3 0 0 . 1 4 3 2 0 . 1 0 6 8

Fig. 13. Test result of Consumer Price Index Data: ARMA(4,3) ((a) ESACF pattern (b) Noise Filtered Pattern (c) Preprocessed Values (d) Outputs of Reduced PCN).

J.K. Lee / A two-stage neural network approach

pattern of ARMA(2, 0). Although the pattern has 24% noise level, N F N completely removes noise. The underlined numbers in Figure 12(b) clearly represent a triangle whose vertex is found on the third row and the first column. Figure 12(c) shows the input to Reduced PCN which consists of 39 values obtained by preprocessing the noisefiltered pattern in Figure 12(b). Figure 12(d) shows the outputs of the A R and M A Networks.

The output vector of the A R Network has the largest value in the third element which means an A R order of 2. The output vector of the M A Network indicates an M A order of 0. Thus, the two-stage approach correctly classifies U.S. G N P data as AR(2, 0) without any difficulty. Nelson [17] analyzed this data as AR(1) after applying the first order of integration filter, that is, differencing the data once. However, note that

AR~MA

0

1

2

3

4

5

6

0

0 0

0 1

0 1

0 1

0 0

0 1

0 0

0 0

0 1

0 1

1

2 3

(a)

4

5 6 7

477

7

8

9

1

1

1

1

0

1

1

1

1

1

0 0

0 0

1 1

1 1

0 0

0 0

1 0

1 1

1 1

1 1

1

0

!

0

0

0

1

1

0

1

8

0 0 1

0 1 1

0 1 1

0 0 0

0 0 0

1 1 1

1 1 1

1 1 1

0 0 1

1 0 1

9

1

1

1

0

0

1

1

1

1

0

0

1

2

0.100 0.370 0.098 0.099 0.080 0. 100 0.100 0.100 0.100 0.100

0.166 0.173 0,856 0.212 0.101 0. 081 0.100 0.100 0.100 0.100

(b)

0 1

2 3 4 5 6 7 8

9

3

4

5

0.008 0.091 0.081 0.081 0.251 0.038 0.211 0.186 0,743 0,837 0.548 0,900 0,857 0.758 0,835 0.524 0.216 0,861 0,756 0,835 0. 101 0. 216 0,861 0, 755 0.081 0.101 0.217 0,861 0.100 0.080 0.101 0.217 0.100 0.100"0.080 0.101 0.100 0.100 0.100 0.080

6

7

8

9

0.081 0.081 0.081 0.081 0.184 0.188 0.189 0.189 0,899 0,899 0,899 0,899

0,895 0.899 0.899 0.899 0.520 0, 835 0,756 0,861 0.217 0.101

0,901 0. 521 0.835 0.756 0.861 0.217

0.900 0, 901 0.521 0,835 0,755 0.-861

0.900 0. 900 0,901 0.521 0,835 0,755

~(i> t 0"9f 0.8

0.7

(c)

0.6! 0.5

I

0.4 0.3 0.2 0.1

5

I0

15

2

25

30

35 39 ".i

(d) AR Network : 0.0956 0.2824 0.5405 0.1114 0.0954 0.1016 MA Network : 0.0318 0.3816 0.0000 0.0286 0.1053 0.6980 Fig. 14. Test result of the Caffeine Data: ARMA(2,5) ((a) ESACF pattern (b) Noise Filtered Pattern (c) Preprocessed Values (d) Outputs of Reduced PCN).

478

J.K~ Lee / A two-stage neural network approach

the AR(1) process with a differencing is equivalent to the ARMA(2, 0) process, and the adoption of ESACF does not require any differencing before the specification of the ARMA(2, 0) model.

CASE II: Monthly Consumer Price Index of the U.S. It is known that the monthly U.S. CPI data between 1953 and 1970 has a strong exponential trend, and ARMA(4, 3) is an appropriate model for this time series [19]. If we use the Box-Jenkins method, we must determine the order of differencing to eliminate the trend in the time series. However, we can obtain the ESACF pattern in Figure 13(a) from the raw time series without differencing. The ESACF pattern has a 27% noise level and indicates two possible models: ARMA(2, 1) or ARMA(4, 3). The noise-filtered pattern has some spurious values like Figure 9(a) - i.e., the underlined numbers in the third row of Figure 13(b). The output vectors of the AR and MA network have the largest values in the fifth and fourth elements, respectively, and have the second largest value in the third and second elements, respectively. Therefore, the appropriate model for CPI data is ARMA(4, 3). However, ARMA(2, 1) can be considered as an alternative model to ARMA(4, 3) as discussed in [19]. CASE III: Caffeine Data Figure 14 shows the case in which seasonality is involved in the time series. The data consists of 178 observations of caffeine levels in instant coffee taken every weekday, which is known to follow the ARMA(2, 5) model [5,24]. The ESACF pattern has a 41% noise level and does not show any obvious triangular shape. However, NFN reduces the noise, although it leaves the other type of spurious values as depicted in Figure 14(b) namely, the underlined numbers in diagonal lines. According to the output vectors, the noise-filtered pattern indicates that the orders of AR and MA are 2 and 5, respectively. Thus, the two-stage approach correctly classifies caffeine data into ARMA(2, 5). 7. Concluding remarks We have shown that the neural networks can identify ARMA model with about 90% of accu-

racy using ESACF inputs. To accommodate the noisy environment, Noise Filtering Network is adopted which improved the performance significantly. To reduce the training effort without losing the performance, we have also devised a Reduced Model for Pattern Classification Network. Follow-on question is the comparative performance of neural network for forecasting itself as mentioned in the introduction. For this issue, readers may refer to [10].

Acknowledgement We would like to thank Kar Yan Tam, two anonymous reviewers, and Peter Silhan for their helpful comments on this paper and suggestions for our ongoing researches.

References [1] H. Akaike, A New Look at the Statistical Model Identification, IEEE Transactions on Automatic Control AC-19, (1974) 716-723. [2] G.E.P. Box and G.M. Jenkins, Time Series Analysis Forecasting and Control (Holden-Day Inc., San Francisco, 1976). [3] R.P. Gorman and T.J. Sejnowski, Analysis of Hidden Units in a Layered Network Trained to Classify Sonar Targets, Neural Networks 1, Nr. 1 (1988) 75-89. [4] M. Gutierrez, J. Wang and R. Grondin, Estimating Hidden Unit Number for Two-layer Perceptron, Proceedings of IEEE 3rd International Conference on Neural Networks, (1989) I677-1681. [5] D.C. Hamilton and D.G. Watts, Interpreting Partial Autocorrelation Functions of Seasonal Time Series Models, Biometrika, 65, (1978) 135-140. [6] J.J. Hopfield, Neural Network and Physical Systems with Emergent Collective Computational Abilities, Proceedings of National Academy Science, USA, 79, (1982) 2554-2558. [7] G.W. Hill and D. Woodworth, Automatic Box-Jenkins Forecasting, Journal of Operational Research Society, 31, (1980) 413-422. [8] R.A. Jacobs, Increased Rates of Convergence through Learning Rate Adaptation, Neural Networks, 1, (1988) 285-307. [9] W.C. Jhee, J.K. Lee and K.C. Lee, A Neural Network Approach for the Identification of Box-Jenkins Model, Forthcoming in Network: Computation in Neural Systems (1992). [10] W.C. Jhee and J.K. Lee, Performance of Neural Networks in Managerial Forecasting, International Journal of Intelligent Systems in Accounting, Finance and Management, 2.1, (1992) 55-72.

J.K. Lee / A two-stage neural network approach

[11] D.C. Montgomery and L.A. Johnson, Forecasting and Time Series Analysis (McGraw-Hill, New York, 1976). [12] C.A. Kang, D. Bedworth and D. Rollier, Automatic Identification of ARIMA Time Series, liE Transactions, 14, (1982) 156-166. [13] S.Y. Kung and J.N. Hwang, An Algebraic Projection Analysis for Optimal Hidden Units Size and Learning Rates in Back-Propagation Learning, IEEE 2nd International Conference on Neural Networks, (1988) 1363-I370. [14] A. Lapedes and R. Farber, Nonlinear Signal Processing Using Neural Networks: Prediction and System Modeling, Los Almos National Laboratory Report LA-UR-872662 (1987). [15] R.P. Lippmann, An Introduction to Computing with Neural Nets, IEEE ASSP Magazine, 4, (1987) 4-22. [16] R.J. McEliece, E.C. Posner, E.R. Rodemich and S.S. Venkatesh, The Capacity of the Hopfield Associative Memory, IEEE Transactions of Information Theory 33, Nr. 4 (1987) 461-482. [17] C.R. Nelson, Applied Time Series Analysis for Managerial Forecasting (Holden-Day Inc., San Francisco, 1973). [18] M.J, Norusis, SPSS/PC+ for the IBM P C / X T / A T (SPSS Inc., 1986). [19] S.M. Pandit and S.M. Wu, Time Series and System Analysis with Applications (John Wiley and Sons, New York, 1983). [20] Y.H. Pao, Adaptive Pattern Recognition and Neural Networks (Addison-Wesley, MA., 1988). [21] D.E. Rumelhart, G.E. Hinton and R.J. Williams, Learning Internal Representations by Error Propagation, in: D.E. Rumelhart and J.L. McClelland, Eds., Parallel Dis-

[22] [23]

[24]

[25]

[26]

[27]

[28]

[29]

479

tributed Processing: Explorations in the Microstructure of Cognition. Vol. I: Foundations (MIT Press, 1986). G. Schwarz, Estimating the Dimension of a Model, The Annals of Statistics 6, Nr. 2 (1978) 461-464. T.J. Sejnowski and P.K. Kienker, Learning Symmetry Groups with Hidden Units: Beyond the Perceptron, Physica, 22D, (1986) 260-275. R.S. Tsay and G.C. Tiao, Consistent Estimates of Autoregressive Parameters and Extended Sample Autocorrelation Function for Stationary and Nonstationary ARMA Models, Journal of American Statistical Association, 79, (1984) 84-96. A. Waibel, T. Hanazawa, G. Hinton, K. Shikano and K. Lang, Phoneme Recognition Using Time-Delay Neural Networks, IEEE Transactions on ASSP 37, Nr. 3 (1989) 328-339. PJ. Werbos, Generalization of Backpropagation with Application to a Recurrent Gas Market Model. Neural Networks 1, Nr. 4, (1988) 339-356. S.C. Wheelwright and S. Makridakis, Forecasting Methods for Management, (John Wiley & Sons, New York, 1985). H. White, Economic Prediction Using Neural Networks: The Case of IBM Daily Stock Returns, IEEE 2nd International Joint Conference on Neural Networks, (1988) 11451-11458. W.A. Woodward and H.L. Gray, On the Relationship between S-Array and the Box-Jenkins Method of ARMA Model Identification, Journal of American Statistical Association 76, (1981) 579-587.