On objective autoregressive model testing

On objective autoregressive model testing

Signal Processing 10 (1986) 185-191 North-Holland SHORT 185 COMMUNICATION ON OBJECTIVE AUTOREGRESSIVE MODEL TESTING* Torild VAN E C K (Member ...

375KB Sizes 2 Downloads 68 Views

Signal Processing 10 (1986) 185-191 North-Holland

SHORT

185

COMMUNICATION

ON OBJECTIVE

AUTOREGRESSIVE

MODEL

TESTING*

Torild VAN E C K (Member EURASIP) Seismological Department, Uppsala University, S-750 12 Uppsala, Sweden Received 12 May 1981 Revised 24 May 1985

Abstract. A number of objective autoregressive (AR) model order-decision functions are reviewed. Simulations and synthesis from available studies show a superior behaviour of the Bayes Information Criterion (BIC) and test statistics with variable levels of significance, like the generalized Akaike's Information Criterion (AICa).

Zusammenfassung. Eine Anzahl yon objektiven Entscheidungsfunktionen fiir die GrSsse von autoregressiven (AR) Modellen wird untersucht. Simulation und Synthese yon existierenden Studien zeigen ein besseres Verhalten des Bayes Informations Kriterions sowie der statistischen Teste mit variablen SignifikanzhiJhen, wie z.b. das generalisierte Akaike Informations Kriterion (AICo). Rrsum6. On compare plusieurs fonctions de drcisions objectives portant sur l'estimation de l'ordre d'un module autorrgressif (AR). Des simulations et des syntheses g partir d'&udes disponibles actuellement montrent que le crit~re d'information d'Akaike (AIC), la prrdiction de l'erreur finale (FPE) et le crit~re de fonction de transfert autorrgressive (CAT) donnent des rrsultats h peu prrs 6quivalents. Au contraire, le crit~re d'information de Bayes (BIC) leur est suprrieur quand les processus mesurrs durent assez Iongtemps. Keywords. Autoregressive modelling, order decision.

where

1. Some order-decision functions

P

Given a time series {x,}, where x, = x ( n A t ) , At is a fixed time interval, n = 0, 1 , . . . , N - 1, assume it can be modelled as an autoregressive (AR)" process,

X n -- Xn :

Xn

~ aixn-i i=l

and the estimated prediction error variance

~ 2 ( . ) = ~1 N2 1 eA2 n.

P

x. = ~, a i x . _ i + e .

en :

(1)

n=0

i=1

where {e.} is white noise. The AR- or prediction coefficients {a~}, i = 1, 2 , . . . , p, are estimated by mmlm~zmg N

1

^2

en, n=o

* This research was partly sponsored by the Swedish Natural Science Research Council under Contract No. G. 3164-104.

There exists a large variety of methods to estimate the AR coefficients; however, in practical applications, where p is often unknown, one special problem emerges, namely the choice of an optimal order of p. In this short communication, several special test statistics, which have recently been proposed in the literature, are reviewed and compared.

0165-1684/86/$3.50 O 1986, Elsevier Science Publishers B.V. (North-Holland)

T. van Eek / Objective A R model testing

186

Most test statistics are functions, f(p), which reach a minimum for a certain optimal order/3. That is, when f ( p + 1) becomes larger than f(p), additional AR coefficients are not significant any more. It can thus be shown that the optimal order /3 is reached when v(p)=

& 2 ( p ) - &2(p + 1) ~2(p+l )

ameters y and a. These relations are derived, assuming that all functions f(p) have only one minimum. More decision functions exist (see, e.g., [20]) and the restriction to the above functions is motivated by the efficient program implementation and intensive applications (see, e.g., [8, 9, 21]). Several features can be seen in Table 1: Firstly, all functions f ( p ) can be described as test statistics, which compare v(p) with a level y, where y ~ 1 / N for large N. Secondly, for large N, the FPE2, AIC2, and CAT rules are identical and correspond to an F-test with a = 2 (see also [19,20]), FPE~ and AICt3 criteria are identical to an F-test with a =/3, for large N. Consequently,/3 can be any positive value and not necessarily only an integer as proposed in [3, 4, 18]. Thirdly, only the functions BIC

(2)

is no longer significant and falls below a threshold y (see Fig. 1 and Table 1). ( N - p - 1 ) v ( p ) obeys a Fisher distribution F(1, N - p - 1 ) , with 1 and N-p-1 degrees of freedom. For each decision function we can find such a threshold y, alternatively a. In Table 1, we present several of the most common functions, f(p), and their relation to par-

1.0

\~

c~

• F-test

p:4

5Z

O F-test p:8 570

Ion

+ FPE2

p;4

x FPE 2 [3 CAT

p:8 p:4



BIC

AIC2 • HAQ

r~

0.1

x [3

-0.01 10 I

log N

10o I

~

",,,~

\

1000 I

Fig. 1. The threshold, y, vs. time sequence length, N, for a number of decision functions mentioned in Table 1. If v ( p ) = (6.2(/3) - 6.2(/3 + 1))/6-2(p + 1) falls in the upper right half of the plane, the hypothesis of dp+~(p + 1) ¢ 0 is true. I f v(p) falls in the lower left half of the plane, the hypothesis of ap+l(P + 1)= 0 is true. Signal Processing

T. van Eck / Objective AR model testing

187

Table I

Decision functions, f(p), which reach a minimum for the optimal order/3. The minimum off(p) occurs for/~ when f(/3+ 1)>f(/3) or, alternatively, when v(p) = (~2(p) _ ~2(/3 + 1))/~2(/3 + 1) falls below a threshold y. Comparison with the Fisher's test give different levels a f(p)

Y l+y

y t2

Fisher's test [14]

-

Generalized Final Prediction Error (FPEt~) [18,4]

N + (/3 + 1)p~z(p) a

Criterion Autogressive Transfer function (CAT) [5]

N-p

a ot

N-p-1

N-p-l+a

/3N (N+(/3-1)p)(N-p-1)

/3 (N+(fl-1)p)(N-p-1)+flN

fl S+(/3-1)p (for N>>p)

~, N - j

(N-p)N

j=l ~2(p)

t~2(p)

2N-p- 1 (N-1)(N-p-1)

2N-p- 1 N(N-p)

2N-pN-1

1 2

(forS>>p)

Generalized Akaike's Information Criterion (AIC~) [3]

ln~2(p)+flp a

eB/N_l~fl/N

~/3/N

~fl

Hannan and Quinn

In ~2(p) + p In In N

e(2/N) In In N _ 1

~ ( 2 / N ) In In N

~2 In In N

~(ln N)/N

~ln N

Criterion (HAQ) [7] Bayes Information Criterion (B1C) [16] a

~ (2/N) In In N In d'2(p) +p In N

eOnNI/N _ 1 ~ (In N)/N

fl /> 1, FPE 2 and AIC2 correspond to the original FPE [1] and AIC [2] definition.

and H A Q are consistent, that is, they estimate the correct order p with probability one, as N ~ oo (see also 112]). The estimated partial correlation or reflection coefficients, ap+l(P + 1), are related to the prediction error variance as [16]

8 ( p + l ) = d ' 2 ( p ) [ 1 - l ~ p + l ( p + l ) l 2]

Box and Jenkins [5] p r o p o s e d a rather crude statistic based on this assumption and K a s h y a p [12] apl~lied this assumption in proving the A I C rule to be consistent. The magnitude of the reflection coefficient is directly related to the distances of the poles from the unit circle, namely:

(3) p+l 1

ap+l(P+l) = I] --,

and, from (2),

v(p) ]~P+I(P + 1)]2 l+v(p)"

i= 1 ri

(4)

Thus, the decision functions summarized in Table 1 can alternatively be interpreted as statistics which test if 2 is less than y ( l + y ) -1. For large N, this is similar to j u d g i n g whether or not ap+l(P + 1) is significantly different from zero. We assume than that d p + ~ ( p + l ) is normally distributed with mean zero, when p + 1 is larger than the true model order (see [5]).

(5)

where ri are the radial distances for the zeros in the polynomial p+l

aizp+2 i+ 1, i--I

i.e., the poles in the A R spectrum. Consequently, if poles move away from the unit circle, ap+l(p + 1) becomes less significant and therefore more difficult to detect with the decision functions summarized in Table 1. Vol. 10, No, 2, March 1986

188

T. van Eck

/

Objective A R model testing

2. Simulations In order to examine and compare the behaviour of the above order-decision functions, for relatively small N, three different kinds of AR processes have been simulated and corresponding optimal orders,/J, have been estimated. We also compared the least-squares procedure as proposed by Marple [13] and the constrained technique proposed by Burg [6]. The innovation error has been generated with subroutine G G N O R from the IMSL library [10] and the first 200 values of the generated AR process have been neglected to avoid transient effects. For all experiments, 100 realizations with lengths N have been examined. The FPE2, AIC2, and C A T rules did not show significant differences in any of the performed experiments, indicating that their asymptotic equivalence, shown in Table 1, holds even for relatively small N (see also Fig. 1). Therefore, the lO0

o 0

~



o

a

4

+

50

* ÷

/

r

ooOOO°o

0

o

"

o• Q

MARPLE'S ALGORITHM B,c .t_

100

oOOOO~ • a Q •

discussed in [21], {a~} = {2.7607, -3.8106, 2.6535, -0.9238} for different lengths from N = 20 to N = 100 (see Fig. 2). The FPE2 rule does not improve its estimates of p for large N and is ineffective in combination with the Marple algorithm. The BIC rule or FPE~ and AIC~ rules with /3/>4 are superior. The above-mentioned process has poles near the unit circle, i.e., contains two pronounced frequency peaks at angular frequencies, 0.87 and 0.70 radians. By moving the poles outward and keeping the arguments constant, we decrease a4(4) (see equation (5)). The behaviour of the BIC rule for N = 100 and for increasing r~ = r2 = r is plotted in Fig. 3. Already for r approximately equal to 1.55, the preferred order clearly converges to a value of # = 3. In other words, the two frequency peaks become hardly distinguishable.

b

++ +÷

5O

+

BURGS ALGORITHM 50 N 100





MARPLE'S ALGORITHM 50 N 100

Fig. 2. Frequency of #, from four order-decision functions for each group of 100 realizations with different lengths N of a fourth-order AR process, {ai}={2.7607, -3.8106, 2.6535, -0.9238}. (a) © indicates the number offf = 4 found with FPE4, • with BIC, and + with FPE2, using Burg's algorithm. (b) O indicates the number of/~ = 4 found with AIC6 and • with BIC, using MarNe's algorithm. behaviour description of one of them implicitly includes the other two. In all experiments, we use the FPE~ definition as described by Shibata [18]. The approximation of Bhansali and D o w n h a m [4] showed significant unfavourable differences, when used for relatively small values of N as in the experiments below. Experiment 2.1. All decision functions from Table 1 have been applied to a fourth-order AR process Signal Processing

0

5

~

10

Fig. 3. Frequency of # from BIC for eight different groups of 100 realizations of a fourth-order AR process with length N = 100. For each group, the poles {zI = r e 0"7i, zl* = r e -0"7i, z 2 = r e ° ' s 7 i and z* = r e - ° 8 7 i } have different radii r. Experiment 2.2. The decision functions from Table 1 were applied to a tenth-order AR process for three different lengths, N = 100, N = 600, and N = 1800. The radius ri and angular frequencies toi of the poles are respectively: {(r, to~)} = {(1.63, 0), (1.14, 0.21), (1.24, 0.87), (1.33, 1.28), (1.16, 1.87), (1.48, -tr), (1.16,-1.87), (1.33,-1.28), (1.24,-0.87), (1.14,-0.21)}.

189

T. van Eck / Objective AR model testing

In Fig. 4, the results are shown for the CAT and BIC rules using the Burg and Marple algorithms. The FPE2 rule did not detect any minimum using the second algorithm. Also, here the FPEo and AIC0 rules with/3/> 4 were required for acceptable results.

100

a

BURGS ALGORITHM CAT

and all other AR coefficients equal to zero. This model has been discussed in several previous publications (e.g., [11, 3]). The results (see Fig. 5) are comparable with our second experiment, that is, FPE~ and AIC~ with/3 I> 4 are effective when using Marple's algorithm. As in the other experiments, the HAQ rule provides rather unstable estimates of /~. Convergence for the BIC rule is relatively fast for both algorithms.

100

BURGS ALGORITHM CAT

50 Ik N = 6 0 0 f

\,

/

100 z 0

b

,

,

~5o

tjN_60_0

50

BURGS ALGORITHM BIC

0

a

N =1800

/

IO0 BURGS ALGORITHM

z

It.

0

O O z

b

t

iB I C ,

,", N=600

rl

I;

0 ,~,

100

MARPLES ALGORITHM BIC t



C

u_ © © z

',

I/

,

I?1 1oo

tl

jl

F

"i N =18o0

100

1 / -

A. 6oo/!

50 .,

N:~O_Q /

\~.~

,

MARPLES

1[

ALGORITHM BIC

I/ I IjN =600

C

50 v

~

5

,

10

.',~ o

I,b..lOO

15 >15

Fig. 4. Frequency of/~ from two order-decision functions for three different groups of 100 realizations of a tenth-order AR process (see Experiment 2.2). Values N = 100, 600, and 1800 have been used. In (a) C A T and (b) BIC calculated with Burg's algorithm, and in (c) BIC calculated with Marple's algorithm.

Experiment 2.3. The same decision functions as in the other experiments were applied to a fifteenthorder AR process with al = 0.5, a2 = 0.06, a15 = 0.45

v v - v

5

v

10

I

15

O

20

725

Fig. 5. The frequency of/~ from two order-decision functions for two groups of 100 realizations, with respective lengths of N = 100 and N = 600, for a special fifteen-order AR process (see Experiment 2.3). In (a) C A T and (b) BIC calculated with Burg's algorithm, and in (c) BIC calculated with Marple's algorithm. Vol. 10, No. 2, March 1986

190

T. van Eck / Objective A R model testing

3. Discussion

Acknowledgment

Although the Monte Carlo approach used above, does not justify general conclusions, several trends can be pointed out, especially by comparing the decision functions as in Table 1 and available studies on the subject. The FPE2, AIC2, and CAT rules are ineffective when applying Marple's algorithm (see also [13]); however, FPE0, AIC0 with higher significance levels are applicable. The BIC rule satisfies this requirement (/3 = In N > 2 for N > 8) and is also optimal [16]. Kashyap [12] proves that the AIC~ rule is not optimal, that is, for larger N no better ,5 can be found for a fixed /3. See especially Fig. 2, where FPE2 (and therewith AIC2) stabilizes around 50% for increasing N. Shibata [ 17] showed that, for large N, the ,5 from AIC2 is X 2 distributed; because of their asymptotic equivalence, the same can be said for the FPE2 and CAT rules. The HAQ rule, though consistent according to Kashyap's formulation [12], proved to be inferior to the BIC rule in all experiments. The comparison made in the previous sections is based on the assumption that the decision functions only have one minimum. In the experiments, therefore, the minimum of the decision functions was chosen among all possible orders of p, 1 ~

100. However, the decision functions do not always show only one minimum and some authors [21] proposed, for example, to use the first minimum in the FPE2 function as a criterion. This may prevent the imminent risk of overfitting. Applying higher levels, c~ or /3, will have the same effect. A variety of order-decision functions exists, most of them (mentioned in Table 1) are easy to implement and useful aids in AR and even ARMA modelling and spectral analysis as used in many applications [8, 9]. From the above considerations and experiments, the BIC rule and rules with variable significance levels (for example, FPE~ and AIC~) are to be preferred.

The author wishes to thank T. S6derstr6m and O. Kulh~inek for their critical reading of the manuscript.

Signal Processing

References [1] H. Akaike, "Fitting autoregressive models for prediction", Ann. Inst. Statist. Math., Vol. 21, 1969, pp. 243-247. [2] H. Akaike, "A new look at the statistical model identification", IEEE Trans. on Automatic Control, Vol. AC-19, No. 6, December 1974, pp. 716-723. [3] H. Akaike, "A Bayesian extension of the minimum AIC procedure of autoregressive model fitting", Biometrika, Vol. 66, No. 2, 1979, pp. 237-242. [4] R.J. Bhansali and D.Y. Downham, "Some properties of the order of an autoregressive model selected by a generalization of Akaike's FPE criterion", Biometrika, Vol. 64, No. 3, 1977, pp. 547-551. [5] G.E.P. Box and G.M. Jenkins, Time Series Analysis, Forecasting and Control, Holden-Day, San Francisco, CA, 1976. [6] J.P. Burg, "'A new analysing technique for time series data", Presented at: Advanced Studies Inst. Signal Processing, NATO, Enschede, The Netherlands, 1968. [7] E.J. Hannan and B.G. Quinn, "The determination of the order of an autoregression", J.R. Statist. Soc. B, Vol. 41, No. 2, 1979, pp. 190-195. [8] S. Haykin, ed., Nonlinear Methods of Spectral Analysis, Springer, Berlin, 1979. [9] K.W. Hippel, "Geophysical model determination using the Akaike Information Criterion", IEEE Trans. on Automatic Control, Vol. AC-26, No. 2, April 1981, pp. 358-378. [10] IMSL, International Mathematical and Statistical Library, edition 8, IMSL Houston, 1980. [11] R.H. Jones, "Fitting autoregressions", J. Amer. Statist. Assoc., Vol. 70, No. 352, September 1975, pp. 590-592. [12] R.L. Kashyap, "Inconsistency of the AIC rule for estimating the order of autoregressive models", IEEE Trans. on Automatic Control, Vol. AC-25, No. 5, October 1980, pp. 996-998. [13] L. Marple, "A new autoregressive spectrum analysis algorithm", IEEE Trans. on Acoust. Speech & Signal Processing, VoL ASSP-28, No. 4, August 1980, pp. 441-454. [14] B.R. Martin, Statistics for Physicists, Academic Press, London, 1971. [15] E. Parzen, "Multiple time series: Determinating the order of approximating autoregressive schemes", in: P.R. Krishnaiah, ed., Multivariate Analysis IV, North-Holland, Amsterdam, 1977, pp. 283-295. [16] G. Schwartz, "Estimating the dimension of a model", The Ann. Statist., Vol. 6, No. 2, 1978, pp. 461-464.

T. van Eck / Objective A R model testing [17] R. Shibata, "Selection of the order of an autoregressive model by Akaike's Information Criterion", Biometrika, Vol. 63, No. 1, 1976, pp. 117-126. [18] R. Shibata, "Asymptotic mean efficiency of a selection of regression variables", Ann. Inst. Statist. Math., Vol. 35, Part A, 1983, pp. 415-423. [19] P. Stoica, "'Comments on 'A Bayesian comparison of different classes of dynamic model using empirical data' ", IEEE Trans. on Automatic Control, Vol. AC-24, No. 3, June 1979, pp. 516-517.

191

[20] T. Srderstrrm, "On testing of model structure in system identification", Internat. J. o f Control, Vol. 26, No. 1, 1977, pp. 1-18. [21] T.J. Ulrych and T.N. Bishop, "Maximum entropy spectral analysis and autoregressive decomposition", Reviews Geophysics Space Physics, Vol. 13, No. 1, February 1975, pp. 183-200.

Vol. 10, No. 2, March 1986