Copyright © IFAC Identification and System Parameter Estimation , Budapest, Hungary 1991
RELATIONS OF INFORMATION CRITERIA FOR STRUCTURE SELECTION OF DYNAMIC SYSTEMS S. M. Yeres School of Electronic and Electrical Engineering, University of Birmingham, Edgbaston , B15 27T, UK
Abstract, The prope~'ties of AIC , BIC , q, and stochast ic complexity cri teria are compared for a fmlte number of samples by repeated simul a.tions on AR1VIA processes, It turns out that there are several influent.ial factors: which parame ter es timation method and criterioll is used, what is the di stribution of the source white noi se process and the number of samples. The meas urement of performan ce of information criteria is not unique either: it can be based on the probabili ties of selecting a model order or on the accuracv of the attached spectral estimates or on power function s from a test-t heoretical point 'of "iell' , ete. T he present study is limited to t he probabili ties of selecting different model orders. These compansons may help to get a better picture on t he usefulness and limi tations of inform ation criteria and chosing a crit erion for a particular application.
There are many poi nts where the asymptotic analytic study ca.nnot refl ect the behavionr of a numerical proced m e based on a finite number of samples. The asymptotic analysis gives an ideali zed view about the methods because it rather concent.ratcs on consist.ency. In a certain sense, for a md.hod to be a candidate for applicat ions is that it should satisfy the minima.l requirement of consistency at lea.st under fairly idealized conditions where the noise is nicely dist.ributed and no out.liers are considered.
INTRODUCTION Today information criteria are used in many identifi cation problems. Having a more accurate picture on their usefulness is importa nt for furth er applications. The asymptotic properties have been widely studied. The first proposals of Akaike ( 1972) on t he fini te predi ct ion error criterion (FPEj and on t he A-information criterion (A IG) were followed by suggestions of the consistent BIG (Ak aik(', 19/i, Schwarz, 19i5, Hannan, 1978 ) and asymptot.ic properties have been studied by Shibata (1976), Hannan (1980), and results summarized Hannan & Deistler( 1988), Veres( 1991). Most of the results have been dealing with consistency when the number of samples infinitelv increases. In this paper we ini t iate study ing t.he properties of information criteria for small number of samples and associated parameter estimators. Asymptotic relations between information criteria have been treated elsewhere, e.g. Veres (1990a,1990b, 1990c) . . aut07'eg7'esswe T o describe the problem consider scalar moving avc7'Q.ge (ARMA ) processes with output sequence {y,}, satisfying the equation
To visuali ze the varictv of a.vailable methods for finit e sa mples, a simple table ·ca.n be const.ructed. One of the variables is the information criterion, the ot her variable is t he parameter estimation method applied. The estimation procedures have been dealt with in the li terature (Davis & Vinter,1985 , Norton,1986, Lj ung, 1987, Hannan &: Deistler ,1988, S6derstr6m &: Stoica, 1989) Table 1 gives an overview of some of the possible pa.irings of estimat ion methods il nd niteri a for model structmc estimati01l. In Ta.ble 1 three signs were used to notify whether a pa.iring approaches the best possible performance (+) (fr0111 alllong all the considered ones) or t.hat it is i1lferior to som eotl1('r met.hod (-) or t hat it can give quite misleading results for small sa mples. In Section 2 first the "heayy tailed" infinite order AR processes are studied on simula.tions. Section 3 is about the effi ciency of order selection for finit e order AR processes: whether t he AIG or BIG should be used for order selection. Section 4 studi es the performance of est imation proced ures and informatio n cri teria in case of MA and ARMA processes. The simulations were usually repeated 50 times independen tly and the relative frequences of order est im ates are put in tahular form . Ta.ble 2 reca lls the acc uracy of the relative-frequency estimation of probabilities. Values C are shown satisfying P (/1'eIFeq. - pis. c)= Conf p7'obab . Table 2 is an import ant reference in judging the results of the sil1lulations present ed in Sections 2-4. \Vheneyer it was required in a simulat ion an initial covariance est imate was taken as P= D-iag{1 0} , the initial parameter vector 0, forgetting factor=l was used . F or 1\1,\ operator stabili ty checks were carried out a.t e,-ery ti me instant and if instabili ty occmed the new parameter es ti mate 0" "'as re'placed by the earliest stable O~"'=i1f_,+0 .5 "(Of-i1,_,), 71=1. 2 ..... The sOlll'ce noise was generated by e,= 11:+ ... +'1/.:'-0 , where all ·1/.; were generated by a standard NAG Jl seu do- ran dom number generator to produce i.i.d . yariables in [0,1] (using the recursive formul a 7'n=168077'n_,(mod(J3'-1)} j.
where {e,} is a noi se sequence. In this paper let define the information c7''ite7'ion with. 1)Cna.ity tC7'm. by '[' .) V " '[' ~ 1.) 0 ( T ) (k) IG(y .".k = r (Y ,0,[,'" +-r"
(2)
t
when' VT(yTO~'['.k) = t;ro'['), and e,(Oj. t=l , ... , 1 , 1' o.r: est imate of the innovat i'Olis based on a parameter estimate 0'1' depende"nt on the es timation met hod applied. Since an estimat.e 0'[' is used which was obtained for the ",hole sequence of samples, t herefore t he informat.ion criteria wit h pellid t.y factor might. be called off-line c7'de7"l:a. For AIG put u( T )=2. for BIG put Ct( T )=log T , for dJ put et (T)=loglogT . criterion is The predictive least squares (P LS) calc ulat.ed on the bases of on-line est ima tes '['
?-
PLS(y T .o.k) = E e;(O,_ ,) !=1
where at t.ime insta nt t the calculation of the prediction enor e,(O,_,) is based on samples up to time t-1. PLS has probably been fir st intl'Oducecl by Rissanen &: \V crt.z (1985) , and stoc has t.ic complexity was introduced am! ana lys('c! in Ri ssa nen (198G) am! Rissancn (1989). PLS can be called an on-line c1"ite1"ion hecause its way of calculation
79
TABLE 1 Possible pairing; of information critel'ia and estimation methods. Notation: + = good, - = inefficient, ? = wTong performance,
It can be clearly seen on Table 1.1 tha.t only the AIG and q,-criteria. hint to longer autoregression, the PLS and BIG are quite misleading because of the small number of parameters .
Information Criteria
TABLE 1.1. Relative frequencies of AR-OTdeT estimates f01' (3) using di!fcTent cTiteria and RLS method
Methods
Off-line
On-line
BIC
AIC
PLS
N,m Slmp.
AR oulets 10
11
02 .02
9
00
Recurs .
+
+
+
LS for AR
3. " "
Recurs .
.24
.20.18
22 06
.20
04 00
00
08
02
22
02
16
100
ELS
.00
200
,)
02
12
04
.0' 06
24
04
10
24
16
04
06
02 04
04
"
U,rngAIC
On-line
+
Two-Stage Recurs .
+
+
ML
ARo'd~.
Num . Slmp. 25 36 <49 15
+
+
9 .48 .34 .20
.34 32
. 14 14 . 18
06 08 . 12 20
.02 02 .08 . 14
100
On-line
00 .00 .0 2 .02
.02 .02 06 .04
12
1011
.04 .02
.04 .02 .00 .02 .00 .02
04
.02
02 .00
04
+
Three St .
04 .02
.02 .04
6) V,in, PLS crilenDII.
Off-
AR order'
Num.
+
+
+
line ML
Samp.
2.
Off-line
Two-Stage
3.
.20
"75
.08 .02
. 14
.08
.22
.12
.02
.02 .02
.00 00 .00 .02 . 12
.22
.00 200
00
02
.00
.00
02 .00
00
00
.00 .00 .00 .00
.04
Off-line U,ing BIG
c)
+
+
Three St. (Hannan)
AR orders
Num .
S.mp
Off-line
+
+
Three St.
25
.20 . 12
(Mayne)
3.
Off-line
75
.00
.00
100
.00
.00 .00
.06
18
.00
+
Three St.
+
+
+
.00
00
12 . 14
18
10
16
.0" .00
. 16 .02
.00 04 . 10 . 12
.0210 .04 .06
.06 .08
10
06 '0 08
22
.20
30
(Combined) J) UJing? ,rilenon.
For brevity we shall use the following notations : RLS= recursive least
On the next Table 1.2 we can observe that in spite of that the RELS llldhod is llut efficient the "off-line computed" BIG glvcs the best results. The AIG and PLS seem to give initially more ch
squares, RELS= recursive extended least squares, ONTWS= oil-line twostage, ONTS= on-line three-stage . RML= recursive maximum likelihood ,
OFML=
off-line
maximum
likelihood,
OFTWS=
off-line
two-stage,
OFTS= off-line three-stage , PLS= predictive least squares or in other
terms stochastic complexity criterion.
The simulations were carried out on the attached demo-disk of the monograph Veres(1991) which provides an easy to use and quick environment for studying information criteria.
TABLE 1.2. Relative jTeqnencics of oTdeT estim.ates for (3) usmg diffcTent cl'iteTia and RELS method Num.
TABLE 2 StandaTd deviat'ions and confidence Tadi'i fOT pTobab-dity est'irnatol'S by l'dative frequency (1'Ounded values) Num.Rep. 50
200
p
0 .5 Conf.
Std.dev.
0 .95 Conf.
0 .1
0 .04
0.03
0.07
0.11
0.07
0 .06
0 . 12
0.18
0.1
0 .02
0.02
0.04
0.05
0 .5
0.04
0.03
0 .06
0.09
(0,0)
..,.
00 .00
0.1
0 .01
0.01
0 .02
0 .02
0.5
0.02
0.01
0.03
0 .04
(0,2) (1 ,0) (1 ,1)
75 100 200
08 .08 .00 .00
cl
00 .00
3. 25
.00
.02
.00
25
26
.10 .02 00
.38
00
.00
.36
00
00
40 . 16 02
.06 . 14
12
. 18 16
.28
12
28
ARMA orders
00
02 04 00
02 04 00
02
(1.2)
(2 .0)
(2 . 1) (2 . 2)
"
32
SO
52 200
.00 00
00 00
00 00
10 06
cl UJi ng DIC
Using the on-line two stage method in Table 1.3 both the AIG and BIG overparametrize in the sense of an "inclusive" overparametrization, i. e. the "true" model is contained in the estimated st ructure. The worst result was obtained with the PLS, where for 100 samples .42 and for 200 samples .16 was the relati\'e frequency of the completely wrong (2 ,0) order.
Sirn1Llation 1 Initial values: y-lOO= 0, 0.9E I-I' t=
08
(1.2) (2 .0) (2.1) (2 .2) .04 . 12
.02 00 00
(0.0) (0,1) (0.2) (1.0) 0.1)
.00
-
16 12
.) UJingAlC
The simulations in this sectioll havc the main feature , that the generated process has an infinite AR representation with large coefficients up to a high order.
El
. 12 08
U.,ing PLS entenon
(0,0) (0.1) (0.2) (t .0) (1.1)
00
STRUCTURE SELECTION FOR HEAVY TAILED INFINITE ORDER AR PROCESSES
=
(2 .0) (2 . 1) (2.2)
N,m 5amp
5amp
&-
14
.22
Num .
Recul'sions : YI+ 0.8YI_I
0 .2)
.0' .0'
00
1000
(0.1)
25
0 .99 Conf.
0.5
Samp.
-99, ... , 500
{E" t=
-lOO, ... , 200} genemted by the standaTd NAG geneTator, fOT estimation YI>" " YlOo is used.
80
TABLE 1.6. Relat·'ve ji-equeneies of orde7' estimates faT (3) lIsi1l.g diffeTcnt cTitc7'ia and OFTWS method,
TABLE 1.3. Relative fTl~ qu encies of m'de7' estimates f07' (3) using different cTite,.,:n. a.nd ONTWS method. Num.
S . mp ,
..,. 25
ARMA ordtr,
A R M A OfcM,.
(0,0)
(0,1) (0,2)
.06
.06
.00
.02 02 .00
15
(1,0) (1.1) (1 ,2) (2,0) (2,1) (2,2)
.10 .0'
.04
.lO
.06
.00 .00
.50 .62 .52 .4' . 16
00
200
S"mp
(O ,O)
(0 , 1) (0,2) (1 .0) (1,1) 0 .2) (2.0)
25
.02
.02
..,.
.02
.01 .06 06 14 .56
00
100 200
.• .•
. l6
.00
.00
.0' ,02
.00
.00
.00
.00
(2,1)
( 2 ,2)
. 10 . 16
10
.20 . 16
D<
5.
.0'
" 08
.0 4
.0'
00
.08
.18
( 1.1) (1 . 2) (2.0)
(2 . 1 )
(2.2)
18 .20 04
.06 . 10 . 18
00 . 00
0) VSI", PLS en It riD" Num.
Num. ARM'" Ofde., S,mp. (0 ,0) (0,1) (0,1) (1 ,0) (1.1) (1.2) (2,0) (2 . 1) (2 ,2)
.." 25
.00
.06
.00
. 16
.00
.00
.02
.06
.00
.00
100
.00 . 00
.00 .00
.10
.sa
.00
.00
.0 4
.00 .00
.02
.04
.S< .16 02
.04
15
200
.30
00
.02
.20 .14 .02
S"rnp .
.06 .08
ARMA o,dtr,
(0.0) (O.l) (0, 2) (1 ,0)
25
,.
.00
75 '00
.00 .00
.02 .00
.D< . 00 .00 .00
.00
.78
.00
.00 .00
00
.00
.00
.00
O•
..
06
.00
r.
D<
"
.D<
.1'
6) VII", Ale Num.
ARMA Ofder, Sl m p .
,.
25
15
100 200
(C,O) (0,1)
.00 .00 .00 .00
.06 .00
(0.1) (1 .0) (1,1)
00 .02
.00
.00 .0 0
(1.2)
.00
.22 . 12 .04 .00 .00
,00
Samp.
(2 .0) (2 , 1) (2 ,2)
30
.08
. 10 .10
.64
.02 .00 ,00
14
12
. 12
14
16
,.
.00
25
04 .04 . 12 34
,02
.00
.00
.00
.00 00
02 .02
.00
.00
.00
.00 00
25
. 22 10
rs
.02
.08
100 200
.00
. 34 .• 0
04
00
.00 15 100
, 16 .06 .00 .00
62
90 86
. 16 16 08 .00
.00
.00
04
00
08
.08 .08
.00 02
00
. 18
.02
. 10 18
. 12
.08 . 14
. 16
14
.OB
. 10
06
.08
06 .08
25 36 49 1S 100
G) Vlln9 PLS cnltnon
. 14 .22 .24
.04 . 12 . 16 .22
.32
. 32
04 08 . 16 . 16
PLS crilenon
Num .
Samp.
.08
.00 ,00 .00 lJ~in,
0)
.08 S4
.02 .02 .02
(1,0) 0 , 1) 0,2) (2,0) (2 . 1) (2,2)
(0 ,0) (0.1) (0,2)
Num , Slmp . (0.0) (0 , 1 ) (0 .2) 0,0) (UI 0 ,2) (2 .0) (l , 1l (2.2)
04
.04
ARMA ordt"
Slmp .
TABLE 1.4. R ela.tive frequencies of 07'deT estimates faT (3) using diJ/e7'ent (TitCTifL and RML method.
.0 4
64 .80
(2 . 1) (2.2)
20
TABLE 1.7. Relative ji-eqnc7Icies of m'deT estimates faT (3) "ll.87.7Ig diff eTent C7··/.ten(). and OFTS method.
Much better performance is shown on the next Table lA in connection wit.h the RML method. Although the RML is on-line method, the off-line computed BIG resulted in the highestrelative frequency of correct order estimation.
25
(2,0)
c) VSl ng BIC
t) Ullrlg BIG
..,.
(1,1) 0 . 2)
.08
.00
75 100 200
r.
ARMA ordt"
(0 ,0) (0 , 1) (0,2) (1 ,0)
ARMA Oldtl$
(0.0)
(0,1) (0.2) 0.0) (1 , 1 ) (1,2) (2.0) (2,1) (2,2)
.00
. 10
.00 .00 ,00
.00 .00 00
.30 .08 ,00 .00 00
.00
. 16 .50
18 . 10 .08
.08
04 .06
.18
200 Num . Slmp
ARMA order5 (0,0) (0, 1)
(0.2)
..
.00
.04
48
06
60
. 12
rs
.00
.00
S2
,20
200
:.. l
I s:~:·
6) U,ing AlC
(1.0) (1 .1) (1.2) (2,0) (2. 1) (2,2)
,.
.04
25
.02
00
.32
ARMA
Samp
02 .02
.12
"
..,.
11
25
10
b) U~ing AlC
15
.00 .00
O\) .00 00
(0,2)
(1 ,0)
(I.ll
.00 .00
(1.2)
.06 08 . 10
(2.0) (2,1)
,06
46 . 16 02 00
.00
00
. 18
00 .00 00
00
.04
.0 4
00
.06
.06 06
.04 .06
20
04 .02
.00
DIe
The comparison of the information criteria for the off-line three stage (OFTS) met.hod in Table 1. 7 resulted not that the PLS computed from thc third stage was the most s~ccesfl~l cnt.en01: but the BIG in the above repeated s11l1l1latlOn, For Judglllg the results of the simulations an Important characteristic is the relat.ive frequency of estllllatmg unfeaslble model stru ct ure which does not "include" the gencrated lllodel. In Table 1.7 a) the PLS ~ave a rel at l\'~ly hIgh chance to the (2,0) order, which is lll ConSlst.ent WIth t he generated (1, 1) order. The res ults of the' comhnlf'd three stage l1Ipt.hod with PLS criterion were better in Table 1.8 than in the case of the simple three stage method . The AIG and B I G ha\"e shown similar results to those obtair]('d for the pure off-line three stage method,
.00
~
(0 ,0) (0 . 1> (0 ,2) (I,D)
.66 76
cl Vsing
."
TABLE 1.5. R elative ji-equcncie~ of m·der· estimates fOT (3) ILsin!} dijf", ·wt (Ti ll:"'fL and ONTWS method. .
200
10
orde',
(1.1) 0.2) (2,0) (2 . 1) (2,2)
(2,2)
c) (f,ing me The r('sults for t l!e oll-liu", three stage method of Maync &. Ast rolll.\.: Clark (1984) jJitired with the PLS criteriou in Table 1.5 shows it slightly bett.er result than on-line two stage method but. it is llI1l ch worse than the results obtained by the n!\IL &: PLS pairing.
.."
.00
(0.2) 0.0)
00 00 (0.0) (0,1)
00 .04 .66 .00 80 .D< rs .02 . 14 04 .00 .00 .18 . 16 100 00 ,02 .00 .86 '00 __L -______________________
Samp.
(0,0) (0,1)
(U) (I ,ll (2 .0) (2 , 1) (2 .2)
,.
20
16 20
.20
. 18
32
TABLEl .8. Rclat:ive jTeqnencies of m"de7' estimates faT (3) lIsmg diffe7'cnt entenlL and OFTS method with comb·ined second and thiTd ~tages .
02
. 10
. 10 14 . 22 06
16 22 26
.04 08
"
. 18 ARMA ordtrs
5"mp .
COlllpariug the' differellt information criteria for t.he off-line two stage method in Table 1.6 it ca n clea rly be seen that thp BIC had fill" the best results and it is ~vorth pointing out that the PLS did not give het.ter re,ults than the AlC .
(0 .0) (0 . 1) (0 . 2)
25
04
14
36 49
00
12
(1 .0) ( 1.1) (1.2) (2.0) . 12
12 08 .02
12
10
16
16 HI
20
100
.04
.48
12
200
.02
60
06
0) VSlng PI.5 cnl(rlon
81
. 10
. 12
12
20 12
7S
(2 . 1) (2.2)
14
02 02
02
Num. Samp
..,.
.00 00
200
.00
2'
7' 100
Num . $amp.
,. 25
100 200
STRUCTURE SELECTION OF FINITE ORDER AR PROCESSES
ARMA orders (0,0) (O,l) (0,2) (1,0) (1,1) (1,2) (2.0) (2,1) (2.2)
.02 02 .00 00 .00 .00
32 . 10
.0'"
.02
.00 .00 00
.00 .00 .00
.0'" .00
.22 .36
.60 .62 62 .66
10 06
16
. 16
.00
08
In this section we study what happens if the observed process is actually a finit e order AR process or close to it and t.he models are searched for among ARMA models, We are also interested in the effect of pole location of the generated process. Simulation 2 Initial1,alues: Y-IOO= 0, Y-99= 0,
.02
.02 .14
04
. 16 . 12
.06
AIIII'-"AO(o.r, (0,0) (0,1) (0,2) (l,O) (1.1) (l,ll (2.0) (2.1) (2,2) .02
06
00 .00
.02
.04 .01
.00
.00
.00
.00
.00.00 00 00
.00
.00 .00
00
.... , . 12 .04
.10 .36 .6'" .66
Recursions:
.02 . 10
.10
30 .1.4 16 06
.01
02 .02 .00
{E"
TABLE 1.9. Relative f)'eq·uencies of Ol,der estimates fo)' RML method ARMA Olckfl (0,0) (O. l) (0.2) (l ,O) (1 , 1) (1.2) (2.0) (2 , 1) (2.2)
.00
"'D 100
.00 .00 .00
.02 .00 .00 00 .02
200
.00
.06 .02 20 . 12 .04 .02
.46 .50 .62 .60 .54 .51
.02 .01
04 06 .02 .00 00
.0.4
14
.14
.10
.18
.24
.12
24
.06
Cl) U,jn, AlCcritcriofl Num. S.mp.
..7',. 2.
100
ARMA orders (0.0) (0,1) (0,2) (I,D) (1,1) (1.2) (2,0) (2,1) (l.2) .00
.02
.06
.00
.00 .00 .00
.02
.00
.00 .00
200
.02 .02 .00
. 10 .02 .00 .00 .00 .00
.06 .08 .04 .02 .08 .04
.64
.62 .10
.n
.00
02
.12.04 .02
.02 .00 .00
a1Yt_l
+ a2Yr_2=
.06 . 14 16
15 100
,.
..,.
0("'"
ARMA (O. O) (0,1) (0,2) (1,0) (1,1) (1,2) (2,0) (2.1) (2,2) .06
,04
.42
.20
00 .00
.02 .00
.01 ,04
.12
.00
.00.00 .00 ,00 ,00
.00 .00
.36 .64 .66 14 .10
25
36
200.00
.00
36
100 200
.02 .14
.06 .10 .oe
t=
- 99, .• , , 500
(4)
TABLE 2.1 Relative jTeqlLencies of AR-o)'del' estimates for (4) with. al =0. 7 and a2 =0.5 .
04 02
,
AA 0.-6« 2
'om
S,",P
,
t= -lOO, .•• , 500} generated by the standard NAG genemtor the estimation is based on Y".'.'YT
S~mp .
Num.
(t
Tables 2,1-2.3 show the results of AR order estimation using the PLS, AIG, BIG ancl q, criteria. It is seen that the PLS and AIG were likely to overestimate the correct AR order while the BIG had the highest relative frequencies . This seems to contradi ct to the suggestion that for AR order estimation the criteria are strongly consistent therefore the correct answer is that for small number of samples the BIG should only be used instead of q, if (i) for some reason it is very likely that the correct model order is finite autoregression (e.g, because of lack of long memory of the system, prior information), (ii) the small number of samples is important in the modelling (for technical reasons of implementation, speed requirements against the computations attached to the model etc. )
c) U"",BIC
Finally, in Table 1.9 the results for the off-line maximum likelihood (OFl\IL) method are uot significantly better from those obtained by RML.
Num. S am p .
y,+
.02
0'" .10 .10 .10
'00
16 .10 .10
1000 2000
00 .02
'000
cl U,j",BIC
0
1
.44
.0' 00 . 10
.3<
"
.
.•0 .10 O. 02 .00 .00 .00
.. ...... .." . 3.
o.
O• .0. . 00 .00 .00 .00
.0' .0. .02
O.
O' . 0'
.0. .0 • O• O. O•
. l<
.16 . >2 . 10 .00
"
.•0 .02
.•.
.0' .04 .0.
O'
06
O• 00 02 .0. . 02 .00 .00 .00 .00
.00 .00
o• O' .0 • O• O.
o.
.0' .0 •
41) .sing PLS rriftMo" Aft order 2 ,
Num. Slmp.
The above results show t.hat disregarding t.he possible inaccuracies of relative frequencies, which can be evaluated on t.he basis of Table 2, the p('rfonnance of the methods for the generated process (3) gi\"Cs the following picture.
.."2'
The highest relative frequencies for the correct order were obt.ained by pairing the OFML, RML , OFTS , OFTWS methods with the off-line computed BIG criterion. (Tables 1.9, 1.4, 1.7, 1.8, 1.6) Conceruing the purely on-line methods the on-line three stage Illethod and PLS (Table 1.5) ",as bdter t.hen the Ollline two stage and PLS (Table 1.3), but they were much worse t.han the pairing of RML and PLS (Table 1.4).
.t o
.02
.08
.oe
. 10
.06 .' 0
.01 .04
.02 .02
.0'
.66
. 10 . 16
.06
.00 .00
.02
.12 .72
.20 .18
.04 .06
.00 00
.02 .00 .00
.12 .70
. 104 . 16 .14
.oe .06 .04
.02
.000
20G0
'000
62
.06
6)
.04
.06 .00 .00 .00 .00 .00 .02
.,j", Ale AR o,der
2
Samp
,.
,..
The difference between the OFML (Table 1.9) and RML (Table 1.4) was that the OFML was slightly better for small samples while th ere was no significant difference between OFML and RML OWl' 75 samples. This is a clear advantage for R l...IL, because of the much faster on-line computations involved. The R~IL did work well both when paired with PLS and the off-line Cl'iteria BIG and IjJ .
.00 200 'OGO
2000 3000
26
16
.58
16
14
.04
. 12
62 76 .SO
02 08 .02 04 .00 .02 .94 00 .00 .91 .00 1.0 .i8
I) ..
Finally, it can he seen from Table 1.1 that for ARorder estimation the <)i-criterion (which is asymptoti cally strongly consisteut) is preferable to BIG, since the BIG is more likely to underestimate t.he order for small samples. It is interesting to see in Tahle 1.1 that t.he dominant estimated model orders a grad ually increasing with the number of samples, whi ch is not an obvious consequence of available asymptotic results.
i.,
,
.04 06 10 .06 04 .04 .02 00 .02
.00 .02 .00 .00 .00 .00 .00 .00 00 .00
.02 00
.00 .00 .00 .00 00 .00 00 .00
.00 .00
HIC
In Table 2.2 the rf'lative frequencies of order estimates al·e shown for another generated AR(2) process which was "closer" to AR( 1) models. Since in general the underestimat.ion of the model might be more inefficient than overparameterization in a practical implementation therefore a criterion of juc\gemt'ut may be how frequently does not nnderpstimat e tht' criteri on. It is interesting to compare AIG and BIC iu Table 2.2, where up to 200 samples t.he AIG did untiC'restimilte the structure less frequently than BIG, o,·er 500 samples, howevel', both had little underestimation, while t he BIG hardly had an o,'erestimat.ion, the AIG had a significant challce to overestimate. This is due to t.he fact that t.he AIG does not
Not.e, that the abo'·e conclusions refer to the a specific process studied and to make a safer judgemeut. about. t.he finite sample behavour of the methods , further simulat.ional study is needed.
82
chose the correct model order with positive probablity asymptotically as it was proved by Shibata (1976). Note that in consistency with Shibata's theorem for 3000 samples the rel ative frequences of model selection by AIG were closely equal in Table 2.1 and in Table 2.2 in spite of that the generated processes were mu ch different .
25
.04
49 75 100 200 500
02
,56
. 12
.14
.08
.04
.72
14
0..
,70
00
.46 .22
.20 . 22 .4' .68
16
.72
.02
1000
15 .00 200
.06
.04
.02
.02
.00 .02
02 .04 .06 .06
.02
.78
06
. 06
.02
.08
.00
.00
.00 .00
.00
.56 .42
.04
.02
.24
02
.00
. 14
.02
"
.02
.02
.11
.'0 .42
.00
.00
. 12 . 14
. 16
.52
.02
.02
. 12
. 1S . 10
.00
. l6 . 12
.00 .00 .00
.00
.00
.00 .00
Vli,.,~crileri(ln
Table 2.4 contains the results for the same process as Table 2.3 but with the off-line maximum likelihood (OFML) estimator. The important observation is t hat although the nonlinear optimization of the conditiona.l loglikelihood was carried out with high precision, there is no significant. improvement. in Table 2.4 with respect to the results ill Tahle 2.3 T aking into account also the stat i ~tica l errors of probabli ty estimators by relati ve frequencies, Tables 2.3 and 2.4 are very close to each other .
.06 .06
TABLE 2.4 Relative f)'equencics of order estimates for (4) with al =1.0S and a2 =0.2 by OFML m ethod.
PLS crilenDn
oa) .,inf
.00 .00 .00 .00
i)
04 .04 06
.04 .02
.0' .02
.06
OIde,.
(0 ,1) (0.2) (t ,0) (1 , 1) (1 ,2) (:Z .O) (2.t) (2 ,2) (3.0) (3.1) (3 . 2)
..
TABLE 2.2 Relative jTeqncncies of AR-orde)· estim.ates for (4) w-ith. al =1.0S and a2 =0.2
36
"'~M'"
Nil," . 'Imp.
Num .
Samp. Num.
25 36
.04
75
,00 00
100
.00
200 500
.00
.60 .58 .56
.22 .20 .28 .28
0-4 06
.36
,4.
.06
.64 .72
.12
.06
.08 .08
.02 .04 .08
,04
.02 .04 .02 .06
.04
.02
.06
.02
.04 .04
.06
.~
.00
1000
.00
.70
. 16
.08
2000
.00
.14
14
.06
,00
,74
. 14
,06
ARMA Oldlr" (O , l) (0,2) (1 ,0 ) (1.1) 0 .2) (2.0) (2,1) (2 ,2) (3.0) (3.1) (3.2)
S,mp.
.02
25 36 49
.00 .00 .00 .00
15
100 ,06
.02 .04 .02
.54 .46 .50
. 12 . 10 . 12
.02
.02
.32 .26
.16 . 11
.04 .08
.00
200
.18 .24 .22
.02 .04 .02
.02 .02 .04
.02 .04 .02
.00 .02
00
.30 .36
.06 .04
.02 .02
.04
.00
.00 .02
.02
.00
.22
.02
02 .02
.02 .~
ARM'" order. (0 ,1 ) (0,2) (1 ,0) (1 . 1) (1,2) (2.0) (2 . 1) (2.2) (3 ,0) (3 , 1) {3 .2}
Num. 'amp.
AR order
Samp.
3
25
.04
36
.02
\49
.02
100
.22
.00
.80
.08
.00
.12 .66
.20
.00 ,00
.04 ,02 .00
,02
.02
.02
02
.02 .02
.02 .06
. 70
200
.00
500 1000 2000
.00 .00
.00
.00
100
.02 .00
64
.02
.11 .94
.00 .02 .04
.00
.00 .00 .00
.02
.00 .00 .00
.02 .02
Slmp.
.02
.04
15
.02
.58
. 10
.06
. 12
.02
. 14
.04
. 14 .20
.02
.1 2 .24
.02
. 11
.. 25
15 .00 200
04
.04 06
06 .00
.00 .00
.02 .02 00 00
.62 .56 .30
..
15 .00 200
.00
.00
.00
.00
.04 . 10
.00
. 14 .20 .2 2
.00 .02 .00
20 .20
.4 0
.00 00
.00
.00
.08
08 00
.08
. 16
04
. 16
.04 .08
.28
.04
.26 . 14
.OS .10
.28
. 14
.22 .20
. 10
.84 .11
Slmp 25 36
00
02 .04 .04
.00 .00 .00
. 14 .20
.00 .00
02 .02
.00 .00
00 .00
. 12
.00
. 18
.00
.04
00
00
.00
54
. 16
00
28
00
.02
00
.00
.00
.32
. 18
.00
.48
00
02
02
.00
00
. 10
.08 .06
.02
.02
.10
old~IS
(0 .0) (0. 1) (0 .2) (0 . 3)
(0 .0 )
( 0 . 1) ( 0 .2) (0 .3)
(0 . 1) (0 .2) 0 .0) (1.1) (1 .2) (2 .0) (2 . 1 ) (2 .2 ) (3 .0) (3 . 1) (3 .2)
02 .00
.06
.08 .08
TABLE 3.1 Relative fr equencies of order estimat es for Sim.ulation 3 with a, =0 and b, =0.1 using ELS method.
.00
12
56
. 12
.12
IS
In T ablps 3.1-3.3 the ELS , RML and ONTWS met hods are cOlllpml'd fur thp detect.ion of a sma.ll moving average part in the gl'lll'ril t('d process. The results clo not differ sigllifica.ntly. The BIG tends to underestim ate while the Ale and ()) to overestimate. It is worth noticing that the q, detected the small moving average part wit h 0.5 relative frequency for 200 samples while the BIG and PLS had only the half chanc<, t.o df't<'ct it .
.0'
00 06 .02
.30
.02
06
ordlrl~
.00
.34
.04
R eCl!rs·ions: y,+ a,y,_, = c ,+ b, C' _l= c , t= - 99, ... ,500 {c,. t= -100 , ... SOD} gcne)·atcd by the standa)'d NAG genc)'ato)"
V,ln, PLS criterion
. 16
.00
.00
k
.0 4 .0 4
.02
ARM A
3'
. 12 .20
Sim.ulation 3 Initial values: y-, OO= 0,
ARMA Oldlr"
Samp .
00
.00 .00
In this section t he fini te sample performance of the methods is shown for different variations of pole-zero locations of sca.lar ARl\lA and MA processes.
(O.ll (0 .2) (1.0) (1 , ll (t.2) ( 2 .0 ) (2 . ll (2 .2) ( 3 .0) (UI (3 .2) .0 0 . 00
.02
STRUCTURE SELECTION OF MA AND ARMA PROCESSES
.04
.02 .00
38
ARMA
Samp.
04
."
200
(1)
. 14 .20
.22
For several other simula.tions when t he estimated proces pure autoregression see Veres, 1991 , Chapter 5 Section 3.
ARMA o,dlr" (0 , 1) (0 , 2) (1.0) (1 . 1) (1,2) ( 2 .0) (2 , 1 ) (2 .2) (3 .0) (3 . 1) (3 . 2)
.06
.02
.00
15 .00 200
TABLE 2.3 Rela.tive fr-equ cncies of AR-o)'de)' est·i mates fo)' (4) with a,=1.0S and a.2 =0.2 by RML method.
.00 .00
. 12
.00 .00 .02
(0 . 1) (o.:z) (1,0) (1 ,1) (1 ,2) ( 2.0) (2 , 1) (2 ,2) (3,0) (3 , 1) (3.2)
.."
In Table 2.3 the model esti mation was carri ed out for t he same process as in Table 2.2 using the RML method but the range of models over whi ch the information was computed was a set of ARMA models including also the AR m odels up to the third order. In the case of the AIG and q, the competitive models were (1,0), (1,1) and (3,0) while for the BIG and PLS the only significant competitive models were (1,0) and (1,1). The fact t hat the model (1,1) had for all criteria a high relative frequency up to 200 samples seems to be surprising. Another important observation is about Table 2.3 is t hat for small (up to 200) samples underes timation was in general more likely than order overest.imat.ion for all information criteria.
.04
04
11) Vlln, BIG
c) tuillf BIG
25
. 70 . 70 .64 .56
,06
25
.."
.02 .04 .02 .02
.06
.28
3000
25 36 49 75
"
02
r ) VIi", BIC
82 88 86
. 16 12
06 .02
00
. 14
.
.60
14
12
04
78
10
08
40 38
100 200 500
.84 16 .54
16 24 .46
00 00
.00
1000
26
12
.02
.00
PIS
83
.0' 00 .00
. 14
J2
.20
Ale
Num . Slm"
ARMA orM"
ARMA OfcMf. (0 .0) (0.1) (0,2) (O.l)
25
.14 ,84
49
.18
.08
.04 .04
02
.02
. 18
15
.82
100 200
14
24
1000
.26
. 50 . 70
S
(0,0) (0.1) (0,2) (O.l) . 18
, 16
. 12 .60 .52 .52 .l6 . 16
.00 .04 .00
. 16 . 12
,46 66
. 16
. 10
. 14
.04
. 10
. 18
.84
. 14 . 16
.00 .00 .00
.24 .22 28
.6-4
.54 50
.00 .00
,48 . 16
.52 .a2
,00
, 16 .02
.02
a2 .ao .a2
, 12 . 12 . 10 . 14
.80
. 16
.,""
. 12 . 18 20
.02
.04 .08
00
.06
02
.oa .06 .06
, 16
00
0" .06
06
. 14 . 14 .12 .68 .62
.4 2
.00 .00
1000
. 20 06 06
.oa .06
.04 08 .08
.96
04
.00
....
. 10
.06
.80 .82
18
.02
75 20 •
.5"
06
00
.06
9
10
.00
00
00
.00
00 00
.00
00 00
00
.00 .00
00
00
11
00 00 00
.00 .00
.00 1 00
j
CONCLUSIONS Several examples ha.\'e been shown on performance of information criteria. In these examples the application of OF~IL was not. bettcr than the application of RML. Information criteria associated with ELS was slightly worse than those with RML. Two and threc stage methods didn't results any improycment with respect to Rl\fL , although both the on-line two and three stage methods performed well , but not so the off-line two and three stage methods . It was observed that the estimated AR-OI'der increases by sampling time if the sampled process is infinite order alltoregression.
,00 .00 04 02
.3 4 . 36
.00
cl u lngB/C
(0.0) (0 . 1) (0.2) (O .l)
.02 .00
00 00
.00
AR orders
20
ARM A order.
.• 0
.56
. 12 .oa
00
11
6) u,,1n9 AlC
AlC
Num. ARMA Ofde" Slm" , (0,0) (0 , 1) (0.2) (O,l) 25 l6
.0 • .06
.oa . 12 , 10
eo
PLS
. 14 .08
10
"
ARM A orde" (0.0) (0 , 1) (0 .2) (O ,l)
.62 .62
200 1000
.54 26
5<11m".
. 12 •8
.a6
.68 .56
Num.
ARMA orde" (0.0) (0,1) (0 .2) (O ,l)
49 15
.9
9 .0 •
.1.
TABLE 3.2 Relative freqnencies of order estimates fOl' Simulation 3 with a l =0 and bl =0.1 using RML method.
100
.a4
15
DIG
51m"
25
06
.02
DIC
TABLE 3.3 Relative frequencies of O1'del' estimates f01' Sim:u.lation 3 'Utilh n I =0 and bl =0.1 nsing ONTWS method. ARMA orde,s
Num . Slm" . (0 ,0) (O.l) (0,2) (O.l)
(0.0) (0.1) (0,2) (0,3)
.8
15
.80 .78 .76
. 16 .20 .20
04
.00
.62 60
200
.a2 .14
. 16 .22
.00 .04
02 .00
.50 .34
SOD 1000
.50 .24
.SO
00
..
30 .26 .50
.00
1000
. 18
,54
.46
.26
.72
. 10
ARMA otder. (0 .0) (O.l) (0 ,2 ) (0.3)
.n
, 14 .08
. 78 .62
,00 .00
.8 4 .82 .84
Aka.ike, H. (1972) . Information theory and an extension of the maximum likelihood principle, In Proc. Second Intemational Symp. on Inf01'mation The01'y, 267-281 , eds. Petrov & Csaki, Akademia Kiad6 , Budapest. Akaike. H. (1917) . On Entropy Maximization Principle. In Proc . Symp. on Applicat'ions of Statistics, ed. P. R. Krishllaiah. North Hollalld: Amsterdam. Davis , M.H.A. and R.B. Vinter (1985). Stochastic Modelling and Cont7'ol. Chapll1an and Hall, Loudon and New York . Ha.nnan, E. J. (1980) . The estimation of the order of an ARMA process. Ann. Statist. 8.1071-1081. Hannan, E.J. and M. Deistler (1988). The Statistica.l Theol'Y of Lineal' Systems, John Wiley , New York. Ljung, L. (1987). System Identification: Theory for the User . Prentice Ha.ll, Englewood Cliffs. Mayne, D. Q., Astrom, K. J. and J.M.C. Clark (1984). A ncw algorithm for recursive estimation of paraIll('ters in controllt'd ARMA processes. A lLto'llwt'ica, 20 , 751- 760. N orton, .J. P. (1986). Intl'oduction to I dentifica.tion. Academic Press, London. Rissanen , J. (1986). Universal coding, information, prediction, and estimation. IEEE Tmns. Information Theory , IT- 30. 629-636. Rissa nen. J. (1989). Stochastic Complexity in Statistical Inquil'Y , World Scientific, Singapore. Soderstrom, T. and Stoica, P. (1989). System Identifica.t·ion. Prentice-Hall. New York. Sch\\'arz. G. (1978 ). Estimating the Dimension of a model. The A nna.ls of Statist'ics 6 461 -464. VerI's, S.~vI. (1990). Rela.tions betwcen information criteria for model strncturt> selection. a) Part 1. The role of bavesian order est imation , Int enl.. J. Conb'ol, 52, 389 . b) -Part 2. Modelling by shortest data desc ription. Intem. J. Conb'ol, 52,409 , c) Part. 3. Strong consistency of the predictive least squares criterion , Intel'n. J. ContTol. 52, 737. Veres , S.M. (1991). St1'1Lci
.00 .04
AIC
PLS
75
08 .08 .12 12
.12
ARMA otde" Slm" . (0,0) (O.ll (0.2) (0.3) 25
R EFERENCES
.• 2
.n
00
. 14 , 30
.60 .50
.. .00 00
.08 .00 10 08
04 . 10 O• . 10
. 14
.64
12
.04
.18
. 12
DIC
To give a corn'ct picture a bout tlw results next we show a "less successful" order estimation experiment. In the transfer function of the generated model the pole and zero were realively close to each other as chosen a l = 0.5, bl = O. 7. The experiment was carried out for the RML but no better results were obtained using any of the rest of the estimation methods (See Veres, 1991, Cha.pter 5 Section 2).
TABLE 3.4 Relat'ivt frequencies of oTder estimates for Simulation 3 with 0. 1 =0.5 and bl =0. 7 using RML method. N um .
AR orde"
Slm" , 25 36 49 15 100 200
86 18 .80 sa
10 . 12 . 10 . 12 .24
.02 .0 4 .02 06
.00 .00 .02 02
04 00
.02
02 .00
00 .00 00
.02 .02
00 02
02 00
02
02 00
84