An Improved Modelling Method for Threshold Autoregression and its Application in Forecasting the Railway Passenger Flowrate

An Improved Modelling Method for Threshold Autoregression and its Application in Forecasting the Railway Passenger Flowrate

© I FAC Idt'lllifit atio lJ alld S ~~ l e m Pa ram e ter Eslimalion 19H5 . York . L' K. I' IH,-, Cop ~ right J .... AN IMPROVED MODELLING METHOD F...

961KB Sizes 0 Downloads 73 Views

© I FAC Idt'lllifit atio lJ alld S ~~ l e m Pa ram e ter Eslimalion 19H5 . York . L' K. I' IH,-,

Cop ~ right

J

....

AN IMPROVED MODELLING METHOD FOR THRESHOLD AUTOREGRESSION AND ITS APPLICATION IN FORECASTING THE RAILWAY PASSENGER FLOWRATE Q. H. Zhong*, X. H. Li** and Z. F. Zhang* *De/}{Irtlllfllt uf AutulIlatic CUlltrol, Beijing Institute of T echnology, B eijing, Th e P eople's Republic of China ** TiallJill P etrochelllical In£iwtrial Corporatioll, Tianjil/ , Th e Pl'Ople's R epublic uf China

~ stract.

In this paper a variety of time s e ries nonlinear models are a nd the id ea and an improved modelling method for threshold auto r eg ressio n al'e d iscu s seu in detail. l'hen using threshold autorei~ ression, the rai l wa y passenger flowrate data are analysed and some ,for e casts are ma u e. The monthl y forecasting accuracy for 1982 and 1983 are over ~ 1.9% and 91.4% respectively. As a result of this the long-term forecastin g of railway passenger flowrate wh ich was impossible in China in t he pas t ha s been made feasible and this will enable us to run our railway system more metho d ically and e f ficiently. This method can also be Kp plieu to those fields in w~ ich periodic data often exist. n

e ntione ~

heywor~ ~. Thres bold autoreg ress i on; cyclical data; limit cycle; cyclical oat a ; l inear least squares method; Akaike's information criterion; golden spu ti on me thod; ra i lway passenger flowrate • .

IN'TH(j D UCT I O l~

cyclical , it can be considered as a limit cycle of a nonlinear system. the simplest cycle of a nonlinear s y stem ha s a switc~ing line on the state pl a ne. On two sides of the switching line two different linear models a re used. Similarly, the threshold conception is adopted in the time series analysis. When a sample value is greater than the threshold, the f irst linear model AR(P1) is used. Otherwise, the second linear model AR(P2) should be used. Nonlinear characteristics, such a s nonsensitive zone, backlash, time lag and saturation, often exist in real systems. In time series analysis all the nonlinear characteristics mentioned above can be deal with in the form of time delay. According to whether a d-step delayed s a mple Xt _ is d greater or less than t he threshold, different threshold autoregressive models are used. This is thp ba sic idea used in the threshold autor €gression method to fit nonlinear s ys tems.

Since Box and Jenkins (1 970) presented the time series analysis me ch od, ti~e series analysi s ha s be e n widely used in d ifferent fie l ds. From tre point of view ()f" Rvs tC' :"1. n ~ " dit (1'37 ">1 and ','iu (1 Y8'1) h"ts put f orward t he dynamic a.ata system method which could e a s i ly be ap~li e ~ . As we know, t he s "" methods stuc.i en lineu r modele s only. But in reality, real systems are often nonlinear. When l i near models are used to appr oa ~h nonl inear s y stems, the results are often uns ,. tisfact o ry. Therefore s ome authors have use u non l inear mod els in time series anal ysis. The s e incluu e th e thres l.oln autore c re s siv e moo e l int r oduced by Tong ; 1980 ) , the e xponetial autoregressive model by Ozaki a nn H ~ ggan (1 981), the nonl inear autoreg ressive mod el by Jenes \ 1978 ) , the bilinear mod el by Granger ana. Andersen ( 1978 ) ,and th e stat e - d ependent model by ~riestley ( 1J80 ). Among these no nlinear models we considered the threshold autoreg ress i v e model more attractive and more effective f or engineering use.

The Improved Mode l ling Met hoQ of Threshold Acltoregression THB IDBb. AhD IMPrtOVED ;·,OD"i,LI NG M,,'rHOD OF 'l'HKj~SH O LD AUTOhEGHES SION

'rhe (,:er. e ral t o rm of the thresholc. autoreGressive model i s

Xt ::. 70 l#CiJt !J. f;;

The Idea of Modelling Threshold Au toregression

lD, ljJ

Ji.

Xt-t. +a. t(j)

js 1,2, " ',

A t ime ~ e rie s c a n be co ns idered as an ou t put of a syst e m. Whe n t ~e s a mple c ata are

whe r e

1525

..r

a.S

Ij_I'"

Xt-d.< '[j

J

Yj and d a r e ca i l ea. thres ho l d and

(1 )

Q. H . Zhong.

1526

~.

d e l ay parameter res pec tive ly . ~ ben X _ is t ct : etween ""I ). - 1 and JJ.. t he j - th a 'ltc::re!TCS. Rive mod el (l(. o rcer) is u~ed . 1r, f o r~ , u,;, a . . ' { a.t: 'P} 1S J ~. . .. \ , J • "ne wh1. ,. e nu i.Si; . ' 1 t L var jance

6';'1 ara. {ctttj'l} is inr.~:-,e'ld~nt of.'

{a't;'j")}

J

j'f j ". TL i s tl.resrold auture~-r e~' Ri ve mod el ~ s r Br- rest=nted by "; .';'; A:(

H. l.i and Z. F. Zhall g

1::~ .... ~ I :::a i..: __ "!

'.~rr.t(- l'

~alc '''': ~ 5te

:; i~

'" et)

the model of the two segrr.ents will be re presented as follows

... Xn,+t-K,

rot.)

xno+l

XII.

Xn•t3

Xn.+ z XII._1 . . . Xn,+3-K, Cf.") 1

Xn•H

X 11.+5 XII .+1t " . Xn,+&"'k, ~;'l

0 _1

... - t! . V . i . .';' (.;

. , /\

(i )

:"': l:.~:1. €::-. ·w .

,

(I[ , £:r_

('))

c.) ·.:.Ia l-:ir ~r: ~if:-"( ~ 'e r.t t:-.!'echolc. vCiiud t Lli

', i= 2 , ... i } , t~e

tne ;-Jroc ec- s .' S .:') an.J ' I . /\ t ~r e Rl c: d t q for a •. :ay d

Y'c ~ ·· ·,t

np :i~al

wil : te ;-·our.o . l.e .

Tr.e modeli ng steps of t he threshold a ut orel'ressive rr:o d el method a r e sr.o.,n a s fo l L:n:s (t aki ng J =2 for ar; exar.,n:;'e) : 1 ) T~ ~nsrorm t he orjginal da ta. such as take the l oga r ithm. and r e~ove the mean value o r the t c,-dercy t erm . 2) Gi ve a cela y d and L. wbere L is the n,a ximurn ::>r, ~ er to be entertaineci fo r both linear AR r."ouels. Take nO =:r.ax ( d .L). 3) l et T",={t .t •...• t '} to be a set of q, q2 ql potential candi:.\ t es for t he estima ti on of 11. t he thresl'o Id value . .l> ccorci ing to wret t er X _ (t>n ) is ~ re a ter or less O t d than t qi • tr:e se ries { Xt • t-' is _ 1, 2 , ••• , l} ~ divi f e~ int o two subserieb

X11

~:. e

;,.}~ I

T~ . e~

..• , A1) .

2

J,lC lt (!:. ) ="lC(,. 1 ' 1 .1-.;I". !

fiS

(c.i; .i~ j ,h2'

,f"

',..: . ..

(') n.+1

+

El" ",+3

Tak i n~ different de:ay value d , recea t the p r c c~sses 2; t o 5) . TIe the optimal

6)

/\

delay d will be fO'Jnd based on tre rr,inim7.:t ti on AIC. /\

Su far the o ptimal oelay d , the opt imal /\ t ~, r esl',ola t and t hei r corresponding q

/\

1\

op timal order K1 and K2 and MW

~a ra me ters

m

Ti. i=O.l .2 ••..• K1 ; ~j • j=0.1.2 •..• K2

have been dete rmined. By using these parameter , some f or ecas ting can be ~ade. It s hould be pointed out that for different thresh old value the two subsample numbers some times show e reat disnarity. So when the subsample number is smaller. the maximum ord er to be en tertained . L should be reducea . ut h erwise the residuals could mathematically be very very small or even z ero and this is mis l eading and has no real e ngineering meaning. It should als o be pointed out t hat in step 3) . Tong sup;gested {t o.3D J t uo ' t o. 50 ' t o. 6 o ,t o.TO } as a set of po tential candidates for the estimation of ~ or may be chane ed if neces sary. Very often this is not accurate enough. By complex calcula tion it can be sr'own that AIC (t ) is a convex function of

U)

C".+6

~1I1 K,

i.e.

q

(2)

Xn,+t

Xn.+ 1 Xn.

Xn,+'t

Xn.+ 3 Xn.+% •• . Xn,H-K, Xn.++ X n,+3' • • XII.+5-l(~

Xn.+5

'1:"'

.•• Xn,~2 -K2

E,a} n.+z

1.11.)

tn.+". 1»

~(;&)

+ t'"" ....5

'f'..(>J k,

1. e.

t (Li. 1981). q

Therefore. the golden

sec tion method could be used to optimize the thershold. The se ~r ching could be Btop ped when the distance between two section points is smaller than a g ive value. In thi s way better results were obtained for Wolf sunspot series ( covering the years 1700 to 1920 ). The modeling results of reference (Tong . 1980) and model ing results of our modified method are giv en in Table 1. APPLICATI ON I N

FOriliCASTl~G

HAlL'/lA Y PASS ENGER FLO'.1HATl':

By usi ng the l::'near leas t square me thod. the mod el narameters can be estimated. 4) For a Given thersho ld t calculate 1\ . a i" . th e optimal order K 't) and K~'J that mini 1 mize th e AlC of the moce ls. i.e. AlC (K"l) = 1(:i,n {N. *Ln(.K.8S . (K".') /1'-. .] +2 (K. + 1 )} J

'/I'n ere

O
,

L\::;U.

J

J

J

J

J

J

j=l .2 ( 4) (K'~' ) is the r esi aual sum of J

scuares of the

ti.J_ th J

orue r autorep':resRion

Unde r normal c onditions. the Railway Ministry as we ll as i t s sucora i nate depart~ents only ma kes a short-term (a half t o one month) ~ assen Ge r flowrate forecast accoroir.g to f ormer st atistical da ta or at t he End of t he t hi rd season . estimates the flowrate fo r the nex t year . B~t it i s very di:ficult to predict t he flowrate ~onthly .

1527

Threshold A UlOregression a nd Its Application ~ o.!'~:l

at

i![:D!"':)v e...:

th e.: ~n

:~; .._~

!'19tjO .;[.l :

the

O~

nl!lg

~ ·) C1e ~J

~aS;8

~

~·l:. ~:

C ~-·

NUy

'""~ :"'

,!:,8S~·t:.':.c.~-~!"'

~l~~t nl y

t:.ata f!'ol!, 2. -;7l to

l~'~~~t

m2:~od

:;"'9~:i ,

we

the

t"~o\t::' ate.

9as~enger'

n

1.ts~d

:' o '!"''?~as:inr

:':~ (; 'N r ate

:;owrate ai3.gl'3.m

1~s "~ ~~~ i ~ ;~~ ~ ~ : ,;~c ~,a ~ ~ , ~~~'';~,\ ~~,. ~~;'€:;2.;,; ,,~~

,'eL' ,' .L", .. ; 'ai. ',"·iLCl'e"l>: S yeC1~: OJ J'2~ r ·.d, t L r. e ,) eve ~ :. pna~:: t ,, :- the n8. t i ,.I nal ~C ; jIl.Jmj' and evelY j~\r th~re is ~ pe:lk ana tw c t

-val ~ els .

13.2 alJnt:l':'jf stG1tj,,,~ic al data \covering the years i97l tJ :Jd~) a re use0 fer mouel~i~g and : J recastine. H.S a l'e,.mit OI' thi s , at't,"r 1 :J ,f': ari thmic transformation, the de:'ini te - tendency term is t~ i'! en by 6 .i 339l + J . ~O~23t . t=1. 2 , 3 .... (7 ) If the ue !' inite t enJ ency term is removed , the muiel will be of the form of S~TArt (2 ;l t , l) wi t h the ootima l uelay d being equ~l to l~; ana the opt i mal thre shold

"

/I

value t q beinp, equal t o J . J 5 3667. all r:'elateu"i,~el parameters and the correspondir,g9"S:Jb confidence l i mit are sh own in Tstle 2. v· Using this mod el the natio nal railway nassenger flowrates from 1902 to 1986 were mont ~ly forecasted at t he beginine o f 198 2. ~uring the tw o year test period, it has been shown that t he forecasted values are ve ry closed to the r eal data. the maximum relative er ro r of monthly forecasted values is 8 .1% in 1982 and 8.6% in 1 983 . In other words, the lowest accuracies in 1982 and 1983 monthly are 91.9% and 91.4% re s::,ec tively. De tailed information is shown in Tabl e 3 a ni Fig. 2.

In order t o t es t the mod e l whether suitable or not, it is a good way to see that the forecasting of t he model a e rees with t he real value or not. Tabl e 3 shows that the forec2.sting values are in aereement wi th the real values. The maximum rel~tive monthly forecasting erro r is 8 . 1% in 1982 and 8 .6% in 1983.In other '.-l ord s , t he lowes t accuraci es fo r 1 '182 an:.. 1983 m')nt hly are Y1 . 91<· and 91.4% respective ~y . ~nile the yearly forecasting accuracies "ire 99 .1 ;'0 for 1902 and 98.3% for 19E3. ~ h is is a~it e a satisfact~ry res ult . In t he past o~ly short te r m forec ast could be obtained ~rom experience or by means of trau~tional l inear re~ression method and its accuraci es are very low . r'or example , tt e yearly forecas ting accuracies are 98 . 1% for 19d2 and 96.1% for 1<;83. Fig. 2 sho ws that there is a pe r iod of one year in t he eventual forec a st ing function and every year there are two valleys in June am; Jecember and a "eak in August. In the Chjnese rai l way t~anGP~rtation system the~e is a limit cycle. Fig. 2 also show~ that the pass e nger fl owrate will progressively increase at t he rate of sixty million per year in the next few years. CGN
shold autoregression is better than that in reference (TonR, 1980 ). Using this method, the rai l way passenger flowrate are analysed. The accuracy of monthly f or ecasting f or 1982 and 1983 are over 91.9% and 9 1.4% respectively. The yearly forecasting accuracies are 99.1% f or 1982 and 93 .8% for 1983. While the accuracies of traditional l inear regression are 98 .15% and 96.1 % f or 1982 and 1983 respectively. It can be seen tha t t he application of this method has been successful. 2) This model can not only monthly forecast the national passenger flowrate but also shows that the flowrate will progressively increase with the rate of sixty million per year in the few years. In Fig. 2 it is thown that the eventual forecasting function has a year period and every year there are two valleys in June and Dec. and a peak in Aug •. It is also shown that in the China railway transportation system there is a limit cycle. This information supplys a reliable basis for the goverment plan making. 3) It is worth pointing' out that the threshold autoregresive model can also be used to forecast the national go ods flowrate and the flowrate of each railway line and station. Accurate f orecasting is very important for d ireoting the work of the dispatcher' office, espe cially for each railway line. In this way, railway dispatch can be organised in a planned and scientifically way. 4) In ad d ition, thi s method is also suitable for other situations, such as the electric power department, the running water company and so on, in which cyclical statistical data often exist. It is clear that the threshold autoregres s ive model will be very useful for planned development of the national economy.

hGFi::1illNCES G.E.~. & G.M. Jenkins ( 1970) , Time tleries Analysis', Forecasting an-d-:';ontro}. Hold e r-Day. San Franc i sco. Tong , H. & K.S. Lim (1980), Thre s hold autoregression, limit cycles and cyclical data. J. of ho yal Statistical Society B, 42, 245- 292 . u zaki, T & V. Haggan ( 1981) , ~iod.ell ing nonlinear random vib~ations using an ampl itude-de pend { nt autoregressive mod el. Bio~ etr jka , 68, 1, 189196 Jenes. D.h. (1978) , Nonlinear autoregressive processes . ~roc. ROY . Soc . A, 360 , 71-95. And ersen , A.P. & G . ~ . Granger (1978), An IntrO\;"lction t o Bilinear 'r ime ::ieries

Box,

~odels .

~ r iestle y,

Va n ce~hoech

& R~urecht.

dtat~ dependent models: a €;eneral approach to nonlinear time series analysis . J. of Time ~eries AnalYSis , 1. 1, 47 -72. Li Xin~ua ( 1981) , The successive app ro2.ching modelling method anu its applications. Graduate st udent tre atis e , ~e ij i ne Insti tu te of Technology. Pand.i t, ~ . ~ . & J.~. ~u (1983) , Time Series and tl ys tem Analysis , '..;i th Ap u1i cations. John 't 'liley and So ns. Pandit, S . i'i . (1 973), Data dependent sys-

N. B. (1980) .

Q. H. Zhollg. X. H . Li alld Z. F.

1528

terns: mode J linG ana ly sts, and o ~tinal control via time series , ~h .D. T ~esi9 , universi t y of !iisconsin- had is on.

·: k. Bl,~·;

~.

:·~ Ot.el

;"1 ut.

I

modell ing result of

9'.

cp, CP, If,

'f.
0 . ,)0 7

57 0 . 54847 0 . 022 88 0 . 'J34tJO - 0 . 13552 0 . 044 ·) 3 - 0 . 06231 - '1 . 08222

I;:'

U(

~

.. ,r

confiaen ce 1 :mi ~

~ :; !' .i:-iQc

(:8

'J . 03686 ·: ,. 1G648

~---+~~r-~~--i-----------~--------------~------------~

t

'100

Fig. 1.

197'f

1975

8 • .;2,3 4

') . 2385

- 0 . 15202 8 . 072.)') 'P, 'fI. - 0.11825 0.06824 ~, !Po. 0 . 65 Q 85 ') . 12~5 0 . 10 14 'PlO -0.2S08 2 'P,. - 0 .27546 O.Oy') g Var Var -0.004689 - 0 .001397 JVJVar -=0 . 00 1759 ~

1916

1977

,978

1979

1980

1981

The national railway pas s enger flowrate

n _

: j:ti t

( ~5-oi

, ': 5iai ±O. 005 4 0 . 1J12 t) . 1 J 2') 0 . 0932 ') . 0910 ') . 09 17 0 . 0352 J . G92b J . 0882 0 . 089 3 0 . 0913 0 . 8 951

bOO r---------t---~----~rt~~--f-----~-------+----------~

1973

\ r, t _ 'J

ter!;

700 r--------------1---------------4~~~~--~++~~+_---L----~

1971

' ..

'... ;l

~ara:ne -

• MVar is the pooled mean sum of squares of residuals.

1971

rer j

:> t (l ;

q

500

: C:!~": [:t"; nr E';'

e

~t q'

paralLe ters

l>loa el lin.:; r ef'ul t of this ra per reference(Tong,19 ~ O) lo·.-er uCf:er upper lower region re g ion region region 10.<;440 7.8041 12.Cl3789 7. 80657 Cf. 1 .1) 9179 0.73685 1. 6920 8 .7 432 Cf, -1.1<; 92 - 0 .0409 - 1 . 2~620 - 0 . 007 20 Cf. 0 .41818 - 0 .2 5344 0 . 2367 - 0 .2020 If) 0 .1 7534 0.150 3 0 .1730 1.. - 0 .1 8 764 - 0 .2266 'Pr- 0 . 04178 0 . 0189 Cf& 0 .20399 0 .1612 If!, - 0 .26424 - 8 .2564 '1'8 0 .31 045 0.3195 'f. - 8 .37434 - 0 .3891 'P,. 0 .39351 0 . 4306 'f.. -0.0397 V:trf 2')2.6978 63.4906 2<;4.64 66 .80 144.882 lMY'aI' 153.71 3 3 36.6 36.6

la:'8wr:'tE=r'F

ra~..

low':r TC\iu n \~'t -ti

TABL';: 1 Results of Eodelling :JunsDo t

i

Zh ; lIq~

T h res ho ld Auw regressio n and Its Applicatio n rA h L ~

5

~ e s alt B

.

0:

~ c r~ c a s t i n g

a nd its

Accu""ac y

year . mo n t h

f o r e cas c

r eal

val;.!~.-

val~ c

\ 10 ;)

1'182 . 1 1982 . 2 198 2 . 3 1982 . 4 1982 . 5 1982 . 6 1982 .7 1982 . 8 1982 . 9 1982 .1 0 198 2. 11 19<3 2. 12

3 ~4 .

39 8 3 5. 31) 8 265 . 47 :3 361 . 44 8406 . 27 7814 . 27 8329 . 24 8885 . 20 8383.30 8465 . 93 8 306 . 78 8089.36 9973 2.1 4 8730 . 06

To t a l

1983 . 1 88 50 . 85 1983 . 2 8741. 69 1983 . 3 891 4.7 6 1983 .4 8944 . 61 1983 .5 1983 . 6 ,J ~87 . 32 1983 . 7 8 944 . 99 1983 . 8 .- / 9304 .46 .;:8850 . 63 1983 . 9 1983 .1 0 9038 . 39 1 q8 ~.1 1 8825 . 41 B606 . 49 1983 . 1<: 'f o tal fn o2 39 .66

(10')

rela t i ve a cc u r ae rro r (%) cy (%)

(1 0 ) 9,) 37 - 8 . 11 8058 3. 8 1 - 0 . 76 832 9 8 220 1.72 3 18 3 2 . 73 6 . 34 734 8 8 120 2 .58 UB44 0 .47 3 . 86 7995 8 11 8 4. 29 2 . 86 8076 8 560 - 5 .5 98888 0 . 85 8902 -1.93 8978 -1.42 - 4 .30 9134 886 4 0 .57 8916 0 .32 7. 78 787 5 8612 3.87 9445 . 5 -1.4 9 5 .2 3 841 1 9 . --;? 8 321 8 500 , . 83 8988 - 4 . -''" '+ 104946 1. 23

91. 9 96. 2 99 . 2 98 .3 97. 3 93 .7 97. 4 99 .5 96 .1 95 .7 97 . 1 94.5 99.1 98 .1 98.6 95.7 99.4 99 .7 92. 2 96.1 98 .5 94.8 91. 4 96. 2 SS.cl

98 .8

Xt ~":':"

0-- " -

"

x- r .~ :~ l

~ ' I~

C· •

~

i.

If

" ' 1"

11"

: '.1 ')J

f~ ~

,

.

;

1

~

~

~' 'i.

,nl,, I... ,'

IHO

~:

~

1

w

~\~ f~ 11 ' 1\:

I \ '

11

900

''J ''

:. f

(V I k',i

~

I

~

1

, 1

1

1

~

~'~ i I I

1 I1

800

1

I

V.

[,N¥'



t

700

6126/16/26/2612

1982

F ig . 2 .

1983 ~ventual

198Jf

1985

1986

fo r e caetin g curv e

1529