Parsimonious modelling and forecasting of seasonal time series

Parsimonious modelling and forecasting of seasonal time series

365 Parsimonious modelling and forecasting of seasonal time series S.A. R O B E R T S * Department of Statistics, University of Birmingham, United Ki...

840KB Sizes 0 Downloads 91 Views

365

Parsimonious modelling and forecasting of seasonal time series S.A. R O B E R T S * Department of Statistics, University of Birmingham, United Kingdom

P.J. H A R R I S O N Department of Statistics, Unwersity of Warwick, United Kingdom Received December 1980 Revised February 1983

Some seasonal time series models are considered which are appropriate for the univariate modelling and forecasting of many time series. The equivalent ARIMA forms of these models provide the basis for a critical examination of the Box-Jenkins approach to seasonal model-building, it is concluded that this approach is unsatisfactory and in particular can often result in over=differencing and the adoption of an inappropriate model. Two main reasons for this are discussed: (a) the inadequate class of models considered which rests on a restricted view of parsimony, and (b) the shortcomings of the basic approach to model identification.

successful use. So, belief in this notion gives value to such concepts as exponentially weighted regression (EWR), to the use of local Taylor series approximations for forecast functions, and to structured models expressed in terms of meaningful quantities such as level, growth, seasonal factors, cyclic component, etc. Despite its importance in motivating many of the models we would use in practice, we will not develop this concept in detail in this paper. Rather we wish to concentrate on another, and more familiar, aspect of parsimony, namely 'parametric parsimony'. This refers to the desirability of using as few model parameters as possible whilst still achieving a satisfactory representation of the data. We shall call the number of free parameters the order of parsimony. Thus, if a model has n free parameters which must be estimated from data, then we say that it possesses parametric parsimony of order n. 1.2. Parametric parsimony' in Box-Jenkins model. ling

I. Introduction

1.1. Two notions of parsimony It is widely accepted that parsimony is important in model-building. However, its meaning is not often made clear and it is frequently misunderstood in a way that we shall highlight. In discussing models and model-building we shall invoke two key ideas of parsimony. We use the term 'conceptual parsimony' to emphasise the importance of simplicity and clarity in concept and in the structure of models. This is particularly vital in commercial modelling and forecasting applications where interpretation of, and communication with, the model by the end user is often crucial to * Now with Imperial Chermcal Industries PLC° The Heath. Runcom. Cheshire, United Kingdom. North-Holland European Journal of Operational Research 16 (1984) 365-377

The Box-Jenkins approach to time series analysis and forecasting is presented in detail in the now famous book by Box and Jenkins [2] and descriptions are given by Box and Jenkins [1], Chatfield and Prothero [4] and Newbold [16]. Box and Jenkins (henceforth BJ) base their modelling procedure on the ARIMA class of models

,t,(B)(1

-B)

y,=O(B)a,,

(I.I)

where B is the Backward Shift Operator defined such that Bky,=y,_k, ¢ ( B ) = l - e p l B . . . . . ¢pB p and O ( B ) = I - O I B . . . . . OqBq are the auto-regressive and moving average operators of order p, q respectively, and the model is denoted as ARIMA( p, d, q). These authors appeal to parsimony as an important element in model-building. However, it is clear from the work of BJ and exponents of their methodology, that parsimony tends to be considered as the number of terms or coefficients in the model i.e. p + q in (1.1) (assuming, without loss of

037%2217/84/$3.00 ~ 1984, Elsevier Science Publishers B.V. (North-Holland)

S.A. Roberts, P.J. Harrison / Parsimonious modelling and forecasting

366

generality, that the variance of a,, o2, is known). That is, it is assumed that the minimal set of sufficient parameters in a parsimonious model comprises the full set of AR and MA coefficients. However, it should not be the number of terms which matters but the number of free parameters - - a multitude of terms may be functions of but a few genuine free parameters. Indeed, there are within the BJ class many models not considered by BJ modellers for which the p + q coefficients are functions of a smaller number of genuine parameters. In general then. it is clear that given two models ARIMA(p, d, q) and A R I M A ( p ' , d', q') withp + q < p ' + q' it does not necessarily follow that the former is more parsimonious (has lower order of parsimony). F o r e x a m p l e , c o n s i d e r the class of ARIMA(0, n, n) polynomial projecting models of the form

where L is the length of the seasonal period ( L = 12 for monthly data), V = I - B , X7L = I - B L and where q,(B), O(B) will often pced to be of rather high orders. If the full operators are considered then the model would have many parameters. So parametric parsimony enters in two ways into the BJ model formulation: (a) through the use of multiplicative operators so that the models normally considered are of the form

(1.5) (b) through the prior setting of many coefficients to zero i.e. the assumption of sparse operators. There is no rationale for allowing the number of non-zero coefficients to exceed the number of parameters. So, for example, in the widely employed (linear growth seasonal) models

VVL.V,=(I-O,B-OL BL-Ot.*,BL+')a,, (1.6) V VLy, = (1 -- 0, B)(1 - a , . B C ) a , , In BJ modelling the 0's would always be estimated freely and the order of parsimony is taken to be n. However, a parsimonious subset of this, not used by BJ modellers, is that class {(1 -

B)"y,= (1

-

flB)"a,; n

> 0.0

< fl < 1 } .

(1.3) This has parametric parsimony of order 1 in that only one parameter is to be determined and the 0's in (1.2) are functions of this. There is a great deal to be said for this subset of models since it is not only more parsimonious in parametrisation but also in concept. This is because the forecast function is precisely that deriving from an EWR polynomial predictor with discount factor fl, see [5,15]. Hence, if the subset (1.3) provides an adequate model it may be claimed as preferable on grounds of parametric parsimony and conceptual parsimony.

1.3. Box-Jenkins seasonal models BJ in [2 (Chapt. 9)] extend the ARIMA model in order to be able to model seasonal time series. Their most general seasonal model can be written. q,(B) V '~ V',.%;= O( S),,,,

(I .4)

(1.7)

and the (steady seasonal) models

v y,

= (1 - O,B -

S'-

VLY, = (1 -- aLBL)a,.

(1.8) (1.9)

or common variations of these such as (1--q~B)~TLy, = ( 1 - O , B ) ( 1 - O L B L ) a , . (I - C B t ) y , = (I -

Ot.BL)a,,

(1.10) (1.11)

02 . . . . . 0t_ 1 = 0 is always assumed. Although it is perfectly possible to set these equal to some other function of 0 l, 0L, 0L +i there is no basis for deciding how this could be done or indeed whether it needs to be done; i.e. whether the setting to zero is sensible or adequate. Clearly, it is unreasonable to argue that models are selected from the full seasonal ARIMA class defined by (1.4). Rather, the BJ modeller implicitly restricts attention to a much smaller class of sparse models as candidates for selection. This will be seen to be important as this sub-class does not contain the models we introduce in later sections.

1.4. Mis-idenlification in Box-Jenkins modelling In the remainder of this paper we develop a number of our misgivings about the BJ modelling

S.A. Roberts. P.J. Harrison / Parstmonious modelling and forecasting

approach; the particular emphasis being on seasonal modelling. Using the wide view of parsimony we have discussed, one can define classes of parsimonious seasonal models not considered in the BJ approach, as we did in the non-seasonal case with the EWR(0, n, n) model mentioned earlier. However, in the seasonal case the models we define are expressable as ARIMA models which are distinct from, and definitely not a subset of, the BJ model class. These models are parsimonious both in parametrisation and underlying concept, in contrast to the BJ models which are based on a restricted view of parametric parsimony and involve no appeal to conceptual parsimony. The principal models involved are particular Dynamic Linear Models (DLMs). These occupy an important position in Bayesian forecasting as expounded by Harrison and Stevens [11]. Also considered, though regarded by us as somewhat less useful than the DLMs in practice, are 'Holt-Winters (HW) type' models, see [18], which re-inforce our main points. Both types of model possess a ready interpretation, being defined in terms of the evolution of such underlying time series components as level, growth, and seasonality. In view of this natural formulation they form a sound basis from which to study seasonal modelling, and in particular a methodology such as that of BJ in which such meaning is hidden and indeed is considered relatively unimportant. In this way we are led to the conclusion that the BJ approach as standardly applied is highly unsatisfactory in the seasonal case, and can often result in over-differencing and the adoption of an inappropriate model. As we shall see this is dictated by two main factors: (a) the restricted class of models considered, (b) the inadequate approach to model identification/selection. Further criticism of the BJ models is developed in [19], where the arguments are based on the HW type models.

2. Purely s e a s o n a l m o d e l s

2.1. Outline and notation In order to illustrate most of our main points, in this section we introduce some purely seasonal models which are often appropriate for time series

367

which oscillate, with non-stationary seasonal behaviour, about a constant level. In essence we shall suppose these models to appropriately describe some series, and consider the implications of this for the BJ modeller i.e. we examine the effect of applying the usual BJ methodology in analysing such a series. Throughout the following L denotes the length of the seasonal period (e.g. L = 12 for monthly data), h = 1 / ( L - 1 ) and H = (1, - h . . . . . - h ) T is an L x 1 column vector. Later we shall examine some trend-seasonal models.

2.2. A purely seasonal D L M An often appropriate model for non-stationary seasonal variation about a zero (or constant) level is Y, = rlo.t + c,,

no.,=nl.,_, + r,, ~ , . t = ~It+ l J _ l - h r t ,

(2.1) I~
Here ~,., represents the underlying (unknown) seasonal factor at time t for the ith month ahead, i = 0(L), 1. . . . . L - 1, and the observation noise term ¢, - (0, V) and the seasonal disturbance term r, - (0, S) are independent random variables. With 71, ffi (rl0.,, rll., . . . . . lit _ l.,) T, 8~1, = Hr,, F = (1,0 . . . . . 0) a n d G = ( G i / ) with Gij=O except G,.i+ ~ = 1, 1 ~< i ~< L - 1, (2.1) can be re-written as .I,', -----F l i t + e l '

li, -- Gli,_ ~ + 8li,,

(2.2)

i.e. as a (Constant) Dynamic Linear Model (CDLM) of Harrison and Stevens [11], such that seasonal factor estimates can be obtained for a given series using the Kalman filter recurrence relations. The model is called constant because as defined F, G, var(¢,), var(Svl,) are time invariant. We restrict attention to this case here for the analysis of ARIMA models, although there is no requirement for constancy in the general Bayesian forecasting framework. Note that other choices of the covariance matrix of 8li, are possible which also ensure that the factor estimates sum to zero, and here we choose a simply understood one. See [14] for a slight variation. As defined this model is time invariant and can be expressed in a linear form recognisable as a particular ARIMA model. The ARIMA coeffi-

S.A. Roberts. P.J. Harrison / Parsimonious modelling and forecasting

368

e, = ) ; -.f',_ i (1) = y , - s l . , _ l ,

Table 1

V/S

a

01 ( L ~ 4 )

01 ( L = I 2 )

0t._l = -(l

2

0.5 0.2

- 222~ 5.4 - ~-~

-0.5

20

- ~ 14 - ~

- a)

+ Aoe,,

So.,= Sl.t- I

s , . , = s , ÷ l . , _ l - A o h e ,,

-0.8

l~i~L-1,

(2.5)

where of course SL.,-I =So.t-l"

cients are functions of the variance ratio V / S and these can be explicitly derived; though this is not so for the trend-seasonal models. The equivalent A R I M A form is

xl, ,=

1+

B' y,= t-1

1-

~., OiB a,,

(2.3)

i

Writing s, = (So.,, sl., . . . . . SL- t.,) T we can write s, = Gs,_ 1 + Ae, where A = AoH. Again A may be chosen in many other ways, see for example [18]. Again we find that this predictor is equivalent to an A R I M A model of the form (2.3). with a, = e,, and such that

O,= - ( 1 -

ihAo),

l <~i <~L - 1 ,

(2.6)

with and so is precisely equivalent to the C D L M with

8,=-(1-iha), o~ = V / ( 1

-

I~
a),

ot = A O.

(2.4)

2.4. Model identification using the autocorrelation function

where

a = { ( 4 V / S + 1) I / 2 - 1 } ~ { 2 V / S } . Note that the 0's are all non-zero. A BJ analyst looking at the form of model (2.3) would not class it as being parsimonious since it apparently involves many parameters. However, it is parsimonious in the true sense since all the coefficients (hyper-parameters) 01 . . . . . 0L-I are functions of only one unknown parameter V / S ( a ) i.e. it has parametric parsimony of order 1 whilst comprising (in the monthly case) 11 non-zero terms. Moreover, in practice for most commercial applications V is much larger than S and the 0 ' s (and indeed the autocorrelations of x l . , - - s e e Section 2.4) are large. Even for quite small values of V / S the 0 ' s are still large as shown in Table 1 and it is only for V << S that the later 0 's approach zero. Nonetheless, we always have 0 < a < 1 and so - 1 ~ < 0 1 ~ < - ( 1 - h ) with an exact linear decrease in the magnitude of the 0 ' s from 1 to L - 1.

2.3. A purely seasonal Holt-Winters type model With ) , ( I ) denoting an I step ahead forecast from origin t, e, the one step ahead forecast error, and s,., the estimate at time t of the seasonal factor for the i th period ahead, i = 0(L), 1. . . . . L - 1. we may define a purely seasonal predictor by t , ( i ) =s,.,,

I = 1 . . . . . L(O),

For this model the autocovariance function y~. versus lag k for xl. , = (1 + EL.sllB')y, is ~k -~-

L- k)V+(L-

II.

k)(L-

k - 1 ) ( 2 L - k - 1)S

6(L-

l) 2

O~kL-1,

(2.7)

i.e. exhibits a cut-off after lag L - 1 as does the autocorrelation function Ok ='Yk/'fo. Also, it is easily seen that as V / S ,~ oo, then a ",, 0, 0, ',, - 1, Ok /~ ( L - k ) / L , 1 <~i, k ~ L - 1 and as V / S ",, O, then a l l , 0 , , , " - ( 1 - i h ) , O k ' ~ ( L - k ) ( L - k 1)(2L + k - 1 ) / L ( L - 1 ) ( 2 L - 1). So in particular we have the inequalities -1<0,<

-(1-ih).

l<~i<~L-1,

(L-k)(L-k-1)(2L+k-1) L(L-1)(2L-1)


L-k L

'

l~k~L-1.

(2.8) It is clear that the 0 ' s and O's are both linear in their pattern and generally large in value, espe-

S.A. Roberts, P.J. H a r r i s o n / P a r s i m o n i o u s modelhng and forecasting

369

Table 2

Lag i, k

L =4 l

p,

v ~ s

L=I2 2

~

3

1

~

',

"

V~S

-1

-1

V~S

_5•

-~I

10

I'~

12

14

0~

2

II

-"

12

-'12

23

-1

cially so for the typically large values of V / S that occur in commercial applications. As V / S decreases then the O's, O's fall off to zero more rapidly. An idea of these boundary values is provided by Table 2. Consideration of these facts can provide the basis for identification of the model. Note that if a series exhibits locally constant, non-stationary, seasonality (and that a full set of harmonic terms is necessary to describe it) then the natural operator to capture this is 1 + B + • .- + B L-~ which is apparent from the equivalences. Henceforth, we shall denote this operator by do(B). If other series components, e.g., trend, cycle, are stationary then an ARIMA model for xt., =do(B)); is appropriate. Therefore, to facilitate the identification of (2.3) and many other sensible models, it is clear that this operator should feature in the model identification process and the sample autocorrelation function (acf) of xl. , examined. It is rather strange then that it is the operator 1 - B L = ( 1 -- B ) d o ( B ) which is usually used to account for seasonal non-stationarity. BJ [2, p303] say that for seasonal series "one might expect ... the simplifying operation Vt.Y, = (1 - B L)); might be useful", and [2, p. 328] that the operator (1 e p B X 1 - B L) is "non-stationary in the seasonal pattern but stationary with respect to the basic interval". Yet this is equal to ( 1 - ~ B X 1 B ) d o ( B ) which is appropriate only if nonstationarity in trend also requires capturing by the term 1 - B, see Section 3. Jenkins [13, p. 33], describes a group of series which required "seasonal differencing ~ 2 to induce stationarity in the seasonal behaviour". Indeed the operator is useful, but only if non-stationarity in the level of the series also occurs! A further point is that partly because of this popular misconception, a constant term is frequently fitted to ~TLY,, and it is clear that this implies not just a stochastic level but also a fixed, deterministic growth term.

0

-1

.

-1 10 II

.

.

-1

.

o 11

iii

-1 0

2.5. Problems in Box-Jenkins identification 2.5.1. The identification method In BJ modelling the identification process involves study of the sample acf to determine firstly, the appropriate simplifying non-stationary operator, and subsequently its use with related tools (sample partial acf, spectrum) to suggest the staionary model form. The rationale behind its use with regard to differencing rests on the fact, [2, p. 174], that if an auto-regressive root "approaches unity .. • the autocorrelation function will not die out quickly and will fall off slowly and very nearly linearly". Thus, [2, p. 175], "failure of the estimated acf to die out rapidly might logically suggest that we should treat the underlying stochastic process as non-stationary", and "it is failure --. to die out rapidly that suggests non-stationarity. It need not happen that the estimated correlations are extremely high even at low lags". For example, they show, [2, p. 200], for a realisation of 100 observations from Try, = (1 - 0 . 8 B ) a , that the expected value of the first sample autocorrelation E ( r I ) = 0.429. 2.5.2. BJ identification when (2.3) is appropriate Since do(B) is not a routine operator for them BJ analysts would generally try VL = 1 -- B L first in attempting to transform a seasonal series to stationarity. However, even if they were to follow our suggestion and consider d ° ( B ) they would not identify the correct model (2.3) anyway. For this we have seen that the acf of x~., is not only slow to die out but has an exactly linear pattern with large positive values, and 0~. . . . . 0t._ ~ are very different from zero. This is behaviour not possessed by any of the sparse stationary models they envisage and so, in view of Section 2.5.1, the BJ modeller would undoubtedly interpret the acf of xL, as evidence of non-stationarity in trend and the need to difference the series.

S.A. Roberts, P.J. Harrison / Parsimonious modelling and forecasting

370

Thus, whether he considers 5 " ( B ) or not the BJ analyst will find himself examining the series x2. , = (1 - B ) x . , . This has an autocovariance function •y~ of lag k such that ")'~= 270 - 2Y1 = 2 V + L S / ( L

- 1), L - 1) 2,

"t~, = 2"h, - Y~,-1 - "h,÷l = - k S / ( l<~k~L-1, t

"~L =

--TL-I

~,~=0,

=

-- V,

k>L,

(2.9)

and the A R I M A form is

X2. ,

=

VLy

, ="

1 - ha ,-

B'-(1

a ) B L a,.

(2.10) Now for this over-differenced series we can see that as V / S T " oo then p~, 7 ' 0 for 1 ~< k ~ L - 1, p~_ '~ - 1/2, h a ",, O, and as V / S ",~ 0 then 0~, "~, -k/L(L - 1), P'L 7"0, h a 7"h = I ( L - 1). Thus we have the inequalities -k/L(L

- 1) < p~, < O,

1 <~k<~L-1,-½
o
1,0 <0~= 1 -a

< 1.

(2.11)

Clearly, even for very small values of V / S , P'k and 0,' for l ~ k , i ~ < L - 1 will be very small. Indeed for most commercial applications V : ~ S and hence they will be very close to zero indeed. In practice then, these p's obviously provide a very simple picture to test. P'z is usually large and highly significant whereas the intermediate values' pattern will not be identified in the presence of sampling variation. Hence, the sample acf will invariably look like that of the model (1.7). For example, suppose V / S is as small as 20, then a - - - A 0 = 0 . 2 and so h a = O . 2 / ( L - 1 ) = 0 . 0 2 for , ~_ L -- 12 (0.07 for L = 4) and 0'11= - _1_ 4s2,012 113" In this case the over-differenced form gives a model very close to =

x2. , = V'12y, -- at - 0.8a,_ 12, a typical BJ sparse model. In the BJ methodology this is a parsimonious model whereas the correct

model xl., = , ~ ' ( B)~; s4

= at + $ s a t _ l

+ ~a,_2+

"'" + ~ a , - l l

appears not to be. Hence the great attraction of over-differencing and the choice of a model implying (incorrectly) a stochastic level. Note that sampling variation will obscure the fact that the p's, 0 ' s of the model on x2. , lie on the boundary of the invertibility region (i.e. the 'true' moving-average operator has a root on the unit circle). Moreover, the usual diagnostic checks, which are well known to be rather weak, frequently fail to highlight such problems. It is usual and proper to examine the sample of acf of all possible differences in BJ modelling. Thus, although the acf of x2. , is clearly stationary a model on x3. , = ~7x2., might be considered. Examining this we see that x3., = ~7 V'Ly, -- [1 - ( 1 + h a ) B - ( 1

- a-

ha)B"

+(1 -a)aL+ Ja,, with autocovariances ~,~!'such that 3'~' = 0 for 2 ~< k L+l, and such t h a t ~ ' < 0 , ~'~,'+1 > 0, and also 7"t.-1 > 0 (so long as V / S > I l L ) and 7"L < 0 (so long as V / S > ~h). The exact expressions may be obtained by substituting W1 = 0 in (3.11). So double differencing has resulted exactly in the sparse operators and acf of BJ modellers, and in fact in the ' p o p u l a r ' model (1.6). Continuing with the previous example ( V / S = 20) then a model close to X3. ' = ~7 W I 2 y t

= a, - 1.02a,_ l - 0.78a,_ 12 + 0.80,_ t3 would be suggested. Due to the 'nice' acf pattern of x3. , and popularity of (1.6) we would suggest that a model of this form is quite likely to be chosen. So a double degree of over-differencing may be applied to give a model with the 'linear growth seasonal operator' V' V'L when we do not even have a stochastic or wandering level let alone growth, In support of these arguments some simulated series were generated. Figure 1 shows the moving average coefficients, theoretical acf, and sample acf of a particular realisation of length 120, for the case V / S = 100.

S.A. Roberts, P.J. Harrison / Parsimonious modelling and forecasting Moving average coefficients 0

Theoretical autocorrelations Pk

Sample autocorrelations/3k

r

s- t

l]lllIt T,,

. . . .

T.

l[TtrT,, _.6~---me99tez66~

lllllllllJl

t .......... l t

371



-

-

.

....

J

'1 I 1

., J'

, "

.T ~

/

.t

T ? o

6,



"i

~e

/~

_,a

Fig. I. Purely seasonal D L M (with V / S = 100).

3. T r e n d - s e a s o n a l

models

t, the corresponding H W Type predictor is L ( I ) = ml + s , , ,

3.1. The models

We now extend the previous models to situations requiring a stochastic trend component, and consider model identification in that context. The models we define are as follows:

m, = m,- i + alet. s I = Gs,_ i + A o H e , .

A linear growth seasonal D L M Extending (3.1) to include a local linear growth term ,8, with r a n d o m disturbance 8,8, - (0, W2) we have

A steady seasonal D L M When a constant level is inappropriate but growth is absent then the following D L M is useful:

Y, = P.t + ~h)., + ~1,

Yl = P-t + rl0., + c,.

#1 = # , _ j

P', = # 1 - 1 + 6#1"

,8, =,8,_, + 8,81,

I1, =

GII,_j + Hrl.

(3.1)

Here #1 represents the underlying (local) level of the series at time t and 8#1 - (0, W I). G and H are as defined in Section 2. A steady seasonal H W type model With m, as the estimate of the local level at time

(3.2)

+ ,8, + 6#1,

•1 = G n , _ t + Hr,.

(3.3)

A finear growth seasonal H W type model With b, as the incremental growth estimate, we extend (3.2) to

~,(I)=m, + b , . l + sl.,,

372 m,

S.A. Roberts, P.J. Harrison / Parsimonious modelling and forecasting

= mr_

= [-x(1

Ale ',

t + bt_ 1 +

b , = b , _ t + A2e,,

- x,)+(L-

k - 1)X ] o).

YL = --/~ LO'~ '

(3.4)

s, = Gs,_ t + A o H e t .

Again these models as defined are time invariant and can be expressed in ARIMA form, with the coefficients being functions of the variance ratios ( V / W t , V / W 2, V / S ) or the smoothing constants (A t, A 2, A0). However, for the DLMs these are the solutions of sets of simultaneous quadratic equations for which we do not have an analytic solution. Moreover the DLMs are not now equivalent to the corresponding HW types whose coefficients may be derived explicitly as linear functions of the smoothing constants. Details of these results are provided in [18,20] with tables giving some typical values of the ARIMA coefficients and boundary regions for the 0's and p's.

"f~ = 0,

k>L.

(3.9)

For the DLM 0j . . . . . 0L are not equal and there is not an exact equivalence with the HW type unless W t = 0 (reducing to the purely seasonal case) or S - - 0 (reducing to the steady non-seasonal DLM and exponentially weighted moving average or ARIMA(0, 1, 1), see [8]. The implied moving average operator is far from sparse for typical parameter values as is the acf which again displays a linear pattern, see Fig. 2. Thus, the sample acf of the correct derived series x2. t will be seen as slow to damp (though not always large in value), the basic indication in BJ modelling of the need for further differencing (see Section 2.5.1 ). Applying the operator gr to the model (3.5) yields

3.2. Identification and over-differencing of the steady seasonal models

x3. t = V' WLY, L

-(1 +o,)B-

The DLM (3.1) may be written as

E

(o,-o,_t)B'

t=2

+OLBt'+l with the autocovariances of x2. t being

7L

=

--

t-kh2S,

1-

O,'B'

a,.

(3.10)

For this model 0, * 0,_ I. 2 ~< i ~< L - 1 and so 0[ 0, 2 ~< i ~< L - 1. though these are generally very small indeed. The autocovariances are

Yo = 2 V + L W t +(1 + h ) S , "y,=(L-k)W

a,=

1 <~k<~L-1,

% = 4 V + 2W t + 2(1 + h + h 2 ) S ,

V.

2¢, = O, k > L .

= -2v-(1

(3.6)

Given a set of variance ratios these can be solved numerically to find the moving average operator for which ~'0 = (1 + E~_ t0,2)o~, ~', = ( - 0 , "/-1%,~]÷1, For the H W

"YI.

y,~ = 0,

"Y;.-t = V - h ( 1 t

7t.= - 2 V -

7~=0, 1-

XiB'

2<~k<~L-2, + h)S,

(3.11)

W t + hS,

y:_+~ = V,

type we have

x2.,= VLy, =

+ h)S,

e,,

(3.7)

k>L+l,

whilst for the HW type

t~

X 3. , = ~

with X,=X=-AI+hA AL =

1 -

A t -

o,

and the autocovariances

"Yo = [1 + ( L -

= [ 1 - ( 1 - A , + h A o ) B - (1 - A o - h A o ) B

l <~i ~ L - 1 ,

A o,

1)X 2 + X~] o~,

U~'Ly,

(3.8)

L

+(1 - A t - A o ) B L + t ] e , . Hence, over-differencing results in a model which is exactly sparse in the HW type case and almost so for the DLM. The acf is sparse (i.e.

S.A. Roberts, P.J. Harrison / Parsimonious modelling and forecasting Theoretical autocorrelations#k

Moving averagecoefficients# i

373

Sample autocorrehJtions Pk

r l r , T, r , , 1 ' , T , , , ,

I T'l'I

'rx,t,..

_

'l T T ,,,

,r ,r ,, ,, • ,,

r,~___

...

T,f

l

'r,, t.

1

tT .......... l

I T

1

1

1

..•

-

-

I

m

"

"

IT

,

6



1"

*

-

-

''

Fig. 2. S t e a d y s e a s o n a l D L M ( w i t h V / W = 10. V / S = 100).

p~, = 0, 2 < k ~< L - 2) in both cases, with dominant autocorrelations at lags 1, L - 1 , L, L + 1, which would seem to suggest one of the BJ models (1.6) or (1.7). This is a p h e n o m e n o n which we feel sure occurs quite often in practice. For example, consider the D L M of Fig. 2 with V / W 1 = 10, V / S = 100. For the correct model on x2. ,, Oh. . . . . 0tl are all close to - 0 . 2 5 and the P~ decline linearly from a value of Pt = 0.34. However, for x3. , we find the intermediate 0's, 02 to 0it are between 0.0013 and 0.0040 and so the model is very close to V' X7~2}; = (1 - 0.741B - 0.901B t2 + 0.664BI3)a, = (1 - 0.84B)(1 - 0.90B12)a,,

Moving averagecoefficients 8

II

..........

T

and indeed the acf shown displays the distinctive pattern of this very c o m m o n , but often inappropriate, model. 3.3. The linear growth seasonal models For the D L M we have xs., = ~ 7 ~ L y , =

with the autocovariances Yo = 4 V + 2 W t + L W 2 + 2(1 + h + h 2 ) S ,

Fig. 3. L i n e a r g r o w t h s e a s o n a l D L M ( w i t h V/I,Vt = 1 0 ,

1 ) W2 - ( 1 + h ) S ,

Sample autocorrelations Ok

Theoretical autocorrelations Pk

1

O,B' a,, i-I

7, = - 2 V + ( L -

. ..... ,,

1-

IITTTTTT=,

T t T.***,.,.o

T

T

~

1

V / W 2 = 400, V / S = 100).

T

T

l

,t

. _

_

t

,t

1'

S.A. Roberts, P.J. Harrison / Parsimonious modelling and forecasting

374

"yk=(L-k)W2,

2<~k<~L-2,

Yr.-, = V + W 2 - h ( 1

+ h)S,

-(2v+ w, - h S ) ,

At = tt, + ~o.t + Co.t + ~t,

YL+I = V, ~k----0,

(so c0., = y,),

c,.,='y, c o s ~ p . i + ~ , s i n ~ p . i

k>L+l,

displaying a linear pattern as with the earlier models (see Fig. 3). For the HW type L+I

1-

x3. , = ~ ' V L y , =

~ X,B' a,,

(~,

k-sin4'

cos~k

~, 1 +

8~, ]"

i.e.

c , , = he, + 1.,- i + (cos Lk" a~,t + sin ~k" 8~, ).

and then the equivalent ARIMA model is

with X~

More generally still we might define a trendseasonal-cyclic model by extension of this such that

= 1 -

A I -

h r . = - A 2,

hA o,

A 2-

(1-0B)(1-B)

1+

Xt=I-A2-(1

+ h ) A 0,

~0'B' i--I

2 <~ k <~ L - 1 ,

L+3

×(1-0~B-02B2)y,

=

l - ,~.l O, B ) a ,

At.+l = - ( 1 - A 1 - A 0 ) , and with the autocovariances being easily derived. There is no equivalence between the two unless 14:1= W 2 = 0 or S = 0 . Here the models and acf's are not sparse but the intermediate values are generally quite small. It is for this case alone then that the usual ARIMA models can often provide a fairly close approximation. This is not always the case though and even here the intermediate 0's, p's can be far from zero. 3.4. Other models

For those who feel that our arguments are based on a limited range of models we would point out that the results apply equally well to many other similar models. We may consider different parametrisations of our models e.g. different structures for A, H or the inclusion of stationary autoregressive terms. Taking the latter we might in (3.3) write fl, = Off,_ ~ + aft,, ~, = o:G~,_ 1 + aT, with 0 < 0 , ~ < 1, thereby modelling a shrinking trend component and damped seasonality. The equivalent ARIMA model is then

(1 - o B ) ( 1 - B )

where 01 = 2X cos if, 02 = -X2. Clearly this contains a wide number of useful models as special cases, and the arguments developed in this paper apply equally well to the identification of these. We have confined attention to the previous models for ease of presentation and also because they are in fact the most important cases in commercial applications. We may note in passing that only when 0 = 00 does a stationary autoregressive term of the BJ form 1 - ~B t- (as in (1.11)) with cb = 00L appear. We can see no rational basis for the identification of models with such terms. Indeed ther use seems to stem from the same misconception that underlies the previously mentioned belief that 1 - B t- is the natural non-stationary seasonal operator. See [19] for an examination of such BJ models with stationary autoregressive terms.

4. Discussion We detect two main reasons for the problems prevalent in BJ modelling. These concern (a) the BJ model class. (b) the BJ identification method.

1 + Y'. ,.,'B' y,

4.1. Parsimony and B J model formulation =

1-

O,B'

t-1

a,.

We have seen that the sub-class of ARIMA models forming the basis of the BJ approach does

S.A. Roberts. P.J. Harrison / Parsimonious modelling and forecasting

not include the DLMs or HW types of this paper. Nonetheless, these latter models are all parsimonious. Their ARIMA coefficients are functions of a small number of variance ratios (or smoothing constants), but their operators are certainly not sparse. It must be recognised, therefore, that parsimony does not in any sense lead naturally to the class of BJ seasonal models. Indeed, these rest on a restricted view of parsimony and are far less natural than the models we propose. The latter possess (a) conceptual parsimony--through their reasoned structure and interpretation, (b) parametric parsimony--in the wide sense that we have indicated. Concerning the identification and estimation of seasonal models, one might question just how many genuine free parameters (what order of parametric parsimony) they should possess. Now examination of our models indicates the presence of any or all of stochastic level, stochastic growth, and stochastic seasonality. Therefore, the moving average coefficients should typically be functions of at least one parameter ( V / S ) for a purely seasonal model, of two (V/W~, V / S ) for a steady seasonal, and three ( V~ W 1, V~ W~, V / S ) for a growth seasonal. These numbers are invariably sufficient, i.e. one parameter in place of L for the seasonal component is all that is necessary. However, to constrain a model at the identification stage to fewer parameters than this is dangerous. For example, consider the steady seasonal model (1.9) which we have seen a number of times in the literature. This is precisely equivalent to the HW type model (3.2) with A 0 = ( L - 1 ) A 1 ; and also happens to be the ARIMA form of an EWR steady seasonal model with 0c = fit where fl is the discount factor. Clearly, at the identification stage of modelling the trend and seasonal components should not have their speeds of evolution constrained together in this way, especially not in such an extreme manner where trend updating is markedly slower than for seasonality when all experience indicates that the reverse is invariably more appropriate. In fact there is frequently a case for reducing the parameters to two in the growth seasonal case by constraining the level and growth parameters to yield an EWR trend model (Brown's linear growth model). However, such considerations (of conceptual parsimony allied to parametric parsimony)

375

appear to have no place within the BJ approach, and the constraint of the BJ two parameter multiplicative model (1.7) has no such rational basis.

4. 2. BJ model identification BJ model identification is based largely on study of the sample autocorrelation function (acf). Although other supporting tools are used in stationary model identification, it is almost exclusively the acf pattern that is the basis for the choice of non-stationary operator. However, confining attention to autocorrelation structures is an inadequate approach to identifying a forecasting model. After all, the ultimate reason for the inclusion of, say, a second-order difference operator ~72 is to obtain a model with stochastic level and growth components which can be updated, i.e. to obtain an appropriate forecast function (the full nature of which is dictated by the general autoregressive operator qb(B) V2). The point is that the required forecast function should influence the choice of model. BJ [2, p. 168] argue that this "cannot be safely chosen by visual inspection of the time series itself" and with this we agree. However, we also feel strongly that neither can it be decided solely by inspection of the sample acf. A similar observation can be made about the moving average operator, where the tendency seems to be to include terms at those lags having large sample autocorrelations. So, for example, if for ~'tY, it is found that the sample autocorrelation at lag L is dominant then (1.9) will usually be identified. Yet, as we have indicated, such a model should never be identified because of its implications on the various model components. Such considerations about the updating of trend and seasonal components do not presently play a part in ARIMA model selection. Clearly they should do. The present approach merely compounds the problems of using an inadequate model class in the first place, and the recommended diagnostic checks will not rescue the situation. This tendency also influences the choice of non-stationary operator. For example, Granger and Newbold, [6, p. 102], in deciding upon the operator ~7~74 for a particular series (and ultimately the model (1.7)), comment that "'differencing has finally produced an autocorrelation pattern that is easy to interpret". Of course, the acf;s of the various possible 'differences' should be ex-

376

S.A. Roberts, P.J. Harrison / Parsimonious modelhng and /orecasting

amined as all are informative, but that with fewest non-zero autocorrelations is not necessarily the appropriate one as we have demonstrated. In general, a number of considerations should influence the choice of model: (a) The evidence of the identification tools sample acf, etc. (b) Study of the data. This can indicate the likely nature of the forecast function required and hence of the autoregressive structure required. Failure to consider this point frequently leads to such as the choice of models implying growth when a moments glance at the data would indicate that (on the basis of the available data alone!) this is not needed. (c) Thought about the implied updating of the trend and seasonal components (d) The context and purpose of the analysis. The usual view of a priori ignorance concerning the model form is often unreasonable. The modeller frequently has access to information besides that contained in his data. For example, if the model is to track through and forecast future data, then beliefs about the possible future evolution of the series will be relevant to model choice, particularly when little historical data is available. In a sales forecasting situation then, the available data might appear perfectly stationary in trend but one would often have a high personal probability associated with future wandering of the level of sales. On the other hand, in process control a model with a static mean is more likely to prove satisfactory. It was of course, to cater for situations characterised by model uncertainty, shortages of data, and the prospect of future changes, that multi-process modelling was developed by Harrison and Stevens [10,11]. The general importance of meaningful, structured models with which the user can communicate is also emphasised in Bayesian forecasting.

(a) The models we have defined are all parsimonious. Moreover, they are distinct from the usual BJ seasonal ARIMA models which are founded on a restricted conception of parametric parsimony, involve the use of sparse operators, and thereby imply sparse autocorrelation functions. (b) Our models will not be identified by the BJ modeller. This is because he is anticipating models with sparse operators and acf's. Hence the distinctive autocorrelation patterns of our models will be obscured by sampling variation and any attenuation in the sample acf misinterpreted as evidencing non-stationarity. (c) Over-differencing often occurs in BJ modelling. The afore-mentioned misinterpretation frequently results in the application of over-differencing. In practice sampling variation will obscure the fact that the 'true' autocorrelations of the overdifferenced series lie on the boundary of the invertibility region i.e. correspond to a moving average operator with a root on the unit circle. This over-differencing effect may help to explain some of the occurrences of near unit moving-average roots for seasonal models which have been reported, e.g. Prothero and Wallis [17, p. 479]. (d) The BJ model identification process is unsatisfactory. We suggest that if a sparse model (and acf) appears appropriate for V't,Y, then it is most likely that over-differencing has occurred. Although an almost sparse model for ~7 V~..v, can occur without over-differencing even this cannot be decided by the acf alone. Study of the acf alone to determine the appropriate non-stationary operator on the basis that a linear pattern suggests non-stationarity is unreliable. (e),Sa(B)=l+B+ . . . + B L-1 is the appropriate operator to account for seasonal nonstationarity. To compound the over-differencing

effect the mistaken idea has arisen that 1 - Bt" is the correct operator for this purpose. In fact the general seasonal ARIMA model would more properly be written (assuming a full set of harmonics is necessary to describe the seasonal pattern) as

5. Conclusion

We have examined univariate seasonal modelling in the light of results concerning some important state-space models. In regarding these models as appropriate for many time series we have been led to criticism of current modelling practice. The main points are that

,(BI(I-B/r 1+

B' y,=0(B/a, t--|

and denoted ARIMA(p, r , s , q). In fact, more generally still we should encompass non-stationary harmonic seasonal and cyclic representations by denoting the model as ARIMA( p, r, s, c, q) where

S.A. Roberts. P.J. Harrison / Parsimonious modelling and forecasting

s = the number of complex roots on the unit circle at seasonal frequencies and c - - t h e number of such at non-seasonal frequencies. References [1] G.E,P. Box and G.M. Jenkins, Some recent advances in forecasting and control, Appl. Statist. 17 (1968) 91-109. [2] G.E.P. Box and G.M. Jenkins, Time Series Analysis. Forecasting and Control (Holden-Day, San Francisco, 1970). [3] R.G. Brown, Smoothing. Forecasting, and Prediction of Discrete Time Series (Prentice-Hall. Englewood Cliffs, 1963). [4] C. Chatfield and D.L. Prothero, Box-Jenkins seasonal forecasting: Problems in a case study, J. Roy Statist. Soc. Ser. A 136 (1973) 295-315. [5] E.J. Codolphin and P.J. Harrison, Equivalence theorems for polynomial-projecting predictors, J. Roy. Statist. Soc. Set. B 37 (1975) 205-215. [6] C.W.J. Granger and P. Newbold, Forecasting Economic Time Series (Academic Press, New York, 1977). [7] P.J. Harrison. Short term sales forecasting, Appl. Statist. 14 (1965) 102-139. [8] P.J. Harrison, Exponential smoothing and short term sales forecasting, Management Sci. 13 (1967) 821-842. [9] P.J. Harrison, Contribution to the discussion of "Box-Jenkins seasonal forecasting: problems in a case study", by C. Chatfield and D.L. Prothero, J. Roy. Statist. So<'. Ser. A 136 (1973) 319-324.

377

[10] P.J. Harrison and C.F. Stevens, A Bayesian approach to short term forecasting. Operational Res. Quart. 22 (1971) 341-362. [11] P.J. Harrison and C.F. Stevens, Bayesian forecasting, J. Roy. Statist. Soc. Ser. B 38 (1976) 205-228. [12] C.C. Holt, Forecasting seasonal and trends by exponentially weighted moving averages, Carnegie Institute of Technology. Pittsburgh. (1957). [13] G.M. Jenkins. Practical experiences with modelling and forecasting time series, Gwilym Jenkins and Partners (]979). [14] F.R. Johnston and P.J. Harrison, An application of forecasting in the alcoholic drinks industry,J. Operational Res. Soc. 31 (1980) 699-709. [15] E. MacKenzie, An analysis of general exponential smoothing, Operations Res. 24 (1976) 131-140. [16] P. Newbold, The principles of the Box-Jenkins approach. Operational Res. Quart. 26 (1975) 397-412. [17] D.L. Prothero and K.F. Wallis, Modelling macro-economic time series,J. Rov Statist. Soc. Ser. A 13q (1976) 468-486. [18] S.A. Roberts, A general glass of Holt-Winters type forecasting models, Management Sci. 28 (8) (1982) 808-820. [19] S.A. Roberts, The formulation of statistical forecasting models and Box-Jenkins modelling, submitted (1982). [20] S.A. Roberts. Equivalences and validity reglons for some important dynamic linear models, submitted (1982). [21] P.R. Winters, Forecasting sales by exponentially weighted moving averages Management Sci. 6 (1960) 324-342.