International Journal of Forecasting 22 (2006) 181 – 194 www.elsevier.com/locate/ijforecast
Modulated cycles, an approach to modelling periodic components from rapidly sampled data Diego J. Pedregala,T, Peter C. Youngb,1 a
Escuela Superior de Ingenieros Industriales, Universidad de Castilla-La Mancha, Avda. Camilo Jose´ Cela, s/n. 13071 Ciudad Real, Spain b Centre of Research on Environmental Systems and Statistics (CRES), Environmental Science Division, Lancaster University, Lancaster LA1 4YQ, UK
Abstract Unobserved components models provide a natural framework for the estimation and forecasting of periodic components embedded in the time series, such as business cycles or seasonality. However, periodic behaviour can be complicated to analyse when dealing with rapidly sampled data of the kind encountered in electricity demand forecast problems. Data of this nature tend to show a multiplicity of superimposed periodic patterns, including annual, weekly and daily cycles. In this paper, we present a new seasonal component model based on modulated periodic components, which is capable of replicating multiplicative periodic components in an efficient manner, in the sense that the number of parameters in the model is much lower than in a standard unobserved components model without modulation. The model performance compares favourably with respect to standard techniques on a rolling forecasting exercise based on actual hourly electricity load demand data at a certain transformer in the UK. D 2005 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved. Keywords: Unobserved components models; State space models; Periodicity; Seasonality; Kalman Filter; Fixed Interval Smoothing; Noise variance ratio; Modulation
1. Introduction For many years, the problem of extracting periodic signals from noisy annual, monthly or quarterly time series has occupied the minds of mathematicians and T Corresponding author. Tel.: +34 926 295430; fax: +34 926 295361. E-mail addresses:
[email protected] (D.J. Pedregal),
[email protected] (P.C. Young). 1 Tel.: +44 1524 593966; fax: +44 1524 593985.
statisticians from a variety of disciplines. However, much of this research has concentrated on fairly straightforward problems where the periodicity is composed of only a few components, such as annual cycles in monthly or quarterly time series. Over the last few years, this situation has tended to change and there are more and more areas in business, the economy and the natural sciences where rapidly sampled data (e.g., 15 min, 30 min, hourly and 4hourly) are available over long periods of time. Such data reveal diurnal, weekly, monthly and even annual
0169-2070/$ - see front matter D 2005 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved. doi:10.1016/j.ijforecast.2005.03.001
182
D.J. Pedregal, P.C. Young / International Journal of Forecasting 22 (2006) 181–194
periodicities that give rise to complex frequency spectra with many peaks at the fundamental and harmonic periods associated with these cycles. Such data are not easy to handle using the existing methods of signal extraction and seasonal adjustment, and so there is a need for more comprehensive, wellformalised modelling strategies that are designed to cope with the complexities of these rapidly sampled time series. There appear to be two main problems with rapidly sampled data. First, the length of the series necessary to achieve a sufficient representation of the data has to be very large compared with more standard data. Secondly, the number and complexity of the modes of behaviour present in the data are large and, consequently, the number of parameters necessary to achieve a reasonable representation of the series is too large to be handled by methods developed for more standard series (e.g., monthly). In order to proceed, therefore, some simplifying assumptions need to be made to make the analysis computationally feasible. One typical example of such data are the time series associated with electricity demand, as discussed in later examples, where several years of half-hourly data are often available. These contain diurnal periodic components that interact with weekly and annual cycles, together with long run trends and influences from input variables, often in a nonlinear manner. Already a large number of different approaches to forecasting such demand series have been suggested, ranging from earlier methods based on simple regression to optimal estimation techniques (Kalman filtering) and Artificial Neural Networks. Some typical examples are Abu-ElMagd and Sinha (1982); Bunn and Farmer (1985); Engle, Mustafa, and Rice (1992); Harvey and Koopman (1993); Irisarri, Windergren, and Yehsakul (1982); Bunn, Larsen, and Vlahos (1996); Darbellay and Slama (2000); Henley and Peirson (1998); Ramanathan, Engle, Granger, Vahid-Araghi, and Brace (1997); and Zhang and Dong (2001). From all of such approaches, the unobserved components (UC) methodology is one option that seems very well suited for the purpose but not fully exploited yet. One problem with this approach is that there are several particular methodologies proposed within this UC framework. The best known in the economics literature are perhaps the following: X-12 ARIMA and its predecessors (Findley, Monsell, Bell,
Otto, & Chen, 1996 and references therein); deterministic optimisation methods proposed for signal extraction, from which one of the best known among economists is the dregularisationT approach (e.g., Akaike, 1980; Hodrick & Prescott, 1980; Young & Pedregal, 1999); dreduced formT approaches based on the decomposition of ARIMA models (see, e.g., Maravall & Go´mez, 1998 and references therein); and the state space (SS) approach or dstructural approachT in which the models for the components are directly specified as a SS system (e.g., Harvey, 1989; West & Harrison, 1989; Young, 1984; Young, Pedregal, & Tych, 1999). The latter approach to UC modelling seems one of the most promising to the problem of modelling and forecasting complex periodic data and some attempts have already been made in this regard, such as Harvey and Koopman’s (1993) SHELF computer package. In the present paper, we describe a state spacebased UC approach that includes a new seasonal component model that is particularly capable of replicating multiplicative dseasonalT components and handling rapidly sampled time series. The new seasonal component model and a new method of estimation take advantage of the spectral properties of the Dynamic Harmonic Regression (DHR) model (Young et al., 1999). The technique is illustrated with examples based on the modelling and forecasting of 4-hourly electricity demand data.
2. The unobserved components time series model It is well known in the UC literature that there are many ways of decomposing the time series, given the inherent subjectivity in the definition of the components. Nevertheless, a very successful and widely accepted univariate version of the UC model is yt ¼ Tt þ Ct þ St þ et
ð1Þ
where y t is the observed time series; T t is a trend or low-frequency component; C t is a damped cyclical component; S t is a seasonal component; and e t is a zero mean value, serially uncorrelated white noise series, with constant variance r 2 (also known as the dirregularT component). Eq. (1) is considered here as the observation equation in a discrete-time SS model which describes the stochastic evolution of state
D.J. Pedregal, P.C. Young / International Journal of Forecasting 22 (2006) 181–194
variables associated with each of the UCs. The overall UC model is constructed by assembling together all of these individual SS forms of the components. It is appropriate, therefore, to consider first the specific form of the SS models for each of the stochastic components in Eq. (1). 2.1. Long-term trend models There are a number of possibilities available in the literature for modelling stochastic trends. One general formulation is the Generalised Random Walk (GRW), which takes the form: x1t g1 a b x1 þ ¼ 0 c x2t x2 t1 g2 t x1t Tt ¼ ð 1 0 Þ : ð2Þ x2t Here, a, b, and c are constant parameters; T t is the trend component (i.e., the first state, x 1t ); x 2t is a second state variable (generally known as the dslopeT); g 1t and g 2t are zero mean, serially uncorrelated white noise series with constant block diagonal covariance matrix Q. This model includes, as special cases, the Random Walk (RW: a =1; b =c =0; g 2t =0); Smoothed Random Walk (SRW: 0 b a b 1; b = c = 1; g 1t = 0); the Integrated Random Walk or Second-Order Random Walk (IRW: a = b = c = 1; and either g 1t = 0 or g 1t =g 2t ); the Local Linear Trend (LLT: a = b = c = 1); and the Damped Trend (DT: a = b=1; 0 bc b1). In the case of the IRW, x 1t and x 2t can be interpreted as level and slope variables associated with the variations of the trend, with the random disturbance entering only via the x 2t equation. 2.2. Cyclical and seasonal component models Although these two types of component are often considered separately, both can be treated in the same way from a modelling standpoint, since they both reflect periodic behaviour. The difference lies only in the period considered: longer than 1 year for cyclical components, with dseasonalT normally reserved for the annual cycle. Many possibilities are available for modelling periodic components, but the primary ones that have proven particularly useful in practice are based either on the use of dummy
183
variables or the introduction of trigonometric terms (see, e.g., Harvey, 1989; Pedregal & Young, 2002 for further details). Previously, our preferred option has been to use a Dynamic Harmonic Regression (DHR) model, where the trigonometric terms occur directly in the UC observation equation (see Young et al., 1999). Here, however, we combine this with the alternative approach, where the trigonometric terms are included in the SS equations, to allow for dmodulated cyclesT. By including the periodic behaviour in both the observation and the state equations, it is possible to generate a UC that accounts for different periods simultaneously. Consider, for example, combining the following trigonometrically defined state equation (see e.g., Harvey, 1989; Pedregal & Young, 2002):
ait a4it
¼ qi
cosxi sinxi
sinxi cosxi
ait1 a4it1
þ
git g4it
ð3aÞ
where q i is a damping factor and x i is the periodic frequency, with an observation equation that contains a periodic component S t of the DHR model form, St ¼
R X
ait cos xj t þ bit sin xj t
ð3bÞ
i¼1
where x j p x i (i =1, 2, . . . , R) and parameters b it (i = 1, 2, . . . , R) are defined exactly as in Eq. (3a) with innovations processes independent of those on which a it depends. We will refer to this model as the DHR Modulated (DHRM) model. This model introduces multiplicative cycles since periodic functions of a given frequency x j in the observation Eq. (3b) are multiplied by parameters that are themselves periodic functions of a different set of frequencies x i (i = 1, 2, . . . , R), defined by the state Eq. (3a). These are known as dmodulated signalsT in the signal processing literature. The peculiarity and advantage of this new model is that the amplitude of the shortest period cycle is modulated by the longer one, so giving a different pattern for each realisation of the short period cycle. The importance of this is that the mixture of cycles made possible by the combination of Eqs. (3a) and (3b) reduces the dimension of the state vector. This, in turn, reduces the number of stochastic parameters associated with
184
D.J. Pedregal, P.C. Young / International Journal of Forecasting 22 (2006) 181–194
UCs that need to be estimated, when compared with more standard formulations of the UC model. It is for this reason that the incorporation of modulated
cycles proves valuable in modelling rapidly sampled time series that have many interacting periodic components.
3. Identification and estimation of modulated cycle models Having defined SS structures for all the components of the UC model, it is straightforward to assemble them in an aggregate SS model described by the following state and observation equations (see, e.g., Harvey, 1989; Young et al., 1999): State Equation : xt ¼Fxt1 þ gt Observation Equation : yt ¼ Ht xt þ et where xt is an n-dimensional state vector; y t is the scalar observed variable; g t is an n-dimensional vector of zero mean, white noise inputs (system disturbances) with diagonal covariance matrix Q; and e t is the white noise variable in Eq. (1), which is assumed to be independent of g t . F and Ht are, respectively, the n n state transition matrix and the 1 n observation vector, which relates the state vector xt to the scalar observation y t . In order to illustrate how the SS form is used to implement the DHRM model proposed in the previous section, a simple example is presented here. It consists of a signal of period 6 samples per cycle (s/c) that is modulated by a RW process and one cycle of period 42 s/c. The SS form for such a component is given in Eq. (4). 2 3 2 32 3 2 3 1 0 0 0 0 0 x1t x1t1 g1t 6 x2t 7 6 0 1 0 76 x2t1 7 6 g2t 7 0 0 0 6 7 6 76 7 6 7 6 x3t 7 6 0 0 cos 2p 76 x3t1 7 6 g3t 7 sin 2p 0 0 42 42 6 7¼6 6 7 7 6 7 2p 6 x4t 7 6 0 0 sin 42 cos 2p 0 76 x4t1 7 þ 6 g4t 7 0 42 6 7 6 6 7 7 6 7 4 x5t 5 4 0 0 0 54 x5t1 5 4 g5t 5 0 cos 2p sin 2p 42 42 0 0 0 0 sin 2p cos 2p x6t x6t1 g6t 42 42
2p yt ¼ cos 6
2p sin 6
2p cos 6
0
2p sin 6
0 xt þ e t :
ð4Þ
Given this form for the overall model, the Kalman Filter (Kalman, 1960) and the associate recursive Fixed Interval Smoothing algorithm (Bryson & Ho, 1969; Harvey, 1989; Young, 1984) provide the basis for estimating the state of the SS model and, therefore, the components in the UC model. 3.1. Estimation of hyper-parameters Before the recursive algorithms can be utilised, the variances of all the stochastic noise inputs present in the model (often called hyper-parameters) must be known or estimated in some way. It seems clear that the problems of identification and subsequent parameter estimation for the complete state space form are non-trivial. A usual way to deal with the problem is to formulate it in maximum likelihood (ML) terms (e.g., Harvey, 1989). Assuming that all the disturbances in the state space form are normally distributed, the ML function can be computed using the Kalman Filter via dprediction error decompositionT and then used in the numerical optimisation of the hyper-parameters. This is the generally accepted method, because of its well-known theoretical basis. However, the optimisation can be very complex, even for relatively simple models, due to the flatness around the optimum (Young et al., 1999). An alternative approach that does not suffer from these
D.J. Pedregal, P.C. Young / International Journal of Forecasting 22 (2006) 181–194
185
problems is to use an estimation procedure based on the spectral properties of the model. A full description of this algorithm can be found in Young et al. (1999). The parameters are estimated so that the logarithm of the model spectrum fits the logarithm of the empirical pseudo-spectrum (either an AR-spectrum or periodogram) in a least squares sense. In order to consider this in more detail, let us consider a full DHRM model given by " # m R X X yt ¼ aijt cos xj t þ bijt sin xj t þ et j¼1
i¼1
where some of the parameters a ijt , b ijt are modelled either as a trigonometric cycle (of an associated frequency x i ) and the rest are modelled as any process within the GRW class. Beware that the difference between this equation and Eq. (3b) is that a second sum operator is affecting the frequencies in the sine and cosine terms. The model spectrum can then be synthesised in two steps: (a) derivation of the spectrum of the TVP models and (b) derivation of the spectrum of the sinusoidal components modulated by the TVP. 3.2. Spectra of TVP models In order to derive the spectrum of the TVP models, it is necessary to first obtain the Transfer Function (TF) form (or dreduced formT) of its SS description. For example, in the case of the trigonometric cycle for a frequency x i the TF form is, yt ¼
1 þ aB git þ et 1 þ bB þ cB2
with a = q{sin(x i ) cos(x i )}, b =2q cos(x i ) and c =q 2. The pseudo-spectrum of this process is f y ðx Þ ¼
1 2p
1 þ a2 þ 2acosðxÞ 2 2 r þ r ; 1 þ b2 þ c2 þ 2bð1 þ cÞcosðxÞ þ 2ccosð2xÞ gi
xa½0; p:
This is a function in x with a peak at x i . The spectrum of any model within the GRW family can be derived in a similar manner (see Young et al., 1999 for further details). 3.3. Spectra of the DHR model with periodic modulated parameters Taking advantage of some basic Fourier transform properties, it is possible to show that the pseudo-spectrum of a single DHR component of the form s t = a it cos(x j t)+ b it sin(x j t), in which the parameters’ variations are governed by a trigonometric equation cycle for a frequency x i is (
1 þ a2 þ 2acosðx xj Þ r2 1 þ þ þ 2bð1 þ cÞcosðx xj Þ þ 2ccosð2x 2xj Þ gi ) 1 þ a2 þ 2acosðx þ xj Þ r2 þ 1 þ b2 þ c2 þ 2bð1 þ cÞcosðx þ xj Þ þ 2ccosð2x þ 2xj Þ gj
1 Sij ðxÞ ¼ 2p
b2
c2
where a, b and c have the same definitions as before. Beware that the frequency x i is implicit in the definition of constants a, b and c. S ij (x) is then a function of x with two peaks at frequencies x j x i and x j +x i , and with a trough between them centred at frequency x j . Similar expressions can be found for any of the trend and seasonal/ cyclical components of the previous sections (see, e.g., Young et al., 1999).
186
D.J. Pedregal, P.C. Young / International Journal of Forecasting 22 (2006) 181–194
Fig. 1 shows three of these functions, namely, the fundamental period of a daily cycle in 4-hourly data, i.e., 6 samples per cycle (s/c); modulated by three harmonics of a weekly cycle (42, 21 and 14 s/c). In other words, the SS form of such a model would be composed of an observation equation that is a DHR term (one sine and one cosine) of period 6 s/c, modulated by three trigonometric cycles of periods 42, 21 and 14, respectively. Each term produces two peaks in the spectrum, symmetrical with respect to the daily period. The longer the period, the closer it is to the daily peak. The distance between the peaks is constant in period, in this case 42 s/c. Note that in order to incorporate the central daily peak at 6 s/c in the model, an additional, standard DHR term for that period modulated by a GRW parameter would need to be included. 3.4. Full estimation algorithm in the frequency domain The spectrum for the DHRM model is formed by the addition of the spectra of the individual terms (fundamental frequency and harmonics) for all the components, and it can also be expressed in terms of the noise variance ratio (NVR) parameters NVRij =r ij2/r 2, i.e., " # " # m R m R X X X X 4 2 2 2 4 rij Sij ðxÞ þ r ; or fy ðx; NVRÞ ¼ NVRij Sij ðxÞ þ 1: fy x; r ¼ j¼1
i¼1
j¼1
i¼1
The problem then is to find the set of parameters NVR (and any other hyper-parameters present in the model, such as the a parameters in the SRW model) which yield the optimal least squares fit to the empirically estimated spectrum. Although a linear least-squares fit is the most obvious, Young et al. (1999) show that substantial advantages can be found when the following nonlinear objective function is used, 1 h n oi2 TX I fy ; fˆ 4y ¼ log fy ðxk Þ log fˆ 4y ðxk ; NVRÞ : k¼0
Here, f y (x k ) is either the sample periodogram or AR spectrum of the time series, with the AR order in the latter case identified by the Akaike Information Criterium (AIC). Using the log transformed spectra yields much better defined NVR estimates since it concentrates attention on the most important shape of the dshouldersT associated with the harmonic peaks in the AR spectrum. The linear solution is used as an initialisation for this nonlinear optimisation, which is computationally very fast. Typical spectra of modulated models 42
42 5
14
log(Power)
10
21
21
14
3
10
1
10
15
8.57
6
4.62
3.75
Period (samples/cycle) Fig. 1. Typical spectra of a daily DHR term of 4-hourly data (6 samples/cycle), modulated by a weekly trigonometric cycle, plus two of its harmonics (42, 21 and 14 samples/cycle).
D.J. Pedregal, P.C. Young / International Journal of Forecasting 22 (2006) 181–194
187
4. Empirical examples
4.1. Specification of the model
In this section, the capabilities and flexibility of the methodology described in the previous sections is applied to electricity demand data from a U.K. electricity supplier. Fig. 2 shows 4 weeks of 4-hourly electricity demand for two seasons of the year. The data reveal all the typical features commonly found in most energy demand series: by eye, it is easy to recognise strong daily and weekly patterns. The daily pattern is different for some days (especially weekends) and it also changes according to the season of the year. Other features, not shown in the figure, are also present in the series, such as an annual cycle (due mainly to the climate effects), as well as outliers, bank and public holiday effects and missing data. The time series in Fig. 2 have some peculiarities. The daily patterns only differ clearly between two long seasons rather than four (autumn–winter and spring–summer); and the typical evening peak in winter is not very clear (except at weekends) because the series are only sampled every 4 h. However, it can be seen that the demand is sustained in the winter evenings, contrasting with the rapid decline in summer evenings. A second peculiarity is that the nonlinear dair-conditioningT effect, common to many countries, is not present due to the relatively moderate temperatures during the U.K. summer.
The most obvious univariate UC model of the data in Fig. 2 is of the form, yt ¼ Tt þ At þ Wt þ Dt þ et
ð5Þ
where y t is the observed 4-hourly electricity demand; T t is a trend or low-frequency component; A t is an annual cycle; W t is a weekly component; D t is a daily component; and e t is the (irregular) white noise component. Depending on the objectives of the study and the availability of data, some a priori simplifications might be sensible. For example, the estimation of a separate annual cycle is essential when long-term forecasts (at least several months ahead) are required and several years of data are available. However, if only few years of data are available, as here, then the estimation of the annual cycle may be inaccurate and forecasts based on such a model would be dangerous. Even in the case where there are many years of data but the objective is to forecast in the short term (up to 1 week), the approximation of the trend plus the annual cycle by a simple trend may be quite sensible, producing good practical results. In the present study, only 2 years of data were made available to us and the objectives are, therefore, to forecast the series up to 1 week ahead. Given the possible inaccuracy in the estimation of the annual component in these con-
Load Demand Pattern in Summer (top) and Winter (bottom)
MW
25 20 15
MW
30 25 20 15 42
84
126
Samples Fig. 2. Typical 4 weeks of 4-hourly electricity demand data for a U.K. electricity supply company in summer (top plot) and winter (bottom plot).
188
D.J. Pedregal, P.C. Young / International Journal of Forecasting 22 (2006) 181–194
ditions, it was eliminated from the UC model (5), its effects to be absorbed within the trend component. Fig. 3 illustrates the identification stage in the frequency domain. The AIC identified AR(48) spectrum of the demand series (solid line) is clearly complex but it reveals peaks of similar height disposed symmetrically around all the daily periods (6, 3 and 2 s/c), although with some distortion due to the interference of the long spectral effect of the trend. This is precisely the type of spectral behaviour typical of modulated components, as discussed in the previous section. The DHRM model was based initially on the characteristics of this spectrum but was finally selected on the basis of the usual statistical criteria (error diagnostics, forecasting accuracy, etc.), as well as ensuring that the model could be interpreted well in dstructuralT or dphysicalT terms. In relation to the latter requirements, it is generally admitted that models with a clear physical meaning are preferable to dblack-boxT models. The physical meaning (i.e., interpretability of the estimated components) is a primary objective of our approach. Due to the inherent subjectivity in UC modelling, however, many specifications are possible. Since the distance between all the peaks in the spectrum and their neighbours is always 42 s/c in period (see Fig. 3), many different DHRM parameterisations are possible, but not all of them are equally meaningful. The option selected, however, may depend on the objectives of the research.
A reasonable DHRM model used in this application is one composed of the following unobserved components: ! Trend component, considered as an IRW model. ! Standard weekly component, based on the fundamental frequency (42 s/c) and two of its harmonics (namely 21 and 14 s/c) modulated by RW TVP. ! Daily component, composed of the addition of two subcomponents: Standard daily subcomponent, based on the fundamental frequency (6 s/c) and its harmonics (namely 3 and 2 s/c) modulated by RW TVP. Modulated daily subcomponent, based on the daily frequencies (observation equations) modulated by the weekly frequencies (state equations). o
o
In the above model, all the empirically identified spectral peaks appearing in Fig. 3 are recovered by some peak of the model spectrum. 4.2. Simulation example A simulation example is considered in this subsection, before we proceed to the analysis of the real data. This demonstrates how a model, such as that proposed above, can provide a good representation of demand data and can be estimated well by the approach discussed in previous sections. The simu-
Empirical AR-Spectrum (solid) vs Fitted Spectrum (dashed) 6
42
log(Power)
2
10
2 3
0
10
20
10
6.67
5
4
3.33
2.86
2.5
2.22
2
Samples / Cycle Fig. 3. Empirical AR(48) spectrum of the electricity demand series (solid) and fit of DHRM model that includes modulated cycles for the electricity load demand data (dashed).
D.J. Pedregal, P.C. Young / International Journal of Forecasting 22 (2006) 181–194
lation model consists of 12 full weeks of 4-hourly data (504 samples) generated by the following UC model: yt ¼ Tt þ Wt þ Dt þ et g Tt ¼ 1t g1t cN 0; 106 2 et cN ð0; 0:64Þ Wt ¼ a2t S42;t þ a3t S21;t þ a4t S14;t Mt ¼ C42;t þ C21;t þ C14;t Dt ¼ Mt S6;t þ S3;t þ S2;t þ a5t S6;t þ a6t S3;t þ a7t S2;t where all the a jt are time varying parameters g simulated as RWs, i.e., ajt ¼ jjt for j = 2, 3, . . . , 7, and 2pt 2pt Sk;t ¼ cos þ sin k ¼ 2; 3; 6; 14; 21; 42 k k
Cl;t 4 Cl;t
"
¼
2p l 2p sin l cos
2p l 2p cos l sin
#
Cl;t1 4 Cl;t1
!
þ
glt g4lt
In other words, the simulation includes all the elements discussed previously: an IRW trend; an additive weekly component; a daily component composed of the additive part usual in DHR models, plus a weekly modulated part and a white noise irregular component. All the noise inputs present in the model are simulated independently. The simulated series is shown in Fig. 4. It is clear that the series closely resembles the properties of the real data shown in Fig. 2, and the modulation effect is clear in the daily component. Fig. 5 shows the difference between these true and estimated components with confidence bands added to the plots in order to facilitate a statistical assessment. It is clear that the estimation procedure has done a good job, with all the estimated components, without exception, lying inside the estimated confidence bands. 4.3. Estimation of the model for the electricity load demand data
l ¼ 42; 21; 14 gjt cN 0; 106 j ¼ 2; 3; N ; 7 g42;t cN 0;2 105 g21;t cN 0;5 106 g14;t cN 0;5 106 g442;t cN 0;2 105 g421;t cN ð0;5 106 g414;t cN ð0;5 106 Þ: All noises involved in the simulation are independent of each other.
When we consider the real demand data, the estimation stage in the frequency domain presents no problems at all, despite the number of parameters involved, mainly because of the good definition of the spectral peaks in the spectrum. The results of such an estimation procedure are presented in Table 1, and Fig. 3 shows the fit of the model to the empirical AR(48) spectrum. The estimated components for 4 weeks of data are shown in Fig. 6. All of them possess the properties assumed: the trend evolves smoothly in time; the daily
Simulated series 130
Simulation
120 110 100 90 0
50
100
150
189
200
250
300
350
400
450
500
Samples Fig. 4. Twelve full weeks of 4-hourly data from a simulation of a DHR model including modulated daily cycles (DHRM).
190
D.J. Pedregal, P.C. Young / International Journal of Forecasting 22 (2006) 181–194
True trend - estimated
True weekly - estimated
1 0.5
2
0
0
-0.5
-2
-1
True daily - estimated
Estimated Irregular
2
2
0
0
-2
-2 100
200
300
400
Samples
500
100
200
300
400
500
Samples
Fig. 5. DHRM modelling errors: differences between each of the true and estimated components, with the 95% confidence band, and the estimated irregular component shown in the lower right panel.
component recovers the differences between days; and the weekly component is what would be expected. Finally, the irregular component is small compared with the other components, as required for good modelling. In order to compare these results with those obtained by another method, an extension of the Box-Jenkins (ARIMA) dairline modelT for 4-hourly data with three multiplicative polynomials was also estimated and evaluated. This ARIMA model was one option used by the electrical company in order to produce hourly forecasts of the demand. For 4-hourly data, the model proved unsuitable in many respects, with the autocorrelation coefficients for the residuals significantly different from zero for up to four lags (16 h). In order to ensure that the best ARIMA model was used for the comparative study, considerable effort was made to evaluate various other ARIMA model forms. The best fitting and forecasting results were finally obtained with the following model: ð1 BÞð1 B6 Þð1 B42 Þ/ð BÞyt ¼ ð1 þ #6 B6 Þð1 þ #42 B42 Þat with /(B)= (1+ / 1B +/ 2B 2 + / 3B 3 +/ 4B 4). Here the periodic differencing is aimed at explaining the
diurnal (6 4=24 h period); and weekly (42 4= 168 h) cycles. On comparison with the results obtained for the above ARIMA model, the DHRM model is superior with respect to all measures (see Tables 1 and 2): the innovation (one-step-ahead prediction error) variance and the likelihood measures are all much better. One possible reason for this superiority is that the number of parameters in the DHRM model is much higher and so it is able to describe the data better. In this respect, readers may question the DHRM model because of the possibility of overfitting, non-parsimony and similar considerations. However, these considerations are strictly applicable to nested models: when applied to non-nested models (as in our case, DHRM vs. ARIMA), the evaluation of models should be based on much more general considerations. The most important of these considerations are how well the parameters are estimated, the complexity of the estimation procedure used in the modelling and the eventual forecasting performance. In our case, all the hyper-parameters are estimated clearly and well; while the whole procedure of identification and estimation of DHRM models in the frequency domain is transparent and clear.
D.J. Pedregal, P.C. Young / International Journal of Forecasting 22 (2006) 181–194 Table 1 DHRM model output Period observation Eq.
Period state Eq. or state model
NVR
Score
S.E. of score
Trend (l) 42.00 21.00 14.00 6.00 6.00 6.00 6.00 3.00 3.00 3.00 3.00 2.00 2.00 2.00 2.00
IRW RW RW RW IRW 42.0 21.0 14.0 IRW 42.0 21.0 14.0 IRW 42.0 21.0 14.0
1.3843e-004 6.2301e-002 1.7310e-002 6.7668e-003 1.3456e-007 4.6470e-003 3.9814e-003 2.7754e-003 1.5411e-007 1.1837e-003 1.4407e-004 1.2193e-004 5.5833e-008 3.1158e-004 6.8477e-005 1.9207e-005
3.858 1.205 1.761 2.169 6.871 2.332 2.400 2.556 6.812 2.926 3.841 3.913 7.253 3.506 4.164 4.716
0.200 0.087 0.113 0.122 0.197 0.091 0.100 0.111 0.260 0.125 0.186 0.229 0.410 0.203 0.265 0.526
Innovations variance Log. likelihood Frequency domain function
0.148 230.349 40.552
For example, Fig. 3 shows that all the components modelled are present in the series and the objective function is very well defined (despite the number of parameters). This is because of the clear definition of the peaks in the empirical AR spectrum and the fact that all the harmonics are orthogonal functions concentrated on a particular frequency band. As a result, the nonlinear optimisation routine converges very rapidly, comparable with the time it takes to estimate the ARIMA model. Finally, as we shall see, the forecasting performance of the model is very good indeed. All this evidence seems to show that the DHRM model in Table 1 is an adequate representation of the data and is not affected by overfitting and related problems. Actually, the opposite appears to be the case: namely, the ARIMA formulation seems too parsimonious with not sufficient flexibility to explain the data. Before considering the forecasting results, it is worth emphasising why we have utilised our frequency domain optimisation approach rather than
191
the more common ML method. As pointed out previously, it is well known that ML optimisation can be problematic when the model is characterised by a large number of parameters. For example, this is probably why, in some UC models that use ML as a method of estimation, only a single parameter is estimated for the seasonal component (i.e., all the harmonics have the same value of the NVR hyperparameter, see, e.g., Harvey, 1989 or West & Harrison, 1989). Clearly, this might not be a very good strategy when the significance and character of the separate harmonics are different, as shown by the different height and width of the spectral peaks in Fig. 3. Certainly, our experience with ML optimisation in UC models is that ML estimation becomes rather cumbersome if more than one hyper-parameter is used for all the harmonics, and often convergence is not possible (see Young et al., 1999). Similarly, the selection of different NVR values for individual harmonics using a Bayesian approach (e.g., West & Harrison, 1989) could be rather complicated because of the number of parameters involved. To conclude, the DHRM model is apparently much more complex and flexible than its ARIMA counterpart. The estimation procedure proposed in the frequency domain is comparable (in computation time) with the ARIMA model. And finally, there are advantages in likelihood terms, even when the objective function is not ML, because our frequency domain method of optimisation does not suffer from the problems of ML optimisation in this complex UC model situation. These conclusions have been systematically confirmed with a large number of simulation and practical examples in a wide range of different contexts over the past few years. 4.4. Forecasting results In order to evaluate the forecasting performance of the model, a lengthy and comprehensive experiment was carried out. The model was estimated using a rectangular window of 20 weeks, moving forward 1 day at each iteration and producing forecasts up to 1 week ahead based on these estimates. This procedure was repeated for 1 year of data. Note that restricting the length of the
192
D.J. Pedregal, P.C. Young / International Journal of Forecasting 22 (2006) 181–194
Series and Trend
Daily Component
35 5
MW
30 25
0
20
-5
15
Irregular Component
MW
Weekly Component 5
5
0
0
-5
-5 0
50
100
150
Samples
0
50
100
150
Samples
Fig. 6. DHRM estimated components for a sample of 4 weeks of electricity demand.
window to 20 weeks is only for the purpose of reducing the computational requirements in this thorough evaluation. However, not much is lost in this regard because the longest period of the components in the model is 1 week and it is estimated on the basis of twenty full cycles of this length. Table 3 reports some results for 1 day ahead forecast errors across the whole sample considered. It shows the minimum, mean and maximum values of the MAPE for each day of the week. The behaviour of
Table 2 ARIMA estimation results Time domain (ML)
/1 /2 /3 /4 #6 # 42 Innovations variance Log. likelihood Frequency domain function
Parameter
S.E.
0.079 0.299 0.149 0.163 -0.871 -0.878
0.029 0.029 0.029 0.030 0.052 0.051
0.256 123.011 1026.696
the errors across days is quite uniform, although there are somewhat higher errors on Saturdays and lower errors on Wednesdays. Fig. 7 shows a forecast comparison along the sample based on the MAPE of three models: one naRve model (dashed line: the last observed week is considered the forecast for the next week); the ARIMA model considered in the previous subsection (dotted line); and the DHRM model (solid line). The upper plot shows the mean of the MAPE for forecast horizons ranging from 4 h up to 1 week ahead; while the lower plot shows the standard deviation counterparts. It is clear that the DHRM model produces better aggregate results than the naRve and ARIMA models. Both the ARIMA and DHRM are much better than the naRve model for short forecast horizons, implying that both models are doing much more than simply using the previous week’s observation as a forecast. However, only the DHRM is better for longer horizons of up to 1 week. Similar comments apply to the standard deviation of the MAPE. The good performance of the ARIMA model for shorter periods ahead can be explained on different grounds. Firstly, because of the differencing operation that is inherent in ARIMA modelling, the model tends to concentrate on the short period behaviour of the
D.J. Pedregal, P.C. Young / International Journal of Forecasting 22 (2006) 181–194
193
Mean of MAPE
Mean and S.D. of MAPE up to One Week Ahead
6 4
S.D. of MAPE
2 6 4 2 6
12
18
24
30
36
Samples Fig. 7. Mean (top plot) and standard deviation (bottom plot) of MAPE for true 1-week-ahead forecasts evaluated on repeated forecasts along one year of the data for three models: DHRM (solid line); ARIMA (dotted line); and naRve (dashed line).
series (and induces considerable amplification of the high-frequency noise at the same time). Secondly, the ML optimisation of the ARIMA model concentrates on the minimisation of a function of the one-stepahead forecast errors, naturally producing a little better performance for such a short horizons. Since the DHRM model uses spectral optimisation, it emphasises more the structural properties of the series, and so concentrates on the optimisation of both the long and short-term behaviour.
5. Conclusions Unobserved components models (UC) constitute a natural and well-formalised framework for the analysis of time series characterised by trends, seasonality and other periodic cycles. In this paper, we have extended the practically well-proven DHR model approach to include a new periodic component that
Table 3 Mean, minimum and maximum values of the MAPE of the DHRM modulated model for true 1-day-ahead forecasts Mon. Tue.
Wed. Thu. Fri.
Sat.
Sun. Overall
Mean 2.998 2.988 2.559 2.792 2.891 3.041 2.893 2.880 Minimum 0.283 1.204 0.932 0.754 0.757 1.219 1.066 0.283 Maximum 4.873 4.874 4.738 4.773 4.867 4.803 4.576 4.874
has a multiplicative (modulated) type of seasonality. This is particularly useful for series that exhibit several superimposed periodic components arising in situations where the series is sampled at a rate more rapid than monthly. This new Dynamic Harmonic Regression Modulated (DHRM) model consists of a mixture of a DHR model in the observation equation, with each term modulated by stochastic time-varying parameters that can oscillate periodically at different prescribed frequencies. The paper has shown how the frequency domain estimation procedure proposed in Young et al. (1999) for the standard DHR model can be extended to this new model. In this approach, different NVR parameters (or hyper-parameters in general) are estimated for each harmonic subcomponent, in contrast to the normal practice with UC and structural models, where a unique parameter is specified and estimated for the whole component. This approach is possible because the objective function is well defined around the optimum, making the estimation of relatively large models of this type surprisingly simple. The new DHRM model has been applied successfully to both a simulation example and a real example concerned with modelling and forecasting electricity demand data from the U.K. Both examples include a daily periodic component modulated by a weekly one, for which the DHRM model is ideally suited. The
194
D.J. Pedregal, P.C. Young / International Journal of Forecasting 22 (2006) 181–194
estimated components obtained in this manner are meaningful and the forecast performance of the model is very good when compared with the ARIMA and naRve models, especially for forecast horizons higher than 1 day ahead.
Acknowledgements The authors are most grateful to the UK Engineering and Physical Science Research Council (EPSRC: Grant ESA7088) and two British companies that provided the data used in this project. We also would like to thank the editor and two anonymous referees that reviewed a previous version of this paper.
References Abu-El-Magd, M. A., & Sinha, N. K. (1982). Short term load demand modelling and forecasting: A review. IEEE Transactions on Systems, Man, and Cybernetics, SMC-12, 3. Akaike, H. (1980). Seasonal adjustment by a Bayesian modelling. Journal of Time Series Analysis, 1, 1 – 13. Bryson, A. E., & Ho, Y. C. (1969). Applied optimal control, optimization, estimation and control. Waltham, MA7 Blaisdell Publising Company. Bunn, D., & Farmer, E. D. (1985). Comparative models for electrical load forecasting. New York7 John Wiley. Bunn, D., Larsen, E., & Vlahos, K. (Eds.) (1996). Special Issue. Journal of Forecasting, vol. 15, no.6. Darbellay, G. A., & Slama, M. (2000). Forecasting the short-term demand for electricity. Do neural networks stand a better chance? International Journal of Forecasting, 16, 71 – 83. Engle, R. F., Mustafa, C., & Rice, J. (1992). Modelling peak electricity demand. Journal of Forecasting, 11, 2241 – 2251. Findley, D. F., Monsell, B. C., Bell, W. R., Otto, M. C., & Chen, B. C. (1996). New capabilities and methods of the X-12 ARIMA seasonal adjustment program, U.S. Bureau of the Census, mimeo. Harvey, A. C. (1989). Forecasting structural time series models and the Kalman Filter. Cambridge7 Cambridge University Press. Harvey, A. C., & Koopman, S. J. (1993). Forecasting hourly electricity demand using time-varying splines. Journal of the American Statistical Association, 88, 1228 – 1242. Henley, A., & Peirson, J. (1998). Residential energy demand and the interaction of price and temperature: British experimental evidence. Energy Economics, 20, 157 – 171. Hodrick, T., & Prescott, E. (1980). Post-war US business cycles: An empirical investigation, Carnegie Mellon University, manuscript. (1997). Journal of Money, Credit and Banking, 29, 1 – 16.
Irisarri, G. D., Windergren, S. E., & Yehsakul, P. D. (1982). On line load forecasting for energy control center application. IEEE Transactions on Power Apparatus and Systems, PAS-101, 1. Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. ASME Transactions, Journal Basic Engineering, 83-D, 95 – 108. Maravall, A., & Go´mez, V. (1998). Programs TRAMO and SEATS, instructions for the user (Beta Version: June 1998). Madrid7 Bank of Spain. Pedregal, D. J., & Young, P. C. (2002). Statistical approaches to modelling and forecasting time series. In M. P. Clements, & D. F. Hendry (Eds.), A companion to economic forecasting (pp. 69 – 104). Oxford7 Blackwell. Ramanathan, R., Engle, R., Granger, C. W. J., Vahid-Araghi, F., & Brace, C. (1997). Short-run forecasts of electricity loads and peaks. International Journal of Forecasting, 13, 161 – 174. West, M., & Harrison, J. (1989). Bayesian forecasting and dynamic models. New York7 Springer-Verlag. Young, P. C. (1984). Recursive estimation and time-series analysis. Berlin7 Springer-Verlag. Young, P. C., & Pedregal, D. J. (1999). Recursive and en-bloc approaches to signal extraction. Journal of Applied Statistics, 26, 103 – 128. Young, P. C., Pedregal, D. J., & Tych, W. (1999). Dynamic harmonic regression. Journal of Forecasting, 18, 369 – 394. Zhang, B., & Dong, Z. (2001). An adaptive neural-wavelet model for short term load forecasting. Electric Power Systems Research, 59, 121 – 129. Peter C. Young is Professor of Environmental Systems in the Environmental Sciences Department, and Director, Centre for Research on Environmental Systems and Statistics, at Lancaster University, UK. He is also an Adjunct Professor in the Centre for Resource and Environmental Studies at the Australian National University (ANU), Canberra. Prior to this, he was Head of the Environmental Science Department at Lancaster; Professorial Fellow at the ANU; University Lecturer in the Department of Engineering, Cambridge and Fellow of Clare Hall, Cambridge. His research interests cover diverse areas of application, including nonstationary/nonlinear time series analysis and forecasting; data based mechanistic modelling of stochastic systems; and the design of multivariable control systems. Diego J. Pedregal is lecturer at the Industrial Engineering department of Universidad de Castilla-La Mancha (Spain) since December 1999. He obtained his first degree in Economics (Econometrics and time series analysis) in June 1991 from the Universidad Auto´noma de Madrid (UAM, Spain); an MA in Public Finance in June 1992 from the Institute for Fiscal Studies (Spain); and a PhD in March 1995 from the UAM. He spent 3 years in the Environmental Sciences Department at Lancaster University (UK) from September 1994 to September 1997, involved in a number of projects, mainly on the development of adaptive forecasting models with applications to non-stationary and nonlinear, rapidly sampled time series.