Economic Modelling 22 (2005) 551 – 569 www.elsevier.com/locate/econbase
Modelling the hourly Spanish electricity demand Gloria Martı´n-Rodrı´guez, Jose´ Juan Ca´ceres-Herna´ndez* Department of Economı´a de las Instituciones, Estadı´stica Econo´mica y Econometrı´a, University of La Laguna. Campus de Guajara; Camino La Hornera, s/n. 38071 La Laguna. Santa Cruz de Tenerife, Spain Accepted 4 September 2004
Abstract In the electricity generating sector of an economy, supply hardly responds to unexpected demand increases. Furthermore, the seasonal pattern accounts for a major part of short-term variations in demand. In this sense, identifying the salient features of this pattern could contribute in avoiding supply problems. This paper is concerned with the analysis of the seasonal behaviour of the hourly Spanish electricity demand. Structural models are used to handle these hourly observations and splines are embedded within such models to create parsimonious formulations of periodic effects that are able to cope leap years. D 2004 Elsevier B.V. All rights reserved. JEL classification: C22; Q41 Keywords: Spanish electricity demand; Hourly data; Seasonality; Structural models; Splines
1. Introduction The estimation of the future demand for electric energy is central to the planning of a regional or national system for generating electricity. Thus, it has become imperative to use modelling techniques which capture the effects of factors such as prices of electricity
* Corresponding author. Tel.: +34 922 317035; fax: +34 922 317042. E-mail address:
[email protected] (J.J. Ca´ceres-Herna´ndez). 0264-9993/$ - see front matter D 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.econmod.2004.09.003
552
G. Martı´n-Rodrı´guez, J.J. Ca´ceres-Herna´ndez / Economic Modelling 22 (2005) 551–569
and substitute goods, income, population, technology and other variables.1 Short-term prediction is equally important for matching the generation capabilities with the expected requirements. Unlike other commodities, electricity cannot be economically stored and must be provided upon demand. To meet irregular and uncertain demand at any moment, an electricity generation system must have additional capacity available at very short notice. Depending on how fast it is available, this reserve bears the names of spinning reserve or cold reserve. Overestimating future demands would result in unused spinning reserves. Underestimating future demands is equally detrimental, because high starting costs are incurred if the cold reserve has to be drawn upon. In this sense, suppliers can use short-term predictions to control the number of generators in operation, to shut down some of them when forecasted demand is low or to start them up when forecasted demand is high.2 Therefore, it is not enough to just predict the total energy use on an annual basis. Since the whole management objective is to alter the system load shape, an hour-by-hour load shape forecast is an essential prerequisite3. Short-term load forecasting is, however, a difficult task because the load series exhibits several levels of seasonality: the load at a given hour is dependent not only on the load at the previous hour, but also on the load at the same hour on the previous day, and on the load at the same hour in the previous week or year4. A large variety of short-term forecasting techniques have been investigated, but the most popular are the time series models.5 There is no consensus as to the best time series approach to electricity demand modelling. The range of different approaches includes ARIMA models and structural time series models. Ramanathan et al. (1997) produces hourly forecasts by using a time series model for each hour of the day; however, this solution is certainly not a parsimonious way of modelling. Harvey and Koopman (1993) describes an approach which involves fitting a time-varying spline to a number of knot points.6 In any case, the first step in forecasting electricity demand is formulating an adequate model. This is the aim of this paper, in which hourly data is used to analyse the evolution
1
Long-term forecasting models are usually based on causal models. See, among others, Sutherland (1983), Hsio et al. (1987), Griffin (1993), Chan and Lee (1996) and Fatai et al. (2003). 2 See Darbellay and Slama (2000). See, also, Gouni and Torrion (1988), Bao (2002), Fay et al. (2003) and Kher and Joshi (2003). 3 Mehra and Bharadwaj (2000). 4 Furthermore, there are many important exogenous variables that could be considered, specially weatherrelated variables (Harvey and Koopman, 1993; Bao, 2002; Kher and Joshi, 2003; Taylor and Buizza, 2003). 5 The collection of papers in Bunn and Farmer (1985) discusses various time series approaches to short-term load forecasting. Neural networks are a black-box model useful when the relationship between variables is not clear. Current research in load forecasting is still somewhat divided on the issue, whether sophisticated nonlinear models represent progress, or whether more effort should be spent in improving linear models. Artificial neural networks can model nonlinear relationships, but they are not necessarily superior to linear time series models (Darbellay and Slama, 2000). Hippert et al. (2001) examines a collection of papers (published between 1991 and 1999) that report the application of neural networks to short-term load forecasting. See, also, Kher and Joshi (2003). 6 Taylor and Majithia (2000) proposes forecasting the demand at some fixed points (cardinal points); then, forecast for periods between cardinal points are obtained by a procedure known as profiling which involves fitting a curve to the cardinal points.
G. Martı´n-Rodrı´guez, J.J. Ca´ceres-Herna´ndez / Economic Modelling 22 (2005) 551–569
553
of the Spanish electricity demand in the period from 1998 to 2001. The demand levels have not kept stable over the last few years. To model this behaviour, structural models appear to be a potentially appropriate tool for handling these instabilities. And the spline functions, used in this paper to capture the seasonal component, can be embedded into this approach. In this sense, our proposal is similar to the approach taken by Harvey and Koopman (1993), but the splines are not forced to cross given points and the annual cycle is modelled in a different way. The use of the structural approach, the existence of simultaneous periodic fluctuations which must be taken into account when hourly data is being studied and the specific adaptions of splines which must be incorporated when the length of the period of the yearly variation is not always the same, are factors why this study can be interesting from an econometric point of view. The plan of this paper is as follows. In the next section, a method of dealing with seasonal pattern in high frequency data by means of spline functions is proposed. In Section 3, this procedure is applied to an hourly series of Spanish electricity demand. The data used is described and the different periodic variations are identified. Given that some properties of the series do not appear to remain the same over time, structural models are used as an appropriate class of models to cope with this kind of situation. Section 4 presents the conclusions.
2. High-frequency data and spline functions When the scope of the analysis goes beyond the description of long-term movements, the frequency of observation is an important choice. This being the case, weekly, daily or hourly data could be used. However, the study of high-frequency data is a more complicated task and new methodological approaches could be necessary. A model where seasonal cycles at different frequencies can be simultaneously estimated must be used, but estimating such a model cannot usually be feasible. Thus, the only way could be to use a mixture of far from parsimonious conventional modelling and computationally more efficient spline functions. For hourly data, the seasonal cycle in which the period is a year can hardly be estimated by a simple seasonal dummy model, although stochastic behaviour is not present. However, when the annual seasonal cycle is characterised by smooth changes from one season (hour) to the following, this seasonal variation at time t can be defined as a function of the season corresponding to this observation by means of a spline function; that is to say, such a seasonal pattern is fitted by piecing together different polynomials defined as functions of the season, one to each of the separate segments through the year, and connected satisfying continuity conditions.7 Another problem in hourly data is handling leap years, since the length of the period of the yearly variation does not remain the same and, therefore, the seasonal pattern is 7
As Harvey and Koopman (1993) mentions, with high-frequency data, it becomes difficult computationally to include the full complement of seasonal dummies or trigonometric terms in the seasonal component. In the context of electricity demand, the formulation of the seasonal variation must be parsimonious but flexible enough to capture marked variations in the pattern while retaining a reasonable degree of continuity.
554
G. Martı´n-Rodrı´guez, J.J. Ca´ceres-Herna´ndez / Economic Modelling 22 (2005) 551–569
heterogeneous. However, the spline modelling is able to handle these particular situations as a more flexible tool than other formulations. Next, a method of dealing with this heterogeneity is proposed when hourly data is used. The yearly seasonal variation is approximated by a periodic cubic spline where a function related to adjustment error is optimised, but crossing given points is not required.8 Note that the period in which this fluctuation is completed takes 8760 hours for non-leap years and 8784 for leap years. Then, in the first step, a spline approximating the seasonal pattern in a non-leap year can be formulated. In a second step, by demanding adequate continuity conditions, an approximation of the seasonal pattern in a leap year can be obtained. 2.1. Procedure for estimating the yearly variation in non-leap years Suppose the c t values of the observed series are filtered in order to eliminate the trend and other seasonal variations than annual. Let c t be the yearly seasonal variation at time t. In order to capture the yearly variation by means of a periodic cubic spline, seasonality can be assumed to be fixed as a preliminary hypothesis9; that is, c t =c h if the observation at time t corresponds to hour h of the year, h=1,. . ., 8760. Now, once the hourly observations corresponding to February 29th in each leap year are eliminated, a preliminary estimate of the seasonal pattern can be obtained by calculating hourly averages at each one of the 8760 hours10. Let {yˆ h1}h=1,. . .,8760 be the set of hourly averages.11 Then, cˆ 1t ¼ cˆ 1h ð1Þ h¼1;...;8760
if the observation at time t corresponds to the season h. In order to capture the component yˆ h1, h=1,. . .,8760, a periodic cubic spline g nly(h) can be used, defined as g nly ðhÞ ¼ gi ðhÞ ¼ gi;0 þ gi;1 h þ gi;2 h2 þ gi;3 h3 ;
hi1 VhVhi ;
i ¼ 1; . . . ; k;
ð2Þ
with h 0=1 and h k =8760. This spline must satisfy the traditional continuity conditions. However, the periodic character is assured by imposing additional conditions.12 The final set of restrictions is: (a)
continuity of the function, g nly, gi;0 þ gi;1 hi þ gi;2 h2i þ gi;3 h3i ¼ giþ1;0 þ giþ1;1 hi þ giþ1;2 h2i þ giþ1;3 h3i ;
ð3aÞ
8 In other procedures, the spline function is forced to cross the so-called knot points (Koopman, 1992; Harvey and Koopman, 1993; Harvey et al., 1997). 9 The deterministic assumption is only taken into account in order to explain the procedure in a simple way. This approach can be generalised to allow the seasonal pattern to evolve over time by letting the spline be stochastic. 10 These averages are corrected in order to have the seasonal effects summing to zero over the 8760 hours. 11 The superscript used in the seasonal component notation lets us distinguish the different estimates of this component. 12 Although splines are a traditional interpolation technique (Poirier, 1973, 1976; De Boor, 1978; Marsh, 1983, 1986; Marsh et al., 1990; Monahan, 2001), many theoretical developments must be adapted when the approximate magnitude is a complex seasonal variation.
G. Martı´n-Rodrı´guez, J.J. Ca´ceres-Herna´ndez / Economic Modelling 22 (2005) 551–569
555
for i=1,. . .k1, and 2 3 gk;0 þ gk;1 hk þ 1 þ gk;2 hk þ 1 þ gk;3 hk þ 1 ¼ g1;0 þ g1;1 h0 þ g1;2 h20 þ g1;3 h30 ; (b)
ð3bÞ
continuity of its first derivative, Dg nly, gi;1 þ 2gi;2 hi þ 3gi;3 h2i ¼ giþ1;1 þ 2giþ1;2 hi þ 3giþ1;3 h2i ;
ð3cÞ
for i=1,. . .k1, and 2 gk;1 þ 2gk;2 hk þ 1 þ 3gk;3 hk þ 1 ¼ g1;1 þ 2g1;2 h0 þ g1;3 h20 ; (c)
ð3dÞ
continuity of its second derivative, D2g nly, 2gi;2 þ 6gi;3 hi ¼ 2giþ1;2 þ 6giþ1;3 hi ;
ð3eÞ
for i=1,. . .k1, and 2gk;2 þ 6gk;3 hk þ 1Þ ¼ 2g1;2 þ 6g1;3 h0 :
ð3f Þ
Once the spline is specified in each interval and it is assumed that break points h i , i=1,. . .,k1, are known, the next step consists of determining parameters g i,0, g i,1, g i,2 and g i,3, i=1,. . .,k. Given that the number of unknown parameters, 4k, is larger than the number of demanded conditions, 3k, the vector G 3k 1=( g 1,1, g 1,2, g 1,3,. . .,g k,1, g k,2, g k,3)Vis expressed as a function of the parameter vector G k*1=( g 1,0, g 2,0,. . .g k,0)V. That is, gr ¼ g1;0 ar;1 þ g2;0 ar;2 þ . . . þ gk1;0 ar;k1 þ gk;0 ar;k ;
ð4Þ
for r=1,. . .,3k, being g 1=g 1,1, g 2=g 1,2, g 3=g 1,3,. . ., g 3k2=g k,1, g 3k1=g k,2 and g 3k =g k,3, and where the a parameters are appropriate functions of break points h i , i=0,. . .,k. So, gnly ðhÞ ¼ g1;0 X1;h þ g2;0 X2;h þ . . . þ gk1;0 Xk1;h þ gk;0 Xk;h ;
ð5Þ
where regressors X i,h , i=1,. . ., k, are appropriately defined as different functions of the break points and the hour h in the year. Then, another estimate of the seasonal pattern, cˆhnly, can be obtained by estimating the model cˆh1=g nly(h)+e h , where e h is a disturbance term related to the adjustment; that is, cˆhnly=gˆ nly(h), h=1,. . .,8760. Break points h i , i=1,. . .,k1, are chosen in order to minimise the sum of the squared residuals.
556
G. Martı´n-Rodrı´guez, J.J. Ca´ceres-Herna´ndez / Economic Modelling 22 (2005) 551–569
2.2. Procedure for estimating the yearly variation in a leap year The previous spline captures the yearly seasonal cycle for non-leap years, but, in order to model this seasonal variation for leap years, other spline, g ly, could be defined as 8 ;if h ¼ 1; . . . ; 1416 < gnly ðhÞ ly g ðhÞ ¼ gF29 ðhÞ ð6Þ ;if h ¼ 1417; . . . ; 1440 : nly g ðh 24Þ ;if h ¼ 1441; . . . ; 8784 where g nly is the spline defined for a non-leap year and g F29 is a polynomial function, which captures the yearly seasonal cycle at the hours corresponding to February 29th, defined as ly g F29 ðhÞ ¼ g0F29 þ g1F29 h þ g2F29 h2 þ g3F29 h3 ; hly 1 VhVh2 ;
ð7Þ
where h 1ly=1417 and h 2ly=1440. The continuity of spline g ly and its first derivative, Dg ly, can be assured by demanding the restrictions 2 3 F29 ly g0F29 þ g1F29 hly ¼ gnly hly ð8aÞ h1 1 þ g3F29 hly 1 1 þ g2 1 1 1 1 ; 2 3 F29 ly h2 þ 1 þ g3F29 hly ¼ gnly hly g0F29 þ g1F29 hly 2 þ 1 þ g2 2 þ1 1 ;
ð8bÞ
2 F29 ly g1F29 þ 2g2F29 hly h1 1 ¼ Dg nly hly 1 1 þ 3g3 1 1
ð8cÞ
2 F29 ly g1F29 þ 2g2F29 hly h2 þ 1 ¼ Dg nly hly 2 þ 1 þ 3g3 1 :
ð8dÞ
and
If the observations corresponding to February 29th are located at the l th segment, the previous restrictions can be written in a matrix form as H F29G F29=R F29, where F29 F29 F29 F29 V GF29 g1 g2 g3 ; 41 ¼ g0 2
H F29
1 6 61 ¼6 60 4 0
ly ly 2 h1 1 h1 1 ly ly 2 h2 þ 1 h2 þ 1 ly 1 2 h1 1 1 2 hly 2 þ1
3 3 ly h1 1 ly 3 7 h þ1 7 2ly 2 7 3 h1 1 7 5 2 3 hly þ 1 2
ð9Þ
and 2
RF29
3 2 k1;1 g1;0 þ r1F29 6 rF29 7 6 k2;1 g1;0 þ 2 7 6 ¼6 4 rF29 5 ¼ 4 k3;1 g1;0 þ 3 k4;1 g1;0 þ r4F29
... ... ... ...
þ k1;l gl;0 þ þ k2;l gl;0 þ þ k3;l gl;0 þ þ k4;l gl;0 þ
... ... ... ...
3 þ k1;k gk;0 þ k2;k gk;0 7 7; þ k3;k gk;0 5 þ k4;k gk;0
ð10Þ
G. Martı´n-Rodrı´guez, J.J. Ca´ceres-Herna´ndez / Economic Modelling 22 (2005) 551–569
557
where parameters k m,1,. . .,k m,k , m=1,. . .,4, are appropriately defined as functions of the parameters a 3l2,l ,. . .,a 3l-2,k , a 3l-1,l ,. . .,a 3l-1,k , a 3l,l ,. . .,a 3l,k . Then, 1 GF29 ¼ H F29 RF29 ;
ð11Þ
where 2
H F29 Þ1
d1;1 6 d2;1 ¼6 4 d3;1 d4;1
d1;2 d2;2 d3;2 d4;2
d1;3 d2;3 d3;3 d4;3
3 d1;4 d2;4 7 7 d3;4 5 d4;4
ð12Þ
in such a way that F29 F29 F29 ly gF29 ðhÞ ¼ g1;0 X1;h þ . . . þ gl;0 Xl;h þ . . . þ gk;0 Xk;h ; h1 VhVhly 2;
ð13Þ
where regressors X i,hF29, i=1,. . .,k, are appropriately defined as functions of parameters k in matrix R F29, parameters d in matrix (H F29)1 and the hour h in the year. Then, a second approximation of the yearly seasonal cycle, c2t , can be defined as 2 c t =g nly(h), if the observation at time t corresponds to the hour h in a non-leap year, or c2t =g ly(h), if the observation at time t corresponds to the hour h in a leap year. That is, c2t ¼ g1;0 X1;t þ . . . þ gk;0 Xk;t ;
ð14Þ
where variables X i,t , i=1,. . .,k, are defined as follows. If the observation at time t corresponds to the hour h in a non-leap year, h=1,. . .,8760, X i,t =X i,h . If the observation at time t corresponds to the hour h in a leap year, h=1,. . .,8784, 8 ; if h ¼ 1; . . . ; 1416 < Xi;h F29 ; if h ¼ 1417; . . . ; 1440 Xi;t ¼ Xi;h ð15Þ : Xi;h24 ; if h ¼ 1441; . . . ; 8784 Now, an estimate of c2t can be obtained by estimating the regression model cˆ 1t ¼ g1;0 X1;t þ . . . þ gk;0 Xk;t þ et ;
ð16Þ
where cˆ1t is defined as follows. If the observation at time t corresponds to the hour h in a non-leap year, h=1,. . .,8760, cˆ 1t =cˆ1h. If the observation at time t corresponds to the hour h in a leap year, h=1,. . .,8784, 8 1 ; if h ¼ 1; . . . ; 1416 < cˆ h ð17Þ cˆ 1t ¼ cˆ F29;1 ; if h ¼ 1417; . . . ; 1440 : h1 cˆ h24 ; if h ¼ 1441; . . . ; 8784 where {cˆh1}h=1,. . .,8760 is the series defined in Section 2.1 and {cˆhF29,l }h=1417,. . .,1440 are the hourly averages at each one of the 24 hours corresponding to February 29th for
558
G. Martı´n-Rodrı´guez, J.J. Ca´ceres-Herna´ndez / Economic Modelling 22 (2005) 551–569
leap years.13 Finally, a third approximation of the yearly seasonal cycle, c3t , can be obtained by introducing regressors X i,t in a time series model together the remainder of components.
3. Hourly Spanish electricity demand The Spanish electricity network, which is made up of firms in the electricity generating sector, determines the energy level that any plant must generate at any hour in order to satisfy the demands of the consumers connected to the system. This process requires energy interchanges between plants in which the supply is higher than the demand in its influence area and plants in which the supply is not sufficient. Obviously, the objective is matching demand and supply at any moment. In this section, the theoretical results developed in the previous section will be applied to model seasonal effects for the series of hourly Spanish electricity demand (measured in megawatts) from the 1st hour of Thursday 1st January 1998 to the 24th hour of Monday 31st December 2001.14 As a consequence of the yearly standard time change, the original source of data records 23 hourly observations corresponding to March 29th 1998, March 28th 1999, March 26th 2000 and March 25th 2001, and 25 hourly observations corresponding to October 25th 1998, October 31st 1999, October 29th 2000 and October 28th 2001. In order to avoid the heterogeneity caused by this time change, such a change is not taken into account when available data is assigned to the corresponding hours in a day. When the change occurs in March, the 23 observations recorded on the day of the change and the first observation of the following day have been considered as the 24 observations corresponding to the day of the change. When the change occurs in October, the first observation recorded on the day of the change is considered to be the last observation corresponding to the previous day and then, only the last 24 observations recorded on the day of the change are assigned to this day. In this way, a series is obtained with 24 hourly observations on each day of the period under study. The resulting series will be referred to as { y t }t=1,. . .,35064, hereafter. Fig. 1 shows instabilities in the behaviour of this hourly series and a weak upward trend is observed, although this long-term movement could be obscured by an important seasonal component. Seasonal cycles at different frequencies should be taken into account in order to model the seasonal pattern in an appropriate way. To analyse seasonal effects, three entities are considered: the year, the week and the day. According to this, the hourly demand for electricity could be expressed as y t =f(l t , cday t , day week cweek , cyear and cyear refer to t t , e t ), where l t denotes the trend component, c t , c t t daily, weekly and yearly seasonal variations, respectively, and e t reflects the nonsystematic movements. 13
Note that in a leap year, the estimates of the yearly seasonal variation must be corrected in order to have the seasonal effects summing to zero over the 8784 hours. 14 Demand statistics are obtained from hourly data published by the Operador del Mercado Ele´ctrico (OMEL) and they can be consulted on web page http://www.omel.com.
G. Martı´n-Rodrı´guez, J.J. Ca´ceres-Herna´ndez / Economic Modelling 22 (2005) 551–569
559
Fig. 1. Hourly Spanish electricity demand, { y t }.
To get some preliminary insights of the yearly, weekly and daily variations, the longterm component, estimated by means of the model yt ¼ a þ bt þ mt ;
ð18Þ
is removed. From the residuals of the previous model, {mˆt }t=1,. . .,35064, moving averages with period 168, {ma(168)t }t=85,. . .,34980, are calculated in order to smooth fluctuations in which periods are 24 and 168 (daily and weekly variations, respectively). Fig. 2 shows these moving averages, and a regular seasonal pattern is observed year by year. The yearly seasonal variation is clearer in Fig. 3, which shows the hourly averages in each one of the 8760 hours in a year, {cˆhyear,1 y }hy =1,. . .,8760, calculated from the series {ma(168)t } and appropriately corrected. Once an approximation to the yearly variation is obtained (to be discussed further on), series {mˆt } is filtered in order to eliminate this seasonal fluctuation and, from the
Fig. 2. Moving averages with period 168, {ma(168)t }.
560
G. Martı´n-Rodrı´guez, J.J. Ca´ceres-Herna´ndez / Economic Modelling 22 (2005) 551–569
Fig. 3. Yearly seasonal component; – – – cˆ hyear,1 y , — cˆ hyear,3 y .
resulting series, moving averages with period 24 are calculated, {ma(24)t }t=13,. . .,35052. An approximation of the weekly variation can be obtained from these moving averages by calculating the corrected averages according to each hour of the week, {cˆhweek,1 w }h w =1,. . .,168 (Fig. 4). Finally, the daily fluctuation can be identified, from series {mˆt } filtered in order to eliminate yearly and weekly fluctuations (to be discussed further on), by calculating the corrected averages according to each hour of the day, {cˆhday,1 d }h d =1,. . .,24 (Fig. 5). From these remarks, and as a preliminary hypothesis, the main features of the time series shown in Fig. 1 are assumed to be a stochastic trend and a seasonal component that can be appropriately described by means of three simultaneous periodic variations. Furthermore, the time change has an impact on demand. In fact, this is the hypothesis that justifies the change. Such an impact will be taken into account in the demand model defined as yt ¼ lt þ cday þ cweek þ cyear þ dCt þ et ; t t t
Fig. 4. Weekly seasonal component; – – – cˆ hweek,1 w , — cˆ hweek,3 w .
ð19Þ
G. Martı´n-Rodrı´guez, J.J. Ca´ceres-Herna´ndez / Economic Modelling 22 (2005) 551–569
561
day day day ˆt(h ˆt(h ˆt(h Fig. 5. Daily seasonal component; – – – cˆday,1 d )(averages), —– c d ) (minimum), —–c d ) (maximum). h d , +++c
where
Ct ¼
8 > > > > <
1; if > > > > : 0;
f
2091 V t V 7130 10827 V t V 16034 19563 V t V 24770 28299 V t V 33506 in other cases
ð20Þ
if the time change is assumed to be located at 2:00 a.m. or 3:00 a.m. From a classical point of view, trend and seasonal component are formulated as deterministic functions of time about which the series is constrained to move forever more. However, the stability of the trend appears to be an unrealistic assumption in our case. Furthermore, although averages shown in Figs. 3–5 are good indicators of the average seasonal behaviour, it is very difficult asserting as true the assumption of a fixed behaviour pattern over time. Testing slow changes in the yearly variation is not possible, since only 4 years of data are available. If series {ma(24)t } is treated as a collection of 168 time series, one corresponding to each hour in a week, the evolution of these series provides some evidence of stochastic behaviour of the seasonal effects within a week. In the same sense, if trend and yearly and weekly seasonal variations are removed and the resulting series is treated as a vector of 24 time series, one corresponding to each hour in a day, the stochastic nature of the daily seasonal effects is quite clear from the observed evolution of these series. Furthermore, there are reasons why this conclusion could be obtained. For example, the peak expected within a day occurs an hour earlier in winter. In this paper, structural models are used as a tool capable of capturing the instabilities of the different components of the series. The aim is to present the stylised facts of a series which is described in terms of its components of interest. Each component has an explicit stochastic formulation and a direct interpretation (Harvey, 1989; Koopman, 1992; Durbin and Koopman, 2001). On the other hand, a parsimonious way of modelling the seasonal pattern of electricity demand is highly desirable for hourly data. Structural models can be
562
G. Martı´n-Rodrı´guez, J.J. Ca´ceres-Herna´ndez / Economic Modelling 22 (2005) 551–569
adopted to handle observations recorded every hour and splines can be embedded within such models to create parsimonious formulations of periodic effects. Thus, the model given by Eq. (19) could be formulated as a structural model where the daily seasonal variation is modelled by a set of trigonometric terms as follows cday ¼ t
12 X
ð21aÞ
cj;t ;
j¼1
cj;t cj;t 4
¼
coskj sinkj
sinkj coskj
cj;t1 cj;t1 4
þ
xj;t ; xj;t 4
j ¼ 1; . . . ; 12;
ð21bÞ
* are zero mean white noise processes which are uncorrelated with each where x j,t and x j,t other with a common variance r x2 , j=1,. . .,12, and k j =2pj/24, j=1,. . .,12, are the frequencies corresponding to the cyclical components of the daily seasonal variation. Weekly and yearly seasonal effects are appropriately described by periodic cubic splines.15 The disturbances in all stochastic components are assumed to be mutually uncorrelated. Estimating such a model is not an easy task. In fact, handling simultaneous stochastic seasonal variations could be nonfeasible. Therefore, weekly and yearly seasonal variations are, a priori, assumed to be fixed. Standard dummy modelling of these fluctuations is far from parsimonious. However, periodic cubic splines appear to be a good way of modelling the seasonal pattern within the week or the year with a relatively small number of parameters. To formulate such splines, approximations of each one of the two seasonal fluctuations have been obtained. With regards to the yearly variations, the procedure developed in Section 2 leads to specify a 13-segment periodic cubic spline. This spline is introduced into the structural model as a function of regressors X i,ty defined in Eq. (14) taking into account that the locations of the break points are as follows: h 1y =323, h 2y =468, h 3y =3292, y y h 4y =5354, h 5y =5403, h 6y =5536, h 7y =7476, h 8y =8135, h 9y =8145, h 10=8255, h 11=8525 and 16 y h 12=8535. Thus, the yearly seasonal variation, defined as y y ¼ g1;0 X1;t þ . . . þ g13;0 X13;t ; cyear;3 t
ð22Þ
can be estimated by introducing regressors X i,ty , i=2,. . .,13, in the structural model (Eq. (19)) together with the remainder of components.17
15 An alternative approach is taken by Harvey and Koopman (1993). The objective is modelling the changing electricity load pattern within the week. However, the annual cycle is modelled by means of a deterministic cycle of period 1, that is W t =a cos(kt)+b sin (kt), where k=2p/8760. The seasonal movement W t is not primarily due to temperature, the effect of which is modelled directly, but rather arises from other seasonal changes, such as the change in the number of hours of daylight. 16 The spline model must be the result of a trade off between the degree of adjustment and smoothness which are pursued. Thus, the number of knots is chosen in such a way that the main turning points are captured (De Boor, 1978, pp. 303–304). In general, sections of the periodic pattern displaying sharp peaks require relatively more knots than do less variable sections (Harvey and Koopman, 1993). Hence, a certain amount of experimentation was needed to determine the knot positions that minimised the sum of squared residuals. 17 One of the spline regressors must be eliminated in order to avoid multicollinearity problems.
G. Martı´n-Rodrı´guez, J.J. Ca´ceres-Herna´ndez / Economic Modelling 22 (2005) 551–569
563
Before estimating such a model, the weekly seasonal spline must be formulated. To do that, the yearly seasonal variation must be removed. In this sense, cyear,2 can be t approximated, as in Eq. (16), by estimating the model y y ¼ g1;0 X1;t þ . . . þ g13;0 X13;t þ et ; cˆ year;1 t
ð23Þ
is given by Eq. (17). If the observation at time t corresponds to the hour h y in where cˆ year,1 t a non-leap year, h y =1,. . .,8760, cˆ year,1 =cˆhyear,1 y . If the observation at time t corresponds to t y y the hour h in a leap year, h =1,. . .,8784, 8 year;1 > ; if hy ¼ 1; . . . ; 1416 < cˆ hy year;1 F29;1 ð24Þ ¼ cˆ hy cˆ t ; if hy ¼ 1417; . . . ; 1440 > : cˆ year;1 ; if hy ¼ 1441; . . . ; 8784 hy 24 where {cˆhyear,1 y }h y =1,. . .,8760 is the series defined in Eq. (1) and {cˆhF29,1 y }h y =1417,. . .,1440 are the values of the {mˆt } series (Eq. (18)) at each one of the 24 hours of February 29th 2000. From {cˆtyear,2}t=1,. . .,35064 series, a first approximation of cweek can be obtained by t calculating: (1) {mˆt cˆ tyear,2}t=1,. . .,35064 series; (2) moving averages with period 24 calculated from the series defined in the step one; (3) averages according to each hour in a week calculated from the previous series; (4) averages according to each hour in a week corrected in order to satisfy the zero sum constraint over the whole week. The resulting average series is referred to as {cˆ hweek,1 w }, and so {cˆ week,1 }={yˆhweek,1 w }h w =1,. . .,168, if the t w observation at time t corresponds to the hour h in a week. Following the procedure described in Section 2.1, the component cˆ hweek,1 w is modelled as w w w w cweek;2 ¼ g1;0 X1;t þ . . . þ g6;0 X6;t ; t
ð25Þ
where X i,tw, i=1,2,. . .,6, are the regressors defining a six-segment periodic cubic spline with break points h 1w =2, h 2w =31, h 3w =131, h 4w =133 and h 5w =134. The final approximation of cweek , cweek,3 , can be estimated by introducing regressors X i,tw, i=2,. . .,6, in the structural t t model (Eq. (19)). Once yearly and weekly seasonal splines are formulated, the appropriate structural time series model is y y y y w w w w yt ¼ lt þ cday þ g 2;0 X2;t þ . . . þ g13;0 X13;t þ g2;0 X2;t þ . . . þ g6;0 X6;t þ dCt þ et : t
ð26Þ The results of estimating this model by maximum likelihood indicate that the daily seasonal pattern is stochastic, whereas the slope term is fixed.18 Then, it was decided to include a deterministic slope component, but the significance test of the slope in the final
18
STAMP 6.0 statistical package was used (Koopman et al., 2000).
564
G. Martı´n-Rodrı´guez, J.J. Ca´ceres-Herna´ndez / Economic Modelling 22 (2005) 551–569
Table 1 Variances of the disturbances Variance q-ratiosa a
Irregular (rˆ 2e )
Level (rˆ g2)
Daily seasonality (rˆx2)
0.0000 (0.0000)
159,200 (1.0000)
9.5941 (0.0001)
The q-ratios are the ratios of each variance of the disturbance terms to the largest.
state vector suggests that the trend could be reduced to a random walk without drift.19 The model fit is reasonably good but the data is subject to a number of outliers. To capture these anomalous observations, intervention variables are introduced into the level component which is formulated as lt ¼ lt1 þ
X
kdhd Idhd þ gt ;
gt fN ID 0; r2g ;
ð27Þ
dhd
where I dhd is equal to one if the observation corresponds to the hour h d of the day d in the sample and zero in other cases.20 The estimating results are shown in Tables 1–5 and Figs. 3–6.21 The estimates of the hyperparameters (Table 1) indicate the extent to which the level of the series and daily seasonal pattern are allowed to change over time. Note that the
Fig. 6. Level component, {l t }. 19 This conclusion requires a comment. Note that the plot of the original series shows a weak upward trend. Moreover, although the significance test leads to the elimination of the slope term, such an elimination may cause the Kalman filter to diverge and, therefore, the distance between observations and predictions may increase, perhaps, according to a cyclical movement (Vargas, 1999, p. 34). However, this behaviour is not observed in the residuals of the model without slope and the estimates of the level component capture the observed upward trend (Fig. 6). 20 Intervention variables are only introduced when the magnitude of outliers is very large. This criterion is useful for high-frequency data because the number of outliers is usually high. 21 The smoothed option of the STAMP 6.0 statistical package was used.
G. Martı´n-Rodrı´guez, J.J. Ca´ceres-Herna´ndez / Economic Modelling 22 (2005) 551–569
565
Table 2 Final state vector Component
Estimate
Component
Estimate
Component
Estimate
lˆ T cˆ 1,T cˆ*1,T cˆ 2,T cˆ*2,T cˆ 3,T cˆ 3,T * cˆ 4,T
19,585.126 84.111a 2404.200 1138.900 1110.600 596.040 90.157 134.000
cˆ 4,T * cˆ 5,T cˆ 5,T * cˆ 6,T cˆ 6,T * cˆ 7,T cˆ 7,T * cˆ 8,T
276.830 211.530 187.660 185.910 24.242a 30.199a 44.600a 48.652a
cˆ 8,T * cˆ 9,T cˆ 9,T * cˆ 10,T cˆ 10,T * cˆ 11,T cˆ 11,T * cˆ 12,T
84.182 93.347 120.950 104.970 32.215a 67.823 10.405a 12.092a
The value of the joint test of the daily seasonal effects indicates that they are statistically significant [v 2(23)=1965.26]. a Nonsignificant at 10% level.
variance of the irregular component is approximately null; however, a nonsystematic behaviour is clearly observed. Indeed, these movements are being captured by other stochastic components, above all by the level component whose disturbance has the largest variance. Despite the statistical nonsignificance of some of the parameters defining the yearly seasonal spline, the corresponding regressors remain in the model since such a spline captures a very relevant seasonal behaviour from an economic point of view.22 Note that the nature of the yearly seasonal effects depends primarily on weather-related factors, such as temperature. Electricity demand increases with cold temperatures because heating is switched on. A period with extreme hot temperatures shows an increase in demand as well because air-conditioning is switched on. Of course, many other features also influence the demand for electricity, for example, other weather conditions. In this sense, the weak statistical relevance of the yearly seasonal pattern may be explained by the rather mild Spanish climate. On the other hand, the negative estimate of the d parameter (Table 3) indicates energy saving as a consequence of the time change. However, this conclusion needs to be considered cautiously since this effect could be related to the yearly seasonal pattern. Note that between March and October, where the time change is located, more moderate temperatures may cause a decrease in demand. Table 4 shows the estimates of the k dhd parameters. Two of them (366-1 and 7311) correspond to the first hour of years 1999 (Friday) and 2000 (Saturday), when New Year celebrations may explain the demand increase. At nine o’clock of the first day of 2001 (Monday), a decrease was registered with regards to the levels corresponding to the beginning of the week and of the working day. The remainder of anomalous 22 The joint F test of the yearly seasonal effects, F 12,35036=0.64969, suggests the statistical nonrelevance of the y y set of regressors X 2,t ,. . .,X 13,t . However, in a structural time series model, this F test is biased towards the nonrejection of the null hypothesis due to the stochastic nature of some components. When the yearly seasonal spline is not included, this seasonal variation may be approximately captured by means of the stochastic level. In any case, including statistically irrelevant regressors may cause a loss of efficiency. However, when the model is estimated without this spline, the effects on the estimates of the remainder of parameters appear to be nonsignificant.
566
G. Martı´n-Rodrı´guez, J.J. Ca´ceres-Herna´ndez / Economic Modelling 22 (2005) 551–569
Table 3 Free parameters in the spline specifications Parameter y gˆ 2,0 y gˆ 3,0 y gˆ 4,0 y gˆ 5,0 y gˆ 6,0 y gˆ 7,0 y gˆ 8,0 y gˆ 9,0 y gˆ 10,0
Estimate 6945.3 2952.6 65946 101190103 42906103 1010.9103 14768103 11517106 1228.2106
t value
Parameter
0.78282 0.51667 0.6749 0.98737 1.0063 1.0023 1.4852 1.9079 1.6301
y gˆ 11,0 y gˆ 12,0 y gˆ 13,0 gˆ w2,0 gˆ w3,0 gˆ w4,0 gˆ w5,0 gˆ w6,0
dˆ
Estimate
t value 3
7269310 1888.6106 16133103 107.58 6267.6 678.14103 47335103 1159.5103 306.89
0.30925 0.28566 0.070768 39.291 30.249 14.906 15.962 25.567 2.0798
observations were located at nine o’clock on Sunday 14th February 1999 and at the first hour of Monday 15th January 2001 and of Monday 22nd January 2001. In these three cases, the demand level was less than predicted. The estimates of the yearly and weekly seasonal effects, cˆhyear,3 y and cˆhweek,3 w , are obtained year year from the estimated regression coefficients for X 2,t ,. . .,X 13,t , in the first case, and week week X 2,t ,. . .,X 6,t , in the weekly case.23 The yearly seasonal effect represents less than 15.6250% of the value of the original series at any time. On average, over the sample, this percentage is 5.1922%. In the weekly case, these two percentages are 29.8470% and 7.3694%, respectively. According to this, weekly seasonal effects appear to be more relevant than yearly seasonal effects. In Fig. 3, the yearly seasonal pattern for a non-leap year is shown. A rising movement is observed that finishes in mid-January, followed by another weak downward movement that continues until the end of April. Then, a moderate growth takes place until the end of July and a sharp decrease leads to minimum levels in midAugust. After the summer, a period characterised by noticeable fluctuations begins: growth in September, decrease until mid-October, local maximum levels at the end of November and mid-December, and local minimum levels at the beginning and end of December, coinciding with Christmas. Fig. 4 presents a typical weekly pattern of hourly electricity demand. Minimum levels are registered at the weekend. On Monday morning and Friday evening, the demand is also less than in other weekdays. The estimate of the daily seasonal component on December 31st 2001 (Table 5) shows a similar pattern to the averages cˆhday,1 d observed in Fig. 5. The differences can be explained by the stochastic nature of this component in the specified model. The stochastic nature of this component is observed more clearly if the evolution of this component is considered for each one of the 24 hours in a day over the sample. Fig. 5
23
The intercept of each spline is calculated in such a way that the average estimated yearly seasonal effects are zero and the weekly seasonal effects sum up to zero over a week. The estimates of the level component (Table 2 and Fig. 6) are also properly corrected so that these intercepts are not captured simultaneously in the trend and seasonal components.
G. Martı´n-Rodrı´guez, J.J. Ca´ceres-Herna´ndez / Economic Modelling 22 (2005) 551–569
567
Table 4 Intervention variables Parameter kˆ 366-1 kˆ 410-9
Estimate 1954.5 2217.0
Parameter kˆ 731-1 kˆ 1097-9
Estimate 2378.9 1885.2
Parameter kˆ 1111-1 kˆ 1118-1
Estimate 2444.9 2695.6
also presents the average, minimum and maximum values of the daily seasonal day component corresponding to each one of the hours within a day, cˆ t(h d ) . The maximum demands are located around midday and, to a higher extent, in the midevening at nightfall around eight o’clock. The daily seasonal effect represents, at the most, 31.8603% of the level of the series at any time. On average, over the sample, this percentage is 9.3845%. If the seasonal effect is considered to be an aggregation of yearly, weekly and daily variations, the last percentage amounts to 12.5665%.
4. Conclusions The study of the hourly Spanish electricity demand series shows some key features. First, there are very relevant high-frequency movements and capturing these variations is very complicated. This circumstance represents an obstacle for short-term forecasting. In this sense, the analysis of each seasonal fluctuation, yearly, weekly and daily, is an important requirement in order to plan supply, since this analysis allows the identification of the main peak demand locations, within a day, a week or a year, and also the times at which demand is much less. The shape of the yearly seasonal pattern, possibly related to weather factors, such as temperature, indicates that there are differences between winter and summer. On the other hand, there is no doubt that Saturdays and Sundays are different to weekdays. Within any day, the seasonal pattern is as expected. The demand levels are the lowest early in the mornings. A rising movement begins at daybreak and finishes around midday. Next, the demand remains approximately at the same level until early evening and a new upward movement leads to a peak in the midevening when domestic and industrial demands are coincident.
Table 5 Daily seasonal component (estimates at the Monday December 31st 2001) Hour
Estimate
Hour
Estimate
Hour
Estimate
1 2 3 4 5 6 7 8
250.4685 1945.8747 2887.2017 3173.2476 3370.9385 3396.2862 2800.8991 2202.6243
9 10 11 12 13 14 15 16
1153.5518 35.0769 1490.3795 1780.0914 1682.0284 1173.9980 308.9819 10.0428
17 18 19 20 21 22 23 24
29.5138 683.0407 3033.6346 3422.2234 3049.6872 2261.2590 1073.3487 1225.7441
568
G. Martı´n-Rodrı´guez, J.J. Ca´ceres-Herna´ndez / Economic Modelling 22 (2005) 551–569
Taking into account that the consumption of electricity depends, among other factors, on the weather—which influences the need for heating or air-conditioning—, on the different standard times between mainland Spain and the Canary Islands—which means different needs for lighting—and on the main economic activities, different seasonal patterns over geographical areas in Spain are expected. Therefore, if spatially disaggregated data is available, the previously described methodology may be used to facilitate the management for the Spanish electricity network in such a way that a gain of efficiency in energy exchanges between plants could be obtained. From a methodological view point, it is shown that structural time series models can be adopted to handle observations recorded every hour. A key feature in such models is the use of splines as an appropriate parsimonious way of modelling simultaneous periodic effects. In addition, a procedure of formulating a spline able to handle leap years is proposed.
Acknowledgements We are very grateful to an anonymous referee for critical comments and suggestions. All errors are the authors’ responsibility.
References Bao, J., 2002. Short-term load forecasting based on neural network and moving average, Iowa State University Report, 2002-05-08. Bunn, D., Farmer, E.D., 1985. Comparative Models for Electrical Load Forecasting. John Wiley, New York. Chan, H.L., Lee, S.K., 1996. Forecasting the demand for energy in China. The Energy Journal 17 (1), 19 – 30. Darbellay, G.A., Slama, M., 2000. Forecasting the short-term demand for electricity. Do neural networks stand a better chance? International Journal of Forecasting 16 (1), 71 – 83. De Boor, C., 1978. A Practical Guide to Splines. Springer-Verlag. Durbin, J., Koopman, S.J., 2001. Time Series Analysis by State Space Models. Oxford University Press. Fatai, K., Oxley, L., Scrimgeour, F.G., 2003. Modelling and forecasting the demand for electricity in New Zealand: a comparison of alternative approaches. The Energy Journal 24 (1), 75 – 102. Fay, D., Ringwood, J.V., Condon, M., Kelly, M., 2003. 24-h electrical load data—a sequential time series? Neurocomputing 55, 469 – 498. Gouni, G.L., Torrion, P., 1988. Risk and costs of failure in the French electricity system. The Energy Journal 9, 33 – 37. Griffin, J.M., 1993. Methodological advances in energy modelling: 1970–1990. The Energy Journal 14 (1), 111 – 124. Harvey, A.C., 1989. Forecasting, Structural Time Series and the Kalman Filter. Cambridge University Press. Harvey, A.C., Koopman, S.J., 1993. Forecasting hourly electricity demand using time-varying splines. Journal of the American Statistical Association 88 (424), 1228 – 1236. Harvey, A.C., Koopman, S.J., Riani, M., 1997. The modelling and seasonal adjustment of weekly observations. Journal of Business & Economic Statistics 15 (3), 354 – 368. Hippert, H.S., Pedreira, C.E., Castro, R., 2001. Neural networks for short-term load forecasting: a review and evaluation. IEEE Transactions on Power Systems 16 (1), 44 – 55. Hsio, C., Chan, M.W.L., Mountain, D.C., Tsui, K.Y., 1987. An integrated monthly and hourly regional electricity model for Ontario, Canada. Resources and Energy 9 (3), 275 – 299.
G. Martı´n-Rodrı´guez, J.J. Ca´ceres-Herna´ndez / Economic Modelling 22 (2005) 551–569
569
Kher, V.H., Joshi, S.K., 2003. Short-term electrical load forecasting using artificial neural network. Australasian Universities Power Engineering Conference, New Zealand. Koopman, S.J., 1992. Diagnostic checking and intra-daily effects in time series models. Tinbergen Institute Research Series 27, 163 – 165. Koopman, S.J., Harvey, A.C., Doornik, J.A., Shephard, N., 2000. STAMP: Structural Time Series Analyser, Modeller and Predictor. Timberlake Consultants. Marsh, L., 1983. On estimating spline regression. Proceedings of SAS User’s Group International 8, 723 – 728. Marsh, L., 1986. Estimating the number and location of knots in spline regression. Journal of Applied Business Research 3, 60 – 70. Marsh, L., Maudgal, M., Raman, J., 1990. Alternative methods of estimating piecewise linear and higher order regression models. Proceedings of SAS User’s Group International 15, 523 – 527. Mehra, M., Bharadwaj, A., 2000. Demand forecasting for electricity, Proceedings of the National Conference on Regulation in Infrastructure Services, New Delhi, organised by the Energy and Resource Institute. Monahan, J., 2001. Numerical Methods of Statistics. Cambridge University Press. Poirier, D.J., 1973. Piecewise regression using cubic splines. Journal of the American Statistical Association 68, 515 – 524. Poirier, D.J., 1976. The Econometrics of Structural Change with Special Emphasis on Spline Functions. NorthHolland. Ramanathan, R., Engle, R., Granger, C.W.J., Vahid-Araghi, F., Brace, C., 1997. Short-run forecasts of electricity loads and peaks. International Journal of Forecasting 13 (2), 161 – 174. Sutherland, R.J., 1983. Distributed lags and the demand for electricity. The Energy Journal 4, 141 – 151. Taylor, J.W., Buizza, R., 2003. Using weather ensemble predictions in electricity demand forecasting. International Journal of Forecasting 19 (1), 57 – 70. Taylor, J.W., Majithia, S., 2000. Using combined forecasts with changing weights for electricity demand profiling. Journal of the Operational Research Society 51, 72 – 82. Vargas, M., 1999. Modelizacio´n de series temporales estacionarias en espacio de estados, Documento de Trabajo de la Facultad de CC.EE. de la Universidad de Castilla la Mancha 2/1999/4.