Mathematics
and Computers
in Simulation
32 (1990)
209
209-214
North-Holland
RECURSIVE SMOOTHING
OF ENVIRONMENTAL
C.N. NG and P.C. YOUNG
TIME-SERIES
*
Department of Applied Science, Ciry Polyrechnic of Hong Kong, Kowloon, * Centre for Research on Environmental Systems, University of Luncaster,
Hong Kong United Kingdom
1. INTRODUCTION Recursive
analysis developed in recent years [ 11 provide
methods of time-series
estimation
of models
smoothing
can be used to estimate the structural
utility
with time-variable
of this approach
is illustrated
parameters.
This paper describes
components
by applying
a natural approach to the
how “fixed interval”
of an environmental
it to the analysis
time-series.
of the Mauna
recursive
The practical
Loa atmospheric
carbon
dioxide series. The proposed
method is based on a simple additive “component”
level (or trend) and sinusoidal previous
research
variable,
by assuming
on recursive
cesses. Recursive
the quasi-orthogonal
smoothing
and smoothing and effective
components
component
mal generalisation
evolve as a class of Generalised are employed,
sequential
spectral decomposition
in estimating
of a mean
Following
on from
to be smoothly
time-
Random Walk (GRW) pro-
but they are utilised in a manner that al-
model. The approach
of the observed is statistically
time-series
suboptimal,
into
in that
However, it is justified for practical use by its interpretation the quasi-orthogonal
of the method is straightforward
and multivariate
model, consisting
are permitted
algorithms
is estimated separately.
details of the method and its extensions, continuities,
[2] [3], these components
of the component
in spectral terms and effectiveness
time-series
at the main seasonal period and its harmonics.
that the model parameters
filtering
lows for the straightforward
each harmonic
components
structural components.
Moreover,
opti-
([4] and [5]). Only the basic approach is discussed here;
for example, to handle missing data, extreme observations,
sharp dis-
systems can be found in [5], [6], [7], [B] and [9]
2. RECURSIVE ESTIMATION In this paper, we are concerned series regression
primarily
with the estimation
of time-variable
parameters
y(k) = XT(k) a(k) + e(k)
where the superscript variable parameters;
T denotes vector transpose; x(k) is the vector of regression
(1)
y(k) is the time-series variables;
in question;
mation [l] is applied to this equation,
a(k) is a vector of time
and e(k) is the residual stochastic variation,
zero mean and variance cr2. As we shall see, if a suitably modified form of recursive
0378-4754/90/$3.50
in a linear time-
model of the form:
least squares (RLS) esti-
then it allows us to study any temporal change in the parameters
0 1990, IMACS/Elsevier
Science
Publishers
B.V. (North-Holland)
with
of the
C.N. Ng, P.C.
210
Young / Recursive
smoothing
of environmental
time-series
modelled relationship. There are various methods in which the basic RLS algorithm may be modified so that the influence of old data upon the estimates declines as new data are introduced. These allow the resulting algorithms to respond to any changes in the statistical character of a time-series. The main method employed for this purpose in the present paper is the “fixed interval” recursive smoothing. Here, the restrictive assumption of parametric invariance, inherent in the derivation of the ordinary RLS algorithm, is replaced by the assumption that the parameters evolve as a Gauss-Markov stochastic process of the form, a(k) = F(k-1) a(k-1) + G(k-1) v(k-1)
(2)
Here a(k) is the n dimensional parameter vector at sample time k; v(k) is a vector of zero mean, serially uncorrelated white noise inputs which is assumed to be statistically independent of e(k); and F(k), G(k) are n x n matrices which can, if required, be characterised by time-variable coefficients. The least-squares error estimate, &k), of the time-variable parameter vector a is obtained by the following recursive prediction-correction equation [ 11, Prediction: &k-l)
= F(k-1) &k-l)
P(k/k-1) = F(k-1) P(k-1) FT (k-l) + G(k-1) Q GT (k-l)
I(i) I( ii)
Correction: &k) = &k/k-l) + g(k) [ y(k) - XT(k) &k-l)
1
P(k) = P&/k-l) - g(k) XT(k)P&/k-l) g(k) = W/k-l)
x(k) [ 1+ XT(k)P&/k-l) x(k) 1
I( iii) I(iv)
I(v)
where @/k-l) denotes the a priori prediction of a(k). The “noise variance ratio (NVR)” matrix Q, and the P(k) matrix are both defined in relation to the white measurement noise variance, cr2 , i.e. Q, =Q/ o2 ; P(k)=P*(k)/ 02. Here P*(k) is the error covariance matrix associated with the parameters estimates and Q is the covariance matrix of the parameter perturbation vector v(k), which is diagonal in form if the elements of v(k) are assumed to be uncorrelated. It is important to note that the NVR (diagonal) matrix elements are normally the only unknown parameters in the model to be specified by the analyst. When a time-series is recursively analysed off-line, better use of the available information can be made by using “fixed interval” smoothing, i.e. by modifying the estimates obtained from the above forward recursion using the following backward recursion (see e.g. [ 11 and [2]): gk-l/N) = F-‘[ &k/N) + G Q GT h(k-1) ] h(k-1) = [I,, -P(k)x(k)xT(k)] (FT(k)h(k)-x(k>[y(k)-xT(k)F(k-1) &k-1)1)
W
II
Here, &k/N) is the “smoothed” estimate at k based on all N samples; I,, is an n dimensional identity matrix; while the “Lagrange multiplier” vector h(k) is initiated with zero elements i.e., h(N)=O, where N is the total number of observations.
C.N. Ng, P. C. Young / Recursive
3. THE COMPONENT
TIME-SERIES
smoothing
of environmental
211
time-series
MODEL
In the present method, the observed time-series; Y(k), is decomposed into a number of quasi-orthogonal
components, viz., Y(k) = m(k) + s(k) + e(k)
(3)
where m(k) is the time-variable mean level (or trend); s(k) is nonstationary seasonality ; and e(k) is the residual stochastic part of the signal. By applying the Fourier representation of a complex wave pattern to the time-variable seasonal pattern s(k), it can be decomposed into harmonic components and described by the following Dynamic Regression
Harmonic
(DHR) model, i=h s(k)
= C [eli(k) i=l
cos(2xik/p) + 0,,(k) sin(2niklp)
(4)
for the complete set of h harmonic frequencies, where p is the fundamental seasonal period. Following [I] and [61, each regression coefficient e,,(k), j=1,2; i=1.2, ... , h, of equation (4) may be represented by a member of a family of stochastic, second order Generalised
Random Walk (GRW) models of the following form:
B(k) = a 8 (k-l) + p d(k-1) + v(k-1) (5) d(k) = d(k-1) + w(k-1) where v(k) and w(k) represent zero mean, serially uncorrelated, discrete white noise inputs with variance, q, andq,, respectively. The GRW model subsumes, as special cases, the well known Random Walk (RW: a=l; p-0; the Smoothed Random Walk (SRW: p=pl; Integrated
qw =O);
O
Random Walk (IRW: cx=/3=-y=l;q, =O). In the case of the IRW, B(k) and d(k) can be interpreted as
level and slope variables associated with the variations of the parameter estimates, with the random disturbance entering only through the d(k) equation. It is readily seen that the moving mean level m(k) of the structural model (3) can also be appropriately represented by the GRW model (5), with 8(k) replaced by m(k). 4. ESTIMATING
THE COMPONENTS
OF THE STRUCTURAL
MODEL
The time variable parameters in the component model can all be estimated simultaneously. However, an alternative and computationally less demanding approach is simply to apply the recursive filtering and smoothing algorithms, (I) and (II), sequentially to the various component models described in Section 3, with the stochastic component being obtained by subtraction. In this situation, the parameters for the time-varying mean and those parameters associated with each of the harmonic components are estimated separately and sequentially by exploiting the excellent spectral properties of the smoothing algorithms [4] [5] [61 . As a result, the model estimated at each stage is considerably smaller than that which would be needed if all the components were estimated simultaneously. Figs. 1 and 2, show how the amplitude spectra for the useful IRW trend and DHR seasonal smoothing algo-
212
C. N. Ng, P. C. Young / Recursive smoothing of environmental time-series
rithms are controlled by the selected NVR values. In particular, Fig.1 reveals that the IRW trend algorithm (termed IRWSMOOTH) is a very effective “low-pass” filter, with particularly sharp “cut-off’ properties for low values of NVR. Similar “cut-off’ properties also exist on both sides of the “bandpass” associated with the DHR smoothing algorithm (DHRSMOOTH), based on an IRW model for the harmonic regression parameters (Fig.2). It is clear that, in all cases, the scalar NVR defines the “bandwidth” of the smoothing algorithm. The phase characteristics are not shown, since the algorithms are all the “two-pass” smoothing type and so exhibit zero phase shift for all frequencies. The NVR values for trend and seasonality estimation may be chosen by reference to the spectral characteristics of the estimated components and the original series. Alternatively, it is possible to investigate the frequency domain response of the estimator applied to trend and harmonic models in order to determine the relationship between the NVR smoothness parameter and the range of frequencies (i.e. the band-pass) associated with the smoothing algorithms and, therefore, with the resulting estimated component [4]-[7]. Finally, note that, at each step in this sequential spectral decomposition (SSD) approach, the theoretical requirement that the residuals of the least-squares model should be serially uncorrelated is violated and the approach may, therefore be considered sub-optimal in this narrow one-step-ahead prediction error sense. However, the method is easily justified in the alternative filtering terms preferred here, provided the frequencies of interest at each stage in the decomposition are orthogonal to the remainder. Moreover, state space forecasting procedures based on models obtained in this manner often out-perform equivalent models based on maximum likelihood estimation [5]-[7], [8],[9]. 5. A PRACTICAL
EXAMPLE
Figure 3 shows the monthly average atmospheric concentration of carbon dioxide recorded at Mauna Loa in Hawaii between May 1974 and September 1987. The data clearly reveal a very low frequency trend and strong 12 monthly seasonal cycles. This series is the basis for the conjecture that the general increase of atmospheric carbon dioxide is caused by the burning of fossil fuels and the destruction of forests.The rising “mean” level of the series is represented by the smooth trend curve (dashed line) estimated by the IRWSMOOTH algorithm using an NVR of 0.0001. In this manner, we allow the estimated trend to account for any very long term trend behaviour but ensure that the detrended data (not shown) contain all the information on the 12 monthly seasonal cycles. Using the proposed SSD approach, each of the six 12 monthly harmonic components (i.e. at frequency 12, 6, 4, 3, 2.4 and 2 months per cycle) is estimated separately and sequentially using the DHRSMOOTH algorithm based on an IRW model, with NVR=O.OOl in each case. The total seasonal component, as obtained by summing all of the separately estimated harmonic components, is presented in Fig. 4. When viewed in isolation, it may be seen that the seasonal cycle has fluctuated in amplitude and, in fact, grown slightly over the full length of the record, There are several explanations for this phenomenons and details can be found in, e.g. [lo] (see also [4]). The non-seasonal component (Fig. 5), as obtained by removing the trend and seasonal components from the observed series, reveals some interesting features, which are difficult to observe in the original series. For example, the series appears to have quasi-periodic fluctuations with period of about 40 months. The related seasonally adjusted series (i.e. trend + non-seasonal), as shown in Fig.6 (full line), reveals the changes in the
C.N. Ng, P.C.
Young / Recursive
smoothing
of environmenial
time-series
213
underlying growth rate of carbon dioxide concentration more clearly. Two significant decelerations seem to have occurred 1974-1976 and 1980-1982 which seem, at first, to correlate with declines in the global consumption of fossil fuels, especially petroleum, during and shortly after the “oil crises” of 1973 and 1979 [41. However, these perturbations, which are associated with the 40 month period oscillations in the non-seasonal series, seem also to be related to sea surface temperature anomalies in the Pacific Ocean, which also exhibit quasi-periodic disturbances of a similar frequency [8][9]. 6. CONCLUSIONS This paper has outlined a new approach to estimating the structural components of a nonstationary environmental time-series model based on recursive “fixed interval” smoothing, where the model parameter variations are assumed to follow a generalised random walk process. The flexibility of this stochastic formulation allows for a suitable degree of variability in the estimated trend and seasonality. For instance, it is possible to allow easily for sudden changes, discontinuities, missing data points and outliers. The approach is conceptually simple and very easy to use since it has very few parameters that need to be specified by the analyst. Also, it is felt that the stochastic state-space formulation is inherently more flexible than other alternatives, based on regularisation methods, to which it can be related (see [3]). References
ill Young, P.C., Recursive Estimation and Time-series Analysis, Berlin: Springer-Verlag, 1984. [2] Norton, J.P., Optimal Smoothing in the Identification of Linear Time-variable Systems, Proc. IEE, 122, 663-668,1975. [3] Jakeman, A.J., and Young, P.C., Recursive filtering and the inversion of ill-posed causal problems. Utiiitas Mathematics, 35,351-376, 1984 [41 Young, T.J., The Application of Recursive Estimation and Other Time-series Analysis Techniques to Climatological Records, Ph.D Thesis, Univ. of Lancaster, U.K., 1987. [5] Young, P.C., Recursive extrapolation, interpolation and smoothing of non-stationary time-series, in H.F. Chen (ed.) Identification and System Parameter Estimation, 1988, Oxford: Pergamon, 1988. [61 Ng, C.N., Recursive Identification, Estimation and Forecasting of Non-stationary Time-series, Ph.D. Thesis, Univ. of Lancaster, U.K., 1987. [71Young, P.C. and Ng, C.N., Variance intervention, to appear, J.Forecasting, 1989. [8] Pearmann, G.I. and Hyson, P., The annual variation of atmospheric carbon dioxide concentration observed in the Northern Hemisphere, J. Geophys. Res., 86,9839-9843, 1981. [9] Young, P.C., Ng, C.N., Lane, K., and Parker, D. Recursive forecasting, smoothing and seasonal adjustment of nonstationary environmental data, to appear special issue of the J. Forecasting, 1989. [lo] Young, P.C., and Young, T.J. Envirometric methods of nonstationary time-series analysis, to appear as Chapter 3 in C.N. Hewitt (ed.) Methods of Environmental Data Analysis , to appear 1989/90.
C. N. Ng, P. C. Young / Recursive smoothing of environmental time-series
214
1
:, (‘i :t
l-
I
T\ j:\I I
0:
i’
4
0.8
:I
c-5. E 4
I:
3
0.20
C\ 0
-E E (
HYR:
\,
i ; ’
I I
\
-
.00001
_____.
.ow,
8:
/;
OS-
;i
jI
I! t:
:,
c
01
0.4 -:\ I
I:,
:,
I
)
3
1
i, ;i ,:t Ij
o.s-
;,
0.6-
-g
I:
;
:I
i;
I:
NVR :
0.4-
L00001 0001 I_____.
0.2 -. -)L___ I 0.1
\. 0.0
I 0.2
I 0.3
001 !_.._._..._._
.oo, .Ol
I 0.4
c 0.5
-0:o
o:,
FIGURE 1. Frequency responses of the IRWSMOOTH
0:2
0:3
Cycle per
Cycle per interval
0.i
0:s
Interval
FIGURE 2. Frequency responses of the DHR filter based on the IRW model.
filter.
E g 340 IRW: NVR = .OOO,
333
330 -4 325
f
0 I May 74
25
50
75
I 125
100
r 175
150 5ept
Months
HR-IRW:
-5 0
25
50
75
100
125
NVR=.OOl 150
175
87
FIGURE 3. IRWSMOOTH trend of the Mauna Loa carbon dioxide series.
FIGURE 4. Seasonal component based on the HR-IRW model.
1
1
0.5
IRW:
NVR=
HR-IRW:
-1
I 0
I May 74
1
25
50
I
!
75
100
Months
,
125
,
150
175 I s*pt
.OOOl
NRW=.OOl
325
0
1 25
50
I 75
I 100
125
87
FIGURE 5. Non-seasonal component of the Mauna Loa carbon dioxide series based on the IRW + HR-IRW model.
FIGURE 6. Seasonal adjusted series.
I 150
I 175