Recursive smoothing of environmental time-series

Recursive smoothing of environmental time-series

Mathematics and Computers in Simulation 32 (1990) 209 209-214 North-Holland RECURSIVE SMOOTHING OF ENVIRONMENTAL C.N. NG and P.C. YOUNG TIME...

468KB Sizes 1 Downloads 59 Views

Mathematics

and Computers

in Simulation

32 (1990)

209

209-214

North-Holland

RECURSIVE SMOOTHING

OF ENVIRONMENTAL

C.N. NG and P.C. YOUNG

TIME-SERIES

*

Department of Applied Science, Ciry Polyrechnic of Hong Kong, Kowloon, * Centre for Research on Environmental Systems, University of Luncaster,

Hong Kong United Kingdom

1. INTRODUCTION Recursive

analysis developed in recent years [ 11 provide

methods of time-series

estimation

of models

smoothing

can be used to estimate the structural

utility

with time-variable

of this approach

is illustrated

parameters.

This paper describes

components

by applying

a natural approach to the

how “fixed interval”

of an environmental

it to the analysis

time-series.

of the Mauna

recursive

The practical

Loa atmospheric

carbon

dioxide series. The proposed

method is based on a simple additive “component”

level (or trend) and sinusoidal previous

research

variable,

by assuming

on recursive

cesses. Recursive

the quasi-orthogonal

smoothing

and smoothing and effective

components

component

mal generalisation

evolve as a class of Generalised are employed,

sequential

spectral decomposition

in estimating

of a mean

Following

on from

to be smoothly

time-

Random Walk (GRW) pro-

but they are utilised in a manner that al-

model. The approach

of the observed is statistically

time-series

suboptimal,

into

in that

However, it is justified for practical use by its interpretation the quasi-orthogonal

of the method is straightforward

and multivariate

model, consisting

are permitted

algorithms

is estimated separately.

details of the method and its extensions, continuities,

[2] [3], these components

of the component

in spectral terms and effectiveness

time-series

at the main seasonal period and its harmonics.

that the model parameters

filtering

lows for the straightforward

each harmonic

components

structural components.

Moreover,

opti-

([4] and [5]). Only the basic approach is discussed here;

for example, to handle missing data, extreme observations,

sharp dis-

systems can be found in [5], [6], [7], [B] and [9]

2. RECURSIVE ESTIMATION In this paper, we are concerned series regression

primarily

with the estimation

of time-variable

parameters

y(k) = XT(k) a(k) + e(k)

where the superscript variable parameters;

T denotes vector transpose; x(k) is the vector of regression

(1)

y(k) is the time-series variables;

in question;

mation [l] is applied to this equation,

a(k) is a vector of time

and e(k) is the residual stochastic variation,

zero mean and variance cr2. As we shall see, if a suitably modified form of recursive

0378-4754/90/$3.50

in a linear time-

model of the form:

least squares (RLS) esti-

then it allows us to study any temporal change in the parameters

0 1990, IMACS/Elsevier

Science

Publishers

B.V. (North-Holland)

with

of the

C.N. Ng, P.C.

210

Young / Recursive

smoothing

of environmental

time-series

modelled relationship. There are various methods in which the basic RLS algorithm may be modified so that the influence of old data upon the estimates declines as new data are introduced. These allow the resulting algorithms to respond to any changes in the statistical character of a time-series. The main method employed for this purpose in the present paper is the “fixed interval” recursive smoothing. Here, the restrictive assumption of parametric invariance, inherent in the derivation of the ordinary RLS algorithm, is replaced by the assumption that the parameters evolve as a Gauss-Markov stochastic process of the form, a(k) = F(k-1) a(k-1) + G(k-1) v(k-1)

(2)

Here a(k) is the n dimensional parameter vector at sample time k; v(k) is a vector of zero mean, serially uncorrelated white noise inputs which is assumed to be statistically independent of e(k); and F(k), G(k) are n x n matrices which can, if required, be characterised by time-variable coefficients. The least-squares error estimate, &k), of the time-variable parameter vector a is obtained by the following recursive prediction-correction equation [ 11, Prediction: &k-l)

= F(k-1) &k-l)

P(k/k-1) = F(k-1) P(k-1) FT (k-l) + G(k-1) Q GT (k-l)

I(i) I( ii)

Correction: &k) = &k/k-l) + g(k) [ y(k) - XT(k) &k-l)

1

P(k) = P&/k-l) - g(k) XT(k)P&/k-l) g(k) = W/k-l)

x(k) [ 1+ XT(k)P&/k-l) x(k) 1

I( iii) I(iv)

I(v)

where @/k-l) denotes the a priori prediction of a(k). The “noise variance ratio (NVR)” matrix Q, and the P(k) matrix are both defined in relation to the white measurement noise variance, cr2 , i.e. Q, =Q/ o2 ; P(k)=P*(k)/ 02. Here P*(k) is the error covariance matrix associated with the parameters estimates and Q is the covariance matrix of the parameter perturbation vector v(k), which is diagonal in form if the elements of v(k) are assumed to be uncorrelated. It is important to note that the NVR (diagonal) matrix elements are normally the only unknown parameters in the model to be specified by the analyst. When a time-series is recursively analysed off-line, better use of the available information can be made by using “fixed interval” smoothing, i.e. by modifying the estimates obtained from the above forward recursion using the following backward recursion (see e.g. [ 11 and [2]): gk-l/N) = F-‘[ &k/N) + G Q GT h(k-1) ] h(k-1) = [I,, -P(k)x(k)xT(k)] (FT(k)h(k)-x(k>[y(k)-xT(k)F(k-1) &k-1)1)

W

II

Here, &k/N) is the “smoothed” estimate at k based on all N samples; I,, is an n dimensional identity matrix; while the “Lagrange multiplier” vector h(k) is initiated with zero elements i.e., h(N)=O, where N is the total number of observations.

C.N. Ng, P. C. Young / Recursive

3. THE COMPONENT

TIME-SERIES

smoothing

of environmental

211

time-series

MODEL

In the present method, the observed time-series; Y(k), is decomposed into a number of quasi-orthogonal

components, viz., Y(k) = m(k) + s(k) + e(k)

(3)

where m(k) is the time-variable mean level (or trend); s(k) is nonstationary seasonality ; and e(k) is the residual stochastic part of the signal. By applying the Fourier representation of a complex wave pattern to the time-variable seasonal pattern s(k), it can be decomposed into harmonic components and described by the following Dynamic Regression

Harmonic

(DHR) model, i=h s(k)

= C [eli(k) i=l

cos(2xik/p) + 0,,(k) sin(2niklp)

(4)

for the complete set of h harmonic frequencies, where p is the fundamental seasonal period. Following [I] and [61, each regression coefficient e,,(k), j=1,2; i=1.2, ... , h, of equation (4) may be represented by a member of a family of stochastic, second order Generalised

Random Walk (GRW) models of the following form:

B(k) = a 8 (k-l) + p d(k-1) + v(k-1) (5) d(k) = d(k-1) + w(k-1) where v(k) and w(k) represent zero mean, serially uncorrelated, discrete white noise inputs with variance, q, andq,, respectively. The GRW model subsumes, as special cases, the well known Random Walk (RW: a=l; p-0; the Smoothed Random Walk (SRW: p=pl; Integrated

qw =O);

O
Random Walk (IRW: cx=/3=-y=l;q, =O). In the case of the IRW, B(k) and d(k) can be interpreted as

level and slope variables associated with the variations of the parameter estimates, with the random disturbance entering only through the d(k) equation. It is readily seen that the moving mean level m(k) of the structural model (3) can also be appropriately represented by the GRW model (5), with 8(k) replaced by m(k). 4. ESTIMATING

THE COMPONENTS

OF THE STRUCTURAL

MODEL

The time variable parameters in the component model can all be estimated simultaneously. However, an alternative and computationally less demanding approach is simply to apply the recursive filtering and smoothing algorithms, (I) and (II), sequentially to the various component models described in Section 3, with the stochastic component being obtained by subtraction. In this situation, the parameters for the time-varying mean and those parameters associated with each of the harmonic components are estimated separately and sequentially by exploiting the excellent spectral properties of the smoothing algorithms [4] [5] [61 . As a result, the model estimated at each stage is considerably smaller than that which would be needed if all the components were estimated simultaneously. Figs. 1 and 2, show how the amplitude spectra for the useful IRW trend and DHR seasonal smoothing algo-

212

C. N. Ng, P. C. Young / Recursive smoothing of environmental time-series

rithms are controlled by the selected NVR values. In particular, Fig.1 reveals that the IRW trend algorithm (termed IRWSMOOTH) is a very effective “low-pass” filter, with particularly sharp “cut-off’ properties for low values of NVR. Similar “cut-off’ properties also exist on both sides of the “bandpass” associated with the DHR smoothing algorithm (DHRSMOOTH), based on an IRW model for the harmonic regression parameters (Fig.2). It is clear that, in all cases, the scalar NVR defines the “bandwidth” of the smoothing algorithm. The phase characteristics are not shown, since the algorithms are all the “two-pass” smoothing type and so exhibit zero phase shift for all frequencies. The NVR values for trend and seasonality estimation may be chosen by reference to the spectral characteristics of the estimated components and the original series. Alternatively, it is possible to investigate the frequency domain response of the estimator applied to trend and harmonic models in order to determine the relationship between the NVR smoothness parameter and the range of frequencies (i.e. the band-pass) associated with the smoothing algorithms and, therefore, with the resulting estimated component [4]-[7]. Finally, note that, at each step in this sequential spectral decomposition (SSD) approach, the theoretical requirement that the residuals of the least-squares model should be serially uncorrelated is violated and the approach may, therefore be considered sub-optimal in this narrow one-step-ahead prediction error sense. However, the method is easily justified in the alternative filtering terms preferred here, provided the frequencies of interest at each stage in the decomposition are orthogonal to the remainder. Moreover, state space forecasting procedures based on models obtained in this manner often out-perform equivalent models based on maximum likelihood estimation [5]-[7], [8],[9]. 5. A PRACTICAL

EXAMPLE

Figure 3 shows the monthly average atmospheric concentration of carbon dioxide recorded at Mauna Loa in Hawaii between May 1974 and September 1987. The data clearly reveal a very low frequency trend and strong 12 monthly seasonal cycles. This series is the basis for the conjecture that the general increase of atmospheric carbon dioxide is caused by the burning of fossil fuels and the destruction of forests.The rising “mean” level of the series is represented by the smooth trend curve (dashed line) estimated by the IRWSMOOTH algorithm using an NVR of 0.0001. In this manner, we allow the estimated trend to account for any very long term trend behaviour but ensure that the detrended data (not shown) contain all the information on the 12 monthly seasonal cycles. Using the proposed SSD approach, each of the six 12 monthly harmonic components (i.e. at frequency 12, 6, 4, 3, 2.4 and 2 months per cycle) is estimated separately and sequentially using the DHRSMOOTH algorithm based on an IRW model, with NVR=O.OOl in each case. The total seasonal component, as obtained by summing all of the separately estimated harmonic components, is presented in Fig. 4. When viewed in isolation, it may be seen that the seasonal cycle has fluctuated in amplitude and, in fact, grown slightly over the full length of the record, There are several explanations for this phenomenons and details can be found in, e.g. [lo] (see also [4]). The non-seasonal component (Fig. 5), as obtained by removing the trend and seasonal components from the observed series, reveals some interesting features, which are difficult to observe in the original series. For example, the series appears to have quasi-periodic fluctuations with period of about 40 months. The related seasonally adjusted series (i.e. trend + non-seasonal), as shown in Fig.6 (full line), reveals the changes in the

C.N. Ng, P.C.

Young / Recursive

smoothing

of environmenial

time-series

213

underlying growth rate of carbon dioxide concentration more clearly. Two significant decelerations seem to have occurred 1974-1976 and 1980-1982 which seem, at first, to correlate with declines in the global consumption of fossil fuels, especially petroleum, during and shortly after the “oil crises” of 1973 and 1979 [41. However, these perturbations, which are associated with the 40 month period oscillations in the non-seasonal series, seem also to be related to sea surface temperature anomalies in the Pacific Ocean, which also exhibit quasi-periodic disturbances of a similar frequency [8][9]. 6. CONCLUSIONS This paper has outlined a new approach to estimating the structural components of a nonstationary environmental time-series model based on recursive “fixed interval” smoothing, where the model parameter variations are assumed to follow a generalised random walk process. The flexibility of this stochastic formulation allows for a suitable degree of variability in the estimated trend and seasonality. For instance, it is possible to allow easily for sudden changes, discontinuities, missing data points and outliers. The approach is conceptually simple and very easy to use since it has very few parameters that need to be specified by the analyst. Also, it is felt that the stochastic state-space formulation is inherently more flexible than other alternatives, based on regularisation methods, to which it can be related (see [3]). References

ill Young, P.C., Recursive Estimation and Time-series Analysis, Berlin: Springer-Verlag, 1984. [2] Norton, J.P., Optimal Smoothing in the Identification of Linear Time-variable Systems, Proc. IEE, 122, 663-668,1975. [3] Jakeman, A.J., and Young, P.C., Recursive filtering and the inversion of ill-posed causal problems. Utiiitas Mathematics, 35,351-376, 1984 [41 Young, T.J., The Application of Recursive Estimation and Other Time-series Analysis Techniques to Climatological Records, Ph.D Thesis, Univ. of Lancaster, U.K., 1987. [5] Young, P.C., Recursive extrapolation, interpolation and smoothing of non-stationary time-series, in H.F. Chen (ed.) Identification and System Parameter Estimation, 1988, Oxford: Pergamon, 1988. [61 Ng, C.N., Recursive Identification, Estimation and Forecasting of Non-stationary Time-series, Ph.D. Thesis, Univ. of Lancaster, U.K., 1987. [71Young, P.C. and Ng, C.N., Variance intervention, to appear, J.Forecasting, 1989. [8] Pearmann, G.I. and Hyson, P., The annual variation of atmospheric carbon dioxide concentration observed in the Northern Hemisphere, J. Geophys. Res., 86,9839-9843, 1981. [9] Young, P.C., Ng, C.N., Lane, K., and Parker, D. Recursive forecasting, smoothing and seasonal adjustment of nonstationary environmental data, to appear special issue of the J. Forecasting, 1989. [lo] Young, P.C., and Young, T.J. Envirometric methods of nonstationary time-series analysis, to appear as Chapter 3 in C.N. Hewitt (ed.) Methods of Environmental Data Analysis , to appear 1989/90.

C. N. Ng, P. C. Young / Recursive smoothing of environmental time-series

214

1

:, (‘i :t

l-

I

T\ j:\I I

0:

i’

4

0.8

:I

c-5. E 4

I:

3

0.20

C\ 0

-E E (

HYR:

\,

i ; ’

I I

\

-

.00001

_____.

.ow,

8:

/;

OS-

;i

jI

I! t:

:,

c

01

0.4 -:\ I

I:,

:,

I

)

3

1

i, ;i ,:t Ij

o.s-

;,

0.6-

-g

I:

;

:I

i;

I:

NVR :

0.4-

L00001 0001 I_____.

0.2 -. -)L___ I 0.1

\. 0.0

I 0.2

I 0.3

001 !_.._._..._._

.oo, .Ol

I 0.4

c 0.5

-0:o

o:,

FIGURE 1. Frequency responses of the IRWSMOOTH

0:2

0:3

Cycle per

Cycle per interval

0.i

0:s

Interval

FIGURE 2. Frequency responses of the DHR filter based on the IRW model.

filter.

E g 340 IRW: NVR = .OOO,

333

330 -4 325

f

0 I May 74

25

50

75

I 125

100

r 175

150 5ept

Months

HR-IRW:

-5 0

25

50

75

100

125

NVR=.OOl 150

175

87

FIGURE 3. IRWSMOOTH trend of the Mauna Loa carbon dioxide series.

FIGURE 4. Seasonal component based on the HR-IRW model.

1

1

0.5

IRW:

NVR=

HR-IRW:

-1

I 0

I May 74

1

25

50

I

!

75

100

Months

,

125

,

150

175 I s*pt

.OOOl

NRW=.OOl

325

0

1 25

50

I 75

I 100

125

87

FIGURE 5. Non-seasonal component of the Mauna Loa carbon dioxide series based on the IRW + HR-IRW model.

FIGURE 6. Seasonal adjusted series.

I 150

I 175