Multivariate correlation analysis - a method for the analysis of multidimensional time series in environmental studies

Multivariate correlation analysis - a method for the analysis of multidimensional time series in environmental studies

Chemometrics and intelligent laboratory systems ELSEVIER Chemometrics and Intelligent Laboratory Systems 32 (1996) 57-6.5 Multivariate correlation a...

571KB Sizes 0 Downloads 49 Views

Chemometrics and intelligent laboratory systems ELSEVIER

Chemometrics and Intelligent Laboratory Systems 32 (1996) 57-6.5

Multivariate correlation analysis - a method for the analysis of multidimensional time series in environmental studies Sabine GeiB a9*, Jiirgen Einax b a Thiiringer Landesanstalt fir Urnwelt, Zentrallabor, Priissingstr. 25, 07745 Jena-Gkchwitz, Germany b Institut fir Anorganische und Analytische Chemie, Friedrich-Schiller-Uniuersitiit Jena, Steiger 3, 07743 Jena, Germany Received 1 October 1992; accepted 16 July 1995

Abstract After a short demonstration of the theoretical base of univariate, partial and multiple correlation analysis (auto- and crosscorrelation), the multivariate auto- and crosscorrelation function was deduced. One advantage of multivariate correlation is the possibility for simultaneous handling of all variables of the time or local series, which allows considering all interactions within the variables in the series and between the series which are dependent upon lag. Applications in environmental analy-

sis are provided in this text for the partial crosscorrelation and the multivariate auto- and crosscorrelation for three selected practical examples. Keywords: Correlation techniques; Multivariate analysis; Signal/time series analysis

1. Introduction Many analytical investigations deal with the determination of variation of concentration - in time or space [l]. But these concentration variations not only concern themselves with one dimensional cases, e.g., the time series of one parameter [2-71 in one dimension or in areas like in geostatistics [8], but also with many parameters which are changing simultaneously. Especially in environmental analysis such time or local changes of environmental contaminants are very relevant [9]. Multivariate time series models,

* Corresponding author. 0169-7439/96/$15.00

0 1996 Elsevier Science B.V. AI1 rights reserved

SSDI 0169-7439(95)00067-4

e.g., for the autoregressive moving average [ 10,111 or multiway decomposition [12], are available. In environmental analysis there are two problems which need to be solved: 1. Strong variation of the trace concentrations in environmental compartments like air, water and soil due to the natural inhomogeneities. 2. Complex problems of interactions and the common appearance of different parameters. For that purpose the well-known univariate correlation analysis [13] was generalized into the multivariate case. The multivariate correlation analysis permits including all interactions within the variables and excluding the share of the variance due to the variable noise.

58

S. Geip, J. Einax/Chemometrics

and Intelligent Laboratory Systems 32 (1996) 57-65

2. Theory of multivariate auto- and crosscorrelation analysis The aim of correlation analysis is to compare more functions and to calculate their relationship with respect to a change of T (lag) in time or distance. Assumed is a normal distribution of the functions x(t) and y(t) with the means

2.1. Multiple correlation between x and y Multiple correlation between x and y means more variables in y, variables 1, 2, 3,. . . ,m, interact with one variable in x.

%2(T) ... %mtT)

and the variances

s22(r) sx=

sy=

$1:

[i-x(t)]“dt fg

i/-T

.

[y-y(t)12dz g fg

1 = ,br+nm2,jtg x(t)Y(t+ 8 57 - fg

Partial variables only this the other formula)

%yW =

i

correlation between x and y means more in y interact with one variable in x, but one interaction is relevant for our study. So interactions (relations to 2, 3,4,. . . ,m in the have to be excluded. C,*(r)

~+1(2,3,4,...,m)W

=

lEimGx3

g=S-’

S(T) /

=

function

smm-w

r)dt

In practice, correlation analysis can also used for the discrete measuring values of x(t) and y(t). In the simple univariate case, x(t) and y(t) contain one variable. The autocorrelation function is defined as

and the crosscorrelation

.

2.2. Partial correlation between x and y

where t, is the total time or distance. Correlation analysis involves the comparison of functions in relation to local or temporal change. Therefore folding functions can be used: sJ7)

...

SAT)

s&)

%2(T)

...

SIX(~)

%1(T)

%2(T)

...

s2*(7)

s12b)

s22(7)

,..

s3J7)

.

.

.

.

hn(r> %mW %n(4

as

;I

gX(t)y(t+r) -fg

where n is the number of the items and r is the lag. In the following cases more variables in x(t) and y(t) are available.

,%x(3

sir)

2.3. Multivariate correlation between x and y Multivariate correlation between x and y means all relations between the various variables within -X

S. Gei;6,J. Einax/ Chemometrics and Intelligent Laboratory Systems 32 (19%) 57-65

(e.g., x1, x2,. . ., x,,), within Y (e.g., yl, y2,. . . ,y,) and furthermore the relations between X and Y, which are dependent upon lag, are considered. NC% the total variance can be expressed as a correlation matrix by using centered values. The multivariate auto- and crosscorrelation functions have to contain the total variance in dependence on r. Principal component analysis is one possibility for the extraction of the total variance from the correlation matrix. The to tal variance is equal to the sum of positive eigenvalues of the correlation matrices. For II variables, at least 3n values must be available for the calculation, according to the rule of HORST [14]. The multivariate autocorrelation function SX~ (7) _ (for centered values) is defined as sx,W

= [xWx(t

+ r,‘]

Now this multivariate function is reduced by the following instruction into a univariate function [L(r) S,,(r)

- u] =

= 0

i *i A>0

The multivariate crosscorrelation treated in the same way S&> [L(r)

= [x(M+ -ti]

function

was

r,‘]

/ -0.2

-.

-04: '0

1

2

3

4

6

6

7

=

C

8

9 10 11 12 13 14 15 16 17 16 19 20 21 22 la9 (months)

Fig. 1. Partial crosscorrelation function between the nitrate load at the storage reservoir intake and nitrate concentration in the storage reservoir itself.

significancy of the multivariate correlation value, it is necessary to take into consideration two influences. 2.3.1. Multiplication of probabilities A level of 0.9523 = 0.307 is the result of the multivariate case with 23 variables of a statistical significancy of 0.95 for each variable. 2.3.2. Multivariate testing by the spur criterion In the multivariate case the significant correlation coefficients for each variable add to the multivariate critical correlation value.

=0 2.4. The calculation

S*y(7)

59

of degrees of freedom

hi

A>0

These reduced multivariate auto- and crosscorrelation functions are now simply called multivariate auto- and crosscorrelation functions. For testing the

Table 1 Scheme of crosscorrelated

time series of application

f=n.n-n.m=n.(n-m) where n is the number ber of objects.

Time series 2

storage reservoir intake

storage reservoir

monthly 1976-1991(180

objects)

and m the num-

3.1

Time series 1

date

of variables

nitrate load (x variable)

nitrate concentration (var. 1 in y)

COD (var. 2 in y)

PO: (var. 3 in y)

x(O) x(l)

l(O) 10)

2(O)

n(2)

l(2)

2(2)

3(O) 30) 3(3)

x(180)

;(180)

2(180)

j(180)

20)

60

S. Geib, J. Einax/Chemometrics

and Intelligent Laboratory Systems 32 (1996) 57-65

For the determination of IZ variables, II objects are necessary. The rest are degrees of freedom. For example: Dust deposition was analyzed in 23 elements for 60 weeks (objects). In order to find a multivariate probability P =

mo

7,oo

a00

a00

no0

tt.00

3200

#roe

0.977, the following quired.

univariate

is re

Puni" = 2&377 = 0.999

0 a.00







r.00





a00





aw

flmo LyJZn twu

probability

-





n

1100

*



lmo



am

Tim

1

,Iaca



TM)



aw

Fig. 2. Time series of the elements on the first bridge.



t3.w



nm

mw



“‘I

law

l2w

1

v3.m

S. GeijI, J. Einax/Chemometrics

and intelligent Laboratory Systems 32 (1996) 57-65

61

100.

0.“’ a00

n



a00

7.00

wo



no0 mm



nca





(ew

I

80.

40

30

30

20

20

10 -

10 ’

’ 7.00



’ aw



’ 900



s mw

a00

~~00

1l.w

aw

13.00



’ 1t.00



WWfU

,

80

/

/

80

40

0 a.00

a00

nma

w Zn (uonl

-

01” ““““7.00 “” aw

taw

’ lzoo



l3.00

0 a00

i



’ 7.00

m

’ a00



’ e.00



nm

TIIIN Fig. 3. Time series of the elements on the second bridge.

’ tQ.00

m

’ la.00



’ l2.w



13.00

62

S. GeijI, J. Einox/Chemometrics and Intelligent Laboratory Systems 32 (19%) 57-65

Fig. 4. Univariate

crosscorrelation

The number of degrees of freedom depends on the lag r. f=

23(60 - 23) = 851 for T= 0

f=

23(_50 - 23) = 621 for T= 10

rcrir = 0.115 for f= (table for univariate

851 critical

correlation

coefficients

[71) Rcrit = 2.645 for

f = 851

function for each element.

areas is strong in winter, but minimal in summer due to the assimilation of the plants. Due to the storage time of the water another influence on the nitrate concentration in the storage reservoir is the biochemical interaction between the parameters nitrate and chemical oxygen demand and phosphate, respectively. Therefore it was necessary to use the partial crosscorrelation function to analyze the time-lag between the nitrate load upon intake of the storage reservoir and the nitrate concentration in the storage reservoir itself and now it was possible to exclude the

(resulting from multivariate critical correlation value) These critical values depend on r because of the resulting degrees of freedom. 2

Multivariate

cro88correlaiion

function

3. Applications

mean

flow

time

f,6

3.1. Partial crosscorrelation for the determination of the time-lag of nitrate pollution in a storage reservoir In Thuringia there are many storage reservoirs for winning drinking water. These storage reservoirs may have nitrate problems. The nitrate concentration time series upon intake and the reservoir water itself are periodic. The main source of nitrate is agricultural fertilization, The wash-out effect from agricultural

fl

0

1

2

3 lag (30

Fig. 5. Multivariate crosscorrelation transport rate in a river.

4

6

6

7

mid

function for calculation

of the

S. Geij?, J. Einax/ Table 2 Scheme of crosscorrelated Time

6.00 6.30 7.00

Chemometrics

time series for application

63

and Intelligent Laboratory Systems 32 (1996) 57-65

3.2

Time series 1

Time series 2

bridge 1 (x variables)

bridge 2 ( y variables)

Cd

Cr

CU

Fe

Zn

Cd

Cr

CU

Fe

Zn

x JOI x,(l) x,(2)

x,(O) x,(l) x,(2)

x,(O) x&l) x,(2)

x,(O) x,(l) x,(2)

x,(O) x&l) x,(2)

YI(O) Y,(l) Y,(2)

Y,(O) Y*(l) Y*(2)

Y&O) Y,(l) Y,(2)

Y,(O) Y,(l) Y,(2)

Y,(O) Y,(l) Y,(2)

13.00

influences of chemical oxygen demand and phosphate on nitrate concentration in the storage reservoir (Table 1). A lag of 1.5 months was found (Fig. 1). This means maximum nitrate concentrations in the intake of the reservoir cause maximum mitrate concentrations in the storage reservoir 1.5 months later. Such predictions may help to determine if drinking water should be taken from other storage reservoirs, if nitrate concentrations are too high. 3.2. Multivariate crosscorrelation

for computation of

transport rates of dissolved metals in river water

The transport rates of substances in river water are of economic importance, e.g., in potentially hazardous or pollutive situations. Usually these transport rates have been measured by expensive tracer experiments with salts, colored compounds or radioactive isotopes which can cause pollution themselves when the measurements are repeated frequently. Single 1.2 1-t 0.8 0.6-

time series of trace concentrations of metals in the river are evidently of scattered character (Figs. 2 and 3). The univariate crosscorrelation function is a possible means of measuring transport rates in flowing systems. For this purpose, the time series of heavy metal concentrations (Cd, Cr, Cu, Fe and Zn) at two sampling points from the river with a distance of 4.5 km at exactly the same time (Figs. 2 and 3) were crosscorrelated. The 10 time series of the 5 elements were crosscorrelated univariately, e.g., for each element the time series at the first sampling point from the river was crosscorrelated with the time series at the second sampling point. The univariate crosscorrelation functions for the 5 elements are very noisy and offer no useful information (Fig. 4). With the help of the multivariate crosscorrelation analysis has been possible to include the interaction between the metals due to the relationships in emission and transformation (Table 2). The assumption is that the metals are transported at the same rate. The multivariate crosscorrelation function sXY(r) shows a wide maximum at T = 3, e.g., 1.5 h (Fig. 5). This means that the mean transport rate of the metals is 3 km/h. Hydrological data of this river on the same day makes this result plausible (Fig. 5): maximum

flow rate = 4.28 km/h

mean flow rate = 2.64 km/h

3.3. Multivariate-statistical

analysis of dust deposition data on the North Frisian island Pellworm

Fig. 6. Univariate fraction.

autocorrelation

function of Sulfur. Sulfur in 3rd

Aerosol samples [15] were taken once a week over a period of 14 months on the North Frisian island

64

S. GeijI, J. Einax / Chemometrics and Intelligent Laboratory Systems 32 (I 9%) 57-65

Table 3 Scheme of autocorrelated Date

time series in application

3.3 a

1. fraction time series (x variables)

weekly (52 objects)

x,(O), x,(O), x,Ko,

‘.

>

x*3@)

x,(l), x,(l), x,(l), . .’9x,,(l) x,(2), , 42) x,(51), x*(51),

. , x&1)

2. fraction time series Lx variables) weekly (52 objects)

+

x,(O),x,(O),x,(O), >40) x,(l), x,(l), x,(l), ‘. .T x,,(l) x,(2), ., x&9 x,(51), x,(51), . . . , x*,(51) 3. fraction time series (x

Fig. 8. Multivariate

limit

*Ignlficancy

autocorrelation

function of the 2nd fraction.

26,

I

variables)

weekly (52 objects)

x,(O),x,(O),x,(O), >x,,(O) x,(l), x,(l), x,(l), ‘. . >41) x,(2), , x,,(2) x,(51), x,(51), . . ., -%,(51)

10

c +

a 23 elements: As, Ba, Ca, Cd, Cr, Cu, Fe, K, Mn, MO, Ni, Pb, Rb, S, Sb, Se, Sn, Sr, Ti, V, Y, Zn, Zr. +

Pellworm. A high volume sampler was used according to the impactor principle which allows classification for the aerodynamic equivalent diameter for a 50% separation. In six steps dust particles with the diameters demonstrated in Table 4 were precipitated. The sampling period was 14 months, from May 1984 to June 1985. Determination of metal concentrations in the dried digestion solutions was carried out by total reflection X-ray fluorescence (TXRF) analysis.

Fig. 9. Multivariate

+ Fig. 7. Multivariate

8Ignitlcancy

autocorrelation

limit

function of the 1st fraction.

function of the 5th fraction.

Fig. 6 demonstrates the low evidence of the univariate autocorrelation function for the element sulfur, shown here as an example. The multivariate autocorrelation functions (Table 3) of the data set of 23 elements in these dustlike emission samples were demonstrated in Figs 7-9. In the first fraction (30-7.2 pm) (Fig. 7) significant autocorrelation cases are to Table 4 Lags of significant the particle size

i_:_

*Ignllicancy llmlt

autocorrelation

multivariate

autocorrelation

in dependence

Fraction No.

Particle size in pm

Lags of significant multivariate autocorrelation in weeks

1 2 3 4 5 6 7 (total amount)

30-7.2 7.2-3.0 3.0-1.5 1.5-0.7 0.7-0.3 < 0.3 _X

1 5 5-6 7 7-10 6 5

on

S. Gei@,J. Einax/ Chemometrics and Intelligent Laboratory Systems 32 (1996) 57-65

be seen only accidentally. Conclusions for the relationship between the emission situations per week can only be poorly drawn. In the second fraction (7.2-3.0 pm) (Fig. 81, the autocorrelation graph is not so noisy as compared with the first fraction. Significant relationships up to five weeks are obtained between the deposition situations. This tendency of stronger relations between the deposition events with smaller particles continues up to the fraction number 5 (Fig. 9, Table 4). This means, that changes in the emission of smaller particles are more dependent on the preceding emission state than those of bigger particles. So it is better to characterize the long time emission level by smaller particles than by bigger particles of airborne dust.

Acknowledgements The authors thank Professor Michaelis from the research center in Geesthacht for the cooperation in relation to the dust deposition correlation studies.

References [l] K. Doerffel and A. Wundrack, in R. Bock, W. Fresenius, H. Giinzler, H. Huber and G. Tiilg (Eds.), Correlation Functions

65

in Analytics, Analytiker Taschenbuch Bd. 6, Akademieverlag, Berlin, 1986, pp. 37-63 (in German). Dl E. Fiirster and B. R&z, Methods of Correlation and Regression Analysis, Die Wirtschaft, Berlin, 1979 (in German). [31 P. Metzler and B. Nickel, Methodic in Medicine and Psychology, Time Series and Progress Analysis, Hirzel, Leipzig, 1986 (in German). [41 P.J. Brockwell and R.A. Davis, Time Series: Theory and Methods, Springer, Berlin, 1987. 151 T.B. Fomby, R.C. Hill and S.R. Johnson, Advanced Econometric Methods, Springer, Berlin, 1984. [61 R. Schlittgen and B.H.J. Streitherg, Time Series Analysis, Oldenbourg, Mlnchen, 1989 (in German). 171 SM. Pandit and S. Wu, Time Series and System Analysis with Applications, Krieger, Malabar, FL, 1990. Springer, 181 H. Akin and H. Siemes, Practical Geostatistics, Berlin, 1988 (in German). Valuation of Analytical Investiga[91 J. Einax, Chemometrical tions of Metals in Environment, Dissertation B, Jena, Friedrich Schiller University, 1990 (in German). [lOI D.R. Cox, Stand. J. Statist., 8 (1981) 93. Ml M.B. Priestley, Spectral Analysis and Time Series, Academic Press, London, 1981. WI L. Stable, J. Pharm. Biomed. Anal., 9 (1991) 671. 1131 G. Kateman, Chemometrics and Species Identification, Akademieverlag, Berlin, 1987. [141 E. Weher, Introduction to Factor Analysis, Gustav Fischer, Jena, 1974 (in German). WI R.-P. StoOel, Examinations on the Wet and Dry Deposition of Heavy Metals on the Island Pellworm, Dissertation, GKSS Research Center Geesthacht GmbH, Geesthacht, 1987 (in German).