Meteorological networks optimization from a statistical point of view

Meteorological networks optimization from a statistical point of view

Computational North-Holland Statistics & Data Analysis 9 (1990) 57-75 57 Meteorological networks optimization from a statistical point of view Gu...

1MB Sizes 0 Downloads 47 Views

Computational North-Holland

Statistics

& Data Analysis

9 (1990) 57-75

57

Meteorological networks optimization from a statistical point of view Guy der MEGREDITCHIAN Etablissement

d’Etudes et de Recherches MPtPorologiques,

75007 Paris, France

Received April 18, 1988 Revised October 3, 1988 Abstract: In the paper, the problem of Network optimization is studied from the point of view of Network correlational redundancy minimization. For that a definition of redundancy is given by means of the introduction of new notion: the number of independent observations “equivalent” in the sense of field anomalicity to a given number of correlated observations. Such a criteria allows us to solve both the problem of optimal ranking of a set of stations, and the problem of optimal localisation of a new station for an already given set. For that various backward and forward iterative algorithms are used, in connection with the definition or computing of spatial correlation function of the concerned field. Various examples of concrete applications are given for different meteorological fields. Keywords: Networks optimization, Correlational redundancy, ward) iterative algorithms, Global (local) extremum.

Equivalent

number,

Backward

(fore-

1. Introduction The problem of rational planning of a network of meteorological stations is extremely complex and involves numerous aspects concerning even the very reason for its existence. What is the purpose of such a network? What kind of information has it to give ? These are the reasons for which the optimization problem may be approached from many points of view: economical, aeronautical, administrative, forecasting, and so on. In order to counciliate such various points of view, we will present a procedure of optimization of the “informativness” which gives the network about the concerned field. The basic principle for an adequate choice of the network is that the latter gives the most faithful image of the field. We will consider three types of problems associated with the optimization of meteorological networks, and present adequate methods of resolution and concrete applications we have made to date. 0167-9473/90/$3.50

0 1990, Elsevier

Science Publishers

B.V. (North-Holland)

G. der Megreditchian / Meteorological networks optimization

58

The first problem is the preferential ranking of the existing stations in the network. In case of budget restrictions, for example, it’s necessary to know which station (or stations) must be closed in order to minimize the loss of information regarding the whole field. The second problem concerns the optimal location of new stations, which must be placed in the most efficient way, in order to get the maximal increase of information about the field. The third problem is the choice of the optimum mesh of the regular grid used for the description of the field. The main question is how should the best compromise between completness and economy be realized?

2. Economical effect of network optimization Insufficient meteorological information leads to some losses P for the country’s economy (human lives, harvest destruction, transport difficulties and so on) which will decrease with a better design of the meteorological observation systems. On the other hand, the expenses C associated with the maintenance of the meteorological network will grow as the number of stations increase. A reasonable economical criterion may be defined as the sum S of the cost C of the maintenance expenses and the losses P due to the ignorance of the meteorological conditions: s=p+c. Thus, the network optimization corresponds to the search of the network for which the value of the quantity S is minimum: min S

V Networks.

Nevertheless, it’s easy to conceive practical applications of such optimization.

3. The statistical structure of meteorological

difficulties

linked with concrete

fields

The meteorologist is concerned not only with the statistical behaviour of individual meteorological elements, but also with their reciprocal influences, their statistical interdependencies. Generally, they are described by the cross moments of the second order, the definition of which is based on the operator of mathematical expectation:

fm

E[G(X)l= J-m #(z> dF(Z), K

which is defined for every scalar function G(X)

of the random vector X.

G. der Megreditchian

/ Meteorological

networks optimization

59

4. Couple of random variables To characterize the degree of statistical dependence between the random variables x and y, the following parameters are introduced: the non central coefficients of covariance s(x, y) and structure 6(x, v):

s[x, y] =E[x-y12.

Yl=e4~

4x3 The central u[x,

coefficients

of covariance y[x,

Y] =E[C%

g[x,

y( x, v)

y] =E[f-j12.

The normalized and central coefficients cient) and structure g(x, y):

r[x, y] =E[:, ;],

u( x, _JJ)and structure

of covariance

Y(X, v) (correlation

coeffi-

y] =E[:-y”]*.

When we have a sample of T observations x(l) ,...,

x(t) ,...,

x(T)

y(l),...,

y(t),...,

Y(T)

it is possible to compute the corresponding empirical coefficients by means of similar formulas, introducing the corresponding empirical operator of mathematical expectation:

Then, we obtain:

sI[x, y]

6^[x,y] =lQx-Y12,

=&*Yl,

.;‘[x,y] =i[i-jq2. &qx,y] =~[~-;]‘.

D[x,y] =Aqcjq> P[x,y] =@*;],

We can remark here that covariance coefficients are similarity indexes, while structure coefficients are dissimilarity indexes. Let us reiterate that the coefficients Y(X, y) are usually called correlation coefficients and y(x, v) variograms. Some obvious relations link these covariance and structure coefficients in the case of homogenous and isotropic fields: U[X, y] = s[x, JI] - E(x) - E(y); Y[X,

VI = 4x>

ul -

[E(x)

-

W42;

Y[X,

~1 =

4x7

xl + 4y, VI - 2~1.~3~1;

g[x, y] = 211 intensity

T(X, y)[. Each of these coefficients expresses in its own way the of the statistical dependence between x and y variables.

Random vector Let X= (x1 ,..., interdependence

x,) and Y= (vt ,..., Ye) be two random vectors. The statistical between their components will be described by means of the

G. der Megreditchian / Meteorological networks optimization

60

matrices formed by the corresponding scalar coefficients _J’I: Sxv’

Yj])?

A.xy= (‘[xl,

Yj]),

Kc’xy= (v[xiT

Yj])>

rXy=

(Y[xi3

Yj]),

Rx,=

Yj]),

Gxy=

(g[X,,

Yj]).

(s[xi,

(r[xi,

for each pair of xi and

In particular, if X= Y, the statistical dependence between the vector components may be described by means of the following matrices: SXX) v,,>

R,,,

A,,,

r,,,

G,,.

The definition of the corresponding empirical matrices $.., fx,, fx,!, .d,..‘, ex,, k,, is based on the same principle and the corresponding computations may be carried out if the samples of T observations are available: X(I),...,

X(t),...,

X(T),

Y(l),...,

Y(t),...,

Y(T),

Thus, we have many adequate parameters for the description interdependence between the components of random vectors.

of statistical

Random fields It is difficult to characterize in an exhaustive way the statistical dependence between the values of random fields observed at different points because it is necessary to use the joint distribution functions of high order. That’s why the study is generally limited to the covariance and the structure functions of those fields. Let us introduce the following notations: x( t, Pi) the value of the meteorological field X observed at the time t at the point Pi, and y( 7, P,) the value of the meteorological field y observed at the time r at the point Pj. The spatio-temporal statistical structure of the fields x and y is then described by the spatio-temporal covariance V,,( t, 7, Pi, P,) and structure y,,( t, r, Pi, P,) functions defined according to the corresponding definitions of the same coefficients for a pair of random variables by means of the following relations: %Y[~> 7, Pi9 Pj] =“[x(t,

Pi>, Y(‘?

Yxy[l’

Pi>, YCT7 Pj)].

7T Pi? Pj] =Y[x(r7

P,)],

From such general definitions, we obtain in particular spatial covariance V,,(t, t, P,, P,) and structure y,,(t, t, Pi, P,) functions by setting t = 7, and also the temporal covariance V,,( t, 7, P, P) and structure y,,(t, 7, P, P) functions by setting Pi = P, = P. In a similar way, we can introduce normalized spatio-temporal covariance Y,~(t, 7, Pi, P,) and structure g( t, T, Pi, P,) functions, and their purely spatial r,,(t, t, pi> purely temporal r,,(t, 7, P, PI, gxy(t, r, P, P) expressions. Pj),

gxy(t,

t,

pi,

P,)

or

G. der Megreditchian

/ Meteorological

networks optimization

61

We will distinguish also the cross-covariance or cross-structure functions if x # y and auto-covariance or auto-structure functions if x = y. The definition of the statistical structure of random fields is mainly simplified if the fields are homogeneous and isotropic, that if their spatial characteristics associated with the point R of the field are constant for each point R, and their spatial characteristics associated with a pair of point P, and Pj are dependent only upon the distance. dij = d( Pi, Pi) between these points. We will have, for homogeneous and isotropic fields, the following relations:

%[Pi,Pj] =“*[dl,], G[Piy Pj] =rx[dij]~ Yx[Pi>Pj] =Yx[dij]t gx[P,,P,]=gx[dij]. In the case of temporal structure, the notions of homogeneity and isotropy replaced by the notions of stationarity imposing the following relations:

are

r,j = ti - t,,

dx(t>l = mJx1,

o[x(t)] =+I,

In applications, the patterns of the points corresponding to the values of these functions for given dij or rij are adjusted by means of adequate theoretical functions depending on one or several parameters.

5. About the notion of informativity of meteorological

observation network

Let us consider an adequate network of observation stations for the study of some meteorological element. Let r, be the station number “i” of the network, and (rj)y=i, the network of N stations, &(t) the value of anomalies of such meteorological element observed at the station ri at the time t. The field of anomalies of this meteorological field defined by the N values &(t) observed in these N stations at the time t will be described by the N dimensional vector:

Let us choose a stochastic model of the natural mechanism generating the observations according to which the anomalies observed at the time t are the results of a random sampling realized in a statistical population defined by some probabilistic measure P. Then, the statistical structure of the system of random variables E,(t) is defined by the joint distribution function E;,,,,...,,,,,[X(I),...,

X(T)1= m(1) < x(1),. . ->E(T)< NOI 9

G. der Megreditchian / Meteorological networks optimization

62

where x(t>

= {xr(T),**.,

xi(T),**.,

x,}’

Let us study in particular the correlational redundancy of the network. The statistical interdependences between the random variables E,(t) and tj( t) are characterized by the covariance coefficients: ulj(t,

T)=E[Ei(t)‘Ej(7)]7

or for the normalized

variables “&,

by the correlation

coefficients:

4

Our knowledge of the spatio-temporal interdependences variables t,(t) is characterized by the matrices: v(t,

r)=

( ufj(tT

'))Y

R(t,

T)=(rij(13

between

random

T)).

Now let us impose two important restrictions. First, we will consider a gaussian model and next a density stationary model. Otherwise, the probability density function of the vector of anomalies can be expressed by the following formula:

f&f>=

l (24’*JV11’*

-

fx’v-

‘x

e

.

where V designates the spatial covariance matrix of the field and E( 5) = 0, as we consider the vector of anomalies. Now let us introduce a very important notion of “equivalence” (in “anomalicity”) between two observation networks ( I’i)yZ”=,and ( yj)rZI measuring the values of the same meteorological field. For this purpose, we are going to use an ipportant characteristic of observed field, the “anomalicity” coefficients K and K:

which describe the field variability with respect to its mean behaviour. We will point out that the coefficients K are associated with the network I’, or y, generating the observations x(t) by introducing the following notations: K(G),

$r,,>

K(Y,),

So we can now define the notion networks ( ri);“= 1 and ( y,)r= 1.

&). of equivalence

Definition I. We say that the networks anomalicity” if the distribution functions

in “anomalicity”

between

(r))j”=, and ( yj)~=l are “equivalent coefficients of their “anomalicity”

two

in are

G. der Megreditchian

/ Meteorological

networks optimization

63

equal:

F(x) = F(x). K(h)

KCY,, )

In other words, we will say that two observation networks are “equivalent” if the statistical characterizations of their variability with respect to their mean behaviour are stochastically identical. We could say that such definition of “equivalence” is given in the “strong” sense. It is naturally too restrictive and it’s necessary to give a less restrictive definition that we will call an “equivalence” in the “weak” sense.

Definition 2.

We will say that the networks (c) ;“=1 and ( y,)T= I are “equivalent” up to the order “x ” in the “weak” sense if for the first “s” moments mj or cumulants sj of random variables K(r’) and K( y,) we have the equality (j=l,

s) mj[K(T,)] Vj[K(GT)]

=mj[K(Yn)],

j=l,

= ~j[K(YJ]

9

S

or

j = 1, s-

The theory of the distribution of quadratic forms from normal random variables allow us to get the characteristic function of anomalicity coefficients and therefore the explicit form of their cumulants:

It is not possible to compute directly the corresponding function by direct application of the inversion formula, approximate formula (when N is odd): &(x)-l-

N

N/2

c 7-1

n s=l Sf?

X2r-1

x b-1

+

A,,

N,

+x2, -

x2s-1

density probability but we can get an

. e-

A+,

+A~, ,

h2,

where Xj designates the eigenvalues of the matrix I’. Then, we can define the return period T(X) of an anomalicity of given intensity: T(X) = l/(1 - FK( x)). We can now use the introduced notion of the “equivalence” of two networks to get a new definition; the concept of the number of statistically independent observations nequ( N) “e q uivalent” in the defined sense to the N correlated observations. Here, in fact, it is a matter of some fictitious independent observations; the most important fact is that this number allows us to get an objective measure of the quality of the network from which the observations are obtained. Such a quality concerns the location (or the choice of the time sampling) of corresponding stations where the observations are made. The given definition allows to characterize the network correlational redundancy. For that we suppose that the network (y,)r=i is characterized by the covariance matrix V = a21,, where I, is the unit matrix of order n.

G. der Megreditchian / Meteorological networks optimization

64

In such a way, we get a compact formula obtained from the definition equivalence of order 2 in the “weak” sense:

of the

[tr V]’

n =-P=----

trV2



where tr (V)*signifies the trace of the matrix V. When the normalized anomalicity coefficient K is used for the definition of network’s equivalence we obtain the following formula: [tr RI2 n* =----P trR2 ’ We can easily verify that in trivial cases such a definition is reasonable. Thus, we have: if

R=I,,

if

R=U,,,

Furthermore,

then

nequ=N,

then

n_=l,

UNN= (...i

...) .

if the last eigenvalues of the matrix V are 0, that is the case when components of the vector X,

“k ” linear relations exist between the corresponding we have the relation: nequ -CN - k.

It is interesting to point out that the notion of equivalence of order 3 will lead to the similar formula:

n

(tr V’)’ equ= ctr q2

We will use the introduced notion of equivalent number of independent observations to resolve problems, related to various aspects of the network optimization. Here are some examples. Fig. 1 shows the temporal autocorrelation function of daily minimum temperature for the french station Bordeaux.

r(At, At

Fig.

1. Temporal

0

autocorrelation

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

function

of daily 1949-1976.

minimum

1

temperature.

Station

Bordeaux,

n

=P

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

1

At +l

1.0000 1.0000 0.6219 0.6194 0.4427 0.4196 0.3520 0.3115 0.2813 0.2420 0.2252 0.1972 0.1773 0.1527 0.1438 0.1206 0.1199 0.1075 0.1002 0.933 0.0796 0.0762 0.0650 0.0590 0.0476 0.0135 0.0363 0.0382 0.0157 0.0293 0.0182 0.0158 0.0134 0.0025 0.0178 0.0008 0.0182 0.0013 0.0190 0.0073 0.0218 0.0186 0.0310 0.0239 0.0323 0.0262 0.0310 0.0288 0.0210 0.0192 0.0193 0.0283 0.0183 0.0283 0.0193 0.0252 0.0209 0.0199 0.0178 0.0059 0.0173 0.0060

7.1067 6.5775 6.8519

1.0000 0.6123 0.3966 0.2905 0.2176 0.1555 0.1176 0.0957 0.0818 0.0650 0.0612 0.0526 0.0286 0.0305 0.0213 0.0161 0.0173 0.0443 0.0262 0.0197 0.0211 0.0255 0.0211 0.0115 0.0090 0.0019 0.0003 -0.0018 -0.0034 -0.0015 -0.0044 1.0000 0.6523 0.4617 0.3561 0.2764 0.2267 0.1872 0.1582 0.1329 0.1105 0.0940 0.0677 0.0504 0.0357 0.0210 0.0286 0.0189 0.0294 0.0390 0.0370 0.0372 0.0131 0.0506 0.0520 0.0186 0.0136 0.0462 0.0363 0.0285 0.0256 0.0256 6.7477 6.4372

1.0000 1.0000 1.0000 0.5853 0.6365 0.6799 0.3937 0.4326 0.4557 0.3072 0.3221 0.3212 0.2386 0.2171 0.2500 0.1882 0.1961 0.1995 0.1505 0.1502 0.1571 0.1322 0.1174 0.1340 0.1156 0.1039 0.1260 0.1070 0.899 0.1206 0.0901 0.0707 0.1007 0.0732 0.0515 0.0822 0.0722 0.0124 0.0673 0.0680 0.0359 0.0585 0.0494 0.0235 0.0544 0.0501 0.0132 0.0578 0.0174 0.0053 0.0554 0.0195 0.0033 0.0538 0.0565 0.0027 0.0565 0.0567 0.0078 0.0464 0.0512 0.0161 0.0150 0.0526 0.0153 0.0488 0.0176 0.0224 0.0151 0.0151 0.0217 0.0316 0.0105 0.0232 0.0361 0.0397 0.0261 0.0322 0.0170 0.0203 0.0101 0.0500 0.0136 0.0372 0.0112 0.0080 0.0266 0.0110 -0.0019 0.0136 0.0110 -0.0019 0.0136

7.3651 6.3980 6.9683

1.0000 0.5818 0.3704 0.2662 0.2018 0.1500 0.1155 0.0874 0.0815 0.0669 0.0576 0.0375 0.0180 0.0222 0.0177 0.0055 0.0034 0.0034 0.0010 0.0075 0.0123 0.0145 0.0230 0.0208 0.0225 0.0175 0.0094 0.0095 0.0113 0.0065 0.0066

1.0000 0.7233 0.5145 0.3817 0.3057 0.2526 0.2144 0.1882 0.1648 0.1475 0.1334 0.1149 0.0895 0.0742 0.0676 0.0604 0.0627 0.0559 0.0573 0.0576 0.0610 0.0551 0.0511 0.0445 0.0449 0.0419 0.0405 0.0394 0.0333 0.0333 0.0334

1.0000 0.7267 0.5229 0.3915 0.3133 0.2488 0.1973 0.1581 0.1282 0.1105 0.0940 0.0757 0.0681 0.0565 0.0426 0.0350 0.0281 0.0299 0.0328 0.0310 0.0350 0.0379 0.0392 0.0345 0.0246 0.0208 0.0209 0.0228 0.0187 0.0212 0.0212

1.0000 0.7065 0.5137 0.3915 0.3233 0.2699 0.2195 0.1873 0.1665 0.1455 0.1192 0.1016 0.0818 0.0763 0.0613 0.0627 0.0585 0.0539 0.0601 0.0562 0.0586 0.0534 0.0447 0.0374 0.0312 0.0351 0.0387 0.0380 0.0342 0.0285 0.0285

1.0000 0.7219 0.5134 0.3788 0.2966 0.2402 0.1980 0.1686 0.1457 0.1307 0.1201 0.1039 0.0804 0.0675 0.0584 0.0538 0.0561 0.0510 0.0488 0.0508 0.0538 0.0507 0.0449 0.0371 0.0120 0.0386 0.0353 0.0316 0.0219 0.0225 0.0225 5.8125 5.8286 5.8222 5.9104

1.0000 0.7150 0.5114 0.3818 0.3046 0.2196 0.2092 0.1837 0.1644 0.1541 0.1429 0.1191 0.1001 0.0903 0.0783 0.0709 0.0733 0.0700 0.0732 0.0708 0.0769 0.0684 0.0557 0.0561 0.0572 0.0580 0.0568 0.0554 0.0512 0.0425 0.0425

Le Rennes Stras- Marig- Orly Bourget bourg nane

5.8351 5.8125

1.0000 0.7264 0.5165 0.3986 0.3158 0.2489 0.2012 0.1679 0.1415 0.1289 0.1139 0.0919 0.0776 0.0695 0.0587 0.0166 0.0499 0.0582 0.0617 0.0596 0.0159 0.0366 0.0324 0.0276 0.0266 0.0249 0.0243 0.0233 0.0200 0.0124 0.0125

Bordeaux Lyon

Bordeaux Lyon

Le Rennes Stras- Marig- Orly Bourget bourg nane

Temperatures maxi

Temperatures mini

Fig.2 Temporalautocorrelation coefficients of daily extremetemperature for7 Frenchstations. Pressure networkoptimization.

66

G. der Megreditchian / MeteoroLogical networks optimization

In Fig. 2 we give the values of the temporal autocorrelation function for daily minimum and maximum temperatures for a time lag from 0 to 30 days, and the corresponding “equivalent” number of independent observations (the last row of the table).

6. Rational location of the stations based on the minimization correlational redundancy

of the network

Two problems arise in accordance with the fact that we want to eliminate the most redundant stations or to add to the already existing network the station which is the most interesting for the global knowledge of the meteorological field. It is possible to solve the first problem by means of a backward stepwise procedure, based on the minimization of the network correlational redundancy. For that we use the empirical correlation matrix 2 calculated from the observed sample. At the first step, we search for the less interesting station in the following way: We compute for each subset of N - 1 stations the “equivalent number of independent stations” n,,, (i), where “i ” is the number of the station we eliminate from the initial set to obtain the subset of N - 1 stations. We search for the value “il” which satisfy the following condition: i, =

arg max neq,(i). i

The station number i, is realy the less interesting due to the fact that the loss of information is minimized when it is eliminated. At the second (and further) step, we proceed in an analogous way with the stations remaining after the elimination of the station TV,,and we restart this procedure of preferential ranking. Thus, we get a preferential ranking of the N stations in the order of increasing interest: i,, i,, . . . , i,, . . . , i,. It is also possible to apply a backward stepwise procedure based on the successive choice of the most interesting station associated with the minimization of the equivalent number: jr = arg min n,,,(j). i In Fig. 3 we see an example of the utilization of such procedure for the “optimal” ranking of 48 French stations, describing the field of atmospheric ground pressure realized by means of a backward stepwise selection. The result seems to be logical enough. The 12 best stations are localized on the outskirts of France and in the center. We found more stations in the plains. The 12 following stations fill the empty regions and the last 12 stations are situated in the flat regions. The second problem of the “optimal” location of a new station (51, [7, [S]) requires the knowledge of the statistical structure of the corresponding meteoro-

G. der Megreditchian

/ Meteorological

67

networks optimization

cl from

0 A

0 Fig. 3. Pressure network optimization. Geographical representation stations.

from

It0

12

13to 23

from 25to 36

from

37 to 43.

of preferential ranking of 48

logical field. Given the hypothesis of an homogeneous and isotropic field, the network optimization depends mainly of its spatial density, characterized by the matrix of interdistances between observation stations:

D=

(d;j)>

djj=d[r:,

q.

In this case, the spatial autocorrelation function of the field depends only of these interdistances r[&, tj] = $(d,,). The most frequently used in meteorology forms of such autocorrelation function are given by the following formulas: 4(d) = eead, #(d) = [l + ad] - eCad, ~(d)=(Ud)2'3K*,3(Ud),

$(d)=e-"dI~(bd),

+b(d)=e-a2d2,

$(d)=UdK(Ud),

l&3)=u3/[u2+d2]2'3.

Thus, the “equivalent” number of independent observations ing form:

takes the follow-

which shows clearly the explicit form of its dependence upon the network spatial density. For the optimal location of a new station, we consider a fictitious station yN+i at the point defined by the coordinates (x, y) and by the distances d(yi, Y~+~] =

G. der Megreditchian / Meteorological networks optimization

68

c$(x, Y) from the other stations. The quality of this new system of N + 1 stations is then determined by an “equivalent” number of independent stations which becomes now a function of the coordinates x and Y: kq”(%

Y) =

(N+l)2

N C

#2(dij)+2C#2[6i(X7

i,j=l

Y>]

+I’

i=l

Thus, the optimal location of a new station leads to the search for a global extremum of this function:

for all the points (x, y) belonging to the domain D. In such a way, we search the point (x0, yO) which verifies the relation: nequ[xO, y,] = max,, yEDn,,,[x, y]. Various numerical algorithms are available for obtaining the extremum, for example the gradient or conjugate gradient methods. In such cases, the most delicate problem is the danger of finding a local extremum instead of a global one for the given domain D.

7. Network optimization based on the maximization of multiple correlation coefficient between the observed value at one point and the value at the same point reconstituted by means of the set of existing stations It seems natural to consider that for the preferential ranking of the stations, the less informative station is the one which may be in the best way explicated by the other stations. In the frame of a multiple linear forecast, the corresponding quality criterion may be the multiple correlation coefficient between the really observed value xi at this station and the value Zi forecasted by means of the set of the N - 1 other stations. We will assume for this multiple correlation coefficient the notation

Thus,

the aim of the optimization procedure is to search for an index i, such as

ii = arg max rn2_1[xi, i

gi] .

This maximum is easy to find by means of the inversion of the spatial correlation matrix R xx. In fact the following lemma define the fundamental property of the inverse correlation matrix. Lemma Let R,,

= (rij), R&=

(rij),

Then, we have the following result: rii=

l-

r2_l[xit

SZ,(X[i])]



X[i] = (x1 ,...,

xi-l,

Xi+l ,..., x,)

G. der Megreditchian / Meteorological networks optimization

69

Then, the iterative procedure of preferential ranking of the existing set of stations may be formulated as follows: first, we compute Rxx, then the inverse value of the diagonal matrix R&. After that, we search for the maximum elements. Let the index associated with this maximum be i,, so the station i, will be the most redundant one, i.e. the best explicated by the others. So, we eliminate in the matrix R the row and the column number i,; thus, we get a new matrix R(1) of N - 1 order. After that, we repeat successively the described procedure and we obtain a set of ranks i,, i,, i,, . . . , i,, which define the preferential ranking of the stations in the increasing order of importance. When the correlation matrix is bad conditioned, the inversion procedure is replaced by a pseudoinversion.

8. Network optimization by means of canonical correlation The canonical correlation coefficient p between two sets of variables x1,. . . , x, problem. For that, let us and yr,..., y, can be used for our optimization introduce the notation P=

PIX1,...,~n/Y1,...,

X(i,,..., Z(i,,...

Y,],

iK)=(Xi,,...,qk)y ,iK)=(xl

,...,

xi ,...,

xN)-X(il

,...,

iK).

We apply a stepwise procedure, which at each iteration, consists to form two groups of stations: X( i,, . . . , ik) the group of K stations to be eliminated, as the best explained by the others, and the group of N - K remaining stations Z(i 1,“‘, ik) = { xj,, . . . , xi,>, . . . , xjNmK}, where j, # i,. Iteration I: We perform the operations: 1. p[il 2.

i=l,

= p[ X(i)/Z(i>l,

N.

m~yw[il=pJi,].

The station number i, is the one which is the best explained by the N - 1 other stations, so it is the less interesting station for the whole set and the first one to be eliminated, as the most redundant. Iteration “K + I”: After the iteration “K “, we get two groups of stations, the K stations to be eliminated X[i,, . . . , ik] and the N - K remaining stations j,_,]. Thus, we perform the following operations: Z[j,, . . * > 3. pK+l[i] 4.

=p[X(i,

,...,

i,,

i)/Z(i,

,..,,

i,,

i)].

maxPK+Ibl = PK+l[iK+Il. i

Carrying on the iterations from 1 to N - 1, we obtain a preferential ranking of the set of stations in the increasing order. The basic information is given by the correlation matrix R,, = (N x N) for the whole system of stations. The iterative algorithm implies the use of canonical

70

G. der Megreditchian

/ Meteorological

networks optimization

correlation programm, which computes the canonical correlation coefficient p( E/R) between the group E of the eliminated stations and the group R of the remaining stations.

9. Network optimization by minimization

of the optimal interpolation error

One of the important aims of networks implemantation for gaining knowledge about some meteorological field is the necessity to perform an objective analysis of this field, i.e., to interpolate its values at the points of a regular grid by means of the values observed at the stations. For the realization of such objective analysis, the classical variant of Gandin gives an explicit formula for the interpolation error in each point of the field. Then it is possible to draw the curves of isovalues of the optimal interpolation error. It seems logical to use these extreme values for the rational location of new stations as has been done by Machkovic [5] for example.

10. The principle of network optimization Kriging

by means of the method of universal

The basic ideas of meteorologists about the objective analysis of a field have been at the origine of the development of the so-called method of Kriging elaborated by Matheron [12]. The methodology of Kriging solves the problem of objective analysis under less restrictive conditions, which means in a more realistic frame, nearer to real problems [12,13]. Consequently, it can be applied in the same way as optimal interpolation to the problem of network optimization.

11. Optimization of the network mesh We are going to consider this problem via an example of application concerning the choice of the optimum interval (spatial or temporal) between the observations. Let us consider an autocorrelation function of the following form ‘k(d) = eead, and d, is the correlation radius. Let us suppose that it where (Y= l/d,, characterizes the spatial correlation on a linear domain of length L, with unit step equal to 1. We have then N = (L/l) + 1; r = e-1/do. The formula of the “equivalent” number becomes n

Iv* [l - r’]’ eq”= N[l - r4] - 2?[1-

r2N]

and when 1 + 0, we obtain lim, ~ ,,nequ= 2 No2/(2& - (1 - e-2No), where N, = L/dcP

G. der Megreditchian

/ Meteorological networks optimization

71

This limit value corresponds to a finite number, which indicates properly that the amount of the information furnished by the network does not increase indefinitely with the multiplication of the measuring points. Consequently, it allows us to solve the problem of network mesh optimization. For this purpose, we can study the behaviour of the “equivalent number of stations” corresponding to the network mesh. The mesh 1, the dimension L of the domain, and the number of stations N are related as follows 1 = L/( N - 1). by expressing the network Now we can study the behaviour of n _(N) “informativity” when N increased, that is when 1 decreases. The shape of the curve is characteristical for the correlation functions that we have studied. With the reduction of the mesh of the network, its “informativity” expressed by n,,,(N) first increases, and then becomes stationary; further reduction of the mesh brings no more gain in “informativity”. The optimal mesh lopt can be defined as some reasonable value of 1 for which the maximum value of informativity has almost been reached. It can be done more precisely by introducing lopt (1 - E), the value of the mesh, for which we have the relation:

Through analogy with the classical thresholds of mathematical statistics, we could use the mesh “optimal” at the level 95% or 99%: lopt (0,95); l,,rt (0,99). Such a study has been performed for different autocorrelation functions of the field. Here are four examples. In Fig. 4, the correlation function is exponential: The “optimal” mesh could be defined through a reasonable compromise between two contradictory tendancies: the increase of the number N of stations gives a positive effect (increase of informativity), but also a negative effect (increase of the cost of network upholding). Here the cost of the network upholding is a function of the whole number N of stations C = C(N), and the losses P due to the network unsufficiency are a decreasing function of n = n.,,(N): P = P( n.,,). In Figs. 5 and 6 we have taken the autocorrelation function of the sea-level pressure field; in the case of validity of the hypothesis of homogeneity, spatial isotropy and temporal stationarity of this pressure field its spatio-temporal autocorrelation function has the form: y(d,

T) = (1 + A). e-‘,

where A2 = ( d/1OO5)2 + (~/30)~ and d in kms and r in hours. We have separately studied the case of spatial correlation (Fig. 5) by letting T=Oand D=9000kms. In Fig. 6 we have separately studied the case of temporal correlation by letting d = 0 and T= 700 hours:

r[T] =

[

l'$

1 -+ -e

30

72

G. der Megreditchian / Meteorological networks optimization

%c 1250N 5

555

10

263

20

172

30

128

40

102

50

64

60

72

70

63

60

56

90

50100

Fig. 4. Network

(r[ d] = eCdjdO; d, = 500 km; nq,[0.95] 10.48 (d in km).)

mesh optimization.

= 10.06; n,,,[0.99]

Maille

(km) 2250 1000

10.

473

20

310

230

30.

'0.'

163

5.?.

152

EC

130

70.

113

EO.

101

90.

3olco~

Fig. 5. Network

mesh optimization.

(r[ d] = [l+ (d/1005)].

e-d/1005.)

=

G. der Megreditchian Maills (,,)

,., '

175

2.5

4

5.5

7

a.5 I

5

36

20.

24

30.

17

40.

14

50.

11

60.

10

70.

8

60.

7

90.

7100{

Fig. 6. Optimization

In Fig. 7 we have studied

r[d] =

73

/ Meteorological networks optimization

1

of temporal

interval.

the well-known

(r[ 71 = [l +(-r/30)].

spatial

e-T’3o.)

autocorrelation

function:

sin (0.0015/d)

e-d,4000

0.0015/d

where d is in kms. In Fig. 8 we show the values of the “equivalent number” for the radiance field measured by satellite on a domain of the South West part of the South Atlantic. The difference between the correlational structure in two directions appears in the distinct behaviour of the “equivalent number” reaching more quickly the state of saturation for the horizontal direction. That is why the optimal grid could be rectangular; the choice of the “optimal” mesh lopt (1 - E) thus being defined starting from the same principle: for example, we can take the value 6 = 0,05 for both directions. In this example the signal-noise ratio was 0,/u, = 0.1. For all correlation functions, the shape is characteristic: a fast increase, followed by a stagnation corresponding to the predominance of the redundancy over the informational contribution.

74

G. der Megreditchian

285

/ Meteorologicat networks optimization

15

210

20

i66

25 I_

137

30

117

3E ,_

102

4::I_

90

45 I-

81

5c I_

I_

L

Fig. 7. Study of the network’s optimal mesh. (r[d] = (sin 0.0015/d/0.0015/d)e-d~4000.)

Maille (lo-km, 12.714 7,417

N

0

1.6

6.4

5 lo-

4,045

20-

2.781

30-

2.;19

JO-

1 1.2 -_

-.

-s

..

.

16

\

n

cq"

\ \ \ \ \

1 ‘a

\

1 'I I I

1.712

1.435

1.236

Fig. 8. Equivalent

for the radiances field. Curve (- ) horizontal direction. direction. Signal-noise ratio: a = a,/~, = 0.1.

Curve (.)

vertical

G. der Megreditchian

/ Meteorological

networks optimization

1.5

Conclusion The design of a meteorological observation network is a rather complicated problem, the solution of which is related to the choice of an adequate optimization criterion defined for the purpose of the design of such a network. The general methodology is based on a very simple idea to use the knowledge of the statistical structure of the concerned meteorological fields to improve the network “ informativity”. The solutions that we proposed are not unique and may be different in function of the adpoted criterion or strategy. Anyway they must considered as reasonable recommendations and of course not as imperative directives.

References [l] Alaka, Theoretical and pratical considerations for network design, Meteorological Monography 11, 33 (1970). [2] Der Megreditchian, Sur la definition du nombre de stations independantes Cquivalentes a un systeme don& de stations corrClCes (en russe), M&Porologie et Hydrologie 2 (Moscou, 1969). [3] Dotenko, Khoudiakov, Transformation optimale de l’information meteorologique (en russe), MPttorologie et Hydrologie 6 (1977). [4] Gandin, Sur les principes de disposition rationnelle du reseau de stations meteorologiques (en russe), Travaux du GGO ZZZ (1961). [5] BClooussov, Gandin, Machkovitch, Traitement de l’information meteorologique operationnelle par ordinateur (Moscou, 1968) (en russe). [6] Gandin, Kagan, Sur une approche Cconomique de la planification du reseau de stations mtteorologiques, Travaux du GGO 203 (1967) (en russe). [7] Hutchinson, Redesign of the Zambian Raingauge Network, Geoforum 20/74. [8] Sirotenko, Disposition rationnelle des stations meteorologiques comme un probleme de recherche operationnelle, M&o et Hydrol. 3 (1972). [9] Sneyers, Sur la densite optimale des reseaux meteorologiques, Archiu fiir Meteorology Geophysik und Bioklimatologie, SCrie B 21, 1 (1973). [lo] Tiercelin, RCseaux pluviometriques, Propositions pour une politique de rationnalisation. [ll] Vile&in, Sur l’estimation de la moyenne des processus stationnaires. ThPorie des probabilitPs et ses applications 4 (1959). [12] Matheron, La ThCorie des variables regionalisees et ses applications (ENSMP, 1970). [13] Delhomme, Delfiner, Application du krigeage a l’optimisation d’une campagne pluviometrique en zone aride (UNESCO ATHIS-AMM-Madrid, 1973). [14] Morin, Fortin, Sochanska, Lardeau, Charbonneau, Uses of Principal Component Analysis to identity homogenous precipitations stations for optimal interpolation. Rapport INRS-Eeau (Universite du Quebec, 1979). [15] Der Megreditchian, Traitement statistique des donnees multidimensionnelles, Tome II, ENM (Toulouse, 1980).