On the magnitude entropy of earthquakes

On the magnitude entropy of earthquakes

Tecfonophysics, 115 138 (1987) 115-119 Elsevier Science Publishers B.V., Amsterdam - Printed in The Netherlands On the magnitude entropy of earthq...

468KB Sizes 6 Downloads 68 Views

Tecfonophysics,

115

138 (1987) 115-119

Elsevier Science Publishers B.V., Amsterdam - Printed in The Netherlands

On the magnitude entropy of earthquakes L.

MANSINHA

and P.Y. SHEN

~epa~~e~~ of ~eop~ys~~~,U~~vers~~of Western Ontario,London. Unt. N6A 537 (Canada) (Received August 10,19X5;

accepted October 181985)

Abstract Mansinba, L. and Shen, P.Y., 1987. On the magnitude entropy of earthquakes. In: K. Magi and K.N. Khattri (Editors), Earthquake Prediction. Tecronophysics,

138: 115-119.

The magnitude probability expresses the uncertainty regarding the size of a yet to happen earthquake. The magnitude entropy is a function of the magnitude probability and is a measure of information content of the discrete or continuous probability distribution. Of all the possible probability ~st~butions,

the one that maximises the entropy is

the solution of choice. However, to obtain useful solutions, available information in the form of equations of constraint, must be imposed. Moreover, because the principle is a heuristic one, the solution should be verified. For earthquake data sets the source of information can be the worldwide empirical frequency-magnitude

In~tion Consider an earthquake as the outcome of a random selection process. There are four important uncertainties associated with the occurrence of an earthquake: location, time, size and surface intensity. The four parameters are not independent, but for the purposes of this note it will be assumed that they can be studied in isolation. Assume that somewhere, at some time, an earthquake takes place. This, then, is the “experiment” or the “trial”. The outcome of this process is the “size” of the earthquake, and the magnitude is a measure of the size. We denote the probability by pi (discrete) and the probability density by p(m) (continuous), where m is the magnitude. For this discussion it is not necessary to dist~g~sh between surface wave magnitude and body wave magnitude. There have been many attempts in the past to define a quantitative measure of knowledge (or ignorance). For a discrete variable, Shannon (1948) defined: H=

-cpi

In pi

oa40-1951/87/$03,50

(1) 0 1987 EIsevier Science Publishers B.V.

relation.

as the “info~ation~ entropy” or entropy. It is an empirical function, chosen because it satisfies all the requirements to be a measure of information. However, it is not necessarily unique. There may be other functions

that satisfy the same condi-

tions. The entropy H is zero if there is no uncertainty associated with the outcome of a trial. If we know the outcome in advance, the performance of the experiment adds no new information. It can be shown that H is a maximum if all the pi are equal. The reader is referred to Goldman (1953), Jaynes (1957a, b), and Tribus (1969) for extended discussions. In general the applications of the concept of informational entropy involve the inverse problem. Consider the case in which not only is the outcome of a specific trial uncertain, but also all or some of the probabilities

pi

are unknown.

Several sets of pi may be acceptable as appropriate to the specific experiment. To make a choice among the sets pi one uses the function H. Of all the sets pi, that which makes H a maximum, conveys the most information and is also the most probable.

116

This principle has been used quite extensively in spectral analysis (for a review see Kanasewich, 1981), and also in gravity modelhng of the earth (Reitsch, 1977; Rub&am, 1979) and seismic magnitude statistics (Purcaru, 1973; Berrill and Davis, 1980). Although the magnitude m is a continuous variable, the limited resolution in measuring m provides for a natural discretisation of the variable. If the measurement error is iArni at a magnitude mi, then we may consider the magnitude of any earthquake in the range m, f 4Am, indistinguishable from m,. All earthquakes in this range will be assigned the magnitude m,. With this discretisation, definition (1) of the entropy can be used. However, in common with many other workers, Berrill and Davis (1980) used a definition of entropy H for a continuous variable due to Shannon (1948): H=

-

J

p(m)

In p(m) dm

(2)

This definition of the entropy is coordinate dependent. The invariant expression is (laynes, 1957a, b; Reitsch, 1977; Shen and Man&ha, 1983): H=

-

where q(m) is the “prior probability”. The presence of q(m) may be shown to be a consequence of the fact that the me~~ement error can be a function of the variable m. Only when the error is constant over the entire range of the variable, can one consider q(m) a constant, as in (2). The purpose of this note is to examine the concept of info~ation~ entropy and the applicability to seismic magnitude studies. We will use the terms magnitude probability and magnitude entropy to refer to the specific quantities. The magnitude probability The earthquake generation process is too complex for an “a priori ” or mathematical probability to be calculated. The relative frequency values, based on accrued seismic data, are used as estimates of the probab~ty ~s~bu~on. As discussed earlier, since measurement errors exist, the magnitude m can be considered to assume a finite set of discrete values, mi (i = 1, . . . Z). The m,‘s are not

necessarily spaced equally, depending on the distribution of measurement error iAm). When we extend this definition to a continuous variable, this leads to the introduction of the prior probability. Let n, be the number of earthquakes with magnitude m, during time period T. Then the relative frequency f, is given by:

where N is the total number of earthquakes occurring within the same period. For now we can approximate the probabilities pi with: P, =L

(5)

The Gutenberg-Richter relation provides a good approximation to the set f;, given by (Richter, 1958): f;=alO-b”

(6)

where a, b are constants. It is implicit in the relative frequency definition of probability, (5), that both ni and N are large. For the large magnitude earthquakes the rarity of the events ensures that nr is smail. Hence the appro~mation (5) breaks down, and fi should be considered simply as indicators of past activity but not as good estimates of the magnitude probability for large magnitudes. For prediction purposes Rikitake (1969, 1976) uses the f, as the “preliminary probab~ties”. He then uses the equation of Bayes to modify the preliminary probabilities, after taking into account information from observed premonitory phenomena. The difficulties of defining the probability of occurrence from the relative frequency in seismology is in common with other sub-disciplines of geophysics (Court, 1952). The

constraints

An obvious and necessary constraint particular set of pi is: CPi=l

on any (7)

Other constraints may also be applied. For example, it has been customary to impose the existence of a mean or higher moments on the probability distribution. Of all the possible sets that satisfy

117

the given constraints, the set which makes H a maximum has the greatest probability. If our knowledge of the actual set pi is incomplete, then we may be able to use this criterion of maximising the entropy to make a choice of one particular set. Several simple cases of imposed constraints are given in the literature (Goldman, 1953; Tribus, 1969). Purcaru (1973) and Bert-ill and Davis (1980) give examples specific to seismology. Since all the authors have used the continuous probability density function, rather than the discrete function used in (1) above, and the extension of definition (1) to a continuous variable is not trivial, it is worthwhile discussing the definition of entropy for continuous probability distributions. The entropy of continuous distributions The definition of entropy for a continuous probability distribution is somewhat more complicated than for a discrete distribution. The intuitive approach is to extend definition (1) and write pi(mi) as p(mi)Ami, where p(m), without a subscript on p, is the probability density. The entropy then becomes:

H=

- Cp(mi)Ami

ln(p(mi)Ami)

(8)

and the constraint (7) is: zp(m,)Am,

(84

= 1

The customary procedure is to let Am, go to zero and replace the summation by integration. But this gives rise to two problems. The first is the presence of a term that approaches infinity in the limit. The second problem arises because the Ami’s are unequal and do not approach zero uniformly. This necessitates the introduction of a “prior probability density”. Recall that Ami may be a function of m. We can write: Am, = IC(m

(9)

where 6m is a constant. Substituting (9) in (8) we have: H=

- ~p(mi)k(mi)Gm

ln{ p(mi)k(mi)8m} (10)

which can be rewritten as: H=

- Cp(mi) -ln(sm)

ln{ P(mi)k(mi)}k(mi)Sm (11)

As 6m approaches zero, (11) becomes: H=

-jp(m)

lnzdm-

sfmoln 6m

(12)

where q(m) = l/k(m). The second term on the right goes to cc in the limit. Since one is usually interested in the change of entropy, this term is usually dropped arbitrarily, and we obtain (3). Defined in this form, the entropy is invariant under coordinate transformations. Comparing (2) with (3), we see that the former is valid for those cases in which q(m) is a constant. Application of the maximum entropy principle

The maximum entropy principle is applied to select from among various sets of pi or p(m) that satisfy the constraints, one that contains the maximum amount of information. This is done by maximising H. As pointed out earlier, the correct expressions for H are (1) for a discrete variable, and (3) for a continuous one. However, the prior probability q(m) for a continuous variable is usually unknown. Therefore it is customary to use definition (2), which implies a constant prior probability. Consequently, the solution obtained must be tested against available data for the validity of the assumption. With this in mind, we use (2) in the following discussion. If the constraints are: /

p(m)dm=l

(13)

and : /

g,(m)p(m) dm = t?,

(14)

where g,(m) is a given function of m and gj is the expectation, then the method of Lagrange multipliers gives the solution as: P(m)=exp(-X,-h,g,(m)-X,g,(m)-...) (15) where the Lagrange multipliers hj are to be determined. The equations giving the constraints on the solution are an indication of the state of our knowledge. The larger the number of constraints, the greater our knowledge about the distribution p(m). The fewer the number of constraints, the greater is our ignorance.

118

on p(m)

Case I: No information From

(15) the solution

p(m) This

is known.

is:

= exp(-A,) agrees

06)

with our intuitive

idea

that

any information

on the variable

reason

that p(m) is not uniform.

to suspect

Case ZZ:

g,(m)

A mean

is assumed

without

m, we have no to exist,

with

(17)

to the Gutenberg-Richter

relation

(‘3). Case ZZZ: Additional constraints are known. With additional constraints, the determined solution becomes more complicated. The agreement in Case II with the GutenbergRichter relationship has been considered

needs some clarification. It as an independent confirma-

tion of the empirical

relation.

tion (2) of the entropy,

However,

the defini-

which assumes

a uniform

prior probability, has been used in the derivation. Therefore the agreement should be construed simply as a confirmation of the validity of the assumption. We note that the Gutenberg-Richter relation overestimates the number of earthquakes at the high magnitudes (Shlien and Toksiiz, 1970; Bloom and Erdmann, 1980). Thus solution (17) does not really agree with the empirical distribution in the high magnitude range, implying that the existence of a mean by itself is not adequate. Over the mid- and low-magnitude range the measurement error of m is uniform, but it changes the high magnitude

range.

remarked

exponential

range.

earlier

that

the empirical

distribution

entropy

This

makes

principle

points

model

determined

global

gravity

field

is poor and

from

the

in the high magnitude

to the

density

there

distribution

obtained fact

have been imposed.

is the lateral

However,

distribution

redundant.

between

constraints

p(m)=exp(-A,-h,m)

towards

We have agreement the

an upper bound.

of the exponential

this requirement

maximum

= m. We have

This is similar

mean does not require the nature

that

insufficient

Another

distribution

by Rubincam

example

for the earth (1982) with the

as a constraint.

Rubincam’s

solution does not agree with that of Dziewonski al. (1977) obtained from seismic data. This because

the former

information

did not include

et is

the seismic

as a constraint.

Acknowledgement The Natural Sciences search Council of Canada

and Engineering Resupported this project

through the award of an operating grant to LM. A separate operating grant awarded to A.E. Beck supported

PYS.

References Berrill, J.B. and Davis, R.O., 1980. Maximum entropy and the magnitude

distribution.

Bull.

Seismol.

Sot.

Am.,

70:

1823-1831. Bloom, E.D. and Erdmann,

R.C.,

1979. Frequency-magni-

tude-time relationship in the NGSDC earthquake data file. Bull. Seismol. Sot. Am., 69: 2085-2099. Bloom, E.D. and Erdmann, R.C., 1980. The observation of a universal shape regularity in earthquake frequency-magni-

Discussion

tude distributions. Bull. Seismol. Sot. Am., 70: 349-362. Court, A., 1952. Some new statistical techniques in geophysics. Adv. Geophys., 1: 45-85.

Consider the application of the maximum entropy method to the seismic data from a specific geographic region. Case I, in which we have absolutely no information about the p(m), is trivial. For Case II, implicit in the assumption of the existence of a mean magnitude is the presence of a lower bound. This is in conformity with the inability of seismic instruments to detect earthquakes below a certain magnitude, although theoretically there can be arbitrarily small earthquakes. The existence of the

Dziewonski, A.M., Hager, B.H. and O’Connell, R.J.,

1977.

Large scale heterogeneities in the lower mantle. J. Geophys. Res., 82: 239-255. Goldman, S., 1953. Information Theory. Prentice-Hall, New York. Jaynes, E.T., 1957a. Information theory and statistical mechanics. Phys. Rev., 106: 620-630. Jaynes, E.T., 1957b, Information theory and statistical mechanics, II. Phys. Rev., 108: 171-190. Jaynes, E.T., 1978. Where do we stand on maximum entropy? In: R.D. Levine and M. Tribus (Editors), The Maximum Entropy 15-118.

Formalism.

MIT Press, Cambridge,

Mass., pp.

119

Kanasewich,

E., 1981. Time Sequence

Univ. of Alberta Purcaru,

G., 1973. The informational

earthquake

statistics

Ital. Geofis., Reitsch,

and

Richter, Rikitake,

density

in Geophysics.

energy

and entropy

of earthquakes.

1958.

entropy

in Riv.

approach

to inverse

Seismology.

Freeman,

San

Calif. time

to prediction

of earthquakes.

of magnitude

Tectonophysics,

Geophys.

Res., 87: 5541-5552.

Shannon,

T., 1976. Earthquake

Shen, P.Y. and Man&ha,

Shhen,

Prediction.

Elsevier,

Amster-

1979.

Information

theory

and

entropy

the earth’s

Tech.

theory from

and

Geophys. of earthquake

lateral

Md.

density

gravity

theory

field.

J.

of communica-

623-656.

L., 1983. On the principle

of maxi-

frequency-magnitude

Sot., 74: 777-785.

M.N., 1970. Frequency-magnitude occurrences.

dis-

global

the earthquake J. R. Astron.

S. and Toksoz,

Memo.

Greenbelt,

Earthquake

sta-

Notes,

41:

and

De-

5-18. Tribus,

D.P.,

inferred

C.E., 1948. A mathematical

tistics

dam. Rubincam,

for Earth

relation. 8:

81-95. Rikitake,

D.P., 1982. Information

tribution

mum

T., 1969. An approach occurrence

Rub&am,

NASA/GSFC

Space Flight Center,

tion. Bell Syst. Tech. J., 27: 379-423,

42: 489-506.

Elementary

distribution.

80586-Goddard

Alta.

22: 323-335.

J. Geophys.,

C.F.,

Francisco,

Analysis

prediction

E., 1977. The maximum

problem.

and

Press, Edmonton,

M., 1969. Rational

signs. Pergamon,

Descriptions,

New York.

Decisions