Robust earthquake location using M-estimates

Robust earthquake location using M-estimates

Physics of the Earth and Planetary Interiors, 30 (1982) 119—130 Elsevier Scientific Publishing Company, Amsterdam — Printed in The Netherlands Robust...

929KB Sizes 24 Downloads 56 Views

Physics of the Earth and Planetary Interiors, 30 (1982) 119—130 Elsevier Scientific Publishing Company, Amsterdam — Printed in The Netherlands

Robust earthquake location using M-estimates

119

*

Kenneth R. Anderson Applied Seismology Group, 42 Carleton Street, Cambridge, MA 02142 (U.S.A.) (Received October 29, 1981; revision accepted April 9, 1982)

Anderson, K.R., 1982. Robust earthquake location using M-estimates. Phys. Earth Planet. Inter., 30: 119—130. In the 1930’s, Jeffreys recognized that large errors among the arrival times used to locate an earthquake can move the least-squares estimate of the epicenter far from the true epicenter. This has often been remedied by using M-estimates of epicentral location that iteratively weight each equation of condition by a function of its residual size. Six estimators, least absolute value, least squares, bisquare, Huber, Jeffreys, and sine, are compared for robustness when applied to the earthquake location problem. Although M-estimates are more robust than least squares, they are sensitive to the initial location estimate, the scale of the residuals, and the geometry of the recording network. A detailed study of one earthquake located by both least squares and the bisquare method is reported. This shows that when using least squares, error at one station can inflate the residuals at other stations so that a true outlier may be hidden. Also, error at influential stations can go completely unnoticed by either method.

1. Introduction While developing his travel-time tables in the 1930’s, Jeffreys recognized that large arrival-time errors strongly influence earthquake locations determined by least squares (Jeffreys, 1962). He developed a location method that is robust in the sense that the location produced is insensitive to large arrival errors. This method is still used today (Lee and Lahr, 1975; Bolt, 1960; ISC, 1976, p. ii~ Another common method is to start with the least-squares location and refine the location iteralively, either by discarding the arrival having the largest travel-time residual or by discarding all arrivals having residuals above a certain size, in each case recomputing the least-squares location on the basis of the resulting subset of arrivals (Slunga, 1980). This work was sponsored in part by the United States Geological Survey and in part by the Defense Advanced Research Projects Agency. The U.S. Government assumes no responsibility for the information presented. *

0031-9201/82/0000—0000/$02.75

Jeffreys’ location method is a member of a class of robust estimates, termed M-estimates (Huber, 1972), that is being investigated in the statistical community. Most of this investigation has dealt with one-dimensional robust estimation (Andrews et al., 1972). Less work has been done on the more difficult problems of robust linear regression (Andrews, 1974; Beaton and Tukey, 1974; Hill, 1977) and robust nonlinear regression (Dennis and Welsch, 1976). Although M-estimates usually produce good earthquake locations, there are several problems that must be anticipated. M-estimators try to produce a location while simultaneously identifying a set of outliers (data points having large residuals). In a sense this is circular reasoning, since a good location cannot be determined until outliers have been identified, and outliers cannot be identified until a good location has been determined. This paper exposes several important issues in robust earthquake location. The following Section provides an introduction to robust estimation. Six

© 1982 Elsevier Scientific Publishing Company

120

estimators are defined (least absolute value (L1), least squares (L2), bisquare, Huber, Jeffreys, and sine) which are compared in Sections 4 and 5. It is shown that a robust minimization problem can be posed as an iteratively reweighted least-squares problem. Section 3 describes the earthquake location algorithm used to compare the estimators. Section 4 compares several location algorithms of varying degrees of robustness. Section 5 represents a detailed comparison of a robust location method with that of least squares for one earthquake. The final Section summarizes the major issues.

I

I

-

/‘~EAN

-

MEt~AN—



~ ~

,—‘~

°

~



——

6:

..,.~—

~ 0.2

-

-

I

i

+

-1444-f

-4.0

4.0

OUTLIER POSITION 1. The effect of an outlier on L2 (mean), L1 (median), and

Fig.

2. Introducton to robust estimation A common problem in data analysis is to fit a set of observations y,, i = 1,...,N, to a theoretical function )‘1

I 2.0

~( —J ~X,,

/

where x1 is the vector of independent variables corresponding to observation y~and 0 is a vector of parameters to be determined. In typical earthquake location problems the number of observations, N, is small (half of the more than 95000 seismic events in the Preliminary Determination of Epicenters published by the National Earthquake Information Service since 1964 were located using data from less than 16 stations). Huber (1972) considered the following class of data-fitting problems: find the solution vector 0 that minimizes the function N

P(0) = ~ p(r,/s)

(1)

bisquare estimates of the center of the set of points indicated by plus signs. The median and bisquare estimates are robust, while the mean is not.

a large family of underlying error distributions are termed “robust”. The least-squares approach is not robust because a single large error can move 0 far from the true solution. Consider estimating the center of the set of points shown as plus signs in Fig. I. These points come from a population having a Gaussian distribution with a mean of zero and a standard deviation of 1. The solid line shows how the mean is affected when a single outlier is added to the point set, as a function of the outlier size. As the outlier gets larger, so does the mean. Use of the least-absolute-value (or L1) norm as a robust alternative to least squares has been popularized by Claerbout and Muir (1973). The corresponding loss function is pL(e)—IeJ (3)

where r, is the i th residual given by —



,~

3), ~ The scale factor s is an estimate of the spread or variance of the residuals, and p(e) is termed a “loss function”. For example, the loss function corresponding to the least-squares (or L2 norm) estimate is —

,~

(2) Estimates that are not affected by a few larger errors in the observations and that can be used for

and the one-dimensional solution of eq. (3) is the median. The small dashed line in Fig. 1 shows that the outlier has a bounded effect on the median of the data sample. The long dashed line shows the effect of an outlier on the bisquare estimate (Beaton and Tukey, 1974) that has the loss function 3

PB(e)={:{

~

(4)

When the outlier is small, the bisquare estimate is

121

similar to the mean. However, as the size of the outlier increases, its effect on the estimate de-

1/

I

-

/2

creases until eventually it has no effect on the estimate. Two other estimators that have been studied in the statistics literawre are the Huber estimator (Huber, 1972) and the sine estimator (Andrews et al., 1972; Andrews, 1974). The loss function that Huber considered is

1e2/2

pH(e)

lciei

=



2/2

c

IeHc el > C

~ o ~ ~ U-

‘HUBER’’

,/

:nQuAR:

,~—/-

-

~ 0.5

-

(5)

~I

0.5

I

4.5

SCALED RESIDUAL

This is equivalent to using an L 2 loss function for errors smaller than c and an L1 loss function for errors greater than c. The loss function corresponding to the sine estimator is

Fig. 2. Loss functions corresponding to the estimators given by

eqs.

(2)—(6).

Figure 2 compares the six loss functions. 2 lel>c~ps(e) The following = {c2(l 2c values cos(e/c)) of c for the let bisquare, Huber, (6) —

mizing N (1) is equivalent to solving ~ If q(r,/s)[af(x S is a relatively constant function of 0 mini1,O)/i]3O~=0

and sine estimators give a 95% asymptotic efficiency for the Gaussian distribution (Hill, 1977):

~=

j=

l,...,M

i

where q(e)

p’(e). In matrix notation, this is

eq.(4) c4.685

ATqO

eq. (5)

where A is an N X M matrix with elements

C=

1.345

eq.(6) c1.339 The estimator developed by Jeffreys has not been studied extensively in either the statistical or seismological literature. Jeffreys recognized the need for robust estimation during his work on seismic travel times in the 1930’s (Jeffreys, 1932, 1948, 1962, 1973). He showed that teleseismic travel-time residuals r seem to come from a Gaussian distribution, with zero mean and standard deviation a, contaminated by a small amount of a second Gaussian distribution with a much larger standard deviation a~.Thus the probability distribution of the residuals can be written as 2/2a2) 1/21 d(r) = [(1 a)/a(2ir) jexp(—r + {a/ac(21r)I~12]exp(_r2/2a~2) (7) —

(9)

a1~=[8f(x~,0)/i]0Oj and q is a column vector with elements q, = q( r,/s) Writing q as Wr, where W is the N X N diagonal matrix with elements w1, = q,/r,, eq. (9) becomes ATWr0 Expansion of r about a trial solution 00, terms higher than the first, yields ATWr = ATWA dO

(10) ignoring

(11)

where dO is an adjustment to the trial solution and s, AT, W,least-squares and r are evaluated at dO. 0~.This is a weighted problem for Equation 11 can be written as a normal least-squares problem by letting

where a is the amount of contamination. The loss

B = AWI/

function corresponding to this distribution is

where W1/2 is an N X N diagonal matrix with

2 pj(r)

=

—log(d(r))

(8)

elements that are the square roots of the corre-

122

1.0—~~

~ 0

2 0 L



z

2

C)

I



2 LU

0.2

1.3



I

I~

HU BER

UII

STANDARD DEVIATION

JEFE~~\



I

I

I

I

I

BISQUARE

0.7

SINE—z~... I ~

0.5

I

I

I

I

~I

I

I

I

I

I

I

I 4.0

I

- — — —

‘~

I

I

-

I -4.0

4.5 SCALED RESIDUAL

+

I

I

OUTLIER POSITION

Fig. 3. Weighting functions corresponding to the loss functions shown in Fig. 2.

Fig. 4. The effect on an outlier on SPREAD and standard deviation estimates of the variation of the set of points indicated by plus signs. The SPREAD is a robust estimate of variation, while the standard deviation is not.

sponding diagonal elements of W. Then eq. (11) becomes BTr = BTB dO Solving this equation is equivalent to solving the least-squares problem

5mjn is chosen appropriately for the application. where Figure 4 shows the effect of a single outlier on the standard deviation (solid line) and the

B dO = r

SPREAD (dotted line) for the same set of points

(12)

Thus a robust minimization problem can be posed as an iteratively reweighted least-squares problem. Figure 3 shows the weighting functions corresponding to the loss functions of Fig. 2. Andrews et al. (1972) studied 68 robust estimators of center and it is worth considering a few of their conclusions. Since the solution found by an M-estimator is affected by the scale estimate s, a robust scale estimate should be used. One such estimate is the SPREAD, defined as

used in Fig. 1. A large outlier increases the standard deviations, thereby reducing the effectiveness of a robust estimator. On the other hand, the SPREAD remains near the true standard deviation. M-estimators do well for points from a variety of distributions. The best M-estimators are those that weight large residuals to zero, such as the bisquare and sine estimators. Unfortunately, these estimators do not necessarily unique tions, and thus the final 0 may have depend on 00.soluThe

SPREAD = A median{ I ri I)

L 2 estimate should not be used as the initial

where A is chosen to make the SPREAD a consistent estimate of the standard deviation if the observations come from a Gaussian distribution (A = 1/0.6745 (Hill, 1977)). Occasionally, if a hypocenter fits a subset of arrivals well and there are only a few arrivals (say seven), the SPREAD will be almost zero. This can cause perfectly good arrivals to be down-weighted during further iterations. To avoid this, the scale estimate can be kept from becoming too small by using

estimate, since large errors may bias it far from the global minimum and a distant local minimum may be found (Andrews, 1974). A better alternative is to use the L1 solution as the initial estimate. For one-dimensional problems, it has been shown that M-estimates that take only one iteration step past the L1 solution are usually good. Little work has been done on how to determine confidence intervals for M-estimates. For least squares, the covanance matrix is given by

s = max(SPREAD,smin)

C

(13)

=

s2(ATAY~

123

where s2 is the least-squares estimate of the variance in the observations = ~ r,/ (N M) —

and M is the dimensionality of 0. A diagonal element of C, C’,,, is the standard error of the i th component of 0 (Flinn, 1965). A weighted leastsquares program solving eq. (11) would compute C

=

[~w,r,2/(iv



M)](ATWA)I

(14)

If the sum of the weights is viewed as a measure of “degrees of freedom”, then another alternative would be C=

[~w,r,2/ (

~

clear what probabilistic interpretation can be given to them. Further research is required to determine how the size of the confidence region should be estimated. A procedure similar to that of Evernden’s, using a robust location method and a realistic non-Gaussian error distribution, could lead to appropriate confidence interval estimates.

3. Robust earthquake location Figure 5 shows the location algorithm, SEEK, used in this study. It is a damped Gauss—Newton algorithm similar to that of Buland (1976). Al-

w

(15)

though there are other attractive methods for solv-

Welsch (1975) studied these three alternative estimates of the covariance matrix as well as some others for a particular linear regression using the sine estimator. Although it is unwise to generalize from his results, eqs. (14) or (15) should provide reasonable estimates of the covariance matrix, The confidence region is an M-dimensional ellipsoid of the form 2 (16) (X O)TC_l(X —0) ‘~D where X is an M-dimensional vector in the solution space. Flinn (1965) used

ing this problem, such as Marquardt’s method (Marquardt, 1963, 1970, 1974), or more general optimization algorithms (Welsch and Becker, 1975; Dennis and Welsch, 1976), SEEK can be used with both the L1 (Claerbout and Muir, 1973) and robust weighted norms and it has been analyzed in detail for the L2 case (Buland, 1976). The loop in Fig. 5 reduces the size of dO until a location that reduces P( 0) is foundzero or until dO (1970) becomes so small it is effectively (Jones presents an that interesting variation on this approach).

1



M)] (ATWA)’



D=M52FM,N_M

(17)

where FM.N_M is the F statistic with M and N M degrees of freedom at an appropriate confidence level a. Flinn pointed out that a must be chosen so that D2 is sufficiently small that the ellipsoid remains in the linear region about the solution. Evernden (1969) showed that eq. (17) produces confidence regions that are pessimistically large. He showed that a more realistic confidence region can be determined using —

D2 =

a2x~

(18)

where a2 is an a priori estimate of the variance of the observations, and x~is the chi square statistic with 2 degrees of freedom at an appropriate a. Both Flinn and Evernden used the least-squares covariance matrix. Although the confidence regions quoted in this paper are determined using eqs. (1 5)—( 18), it is not

ALGORITHM SEEK:

~

FIND MINIMUM OF P(e) GIVEN 80



repea~ I

e ~e

s8lve 5dB

=

r for d8

d 1 while (P(8+de*d) > d d/2

P(e))

if (d < SMALL) return 8

e return

e

+

I until ((P(8 )

6



P(O))/P(8) < EPS)

end SYMBOL 80 d8 8 P?6)

MEANING initial solution final solution adjustment to solutioS (see equation previous solution robust loss function

d SMALL, EPS

vector of observed arrival times damping factor appropriately small numbers

6

12)

Fig. 5. Algorithm SEEK to find the minimum of a loss function (see text). The algorithm is presented in a pseudo-language similar to RATFOR (Kernighan and Plauger, 1976).

124

Many complications are hidden in solving for dO in eq. (12). Buland (1976) showed that the matrix ATA is often poorly conditioned, and recommended using the QR algorithm when solving for dO. However, Marquardt (1963) and Smith (1976) showed that proper scaling of the matrix A increases its numerical stability. It has been observed that the numerical properties of ATWA are usually better than those of ATA, although this need not always be true. Even if ATA is well conditioned, ATWA may be ill-conditioned and the undamped adjustment vector may be poor. This can happen, as is shown in Section 4, when there are low weights on observations containing information about some region of parameter space. Even if A1’ WA is well behaved at a local minimum, a bad starting location can lead to bad results (Welsch and Becker, 1975). A further complication is that the most robust loss functions need not have unique solutions. Thus, although robust methods provide better locations than least squares, they come with their own set of problems. The robust solution is dependent on the starting location used. Commonly, the first station to report an arrival is used as the trial location. Although Buland has shown that this is a good trial location for Gaussianly distributed arrival-time errors, it is not particularly robust. Although in linear robust regression the L1 solu.

.

tion provides a reasonable starting solution, the L1 earthquake location is not always better than the L2 location (see Section 5). Because of the significance of the initial location, the arrival-order location (AOL) method of estimating an initial location will be used (Anderson, 1981). For dense local networks, it gives an initial epicenter that is within several kilometers of the final one. The next Section describes a test program used to make a comparison of several robust location methods using hand-picked arrival times from the Caltech Earthquake Detection and Recording Systern CEDAR (Johnson, 1979). A detailed comparison of the effects of error on the robust and least-squares locations for a particular earthquake is made in Section 5.

4. Comparison of location methods A program to test several alternatives for earthquake location has been developed. Using this program, combinations of initial location and weighting scheme can be tried by specifying an ordered list of operations required to produce a location. The list of operations is as follows: Weighting schemes L2

least squares (eq. (2))

L1 BISQ HUBER SIN

least absolute value (eq. (3)) bisquare weighting (eq. (4)) Huber weighting (eq. (5)) Andrew s sine weighting (eq. (6))

JEFF

Jeffreys weighting (eq. (8))

.

,

Location AOL

STAI LOC3

perform consistency check (see below) and use the arrival-order location (Anderson, 1981) as the current location; fix depth to ~ km use the station with the earliest arrival as the current location, fix depth to 5 km three-parameter location; SEEK a new

location using the specified weighting scheme while allowing origin time, latitude and longitude to vary, with depth fixed at its current value LOC4 four-parameter location: from the current location, SEEK a new location using the specified weighting scheme while allowing origin time, latitude, longitude, and depth to vary The consistency check identifies large arrivaltime errors. Every pair of arrivals (i,j) from the same event, from different stations, must satisfy

IT, I~

— —

I~ I ~

t(P,, ~ ,~)+ 27}

s,,) + 2T~

and

t(I~,,

where T, is the arrival time of phase P1 at station i, ~Ij is the distance between the two stations, and t(P, ~) is the surface-focus travel time for the indicated phase and distance. The T1 term is the

125

maximum allowable error an arrival time can have. This check has been used to identify large arrivaltime errors in an automatic arrival-picking and event-location scheme for the CEDAR network (Anderson, 1978), and to identify groups of possibly related arrivals during global network monitoring (Anderson and Snell, 1980). A data set of nearly 1300 hand-picked arrival times from more than 60 events recorded by CEDAR were used to compare the results of four different location methods (Anderson, 1978): Method Operations

taming depth information were removed by the robust weighting! In this case, allowing the depth to vary starting from the trial location led to a proper depth. However, Buland (1976) recommended fixing the depth for the first few iterations, since this increases the region of convergence. Convergence is usually a problem only if the event is outside the recording network. Thus the following algorithm is recommended. Determine a trial location using AOL, BISQ, LOC3. If this location is outside the network, return this location; if not, return the location determined by AOL, BISQ, LOC4.

BI HUBER L1 L2

5. Robust versus least-squares location: a detailed comparison

AOL, BISQ, LOC3, LOC4 AOL, HUBER, LOC3, LOC4 AOL, L 1, LOC3, LOC4 STA1, L2, LOC3, LOC4

Since it was expected that JEFF and SIN would produce locations similar to BI they were not studied extensively, The major differences in location were usually in the final depth, as might be expected. Generally, the differences between the BI and the L2 locations were larger than between the HUBER and the Ll locations, the HUBER and the Ll locations usually being close to either the BI or the L2 location. BI gave more reasonable depths and epicenters for events that are normally hard to locate, such as surface events or events outside the recording network. Since the accuracy of the arrivals was usually good, the BI and L2 epicenters were often in relative agreement with each other. However, in 25% of the events, the epicenters differed by more than 4 km and the depths differed by more than 6.5 km, well beyond their 95% confidence limits 0.5 km in latitude or longitude and 2.0 km in depth). The performance of BI is crucially dependent on the starting location used. For example, for one quarry blast L2 gave a surface location, while BI gave a depth of 6 km. Arrivals from stations close to the epicenter were down-weighted since the initial depth was 5 km, and thus these arrivals were earlier than expected. Further iterations with fixed depth increased the residuals for these stations more, and thus they were further down-weighted. Eventually, all the arrivals con(‘-‘S

This Section gives the results fom a study of the effect of arrival-time errors on the BI and L2 locations for one event. Table I and Fig. 6 summarize the residuals and geometry of the recording network. This event was chosen for several reasons. Both BI and L2 give the same location and the residuals

I 33.4 N

I

I

I



-

-

Cli 1k wI~I

33.0 N

IflQ





-

-

SQl 32.6 N





I

I

I

I

116.0w 115.0W Fig. 6. Map showing the station distribution for the study event. The star indicates the epicenter.

126 TABLE I L2 location parameters for study event Origin Time

Latitude

Longitude

Depth (km)

7.22±0.10

33.006 ±0.006

—115.587 ±0.006

3.63 ±2.2

P-arrivals Station

Time (s)

Residual (s)

Quality

WML WLK NWR CLI SUP ING SGL

8.03 8.90 9.64 9.78 10.78 11.48 13.90

0.01 —0.06 0.12 —0.05 —0.15 0.08 —0.01

100 75 50 75 25 75 100

a

Distance (km)

Azimuth (deg.)

3.50 10.28 13.95 15.95 22.87 25.91 41.64

286 60 319 20 255 94 198

The quality (scaled 0—100) is an analyst’s estimate of the relative accuracy of the arival time.

are small. Also, the event is typical of the smaller events recorded by CEDAR (approximately onethird of the events are reported by less than 10 stations, one-third by 10—19 stations, and one-third by 20 or more stations). Since there are only seven stations reporting arrivals the effect of arrival-time errors should be pronounced; both BI and L2 should perform worse than usual. The station geometry also has typical problems. The diameter of the reporting network, 50 km. is small, and the distribution of the stations is not ideal. Station WML is 3.5 km from the epicenter and an error there should strongly affect the depth of the event, Station SGL is the most distant station and it is the only station that is significantly south of the epicenter. Thus the location should also be strongly affected by error at this station. To investigate the effect of arrival-time errors on the location, errors of —8, —4, —2, 1, 0, 1, 2, 4, and 8 s were added to one station and the event was relocated using the L2, LI, and BI location methods described in the previous Section. This was repeated three times using the stations WML, SGL, and NWR as the contaminated station. Figure 7(a) shows the path of the L2 epicenters as a function of error at each of these stations, —

Negative error pushes the epicenter toward the corresponding station and positive error pushes it away from the station. The location diverges (has extremely large latitude or depth) for larger positive errors at the stations WML and SGL. Figure 7(b) shows the paths for the corresponding BI epicenters. Usually, the epicenters remain within a few kilometers of the actual epicenter. The effect of error at NWR appears to be almost completely removed, whereas only the +2 s error at WML provides a bad location. The effect of positive error at SGL is removed but the effect of negative error appears to be about as bad as that for L2. For comparison, Fig. 7(c) shows the effect of error on the LI locations. Although they are better than the L2 locations, they are not as good as the BI locations. Figures 8—10 show how errors at NWR, WML, and SGL affect the residuals at the other stations. Figure 8(a) shows that for L2 locations, as the magnitude of the error at NWR increases, the residuals at the other stations are also increased. This inflation of the residuals is a general property of L2 and it is also apparent in Figs. 9(a) and 10(a). Figure 8(b) shows that BI removes the inflation of the residuals and the arrival time at NWR is always identified as an outlier. In the process of

127

WML’

/SGL

I NWR

6

/

~

/ ~xx

A

I

/

15

I A

CHANGE IN L2 EPICENTER DUE TO ERROR AT ONE STATION

15

ERROR AT NWR

WML\

NWR

6

0

-J

=5 0

15

6

CHANGE IN ROBUST EPICENTER DUE TO ERROR AT ONE STATION

15

B

ERROR AT NWR

Fig. 8. Residuals at other stations as a function of error at NWR for (a) the L2 location and (b) the BI location.

WML\

\

NWR

7 -15

SGL I C -15 15 CHANGE IN Li EPICENTER DUE TO ERROR AT ONE STATION

identifying the + I and + 2 s errors at NWR the weighting at WML is reduced somewhat, thus slightly increasing its residual. Figure 9(b) shows that once again BI generally correctly identifies WML as the outlier. It does make a mistake when WML has an error of +2s. Fig. 7. The change in the epicenter due to error at one of three stations (NWR, WML, and SQL) for (a) the L2 algorithm, (b) the BI algorithm, and (c) the Li algorithm. The curves show how the epicenter varies as the amount of error at each station changes. Generally, negative error pushes the epicenter toward and positive error pushes it away from a station (the plots are scaled in kilometers).

128

6

6 (a

S

-6__

ERROR AT WML

6

ERROR AT SGL

~

WML

-6

ERROR AT WML

SGL

B

ERROR AT SGL

Fig. 9. Residuals at other stations as a function of error at WML for (a) the L2 location and (b) the BI location.

Fig. 10. Residuals at other stations as a function of error at SGL for (a) the L2 location and (b) the BI location.

Here, WLK, the second nearest station, is considered to be the outlier. Figure 10(b) shows that SGL is correctly identified as an outlier only when its errors are positive. For negative errors, SUP is considered to be the outlier, although the other residuals remain somewhat inflated, indicating that the location is poor. The reason that BI does poorly here is that errors at SGL have a strong influence on the location. Since SGL is separated from the other stations, arrival-time errors at SGL have a stronger leverage on the final solution. Also, errors at SGL bias the arrival-order location to the south.

6. Discussion and conclusion The event studied in the previous Section provides much insight into the location problem. Since it is an extreme case, it brings out the worst faults of both BI and L2. In summary, the following points can be made about earthquake location procedures. A location method such as BI provides more robust locations than does least squares when non-Gaussian arrival-time errors occur. Such errors are common, since arrivals are often weak and ambiguous (Anderson, 1978). Not only are L2

129

locations distorted by arrivals having large residuals, but the residuals of good arrivals can become inflated to the point where the true outliers are obscured, Although M-estimates are superior to least squares, they are sensitive to the initial location and the estimate of residual variance (scale) used. The L2 location is not a robust starting location, and the practice of starting with the L2 location and then removing large residuals before refining the location is not recommended. The arrival-order location provides a good initial location estimate in networks that have a reasonably uniform distribution of stations (Anderson, 1981). A poor starting depth can cause arrivals that contain depth information to be wrongly downweighted. This is most likely if the depth is held fixed for a few iterations. A robust estimate of variation such as the SPREAD (eq. (13)) should be used rather than the standard deviation. Owing to poor network geometry, large errors at influential stations can go completely undetected by either least squares or BI. The practice of azimuthally weighting arrivals (Lee and Lahr, 1975) can make this problem even worse. No matter what location method is used, a handful of arrivals can never be expected to provide a particularly robust location. As a rule of thumb, at least three or four observations per degree of freedom would be desirable. Unfortunately, this is often not possible. In such cases, extra care should be used to inspect the location for possible problems. A location can be checked in several ways for signs of a problem, as follows. (a) If the value of the loss function at the final location is abnormally large, the location may be biased by errors at influential stations. (b) A poor starting depth can cause arrivals that contain depth information to be wrongly downweighted. The weights of arrivals from stations near the source and of depth phases should be checked to be sure that down-weighting has not occurred. (c) Identify influential arrivals. The diagonal elements of the information density matrix (Klein, 1978), termed the data importance by Jordan and

Sverdrup (1981), can provide an indication of which arrivals strongly influence the location. Belsley et al. (1980) have presented several additional estimates of influence. Unfortunately, they showed that no single method is satisfactory in general. Research should be directed toward developing influence measures that are appropriate for the earthquake location problem.

Acknowledgments This work was carried out partly at the Department of Earth and Planetary Sciences, MIT, supported by the United States Geological Survey under Contracts 14-08-00l-G-339 and 14-08-000 116761, and partly at Lincoln Laboratory, Applied Seismology Group, supported by the Defense Advanced Research Projects Agency.

References Anderson, KR., 1978. Automatic processing of local earthquake data. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA, 173 pp. Anderson, KR., 1981. Epicentral location using arrival time order. Bull. Seismol. Soc. Am., 71: 541—546. Anderson, KR. and Snell, N.S., 1980. Automatic association. In: Seismic Discrimination: SATS. DTIC AD-A091 107, Lincoln Laboratory, Massachusetts Institute of Technology, Cambridge, MA.

Andrews, D.F., 1974. A robust method for multiple linear regression. Tectonophysics, 16: 523—531. Andrews, D.F., Bickel, P.J., Hampel, FR., Huber, J., Rogers, W.H. and Tukey, J.W., 1972. Robust Estimates of Location: Survey and Advances. Princeton University Press, Princeton, NJ, 373 pp. Beaton, A.E. and Tukey, J.W., 1974. The fitting of power series, meaning polynomials, illustrated on band-spectroscopic data. Technometrics, 16: 147—185. Belsley, D.A., Kuh, E. and Welsch, R.E., 1980. Regression Diagnostics: Identifying Influential Data and Sources of

Collinearity. Wiley, New York, 292 pp. Bolt, BA., 1960. The revision of earthquake epicenters, focal depths and origin-times using a high-speed computer. Geophys. J., R. Astron. Soc., 3: 433—440. Buland, R., 1976. The mechanics of locating earthquakes. Bull. Seismol. Soc. Am., 66: 173—187. Claerbout, J.F. and Muir, F., 1973. Robust modeling with erratic data. Geophysics, 38: 826—844. Dennis, J.E. and Welsch, RE., 1976. Techniques for nonlinear least squares and robust regression. In: Proceedings of the

130 Statistical Computing Section. American Statistical Association, pp. 83—87. Evernden, J.F., 1969. Precision of epicenters obtained by small numbers of world-wide stations. Bull. Seismol. Soc. Am., 59: 1365—1398. Flinn, E.A., 1965. Confidence regions and error determinations for seismic event location. Rev. Geophys., 3: 157—185. Hill, R.W., 1977. Robust regression when there are outliers in the carriers. Ph.D. Thesis, Harvard University, 289 pp. Huber, P.J., 1972. Robust statistics: a review. Ann. Math. Stat., 43: 1041—1067. ISC, 1976. Bull. mt. Seismol. Center, 13: 1. Jeffreys, H., 1932. An alternative to the rejection of observations. Proc. R. Soc. (London), Ser. A, 137: 78—87. Jeffreys, H., 1948. Theory of Probability. Cambridge University Press, Oxford, 2nd ed., pp. 189—192. Jeffreys, H., 1962. The Earth. Cambridge University Press, Oxford, 4th ed., pp. 93—95. Jeffreys, H., 1973. On travel times in seismology. In: Collected Papers of Sir Harold Jeffreys on Geophysics and Other Sciences. Gordon and Breach, New York. Johnson, CE., 1979. CEDAR—an approach to the computer automation of short-period local seismic networks. Ph.D. Thesis, California Institute of Technology, 332. Jones, A., 1970. Spiral—a new algorithm for nonlinear parameter estimation using least squares. Comput. J., 13: 301—308. Jordan, T.H. and Sverdrup, K.A., 1981. Teleseismic location techniques and their application to earthquake clusters in the south-central Pacific. Bull. Seismol. Soc. Am., 71: 1105— 1130.

Kernighan, B.W. and Plauger, P.J., 1976. Software Tools. Addison—Wesley, Reading, MA, 338 pp. Klein, F., 1978. Hypocenter Location Program Hypoinverse. U.S. Geol. Survey, Open-File Rep. 78-694. Lee, W.H.K. and Lahr, J.C., 1975. HYPO7I (revised): A Computer Program for Determining Hypocenter Magnitude and First Motion Pattern of Local Earthquakes. U.S. Geol. Survey, Open-File Rep. 75-311. Marquardt, D.W., 1963. An algorithm for least-squares estimation of nonlinear parameters. J. Soc. md. Appi. Math., II: 431—441. Marquardt, D.W., 1970. Generalized inverses, ridge regression, biased linear estimation, and nonlinear estimation. Technometrics, 12: 591—611. Marquardt, D.W., 1974. Discussion no. 2. Technometrics, 16: 189—192. Slunga, R., 1980. International Seismological Datacenter: An Algorithm for Associating Reported Arrivals to a Global Seismic Network into Groups Defining Seismic Events. Forsvarets Forskningsanstalt Rep. C 20386-Tl. Smith, E.G.C., 1976. Scaling the equations of condition to improve conditioning. Bull. Seismol. Soc. Am., 66: 2075— 2081. Welsch, RE., 1975. Confidence Regions for Robust Regression. Natl. Working Pap. No. Ill, Bur. Econ. Res. Inc., Massachusetts Institute of Technology, Cambridge, MA, 7 pp. Welsch, R.E. and Becker, R.A., 1975. Robust Nonlinear Regression using the Dogleg Algorithm. Working Pap. No. 76, Center for Comput. Res. in Econ. Manage. Sci., Massachusetts Institute of Technology, Cambridge, MA.