Trimming and procrastination as inversion techniques

Trimming and procrastination as inversion techniques

PHYSICS O F T H E EARTH AN D PLANETARY INTERIORS ELSEVIER Physics of the Earth and Planetary Interiors 98 (1996) 101-142 Trimming and procrastinati...

3MB Sizes 2 Downloads 54 Views

PHYSICS O F T H E EARTH AN D PLANETARY INTERIORS

ELSEVIER

Physics of the Earth and Planetary Interiors 98 (1996) 101-142

Trimming and procrastination as inversion techniques George E. Backus Institute of Geophysics and Planetary Physics, University of Cali~brnia, San Diego, La Jolla, CA 92093-0225, USA Received 25 September 1995; revised 14 December 1995; accepted 16 April 1996

Abstract

By examining the processes of truncating and approximating the model space (trimming it), and by committing to neither the objectivist nor the subjectivist interpretation of probability (procrastinating), we construct a formal scheme for solving linear and non-linear geophysical inverse problems. The necessary prior information about the correct model x e can be either a collection of inequalities or a probability measure describing where x E was likely to be in the model space X before the data vector Yo was measured. The results of the inversion are (1) a vector Zo that estimates some numerical properties zE of x e ; (2) an estimate of the error ~ z = z o - z e . As Y0 is finite dimensional, so is z 0, and hence in principle inversion cannot describe all of x e. T h e error ~z is studied under successively more specialized assumptions about the inverse problem, culminating in a complete analysis of the linear inverse problem with a prior quadratic bound on x e. Our formalism appears to encompass and provide error estimates for many of the inversion schemes current in geomagnetism, and would be equally applicable in geodesy and seismology if adequate prior information were available there. As an idealized example we study the magnetic field at the core-mantle boundary, using satellite measurements of field elements at sites assumed to be almost uniformly distributed on a single spherical surface. Magnetospheric currents are neglected and the crustal field is idealized as a random process with rotationally invariant statistics. We find that an appropriate data compression diagonalizes the variance matrix of the crustal signal and permits an analytic trimming of the idealized problem.

1. Introduction

As part of a project for using data from the magnetic satellites M A G S A T and OERSTED to study the core-mantle boundary (CMB), the author has developed a formalism for estimating errors in geophysical inverse problems. This formalism applies to non-linear as well as linear problems, to regularization (Tikhonov and Arsenin, 1977), to hard and soft prior information, and to both Bayesian and frequentist treatments of that prior information. The formalism appears to provide a framework from which to view many of the techniques used in geomagnetic inversion (Franklin, 1970; Backus, 1970a, Backus, 1970b, Backus, 1970c, Backus, 1988a,

Backus, 1988b, Backus, 1989; Jackson, 1979; Langel and Estes, 1982; Gubbins, 1983, Gubbins, 1984; Gubbins and Bloxham, 1985; Tarantola, 1987; Cain et al., 1989; Stark, 1992a, Stark, 1992b; Donoho, 1994). In the present paper we describe and illustrate this formalism. The formalism shows how to find error bounds for the solutions to inverse problems, but does not by itself assure that the bounds will be small enough to be interesting. That question must be answered on a case-by-case basis. Therefore we emphasize at the outset that we have not yet applied the formalism to any non-linear problems. Non-linear inversions are usually problem-specific and often involve deep issues of 'hard' analysis. We have examined none of

0031-9201/96/$15.00 Copyright © 1996 Elsevier Science B.V. All fights reserved. PII S003 1 - 9 2 0 1 ( 9 6 ) 0 3 1 83 -4

102

G.E. Backus / Physics of the Earth and Planetary Interiors 98 (1996) 101-142

these issues. Although the example treated in this paper is linear, we describe the formalism in enough generality to cover non-linear problems because, as often happens, generality enforces simplicity. Our illustration of the formalism will be an idealized linear MAGSAT inversion which generalizes that in Backus (1989). We will seek to estimate properties of the core-generated geomagnetic field B x at the CMB from measurements of the magnetic field B at satellite altitudes. The idealized measurement sites are assumed to be almost uniformly distributed on a single spherical surface about 420km above the Earth's surface. As even this idealized linear example will illustrate, the investigator has a wide choice of ways to implement the formalism. There are, however, some aspects of the formalism that can be developed in considerable detail for general linear problems. This paper is intended for both Bayesians and frequentists. Frequentists (objectivists) hold with Neyman (1937) that probability has no experimental meaning except as an estimate of frequencies of various outcomes in a repeatable series of random trials. On the other hand, Bayesians (subjectivists) hold that probability distributions can serve as quantitative descriptions of their personal beliefs. This paper also accepts Jackson's distinction (Jackson, 1979) between hard bounds (inequalities) and soft bounds (probability distributions). Data errors for which we know only an inequality are usually called systematic errors, and the inequality is a 'hard' bound on the systematic errors. Data errors subject to a known probability distribution are called random errors. The probability distribution is a 'soft' bound on the random errors. Jackson originally introduced his distinction between hard and soft bounds to deal not with data errors but with prior information about the true Earth model x E, information available before the data are measured. Without prior information, most inverse problems are insoluble (Backus and Gilbert, 1967; Backus, 1970a). In the geomagnetic main field problem, x E is the geomagnetic field B x produced by the core. An example of prior information about x E is that the total rate of ohmic heat production in the core, a quadratic form in B x , must be less than the heat flow out of the Earth's surface (Parker, 1972; Gubbins, 1983: Backus, 1988a) unless the mantle

heat flow is unsteady or some of the ohmic heat in the core is reused to regenerate B x there (Backus, 1975). A less restrictive but more certain example of prior information is that the energy of B K must be less than the negative of the self-gravitational energy of the Earth (see Appendix A for a proof from the scalar virial theorem). Both the heat-flow and the virial bounds are inequalities on quadratic forms in B K , so they are hard prior bounds on x E. How might one obtain a soft prior bound on the true B K, i.e. a probability distribution for it in the space X of physically possible B x s ? Constable and Parker (1988) noted that some of the Gauss coefficients, the spherical harmonic expansion coefficients of the magnetic scalar potential, appear to be independent normal random variables whose variance depends only on degree l. Constable and Parker applied the Kolmogorov-Smirnov test (Kendall and Stuart, 1979) to degrees 2 < l < 12 because the axial dipole is well known to be anomalously large, whereas above / = 12 the Gauss coefficients of the whole Earth are believed to include an appreciable crustal contribution and so do not directly describe the core (Langel and Estes, 1982; Cain et al., 1989). Extrapolating Constable and Parker's probability distribution beyond 1 = 12 for the core is evidently speculative, but a bold observer, whether Bayesian or frequentist, might be willing to test the consequences of such an extrapolation. The resulting probability distribution for the B x produced by the core, i.e. for x E, would be a soft prior bound on x E. Another soft bound might be a Bayesian observer's description of prior personal beliefs about B K by means of a prior personal probability measure for x L in X. This soft prior bound would be rejected as meaningless by frequentists. One class of soft prior bounds is of some historical interest. Backus (1970a) suggested that a hard quadratic bound might be roughly imitated by a suitably chosen probability distribution. Jackson (1979) investigated this 'bound-softening' proposal in some detail, and Gubbins (1983) used it to estimate Gauss coefficients and their errors at the CMB. It is now known that Backus's suggestion of 1970 was wrong, and that softening a hard quadratic bound necessarily introduces spurious prior 'information' about x E when dim X = ~c (Backus, 1988b, Backus, 1989). In Appendix B we extend this result to finite

G.E. Backus / Physics of the Earth and Planetary Interiors 98 (1996) 101-142

dimensional spaces. We show that replacing an upper bound on a quadratic form by a normal probability measure introduces a spurious lower bound on the form. That is, the softening generates a claim that one caxell that is one-twentieth of the upper bound at the 90% confidence level, and one-third of the upper bound at the 50% confidence level. In spaces of dimension three or more, replacing hard by soft bounds generates so much spurious new information that any conclusions based on the bound softening will be unconvincing to most workers. It is a general phenomenon that soft bounds contain more information than hard bounds. For example, estimates of rounding errors in numerical analysis shrink when calculated statistically rather than absolutely. Although replacing hard by soft quadratic bounds is dubious in spaces of more than two dimensions, the converse process is standard practice. It amounts simply to the computation of confidence sets from given probability distributions. The hard bounds so obtained are probabilistic rather than absolute. For example, conventional statistical inference based on repeated laboratory measurements of a real-valued physical quantity x n might justify the conclusion that either a < xn < b or an event has occurred whose probability is e. In conventional language, at a significance level of E, the true value of x, lies in the confidence interval a < x, < b. The statistical ellipsis proposed in Appendix C allows us to say, alternatively, that x, is subject to the hard bounds a < x, < b at confidence level 1 - E. In using soft prior bounds, Bayesians (Tarantola, 1987) and frequentists (Franklin, 1970) have sometimes appeared to underestimate the difficulty and importance of finding convincing justifications for those bounds, and have focused on the mechanics of the probabilistic calculus that such bounds permit. Perhaps this attitude arises from the analogy with one-dimensional model spaces. In one dimension, hard bounds can be softened without adding spurious information. Indeed, Jeffreys (1961) tried to develop a Bayesian calculus in which the prior soft bound was based on no information at all, a procedure that approximates logically justifiable Bayesian calculations in one dimension but not in three or more. Their point of view about probability usually leads objectivists to convert soft to hard bounds whenever possible, so that the numerical process of

103

inversion with error estimation requires that inequalities be propagated from the data space through the model space to the prediction space (Backus, 1989; Donoho, 1994;). Subjectivists, on the other hand, prefer to work directly with the Bayesian calculus of probability distributions, and so in the past have often preferred to replace hard by soft bounds. Our proposal is that both camps should wait till the very end of the inversion, carrying both soft and hard bounds from their places of origin to the final prediction, which may be a single number or a finite vector of numbers. Only then is it necessary to invoke either camp's view of probability. For linear problems, the formalism presented in the next section expresses the error in the prediction vector as the sum of a systematic and a random error. If an appreciable systematic error or a hard prior bound is present, objectivists will not usually be willing to regard the prediction as a vector random variable, so they will convert the soft bound on the random error to a hard bound by calculating a confidence set. This can be combined with the hard bound on the systematic error to give a confidence set for the prediction vector, as described in the next section. Subjectivists will comfortably regard the random error in the prediction as a vector random variable. If the dimension of the prediction vector is one or two, subjectivists can soften the hard bound on the systematic error, convolute it with the random error, and view the whole error in the prediction as a random variable. In light of Appendix B, if the prediction vector has dimension three or more, subjectivists will probably prefer to regard its error as the irreducible sum of a random variable and a systematic error for which there is only a hard bound. To do otherwise would add To achieve our goal of procrastinating the choice between the subjective and objective interpretations of probability, we will find it essential on purely theoretical grounds to work with a finite dimensional approximation to the model space. Thus one advantage of our formalism is that it automatically includes an analysis of the truncation errors that are forced on us by the practical necessities of computation. We will need some notation used in the theories of sets, functions, measures and Hilbert spaces. Most of this has been described by Halmos (1950, Hal-

G.E. Backus / Physics of the Earth and Planetary Interiors 98 (1996) 101-142

104

mos, 1951, Halmos, 1958) and MacLane and Birkhoff (1967). For completeness, we list it in Appendix C.

2, The

general inverse problem

Using the notation summarized in Appendix C, we give here a concise description of a typical geophysical inverse problem. It provides us with D real numbers, y ~ , . . . ,yO (the data). From the data we try to estimate P other real numbers z 1, • • •, z e (the predictions). Both the data and the predictions could be calculated exactly from the correct Earth model, x e, if we knew it. However, it is an unknown member of a known and usually infinite dimensional model space X. We introduce two finite-dimensional real vector spaces, Y and Z, the data space and the prediction space, consisting respectively of all ordered D-tuples and all ordered P-tuples of real numbers. We are given a data function F: X ~ Y and a prediction function G : X ~ Z, whose coordinate functions we denote by Fi:x---->R and GJ:X---)R, where l _ < i _ < D and I_
Yo = YE + ~)RY at- ~sY

(2.1)

By definition, an error is not known exactly, but if we know nothing about it then the data are useless. Systematic and random errors are distinguished by the kinds of partial information we have about them. For the systematic error ~s Y we know only that

8 s y ~ Vs

(2.2)

where Vs is a know subset of the data space Y. We will call Vs the confinement set for 8sy. Often Y has a norm, 1[" II, and Vs is the solid ball B y ( a ) with radius a centered on 0 in Y. Then Eq. (2.2) becomes

simply II~sYll ~ a. Extending the terminology of Jackson (1979), we will describe Eq. (2.2) as a 'hard bound' on 8sy. A random error Bey is a realization of a vector random variable taking values in Y. This random variable is completely described by its probability distribution, a probability measure e on Y. If V is any Borel subset of Y, 'OR(V) is the probability of the event 8 n y ~ V. Following Jackson (1979) we call "OR a 'soft bound' on 5Ry. We will assume that "OR is known except for a few parameters that can be determined by fitting the data. Backus (1989) discussed some aspects of estimating those parameters. We will ignore that question here. We will assume that yi and yiyj are integrable with respect to "OR, and as a mnemonic device we will write the expected values E("OR,y i) and E("OR,yiy j) in the forms E(~R yi) and E(~ R yi~ e y J). We will assume that E(~Ryi)=o

(2.3)

because a non-zero E(~ e yi) belongs in y~ if known and in 'rsy i if unknown. Because of Eq. (2.3), the D × D variance matrix V of 8 Ry has the entries

Vij = E(8 R yi~ R y i)

(2.4)

Given Y0, given the confinement set Vs for 8sy and given the probability distribution "OR for gnY, we might try to predict the value of z e = G(xe). Such an attempt is doomed unless G = H C ) F for some function H: Y ~ Z (Backus and Gilbert, 1967). Usually there is no such H and doom is avoided by including in the inversion some prior information about x E, information available to us beyond what is conveyed by Yo, Vs and "OR"This information usually takes the form of a hard bound or a soft bound on x E in X. Some workers also admit 'firm' bounds, hard bounds that are imprecisely known. We will not do so, because our prior information is often imprecise. Any inversion must include a verification that the results are insensitive to geophysically acceptable changes in the prior information. In the problem of geomagnetic inference at the CMB the model space X consists of all magnetic fields B K outside the CMB whose sources lie inside the CMB. The true Earth model x e is simply the true B K. The data vector y consists of D elements of the geomagnetic field B measured at satellite

G.E. Backus / Physics of the Earth and Planetary Interiors 98 (1996) 101-142

altitudes or on the Earth's surface, and includes measurement errors as well as fields produced by currents and magnetization in the mantle and currents in the magnetosphere. The prediction vector ze might consist, for example, of P coefficients in the spherical harmonic expansion of the magnetic scalar potential for B K. It could also include some fluxes through null-flux curves on the core-mantle boundary (CMB). Henceforth we will use ( X , F , G ) as the label for the inverse problem we have just described. This label is convenient but incomplete. It makes no mention of the error bounds or the prior information, both of which are essential parts of the problem. For example, the prior information about x e often determines what properties of x E can be estimated from the data (Backus, 1989). We will call the inverse problem ( X , F , G ) linear if X is a real vector space and F and G are linear functions. Whether ( X , F , G ) is linear or not, our goal is to find an estimate z 0 for zt- and to describe the error in that estimate.

3. Trimming inverse problems Making heavy use of the notation listed in Appendix C, the present section describes a formal solution to the inverse problem ( X , F , G ) set forth in the preceding section. Some of the steps in this formalism involve delicate questions of functional and numerical analysis. The later parts of the present paper discuss those questions for linear problems, but for non-linear problems we offer only a description of what to compute, not how to compute it. Our formalism focuses on truncation and approximation, as we believe these are the crucial issues in modeling. If treated properly, they produce an inversion scheme that might satisfy both Bayesians and frequentists, and that can supply error estimates based on either hard or soft prior bounds. The formalism depends on finding what we will call a trimming of the inverse problem (X,F,G). A trimming of ( X , F , G ) is an ordered quintuple (XT,Pr,FT,QT). The first entry, X r, is a subset of a finite-dimensional topological space. It is often, but need not be, a finite-dimensional subspace of the model space X. The last four entries of the quintuple

105

are functions, whose domains and codomains are as follows:

P r : X ~ Xr

(3.1a)

Fr:X r ~ Y

(3.1b)

Qr :Y -+ YT where Yr = FT(XT)

(3.1c)

G r : X r --+ Z

(3.1d)

We put no restrictions on these functions except to demand that they be continuous, that

QTIYr = Iy,

(3.2a)

and that F r have a continuous inverse,

F r l : Y r --+X r

(3.2b)

It should be noted that Eq. (3.2a) implies

QT©QT = Qr

(3.2c)

The formalism does not require it, but we lose no generality in assuming that

XT = PT( X )

(3.2d)

We will call X T the trimmed model space, F r the trimmed data function and G r the trimmed prediction function. We will call P r the model-space trimmer and Qr the data-space trimmer. We call (XT,PT,Fr,QT,G T) linear if ( X , F , G ) is a linear problem, X r is a linear space, and PT,Fr,Qr,GT are linear functions. Although the prior information about x e is not explicitly listed in labeling the inverse problem as (X,F,G), that information usually plays a crucial role in constructing useful trimmings. For example, if X is the space of magnetic fields B with sources only in the core and the data are intensities [BI, then any field B that fits the data will have a twin, - B , that also fits the data. Unless we have prior information about the direction of B (e.g. that the dipole moment is positive) we cannot construct an invertible F r or a trimming. (This particular inverse problem has another, more serious, non-uniqueness whose remedy is at present unknown (Backus, 1970d).) A trimming is useless without an error estimate ~z for the candidate z0 with which the trimming replaces the true prediction vector ze. In addition to the random and systematic errors produced by the data, the prediction error ~z includes a part induced by the finite dimensionality of X r and the resulting

G.E. Backus/ Physics of the Earth and PlanetaryInteriors 98 (1996) 101-142

106

consequences, X r -~ X, F r ~ F and G r 4=G. To describe the part of the prediction error produced by these approximations, we define

BrY = f r ( x e ) ,

Brz --- g r ( x E )

(3.3a)

where

Now ze = G(xE), so Eq. (3.3a) Eq. (3.3b) and the foregoing equation imply that

zL = Hr( Y o - BRY-- BsY + BrY) -- BTZ where

Hr= GT©F ~OQ T

f r = F r O P T - F,

gr

=

GrOPr-

G

(3.3b)

These definitions are equivalent to the statements Bry : F r ( P r ( x e ) )

- F-

(xe),

(3.6a)

(3.6b)

We can rewrite these results as ze = Z0 - Bz

(3.7a)

if we define

BTZ = Gr( er( xe) ) - G( xE)

Zo = Hr(Yo)

If our prior information is a hard bound on x e, say

and define

x E E UE

B z = Hr(Yo) - H r ( y o - B e y - Bsy + Bry ) + BTZ (3.7c)

(3.4a)

then from Eq. (3.3a) we have hard bounds for Bvy and BrZ, namely

B r y E Vr = f r ( U e ) ,

B r z ~ Wy = g r ( U e )

(3.4b)

If our prior information is a soft bound on x e, a probability measure ( / % , Z x ) for x e in X, then txe:'Zx---> R and we have soft bounds on Byy and BTZ. These are probability measures (rIr,Ey) and (;2r,Zz) for Bry in Y and Brz in Z. They are defined as

nr = /J,EOfTr 1,

~'T = "YE(DgT 1

(3.5)

AS Bry and Brz are both functions of the random variable x e, they are themselves random variables but will not usually be independent. From any trimming of the inverse problem (X,F,G) one can construct a formal solution to that inverse problem as follows. As y e = F ( x e ) , Eq. (2.1) Eq. (3.3a) and Eq. (3.3b) imply

F r O P T ( XE) = YO -- B~y -- Bsy + Br y By Eq. (3.2a), Q r O F r = Fr, so

Fr O Pr( x E) = QT( Yo - B e Y - BsY + BT Y ) But F ~ : Y T ~ XT exists, so the foregoing equation implies P r ( x E ) = F r l O Q T ( Yo - BRY-- BsY + Br Y)

There is an intimate connection between trimming and modeling. If

Z = Z r = X T and G = G r = Ix~

(3.8)

then the prediction ze is a member of the finite dimensional substitute X r for the model space X. Once this trimming and its errors have been calculated, its estimate z0 and the hard and soft bounds on its error Bz can be used to make any other prediction G'(x E) for which X r provides enough relevant parameters. Thus, modeling is the special case of trimming in which Z, Z r, G and G r are given by Eq. (3.8). Eq. (3.6a) and Eq. (3.6b) or Eq. (3.7a) Eq. (3.7b) Eq. (3.7c) constitute a formal solution to the inverse problem (X,F,G). Under certain conditions that formal solution offers an opportunity to use the data to test the hypothesis that "% is the correct probability measure for Bey. Those conditions are that Qr be linear and not surjective and that Bry and Bsy be negligible in comparison with BRy. To test the hypothesis, it should be noted that, as YE + BrY = F r O P r ( x ~ ) = QrOFTOPT(XE), if follows from Eq. (3.2c) that

(Iy - Qr)(hye + Bry ) = 0

(3.9a)

Invoking Eq. (2.1) and the assumed linearity of QT, we conclude that

Therefore

G r O P r ( xe) = GrOFrJOQr(Yo-

(3.7b)

BRY-- BsY + BrY)

( I y - Qr)( Yo) = ( l Y - Qr)( BRy + Bsy + Bry ) (3.98)

G.E. Backus / Physics of the Earth and Planetary. Interiors 98 (1996) 101-142

If ~sY and STy can be neglected in Eq. (3.9b), then the right-hand side of that equation reduces to ( I f QT)(~ny). This latter vector is drawn at random from the subspace ( I f - Q r ) ( Y ) of Y, and its probability measure on that subspace should be ~TeO = "r/eC)(Ir- Q r ) - '

(3.10)

if r/R really describes 3eY. Given r/n, we can calculate ~RQ from Eq. (3.10) and then use various tests (Kendall and Stuart, 1979, Ch.30; see also Backus, 1989) to compute the probability that ( I t - Qr)(Yo) could have been drawn at random from the distribution Eq. (3.10). It remains to understand and control the size of the error 8z in Eq. (3.7c). For non-linear problems, such estimates can often be based on the modulus of continuity of H T (Graves (1946), p. 116, extended to metric spaces) and on how closely F ~ © P r and GT(DP y resemble Q r O F and G, respectively. For linearizable problems we will discuss these estimates in some detail in later sections. The rest of this section examines in general how objectivists and subjectivists might use Eq. (3.6a) and Eq. (3.6b), and Eq. (3.7a) Eq. (3.7b) Eq. (3.7c). Objectivists have only two cases to consider: is the prior bound on x e hard or soft? First, let us suppose it hard, as in Eq. (3.4a), so that ~TY and ~ r z are systematic errors with confinement sets Eq. (3.4b). Objectivitists would convert the soft bound on 8RY to a hard bound on zE as follows: they would choose a small positive number e n and a Borel set Vn of Y as small as possible in some sense (perhaps in diameter if Y has a norm) but large enough to assure that "qR(VR) > 1 -- ~g" If ~RY ~ VR then Eq. (3.6a) implies that ze~Hr(Yo-

V e - Vs + Vr) - W T

(3.11)

If Eq. (3.11) is false, then an event (SRY ~ VR) has occurred whose probability is less than En. That is, Eq. (3.11) is true at confidence level at least 1 - eR" If the data arise from aspects of x e that are relevant to the predictions, then an adroit choice of a trimming will make the confinement set Eq. (3.11) for ZE small enough to be interesting. In passing, we note that the confinement set Eq. (3.11) can be replaced by the smaller set consisting of all those z E Z that can be written H r ( Y o - ~ R Y - ~sY + f r ( x ) ) gr(X) with 8 R y ~ VR, ~ s y ~ Vs, and x ~ UE.

107

For an objectivist the second case is that the prior bound on x E is soft, a probability measure ( IzE,E x) locating x e in X. Now the objectivist chooses two small positive numbers, ER and EE and two subsets VR C Y and UE C X , as small as possible in some sense but large enough that rlR(VR) > 1 - - ~ R and p,E(UE) > 1 -- EE. The objectivist would use this UE in Eq. (3.4b) to define V r and W T. If ~R Y ~ VR and x E E Ue, then Eq. (3.11) holds. Otherwise, either 8RY ~ VR or x E ~ UE or both. The probability of this union of two events is at most the sum of their separate probabilities, ee + EE" ThUS Eq. (3.11) is true at a confidence level of at least l - e R - E E. Again, as in the preceding paragraph, the confinement set Eq. (3.11) can be slightly improved. For objectivists it remains to say what 'small' means when applied to VR or Ue. If dim Z = l, the answer is simple: for a given eR (or e R and ~E) we choose VR (or VR and UE) so as to minimize the diameter of the set Eq. (3.11). Usually that set will be an interval, and then its length should be minimized. If dim Z > l, the situation is less simple, and Stark (1992a) and Donoho (1994) have discussed various optimizations in some detail. Subjectivists will approach Eq. (3.6a) and Eq. (3.6b) and Eq. (3.7a) Eq. (3.7b) Eq. (3.7c) differently, to exploit their view that they can describe their beliefs about zE by means of a personal probability measure (~'E,Ez) on Z. A naive subjectivist approach to Eq. (3.6a) and Eq. (3.6b) is to convert all the hard bounds to probability distributions and then to find (E from standard probability calculus. This naive method fails because 8sY and x E belong to spaces of high dimension, so softening their hard bounds adds spurious 'information' (Backus, 1988b; see also Appendix B). The naive approach does succeed when the systematic errors are negligible in comparison with the random errors. We give two examples. First, let us suppose ~sY, ~TY, and ~TZ are all systematic errors, but are negligible in Eq. (3.6a). Then that equation can be written approximately as ZE = HT(Ye) where YE = Y0 - ~RY

(3.12)

Subjectivists, unlike objectivists, would be willing to regard YE as a random variable. In principle, it can take only its true value, but a subjectivist would describe his personal opinion about it by means of a

108

G.E. Backus / Physics of the Earth and Planetary Interiors 98 (1996) 101-142

personal probability measure ('oe,£r) on Y. Most subjectivists would obtain this Be from the error distribution "OR by requiring for each Borel subset V of Y that

~Te(V) = "OR(Yo-- V)

(3.13a)

The argument for Eq. (3.13a) is that because of Eq. (3.12) the two events Ye E V and 8Ry e y o -- V are the same, and so must have the same probability. Applying this argument to zE in Eq. (3.12) would lead subjectivists to adopt ~E = "OEOHT 1

(3.13b)

as the probabilistic description of their belief about where zE lies in Z. In the second subjectivist example without hard bounds, let us suppose that 8sy is negligible in Eq. (3.6a) and that the prior information is a soft bound, a probability measure (tze,~, x) on X. Using Eq. (3.3a) and Eq. (3.3b), we write Eq. (3.6a) as

ze=Hr(Yo--~Ry+fr(xe))--gr(Xe)

(3.14)

The right side of Eq. (3.14) is a function of Y0, 8R Y and x e, so we write it hr(Yo,SRY, Xe). Then hr(Yo,. ,. ):Y × X ~ Z. Subjectivists would probably regard 8Ry and x e as independent random variables, so their joint probability measure on Y × X would be the product measure "OR/ze of "OR and /zE (Halmos, 1950). Then subjectivists would choose

~e = ( "ORiXe)O hr( Yo," ,. ) - i

(3.15)

If neither random nor systematic errors are negligible and the inverse problem is nonlinear, we have no advice for subjectivists except to linearize (vide infra) or to become temporary objectivists. If H r in Eq. (3.6a) and Eq. (3.6b) is linear or linearizable, error estimation from Eq. (3.7a) Eq. (3.7b) Eq. (3.7c) is much simplified for both objectivists and subjectivists. If ~RY, 8sY and ~TY are small enough and DH r is the gradient function of H r at Y0, then to first order in the errors we can write Eq. (3.7c) as

8z = A r z + 8RZ + 8sZ

(3.16a)

where

8~ z = OHT( SR Y)

(3.16b)

8sZ = DHT( Ssy )

(3.16c)

ArZ = ( gr -- D H r O f r ) ( xe)

(3.16d)

Here fr and gr are given by Eq. (3.3b), and if H r is linear then DH r = H r. Clearly, 8sZ is a systematic error with confinement set DHT(Vs), Vs being as in Eq. (2.2), and 8RZ is a random error with probability measure

~e = "ORO(D H T ) - '

(3.17)

'OR being the probability measure for 8Ry. Finally, if the prior information about x E is the hard bound Eq. (3.4a), then ATZ is a systematic error with confinement set ( g T - DHT Ofr)(Ue)" If the prior information about x E is a probability measure (/xE,£x), then A r z is a random error with probability measure ~'r = / x E O ( g r -- DHTOfr)-1 When the linearization Eq. (3.16a) Eq. (3.16b) Eq. (3.16c) Eq. (3.16d) is possible, if we estimate ze as Hr(Y0), we commit an error 8z that is the sum of one random and one systematic error. If the prior bound on x E is hard, the random part of 8z is 8RZ and the systematic part is ATZ + 8sZ. The confinement set of the latter is ( g r - D H r O f r ) ( U e ) + DHT(Vs). If the prior bound on xE is soft, the systematic part of ~z is 8sZ and the random part is ATZ + 8Rz. It seems reasonable to suppose that ATZ and 8RZ are independent, so the .probability measure of their sum is the convolution of their probability measures (Kendall and Stuart, 1977, p. 200ff.). When H r is linearizable, the fact that 8z is the sum of a random and a systematic error somewhat simplifies the objectivist's calculation of a confidence set. Linearizability has a more profound effect for the subjectivist. If dim Z > 3, Appendix B suggests that subjectivists seeking a wide audience will want to report the error in 8z as the sum of two errors, a random error with known probability measure and a systematic error with known confinement set. Imitating the systematic error with a subjective probability measure when dim Z > 3 will introduce spurious information. Only if dim Z < 2 will subjectivists want to soften the confinement set for the systematic error in z to a probability distribution and obtain a probability measure for 8z as the convolution of two random parts. Only when H r is linearizable and subjectivists have softened the systematic error in z can they speak sensibly of the correlation between the errors in two predictions. Objectivists would

G.E. Backus / Physics of the Earth and Planetary Interiors 98 (1996) 101-142

never do so except when all systematic errors were negligible. Trimming tries to treat subjectivists and objectivists even-handedly, but in one sense it does favor the latter. The estimate of z0 given by Eq. (3.7b) is determined entirely by the data and not at all by the prior information. Prior information is used only to confine errors. In a successful measurement this policy sacrifices little accuracy (Jackson, 1979) because Qr(Y0) will lie well within the confinement set or a not very large multiple of the variance ellipsoid of F r O P r ( x e) (see, for example, Section 9). In the contrary case, if the prior information is taken seriously, then the probability of observing the actual data vector Y0 is very low, whether calculated by subjectivist or objectivist techniques. Some subjectivists would claim to be unconcerned with this problem and would use the Bayesian calculus (Jackson, 1979; Backus, 1988a) to combine the two probability measures "qn and /zR©F -~, where P'R was either an a priori probability measure for x e or a measure obtained by softening a hard bound on x e. I f / x n is an objective prior probability measure based on prior data about x e, or if P'R genuinely describes the subjectivist's prior personal opinion, then Appendix B will not dissuade the subjectivist from this Bayesian course. What might do so is the compromise policy proposed in the present paper, a policy that may be acceptable to both subjectivists and objectivists: if the prior information renders t Y0 very improbable, then either the prior information or the data vector is suspect, and no good can come of an inversion that tries to combine them.

4. Natural dot products on X, X r and Y

In every inverse problem the probability measure r/R for the random error in the data vector generates a dot product on the data space Y. In certain inverse problems the prior information about the correct Earth model x e generates a dot product on the model space X or the trimmed model space X r. When available, these dot products are powerful tools for estimating the prediction error Eq. (3.17). Therefore, we briefly examine all three dot products, following Backus (1989), who discussed two of them

109

at length and derived the machinery for obtaining the third. We begin with the dot product on X. This is available when X is a real vector space and the prior information is a quadratic bound for the correct Earth model xE: (4.1)

Q( x e , x e ) < q

Here q is a known positive real number, and Q is a known symmetric, positive-definite bilinear form on X. That is, for any Xl,X 2 E X, Q ( x l , x 2) is a real number that depends linearly on each of x~, x 2 when the other is fixed; also Q ( x l , x 2) = Q ( x 2 , x I) and, if x 4= 0, then Q ( x , x ) > 0. Given a quadratic bound Eq. (4.1), we define the dot product of any x~,x 2 E X to be (4.2)

x, . x 2 = q - l a ( x l , x 2 )

We write the norm for this dot product as II xll, so

Ilxll = ( x . x ) v2

(4.3)

In terms of this norm, the prior quadratic bound on x E is simply

Ilxell < 1

(4.4)

If X is not complete in the norm Eq. (4.3), we complete it, so that it becomes a real Hilbert space with dot product Eq. (4.2) (Halmos, 1951). If dim X were finite, this step would be unnecessary; finite dimensional spaces are always complete, i.e. Hilbert spaces, and all their subspaces are closed. Every closed subspace U of X has an orthogonal projector H v : X - - * U. When the prior information about x E is a soft bound, a probability measure P'e on X, we might hope to use /xe to generate a dot product on X by asking that, under this dot product, the variance tensor for/z e be the identity tensor. This amounts to asking that for any fixed wj and wz in X e [ ( w ~ - , ~ ) ( w 2. w)] - e [ w , . = w,'w 2

w l e [ w 2. w]

(4.5)

Backus (1989) has shown that if dim X were finite, then for each /xE there would be a unique dot product on X that satisfied Eq. (4.5). Unfortunately, dim X = ~c, so various seemingly reasonable additional conditions on /z E (e.g. normality) imply that no dot product on X exists that satisfies Eq. (4.5) (Backus, 1988b).

110

G.E. Backus / Physics of the Earth and Planetary Interiors 98 (1996) 101-142

As we cannot always use a soft bound on x e to construct a natural dot product on the full model space X, we do what we can: we use that soft bound to construct a natural dot product on any linear trimmed model space X r. We let (X r, P r , F r , Q T , G r ) be any trimming of the inverse problem ( X , F , G ) , and we the soft prior bound on x e be the probability measure/z e on X. From /.te we define a probability measure/z r o n X T as

iXr = i z e O P ~ l

(4.6)

We can use /x r to calculate expected values of functions on X r. As X r is finite-dimensional, if it is a vector space then the result of Backus (1989) shows that on X r there is a unique dot product satisfying Eq. (4.5). Finally we turn to the dot product on Y. As already remarked (Backus, 1989), there is exactly one dot product on Y relative to which the probability measure r/e for the random error By e satisfies Eq. (4.5). It will be useful to construct this dot product explicitly in terms of the error variance matrix V defined by Eq. (2.4). We adopt the Einstein summation convention: in a term or product of terms if an index appears once as a subscript and once as a superscript, then a sum over all possible values of the index is understood. We will assume that no linear combination of y t , . . . ,yO can be measured with perfect accuracy. This implies that if a~, • • • ,a o are any real numbers not all zero, then a i a j V ij-~ aiajE[BR yiBR y.i] = E[(aiB R yi)(ajBe y j)] = E[(a~6Ry~)2]>O. Therefore the matrix V of Eq. (2.4) is positive definite. It is obviously symmetric, so it has a positive-definite symmetric inverse, V -~ . If 77B, the probability measure for Bny, were gaussian, its density function would be a constant times e x p ( - y i V g l yJ/2). This suggests introducing the following dot product on Y even when r/e is not gaussian: if u = (ul, . . . ,u D) and v = (vl, . . . ,v ° ) then

u . v = uiV~ l v i

(4.7)

The fact that V-J is real, symmetric and positive definite assures that Eq. (4.7) does define a dot product. We define the norm Ilyll for this dot product as in Eq. (4.3). As dim Y < ~, Y is complete in this norm, and every subspace U of Y has an orthogonal projector FIv: Y ~ U.

It remains to show that the dot product Eq. (4.7) satisfies Eq. (4.5). As E [ u • BRy] = O, to prove Eq. (4.5) we want to establish that for any u , v ~ Y E[(U'BRy)(u"

BRy)] = U ' V

(4.8)

To prove Eq. (4.8), we note that ( u . 8 R y ) ( u . BRy) = (uiVglBey:)(geykV[tlvl), so E [ ( u " B e y X v . Bey)] = Ui Vi:~E(BRyJBRyk)V~ I ul = u i v i j l v J k V k l 1Ul = uiVi7 t v t = u • V. In the language of Cartesian tensors (Jeffreys, 1969; Backus, 1986) Eq. (4.8) says that under the dot product Eq. (4.7) the expected value of the dyad B R y B R y is the identity tensor on Y. As the dot product on Y measures the length of the data vector y in units of the random error Be y, it seems intuitively obvious that a larger random error will produce a smaller length (norm) for the data vector. Later we will need this 'obvious' fact, so here we give a proof. Let us suppose that Bey = BRy j + BRy2, where Bey ~ and Bey 2 are independent random variables with mean 0 and variance matrices V~ and V2. If V is the variance matrix of BRy, then, of course, V = V~ + V2. If Ilyll and Ilyt[i are the norms on Y produced by V and V~, we claim that -

I[yll < Ilylll for all nonzero y ~ Y We need to show that if y ~ Y y V - ~ y r < y V ? lyr, or

y(V; 1 - V-I)y r> 0

(4.9) and y # 0

then (4.10)

Here y r is the column vector obtained by transposing the row vector y. Every positive definite symmetric (pds) matrix has a pds inverse and a unique pds square root, so we can define the D × D pds matrix A = V~- t/2 VV~- ~/2. For any nonzero vector y ~ Y, we set u = y V ~ J / 2 A - ~ / 2 V ~ / 2 . Then a simple calculation shows that y(V~~ - V - t ) y r = u(V - V ~ ) u r. However, V - V ~ = V 2. As V 2 is positive definite and u 4: 0, we have Eq. (4.10). That inequality was stated b y , but their proof is incorrect. The infinite series in their Eq.NNN (A5) need not converge. As an application of the dot products defined above, we show that when the model space X has a dot product and the data function F: X ~ Y is linear, the quadratic regularization of the inverse problem (Tikhonov and Arsenin, 1977) can be viewed as a trimming that satisfies Eq. (3.8); i.e. as a model.

G.E. Backus/ Physics of the Earth and Planetary Interiors 98 (1996) 101-142

Quadratic regularization seeks the x 0 ~ X that minimizes the penalty function

p ( x ) = 02llxll 2 + l i E ( x ) - y 0 l l 2

(4.11)

From Eq. (4.15b) a simple calculation shows that ( Fr)rO Fr is positive definite, so that F r ~:Yr--" XT exists. Applying this inverse to Eq. (4.15a) gives

The real constant 0 is the regularization parameter. Setting 0 = 0 reduces the regularization to a leastsquares fit. There seems to be no general agreement about how to choose 0, so for the moment we simply take it as given. Later we shall see that trimming suggests how to choose it. We permit but do not require that X be finite dimensional. Regularization is often used with finite dimensional model spaces, indeed with dim X < dim Y. To see that the minimizer x 0 of Eq. (4.11) can be obtained by trimming an inverse problem, we define

Xo = FTIOQT( YO)

XTF-'({O}) ± , YT=E(XT),

Then Eq. (4.15c) can be written

QT=Hv : Y + Y T (4.12)

It is clear from the Pythagorean theorem that the minimizing x 0 belongs to X r. Moreover, as IIF ( x ) - Y0112 = l i E ( x ) - Qr(y0)[I 2 + IlOr(yo) - y01[2, therefore x 0 is the vector x in X r that minimizes

q(x)=O21lxll2 + l l F ( x ) - O r ( Y o ) H 2

(4.13)

It is well known that this minimizing x o makes q(x o + ~x) - q(x o) vanish to first order in the variable ~ x ~ X r, and hence x 0 is the unique solution in X T of the linear equation

[021x + ( F I X r ) r o ( F I X r ) l ( x o ) = (FIXr)rOQT(Yo)

(4.14)

The regularizing Eq. (4.14) is not in the form of a trimming. To make it so requires a careful study of the function (FIXT):XT~ Yr" This function is an injection by the definition of X T and a surjection by the definition of Yr. Therefore, the inverse function (FIXT)-~:YT~XT exists. Moreover, Yr is a subspace of the finite dimensional space Y, and so is finite dimensional. As (FIXT):XT~ Yr is a linear bijection (an isomorphism), X r is also finite dimensional, and dim X T = dim Yr. Therefore (Appendix C), (FIXT) -T exists, and we can apply it to Eq. (4.14). The result is ET( XO) = QT(YO)

(4.15a)

where

F T = 02(F[XT) -r + ( F [ X T )

(4.15b)

111

(4.15C)

Unlike Eq. (4.14), Eq. (4.15c) does come directly from a trimming (XT,PT,FT,QT,G r) of an inverse problem (X,E,G). To produce the inverse problem and its trimming, we define

Z=

ST,

G = Ix,

(4.16a)

and PT=IIx :X--~XT,

GT=G

Xo = GT©FT '©Qr( Yo)

(4.16b)

(4.17)

Thus x 0 is the prediction z0 that the trimming generates via Eq. (3.6b) and Eq. (3.7b). As quadratic regularization is a trimming, we can use the error analysis for trimmings to describe the errors committed by quadratic regularization. The regularization parameter 0 will presumably be chosen to minimize those errors. However, regularization by itself does not supply enough information to find the errors. Usually x 0 is a finite dimensional approximation to part of a correct infinite dimensional model x E, and without prior information about x E we have no way to estimate how far x 0 may be from what we really want to know. We defer the complete discussion of this issue to a later section.

5. Data compression When the data are very numerous, it is common practice to combine them in various ways so as to generate a data vector ,7 of lower dimension with smaller variance. The new data will be y ~ = 171(y),... ,yg = l~g(y), where I~J:Y-->R for 1 < j < M. In this paper we consider only linear functionals /~i. It is pointless to include among them one that is a linear combination of the others, so we shall assume that { / ~ l , . . . ,/~M} is linearly independent. Therefore M < D. We call the process of computing .~=(y/,...,yg) from y = ( y l , . . . , y O ) a 'com-

G.E. Backus / Physics of the Earth and Planetary Interiors 98 (1996) 101-142

112

pression' of the data. Unless M = D, the compression is irreversible; y cannot be recovered from y. We let 17 be the vector space of real M-tuples, and we define the compression function K:Y ~ 17 by /~(y) = ( / ~ l ( y ) , . . .

,iT M ( y ) )

(5.1)

From the original inverse problem (X,F,G) the compression g produces a new, compressed inverse problem (X,F,G) whose data space is 17. In this new problem the original data function F, the original probability measure 7/8 on Y for the random error 8RY, and the original confinement set Vs for the systematic error 8sY are replace by the new, 'compressed' versions

f =KC)F,

~IR= TqRC)I(-1,

~Ts=l((Vs)

As {Kl, - - ' ,K M} is linearly independent, dim U = M and dim Uj= M - 1. Therefore dim(U1 z A U ) = 1, so Uj a f3 U contains exactly one vector whose dot product with K j is unity. We call this vector Kj. The ordered sequence ( K l , . - . ,K m) is the 'dual sequence' of ( K ~ , . . - , K M ) . It satisfies and is uniquely determined by two conditions: K j E U for 1 < j < M

(5.7a)

K/' K j = 5iJ

(5.7b)

(Hereafter 5], ~}, ~ii and ~ij all denote the Kronecker delta, unity if i = j and zero if i -# j.) The fact that Eq. (5.7a) and Eq. (5.7b) uniquely determine the dual sequence shows that the dual sequence of

(5.2a)

(K,,"" ,K m) is (K','" ,KM). If (KI, "'' ,K M)

In this equation K - ~ is not the inverse of Eq. (5.1). Rather it is the set function defined in Appendix C just before Eq. (C.2). If (Xr,Pr,Fr,Qr,G r) is a trimming of ( X,F,G) and

is orthonormal, then K i • K j = 8 ij, so uniqueness in Eq. (5.7a) and Eq. (5.7b) shows that an orthonormal sequence is self-dual. Both {Kl, . . - ,K M} and { K j , . . . ,K M} are bases for U. Therefore, from Eq. (5.7a) and Eq. (5.7b) it follows that if u ~ U then

/~T = / ( O F T ,

0T = / I O a r

(5.2b)

then (Xr,Pr,,Sr,Q.r,Gr) is a trimming of (X,F,G). The random errors in the compressed data are 8Rya=Ig, i(gRy), and we can imitate Eq. ( 2 . 4 ) b y defining ij = E( ~R yi~R Y J)

(5.3)

where now ~7 is an M × M positive definite symmetric matrix. Then, imitating Eq. (4.7), we can define a dot product on 17. If fi = ( t 1, • - • ,tiM) and = ( ~ l , . . . ,~M) are any vectors in 17, then their dot product is fi-- ~ = t i ~ 5 1 U

(5.4)

To use /( we must understand how the dot product Eq. (5.4) on 17 is related to the dot product Eq. (4.7) on Y. The answer hinges on dual sequences. As Y is a finite-dimensional dot product space and I(J:Y.--~R is a linear functional on Y, there is a unique K -/~ Y such that g J ( y ) = K j .y for all y ~ Y

(5.5)

(Halmos, 1958). We define U = span{ K ' , ' "

,K M}

Uj= s p a n [ { K ' , - . . ,K m} \ { K J } ]

u = ( u . K j ) K j = ( u. K J ) K j

(5.8a)

If v E Y, then dotting v into Eq. (5.8a) gives

u.v=(u.Kj)(K j.v)=(u.Kj)(Kj.p)

(5.88)

By symmetry, Eq. (5.8b) is also true if u ~ Y and v ~ U. If we set u = K~ and v = K k in Eq. (5.8b), then Eq. (5.7b) implies ( K i • Kj)( K j. K k) = ~

(5.9a)

If we set u = K i and v = K k in Eq. (5.8b), then Eq. (5.7b) implies

( K i. KJ)( Kj • Kk) = 8~

(5.98)

Therefore the two M X M matrices whose entries are K i . K ~ and K ~. K j are inverse to one another. It follows from Eq. (5.8a) that P u : Y ~ U, the orthogonal projector of Y onto U, can be calculated in several ways:

Pu( Y) = ( y" Kj) K j = ( Y " KJ)Kj = I(i( y ) K j (5.6a) (5.68)

(5.10) We can use this fact to describe the effect of data

G.E. Backus / Physics of the Earth and Planetary Interiors 98 (1996) 101-142

compression on the dot product. This will involve showing that g ( V ~) = 0

(5.11)

and that KIU is an isometry between U with dot product Eq_ (4.7) and 17 with dot product Eq. (5.4). That is, K I U : U ~ 17 is a bijection for which, if u,v ~ U, then K ( u ) ' - / ( ( v ) = u- v

(5.12)

To prove Eq. (5.11), we observe from Eq. (5.10) that for I < j < M we have g i ( p v ( y ) ) = K i . p v ( y ) = l~J(y)(Ki. K ) = l(Y(y)~. = g'(y), so l?,(Pv(y)) = g(y). As this is true for all y E Y, g©Pv = g

(5.13)

from which Eq. (5.11) follows immediately. To prove Eq. (5.12), we note that K ( u ) = ( / ( l ( u ) , • • • ,/~M(u)) = (K 1 • u, . . - ,K M • u) and similarly for K(v). Then by Eq. (5.4)

/~(U) "7/~(V)

(5.14)

=(Ki.u)Vijl(KJ.v)

From Eq. (5.3) and Eq. (5.5) ~7,?'= E[(K i . ~Ry)(K ~ • 8Ry)]. Therefore, by Eq. (4.10) (5.15a)

~ij = K i "K j

It follows from Eq. (5.9a) Eq. (5.9b) and Eq. (5.15a) that

Vi-j' = Ki" K~

113

compressed data vector 'is' a vector in U and vice versa. Because of Eq. (5.12) and Eq. (5.13), we note finally that

Ilg(y)ll

<-Ilyll if y ~ Y

(5.16)

6. Trimming linear inverse problems that have dot products In this section we suppose that the inverse problem is linear, that the the model space X is a Hilbert space, that the prior information is Eq. (4.4) and that the data space Y has been provided with a dot product as in Section 4. We also suppose that the data function F and the prediction function G are continuous. A discontinuous linear function on X is unbounded (Halmos, 1951), so arbitrarily small changes in its argument can produce arbitrarily large changes in its value. If F or G is discontinuous, then a stronger prior quadratic bound for x e is needed (Backus, 1989) for a sensible inversion. In the class of problems just described, there is a tool ready to hand for constructing a large class of trimmings. We call this tool an invertible truncation of (X,F). It is a function 12 whose domain is the set of pairs ( a , / 3 ) of real numbers such that

(5.15b) 0 < a < fl

(6.1a)

Thus

/~(U) T/~(/.y) : ( /g. K : [(u

i) ( K i . K j ) ( K j" v )

,,)]

As u,v ~ U, Eq. (5.8a) shows that this last term is u . v. Now we have proved Eq. (5.12). However, if K ( u ) = 0 and u ~ U then Eq. (5.12) implies that u = 0. Therefore, IT,Iu:u --+ 17 being linear, it is an injection. Then it is a surjection because dim U = dim 17. One way to interpret Eq. (5.11) and Eq. (5.12) is as follows: when we compress a data vector y = ( y l , . . . ,yO) to a vector .7 = ( y l , . . . ,yM) where y i = l ~ i ( y ) = K i . y , we lose all the information in Pv~ (y). The compressed data vector y can be thought of as a vector in U, namely y,iK r In this way, every

For any such pair, the function value 12(a,fl) is a finite dimensional subspace of X with these properties:

liE( x)ll >_ c~llxll if x ~ O( a ,/3 )

(6.1b)

IIF( x)ll _
(6.1c)

x ~ 1 2 ( a , / 3 ) J-

We show first how to construct an invertible trucation and then how to obtain trimmings from it. We begin by noting that an invertible truncation of ( X , F ) always exists for any linear inverse problem (X,F,G) in which X is a Hilbert space and Y is a finite dimensional dot product space. As F: X ~ Y is continuous, it is bounded (Halmos, 1951). That is, there is a number M such that IlF(x)tl _< Mllxll for all x ~ X . The smallest such M is written ItFII and

G.E. Backus / Physics of the Earth and Planetary Interiors 98 (1996) 101-142

114

called the bound of F. By definition, IIFII is the smallest real number such that

I[F( x)ll _< [[FIIIIxll for all

x~ X

(6.2)

As F is bounded and Y is finite dimensional, F has a singular value decomposition (Backus, 1989). That is, there are orthonormal bases (~1,~2 . . . ) and ( . v l , ' " ,.vo) for X and Y and non-negative real numbers h ~ , . . . , h o such that

F(.~,) = h,.~ n if 1 < n _
(6.3a)

F("~n) = 0 if n > D

(6.3b)

(Note that on the right in Eq. (6.3a) there is no implied sum on n because in both its appearances n is a subscript.) If a > An for all n, we define $2(a,/3) to be the space containing only 0. Otherwise, we define O ( a , / 3 ) to be the subspace of X spanned by those x, for which An _> a. To verify Eq. (6.1a) Eq. (6.1b) Eq. (6.1c) it should be noted that if x E X then oc

x = Y'~ (X'.~n).~ .

(6.4a)

n=l SO D

F(x) :



( x "'rn) An-Vn

(6.4b)

n=l

and D

IIF(x)ll2= ~

,~2n(X'Xn) 2

(6.4c)

n=l

If x ~ O ( a , / 3 ) , then x . . ~ . = O if either n > D or An < a,

so D

IIF(x)ll2_> c~2 ]~ (x.~,,)z = o~2llxll2

(6.4d)

n=l

Hence $2(a,/3) satisfies Eq. (6.1b). For Eq. (6.1c), let us suppose x ~ g 2 ( a , f l ) -L. Then x - ~ , = 0 if

n <_D and h.>_ a, so D

IIF(x)ll 2<~z E (x'~n)2<-~Zllx[I 2

(6.4e)

n=l

As a
in the preceding paragraph is an inauspicious beginning for this enterprise, as constructing it requires computing the singular value decomposition of an z~ X D matrix. In practice, of course, such an infinite process requires some sort of analytic justification of a finite approximation. In many inverse problems our knowledge of the data function F provides such analytic information in the form of a 'simple truncation', a tool for constructing invertible truncations. A simple truncation of (X,F) is a function ~ whose domain is the set of all positive real numbers, For any real 7 > 0, ~ ( 7 ) is a finite dimensional subspace of X such that

x)ll _< "/11xlI if

IIF(

x E ~(y) ±

(6.5)

That simple truncations of (X,F) exist is trivial: if J2 is an invertible truncation of (X,F) and ~ ( y ) = I 2 ( 7 / 2 , 7 ) , then ~ is a simple truncation. However, the idea is to go the other way. We hope to use our knowledge of F to find a simple truncation analytically. We cannot say more about this process here, as it depends on the details of the F being studied, and success is not guaranteed. Once we have a simple truncation, constructing an invertible truncation from it is a finite computational task. To see this, let us suppose that analysis of F has given us a simple truncation ~ . To construct an invertible truncation from it, we note that for any 3' > 0, dim ~ ( 7 ) < zc, so there are efficient algorithms that provide a singular value decomposition of F [ ~ ( 7 ) (Golub and Van Loan, 1983; Press et al., 1992). From such a computation we obtain orthonormal bases {:el, • • • ,~N(r)} and {-Vl,""" ,.VD}for ~ ( 7 ) and Y and n o n - n e g a t i v e real n u m b e r s hi, " , A D n N(~,) that "

"

F ( 2 , ) = h,.~, i f l < n < D A N ( T )

(6.6a)

F(.~,) = O i f D n N ( y )

(6.68)


Here D O N(T) is the lesser of D and N(T). The ~, and .9, in Eq. (6.6a) and Eq. (6.6b) will usually differ from those in Eq. (6.3a) and Eq. (6.3b). We now construct from =" an invertible truncation ~O as follows: for any a and fl with O < a < / 3 , we choose h so that

~ = ( a2 + 72) '/2

(6.7)

We compute the singular value decomposition Eq. (6.6a) and Eq. (6.6b) of the analytically constructed

G.E. Backus / Physics of the Earth and Planetary Interiors 98 (1996) 101-142

space --=(y). If a > 3', for all n, we let O ( a , / 3 ) = {0}. Otherwise, we let 12(a,/3) be the subspace of _~(y) spanned by those .~, in Eq. (6.6a) and Eq. (6.6b) for which A, _> a. To prove that the /2 so defined is an invertible truncation, we appeal to Eq. (6.4a) Eq. (6.4b) Eq. (6.4c) Eq. (6.4d) Eq. (6.4e). Those equations apply to the present situation if in Eq. (6.4a) we replace ze by N and in Eq. (6.4b) Eq. (6.4c) Eq. (6.4d) Eq. (6.4e) we replace D by D A N(T). Therefore Eq. (6.1b) follows from Eq. (6.4d). It remains to prove Eq. (6.1c). To that end, let us suppose x ~ Y2(a,fl) "L. Then we can write x = x v + x ~ , where x~,~O(ol,/3)± f~ W(3') and x ~ ( 3 ' ) ± . It follows from Eq. (6.4e) that liE( xv)[[ _< c~llxvll and from Eq. (6.5) that

IIF(x )II<- 3"11x¢ II However, F( x) = F( x r) + F( x~ ), so, by the triangle inequality, IIF(x)ll _< IIr(x~)[I + [IF(x~)l[. Therefore liE( x)ll < c~[Ixvll + 3'11x~ II if x = xv + x~

(6.8a)

As x v - x~ = 0, we have I1( x)[I 2 = Ilxvll 2 + IIx~ II2

P T : IIx :X-"> XT,

F T : 02( FIXT) - r + ( FIXr), Q r = IIr :Y---> YT,

Gr=GIY- c

(3.2a) is true. One consequence of Eq. (6.1b) is that (FIXr)-I({0}) = {0}, which proves the existence of the inverse function (FI X r ) - l : Y r --->X r (Halmos, 1958). Hence, (F[Xr)-T:XT ~ YT does exist, and so does the Fr: X r ~ YT defined in Eq. (6.9). Then, as noted in Section 4, the inverse function (Fr)-~'Yr X r also exists. That function and F r and G r are continuous because they are linear and their domains are finite dimensional. The functions Pr and Qr are continuous because they are bounded; indeed, [IPrll = IIQTII = 1. Finally, Eq. (3.2a) is true simply because Qr is an orthogonal projector. In the geomagnetic inverse problem, and perhaps in others, it happens that 8Ry = ~RYl + gRY2, where gR Y~ and gRY2 are independent random errors, and the variance matrix V~ of ~RYl is easy to invert whereas the inverse of V, the full variance matrix of gRY, is obtainable only through heavy computation. In this case it is useful to construct a simple truncation ~t based on VI. Then Eq. (4.6) assures us that _wI is also a simple truncation for the inverse problem whose error variance matrix is V. Thus w provides a finite-dimensional space on which to carry out numerically the singular value decomposition Eq. (6.6a) and Eq. (6.6b) required to find an invertible truncation for the inverse problem in which the error variance matrix is V.

(6.8b)

Under the constraint Eq. (6.8b), the right side of Eq. (6.8a) has maximum value IIx[I (c~ 2 + 3'2)1/2. Then an appeal to Eq. (6.7) proves Eq. (6.1c). It remains to show how to obtain trimmings from an invertible truncation. Let us suppose that /2 is an invertible truncation for the linear inverse problem (X,F,G). For every nonnegative real 0 and every real a,/3 satisfying 0 < t~
XT:~(~(tX,fl),

115

(6.9)

where Yr = F ( X r) and Hx, and H r , are the orthogonal projectors of X onto X r and of Y onto Yr. We claim that (Xr,Pr,Fr,Qr,G r) is a trimming of (X,F,G). To show this we must show that F r exists, that Pr, Qr, Fr and G r are continuous, and that Eq.

7. Explicit computation of prediction errors in linear modeling with a prior quadratic bound This section is devoted to a common type of inverse problem that we call a 'quadratic-linear modeling problem' or QLMP. In a QLMP the prediction errors can be computed explicitly so as to illuminate the structure of the inversion, and the use of those errors by frequentists and Bayesians can be explored in some detail. In particular, a QLMP offers a natural way to choose regularization parameters. An inverse problem (X,F,G) is a QLMP if it satisfies five conditions: (1) it is linear; (2) the prior information about the correct Earth model x e is a quadratic bound Eq. (4.1) that has been used to define a dot product Eq. (4.2) on the model space X; (3) X has been completed in the metric defined by this dot product; (4) the prediction space Z is a finite

116

G.E. Backus / Physics of the Earth and Planetary Interiors 98 (1996) 101-142

dimensional subspace of X; (5) the prediction function G is /-/~:X ~ Z, the orthogonal projector of X onto Z. We start the inversion of a QLMP by furnishing the data space Y with the dot product Eq. (4.7), so that the variance tensor of the random error 8Ry in the data vector is the identity tensor on Y. Evidently, in a QLMP the goal is to use the data to estimate the orthogonal projection of x E onto the given finite dimensional subspace Z of X, i.e. to 'model' x E as a member of Z. Of course it is desirable to choose the subspace Z to be apropos; that is, H z ( x e) should explain most of the data and include no parameters whose effects on the data are undetectable. One way to find such apropos Zs is to construct a simple truncation _~ of (X,F). As noted in Section 6, this usually requires theoretical rather than numerical analysis. We shall suppose that we have found ~ . We choose a y > 0 with the understanding that we may have to decrease it later in order for Eq. (6.5) to provide usefully small bounds on the prediction errors. Another way to write Eq. (6.5) is IIFO//_=(~)~ll _< y

(7.1)

From ~ we proceed as in Section 6 to construct candidates eligible to be the prediction space Z. Let N = dim ~ ( y ) , D = dim Y, N A D = min(N,D) and N U D---max(N,D). Computing the singular value decomposition of F l ~ ( y ) : = ( y ) ~ y provides an orthonormal basis {.el,.-. ,.eN} for ~ ( y ) , an orthonormal subset { ) l , " " ,)NnD) of Y and a sequence of real numbers At _> A2 _ 0 • • • _> AN _> 0 such that we have Eq. (6.6a) and Eq. (6.6b). Now we choose an integer M such that

1 <_M<_NC~D

(7.2a)

and Au > 0

(7.2b)

(7.2C)

and

a = H.:X~Z

XT= Z = span{.el,..- ,.eu}

(7.3a)

Pr = [Ix r :X ~ X r

(7.3b)

Fr= 02( FIXT) -r + ( F I X r )

(7.3c)

Qr = Hr,:Y ~ Yr

(7.3d)

G r = Ix,

(7.3e)

In the notation of Section 3, Eq. (7.2a) Eq. (7.2b) Eq. (7.2c) Eq. (7.2d) and Eq. (7.3a) Eq. (7.3b) Eq. (7.3c) Eq. (7.3d) Eq. (7.3e) imply the following: (7.4a)

gr=O HTC)fT = ATZ

PT - Fr I O Q r O F

= (F;7 ' O Q r O F

(7.2d)

Then, (X,F,G) is a QLMP. To make it apropos we need only choose y small enough to be interesting, and M large enough to include the model parame-

-

Pr)(xE)

(7.4b) (7.4c)

It is convenient to define

A~ Z = ( F r ' O Q T O F -- P r ) ( X e ) O/'/_--(r)l (Xe) (7.5a)

AIIrZ = ( F~ ' O Q r O F - PT)( x e ) O Fl=_(r)( Xe) (7.5b) As I x =//_-(r) + / / z ( r ) l , it follows that ATZ = A~Z + AIIrz

and we define Z = span{.el,'"" ,2M}

ters we want to estimate. Because our prediction function G vanishes on Z ± there is no need to vary M so as to minimize the prediction error. For nonQLMPs such minimizing over M is necessary, and has been discussed by Backus (1989). To invert this QLMP we introduce a trimming that generates a weighted, regularized least-squares fit to the data. We shall use 02 to denote the regularization parameter. The trimming, (Xr,PT,Fr,Qr,Gr), is defined as follows (recall that Yr = Fr( Xr)):

(7.5C)

Our information about A¢ z comes only from Eq. (7.5b) and the prior bound Eq. (4.4), i.e, [Ixfll _< 1. As X r c ~ ( y ) and Pr = IIxT, therefore PrOFI=(v)~ = 0. Also, //_=(v)~ O//~(v)~ =//=-(v)'- ' so Eq. (7.5a) can be written

A ~ z = ( F r ' O Q r O r O l l = ( ~ ) ~ ( II_-,~)1( xe) ) (7.6a) Condition Eq. (4.4) is equivalent to the inclusion

x e ~ By,, where BvT is the solid unit ball in Yr. As

117

G.E. Backus / Physics of the Earth and Planetary Interiors 98 (1996) 101-142

all functions in Eq. (7.6a) are linear, that equations implies

YT = FT(XT) = Fr (span{ ~,, • • • ,XM}) = s p a n { . ~ l , ' " " ,.YM}, and Qr = Hrs. Therefore Qr(Y,)

Moreover,

= 0 if M < n < D and, from Eq. (7.9b), M

where

QTOFOII=(y)(XE)

/~± = IIQr OFO//--~)~llll(//z(~)~)(

xe)llFr '(Bye)

We abbreviate by introducing

from

Eq.

(7.6b)

so NQTOFOII-2(:,)~ II -< IIQrllllFO//z(~)~ II < "/ because of Eq. (7.1). Thus, finally,

(7.3d),

IlQrll-<

A~ z e E ?

1,

(7.6c)

Then Eq. (7.7d) implies M

= E A2n(02 "}-A2n)-I~n,~n n=l

Moreover, {-rM+ , , " " ,'rN} C X r ~ and applying Pr to Eq. (7.9a) gives

(7.9d)

Pr = Hx~, so

M

PrOlIz(~)(xe) = ~ ~,.~,

where

(7.9e)

n=l

E~-= ~yF(,I(Bv~ )

(7.6d)

According to Appendix D the confinement set E~ is a solid ellipsoid centered on 0 in X r. To compute it explicitly, we note that Eq. (7.2a) Eq. (7.2b) Eq. (7.2c) Eq. (7.2d) entails the following chain of implications:

(FIXr)(~,) =A,.~,,

(7.9c)

n=l

Fr1OarOFOII=( y)( Xe)

ff = II//=(~)~ (xE) II Also,

= ~_~ A,~,~,

1
(FIXr)-r(~,) = A~'),,

1 <_n < M

FT(X,)=(Oe+A2,)A;I.9,,

l<_n<_M

Fr'(y,)=A,(O2+A2,)-12,,

(7.7a) (7.7b) (7.7c)

l<_n
From Eq. (7.7d) it is clear that the confinement ellipsoid E;x specified by Eq. (7.6d) has as its principal semi-axes the M vectors

e~X-=CyA,(OZ+A2,)-12,,

l<_n<_M

(7.g)

To find a confinement set for AIIrz, we begin by noting that

Finally, from Eq. (7.5b) and Eq. (7.9d) Eq. (7.92), we infer that M AIITZ = -- E 82( 82 -[- /~2n)-I ~n.~n n=l

If we choose not to regularize the inversion, then 02 = 0 and AIIrZ= 0. If 02 > 0 then Eq. (7.90 enables us to calculate AIIrz from the prediction vector z e = ( ~ , .- -,scM) that we are trying to estimate. Thus the error in AIIrz is really determined by the errors in (~,, • • • ,~:M) rather than by the bound Eq. (4.4). This problem can be dealt with by iteration or by moving NlrZ to the left side of Eq. (3.7a) and solving for ze- The present section will show that in realistic inversions neither maneuver appreciably improves the estimate of the prediction error obtained from Eq. (7.90 by appealing only to Eq. (4.4). Therefore we simply observe that Eq. (4.4) and Eq. (7.6b) imply N

~,2 + E ~:~ < 1

//_=(~(x D

(7.9f)

(7.10)

n=l

N

This inequality and Eq. (7.90 enable us to assert that

= E ~:,x,

(7.9a) AllrzeE~k

n=l

for unique scalars ~:,. Then by Eq. (7.2a) Eq. (7.2b) Eq. (7.2c) Eq. (7.2d)

n=l

where the confinement set E~ is a solid ellipsoid centered on 0 in X r whose principal semi-axes are

the M vectors

N~D

FOII=~v)(xe)= ~ A, sc,.9,

(7.11a)

(7.9b)

e~ = 02( 02 + A2n)-|( 1 - ~ 2) l/2.~.

(7.11b)

G.E. Backus / Physics of'the Earth and Planetary Interiors 98 (1996) 101-142

118

From Eq. (7.5c), Eq. (7.6c) and Eq. (7.11a) we infer that

(7.12)

a z E? +

The containment set E~ +E~ need not be an ellipsoid. For example, in three dimensions if A is a very long thin ellipsoid and B is a sphere with diameter much less than the length of A but much greater than the thickness of A, then A + B is a sausage. The result Eq. (7.12) contains the parameter (, about which we know only that 0<~'< 1

(7.13a)

Thus we can conclude only that Eq. (7.12) is true for some ( satisfying Eq. (7.13a). It follows that

ATZ~ U0
(7.13b)

where the union is over all ( satisfying Eq. (7.13a). As E~ cE~ z and E~I c E I I, Eq. (7.13b) also implies

ATZ E El± + ElI

(7.13c)

The result Eq. (7.13b) is harder to use than Eq. (7.13c), but it is the sharper result and so is preferable when the union can be computed conveniently. Returning to the prediction error Eq. (3.16a), we must still deal with the random error BRz and the systematic error 8sZ. For simplicity we assume that Bsy = 0, so 8sZ = 0. The mean vector and variance tensor of 8Ry are E [ ~ R y ] = 0 and E[BRy~Ry] D = Y'..~, ~,, the identity tensor on Y. As QT = Fir,, n=l the mean vector and variance tensor of QT(BRy) are M 0 and ~.v~.9~. Setting DHT= H T in Eq. (3.16b) n=l gives the mean vector and variance tensor of BRZ = HT(Be y):

E[BRZ ] = 0

(7.14a)

and M

E[SeZBRZ]= E Hr( y,)Hr(.9,) n=l In our case, H r = FT ~0 QT, so Eq. (7.7d) yields M 2 -2 E[BRZ~RZ]= Z a2.( 02 + 1~.) "~.'~n (7.14b) n=l If the probability measure rtR for 8RY is gaussian, so is the probability measure ~rR for Be z, and then Eq.

(7.14b) determines ~'R completely. Otherwise we must compute ~'R from the fact that it is r/R©H~ ~ At this point the trimming of the QLMP is finished. The prediction error Bz is expressed as the sum of a random error 8R Z with known probability measure (g in Z and two systematic trimming errors, A~z and A"TZ, with known confinement ellipsoids in Z. To complete the inversion, it remains only to see how a frequentist and a Bayesian might use the results of the trimming. In discussing this question, we consider the simplest case: that both inverters want to estimate only a single model coefficient G = ~:,, and to find its errors estimate 8 Rz,. To motivate the decisions we are about to ascribe to frequentists and Bayesians, it will be useful to recall that, at least in geomagnetism, convincing prior quadratic bounds are likely to be very conservative. That is, in Eq. (4.1) Q(xe,x E) << q so that in Eq. (4.4)

IlxEII << 1

(7.15a)

In geomagnetism, this inequality certainly holds for the virial energy bound and very probably holds for the heat-flow bound. If .~, is one of the basis vectors in Eq. (6.3a) Eq. (6.3b) then Eq. (7.15a) implies

Iz.I = I.,?." XFI << 1

(7.15b)

We present first the frequentist's and then the Bayesian's use of our trimming of the QLMP. A frequentist will combine the hard bounds on A~ z, and All z, to conclude from Eq. (7.5c), Eq. (7.8) and Eq. (7.1 lb) that laTz~l-< ~'A,T + (1 - ~" 2) I / 2 0 2 ] ( 0 2

q'-/~2n)- I (7.16a)

All that is known about ( in this inequality is Eq. (7.13a), but ~" can be eliminated from Eq. (7.16a) by appealing to the Schwarz inequality to obtain IATGI < (A2j 2 -1"-04)1/2(02 -{- /~2n)- I

(7.16b)

Using Eq. (7.13c) instead of Eq. (7.13b) would be another way to eliminate (, but the result would be weaker than Eq. (7.16b). The frequentist will want to construct a confidence interval for Bz, and will begin with By,. We let t¢ be any positive real number and p ( g ) be the

119

G.E. Backus/ Physics of the Earth and PlanetaryInteriors 98 (1996) 101-142 probability that - K _< 8R Y, < K. Then from Eq.

with probability p(K). If the probability distribution of 8Ry is gaussian then p ( K ) = erf(K/2J/2), and, for example, p ( 3 ) = 0.966. Combining Eq. (7.16b) with Eq. (7.17) and appealing to the statistical ellipsis proposed in Appendix C allows us to conclude from Eq. (3.16a) that at confidence level p(K)

error bound comes entirely from the prior information Eq. (4.4) and makes no use of the data. To hold the frequentist's interest, henceforth we assume that A < l . Then g , ( 0 ) = - A - B < 0 and g,(zc) = 1 - A > 0, so g , ( x ) increases steadily from a negative value at x = 0 to a positive value at x = :c. Thus g~(x) and therefore Oxf~(x) have exactly one positive zero, and it is the unique positive x that minimizes f,(x). Let us call it x,(A,B). From Eq. (7.20b) it is

182,1 < j ~ ( 0 )

x , ( a , B ) = B ( B + A R ) ( 1 - A Z ) -l

(7.14a) Eq. (7.14b) (7.17)

I Rz,I-< KA,(0 2 + A2,) -'

(7.18a)

where

where

~ ( 0 ) ~--"(A2n'y2"{- 04)1/2-[- KAn

R = (1 - a 2 + B 2 ) 1/2

AZ " + 02

(7.18b)

The minimum value of f,, achieved at

Conclusion Eq. (7.18a) Eq. (7.18b) is valid for every regularization parameter 02. Thus far 02 is unspecified, and, indeed, how to choose 02 continues to be a subject of interest to inverse theorists. Clearly, Eq. (7.18a) Eq. (7.18b) suggests that the frequentist choose 02 to minimize f , (0). Some definitions will simplify the calculation: we let

min f , = f , (

a = K/A,,

4,,(A,B)

k(x)

B = y/a,,

x

=

(7.19a)

0211A2n

Then a + ( x 2 -1"-O2)I/2 1 +x

(7.20a)

gn(x)=(x-B2)(x2WB2)-'/2-A

(7.20b)

O,g,,(x) =B2(1 + x ) ( x 2 q-B2) -3/2

(7.20c)

From Eq. (7.19b) and Eq. (7.20b), ~ f , ( 0 ) = - A B < 0, so a little regularization is always better than none. To find the optimal value of the regularization parameter 0 2 we begin by noting that 8xg, > 0 for all x, so g,(x) increases steadily with x. However, g,(oc) = 1 - A , so if A > 1 then, for all positive x, g,,(x) < 0 and hence i~.f,(x) _< 0. Then rain f,(oc) = 1. However, x = 00 corresponds to 0 2 a¢. When the regularization parameter is infinite, the estimate Eq. (3.7b) for z, is zero and its error bound is unity. The

(7.21b)

x,(A,B), is

x,( A,B)) = ( B + AR)( AB + R) -' (7.21c)

Next we ask, by what fraction of itself is the error bound decreased if we increase x or 02 from zero to its optimal value? The answer is

=

f,(O) -f~(x,( A,B))

L(0)

g,(x)=(1 +x)2¢k(x) (7.198)

f n ( x ) -=

(7.21a)

=

B( B + AR) (R+ 1)(AB+R)

(7.22)

where R is Eq. (7.21b). Finally, we ask what is the range of x or 02 in which regularization improves accuracy? We note that, as x increases, f,(x) decreases from f , ( 0 ) = A + B to the minimum Eq. (7.21c) and then increases to f , ( ~ ) = 1. Thus, if A + B > 1, any regularization is better than none. If A + B < 1, then a regularization is better than none if its regularization parameter 02 corresponds to an x for which 0 < x < ~,(A,B)

(7.23a)

where if,(A,B) is the positive solution ff of f,(.~) = f , ( 0 )

(7.238)

It is easy to verify that

~___

.~,(A,B) =

2B( a + B)

1 - ( a + B)

(7.23c)

120

G.E. Backus /Physics of the Earth and Planetary Interiors 98 (1996) 101-142

Most of the foregoing discussion can be simplified. The reason is that Eq. (7.15b) makes the error bound Eq. (7,18a) uninteresting unless a~(0)<< 1. From Eq. (7.21c) this means •< 1

(7.24b)

I~z.I < (~* + K) a2'

(7.24c)

As A < 1 and • << 1, Eq. (7.24c) implies a < •

(7.25a)

Squaring Eq. (7.24c) and simplifying gives B2(1 _ •2) = ( • _ A ) 2

(7.25b)

whence B < •(1 - •2) - ' / 2

(7.25c)

Thus the results Eq. (7.18a) Eq. (7.18b) Eq. (7.19a) Eq. (7.19b) Eq. (7.20a) Eq. (7.20b) Eq. (7.20c) Eq. (7.21a) Eq. (7.21b) Eq. (7.21c) Eq. (7.22) Eq. (7.23a) Eq. (7.23b) Eq. (7.23c) will generally be interesting only if A<
B<
(7.28a)

As long as the regularizing is no stronger than Eq. (7.28a), to a very good approximation the error bound is

Eq. (7.24b) is equivalent to B(I - a • ) = R( • - A)

0 < 02 < 2"y(K + "g)

(7.24a)

where

• = ( B + A R ) ( A B + R ) -I

regularizing as long as 0 < x < 2B( A + B). In terms of the regularization parameter 02, this condition is

(7.26)

(7.28b)

at confidence level p(K). Conclusion Eq. (7.28b) can be obtained simply by setting 0 2 = 0 in Eq. (7.18b). Therefore, to first order in A and B, AT-Lz contributes negligibly to 8z,, so thet frequentist's acceptance of the crude estimate Eq. (7.1 la) loses no appreciable accuracy. We conclude that in realistic inversions, where the error is to be at least as small as the estimate of z,, the frequentist can achieve accuracy Eq. (7.28b) but no better. This accuracy can be attained with or without regularization, as long as the regularization parameter does not exceed Eq. (7.28a). Now we turn to the Bayesian, who will convert the hard bounds to personal probability measures and combine them according to what he or she believes about independence of the various errors. Because of Eq. (7.14b), the Bayesian will adopt for the variance of 8 Rzn the value V ( ~ R Z n ) = /~2n(02 "1- /~2n

)-2

(7.29a)

When Eq. (7.26) holds, to lowest order in A and B, R = 1 and

Because of Eq. (7.16b), he or she will adopt for the variance of Arz . the value

x , ( A , B ) = B( a + B)

(7.27a)

V ( A r z . ) = ( A 2 . y 2 + O 4 ) ( O 2 + A 2 . ) -2

min f , = A + B

(7.27b)

4),(A,B) = B( A + B)

(7.27c)

~ , ( A , B ) = 2B( A + B)

(7.27d)

(The Bayesian could also arrive at Eq. (7.29b) by regarding A~ z. and A~ z. as independent and using Eq. (7.8) and Eq. (7.1 lb) to obtain their variances.) The Bayesian will surely hold that 8Rz. and Arz n are independent. This will yield for their sum, 8z., the variance

According to Eq. (7.26) and Eq. (7.27c), optimal regularization achieves only a very minor improvement in accuracy compared with no regularization at all. Indeed, rain f , = f,(0) to first order in A and B. Perhaps the most useful results of Eq. (7.27a) Eq. (7.27b) Eq. (7.27c) Eq. (7.27c) are this fact and Eq. (7.27d). The latter says that we lose no accuracy by

g(~Zn)=(O4"l-~2n]12"~/~2n)(02 "t'-~2n) -2

(7.29b)

(7.29c)

We let x be as in Eq. (7.19a) and define C = (y2 + 1)A~-2

(7.30a)

G.E. Backus / Physics of the Earth and Planetary Interiors 98 (1996) 101-142

Then the variance Eq. (7.29c) is

V,( x) = ( x 2 + C)( x + 1) -2

(7.30b)

and 0~V,(x) = 2( x - C ) ( x + 1) -3

(7.30c)

Like the frequentist, the Bayesian can choose x or the regularization parameter 02 so as to minimize the error represented by Eq. (7.30b). Evidently the result is x= C

by decreasing y (see Eq. (7.19a) and Eq. (7.30a). Usually, decreasing 3` requires increasing the dimension of ~(3`) in Eq. (7.1). We shall see this explicitly in the discussion of geomagnetic inversion in Section 9. Thus, in order to extract from the satellite magnetic data Gauss coefficients up to degree 15, we obtain better accuracy by fitting the data with models whose cutoff degree is considerably higher (J. Cain, personal communication, 1995).

(7.31a) 8. Data compression in an idealized model for data errors in core geomagnetism

The minimum achievable variance is then minV, = C(1 + C) -~

(7.31b)

Compared with no regularization, optimal regularization decreases the error by the relative amount

dp,(C) =

121

v (o) - v V,(0)

(c)

c C+ 1

(7.31c)

Regularization does not increase the error as long as the regularization parameter satisfies 0 < x < 2C(1 + C) -~

(7.31d)

If the Bayesian, like the frequentist, wants an error at least as small as the quantity to be estimated, then Eq. (7.15b) and Eq. (7.31b) lead him or her to be interested only in the case C << 1

(7.32)

In that case, the foregoing formulae can be simplified, and the end result, correct to lowest order in C, gives the smallest achievable error as V(~3Zn ) ,.~ (,,/2 +

I)A~2

(7.33a)

This error is obtainable without regularization or with a regularization whose parameter 02 satisfies 0 < 02 < 2(3`2 + 1)

(7.33b)

The Bayesian arrives at error bounds quantitatively close to those of the frequentist, but interprets them differently. Like the frequentist, the Bayesian loses no appreciable accuracy by accepting the crude confinement set Eq. (7.11 a). The foregoing discussions lead to a useful conclusion about 'guard parameters'. We can decrease the error Eq. (7.27b) by decreasing B and the error Eq. (7.31b) by decreasing C. This can be accomplished

To illustrate our inversion scheme we consider an idealized form of the geomagnetic problem mentioned in Section 1. We try to infer some of the Gauss coefficients of the core field from field components measured at satellite altitudes. We will model the measurement errors, the crustal signal and the magnetospheric signal as uncorrelated random variables, and with one minor exception we will ignore the magnetospheric signal. Jackson (1990) has pointed out that if the crustal magnetization is a random field, even with correlation length zero, the geometry of the Laplace equation produces a significant correlation length in the crustal signal at satellite altitudes. This makes the error variance matrix Eq. (2.4) non-sparse and makes its inverse inaccessible except to numerical computation. Thus the dot product on the data space Y is not analytically available. To see in detail how trimming works, we would like to have an analytic test case that can be followed without extensive computer calculations. Therefore, we will idealize the MAGSAT data enough to make them amenable to analytic treatment but not so much that they lose all relevance to MAGSAT. Specifically, we will use data compression to achieve a diagonal error variance matrix Eq. (5.4) based on the correlations pointed out by Jackson. The idealization of the data begins with the assumption that D / 3 is an integer and that the D data yi are the Cartesian components of the vector magnetic field B measured at D / 3 locations cPl,.-. ,cPo/3 on a single spherical surface S(c) of radius c concentric with the Earth. We follow Cain

G.E. Backus/ Physics of the Earth and Planetary Interiors 98 (1996) 101-142

122

et al. (1989) and take c = 6791 km as a reasonable estimate for MAGSAT. As we model the crustal and magnetospheric fields as random errors we write them ~c B and 8pB (P is for plasma). The measurement error is 8M B, so the total measured field is

B

= BK+

8MB + 8cB

+ 8pB

(8.1)

As 8By i, acy i and aMy i are random errors, E[ap yi] = E[Sc yi] = E[aM yi] = 0. W e d e f i n e

v/,J=E[Spyi~pyJ],

v~J=E[~cyi~cyJ],

V~j= E[~ Myi8 z yJ] The total random error of measurement is

where B K is the field produced by the core. Thus B K is the Earth model x e that we seek. The Carte-

g3Ry i = ap yi + ~3cyi + a M y

sian components of the error fields will be written as aMy i, a c y i w i t h 1
so the error variance matrix Eq. (2.4) is

D/3

( D / 3 ) -' Y'. f(ri) = (f>s(,)

(8.2a)

i=1

(8.3)

V ij = Vy + V{; + V~/

(8.4a)

(8.4b)

Our assumption about 8 M yi implies that

v;? = 0-2 ij

(8.5)

If we endow the data space Y with the dot product Eq. (4.7), the analysis leading to Eq. (4.9) tells us that ][y[12 < y~(Va71)~jyj. Thus, by Eq. (8.5), D

Ily[I 2 < 0-g 2 ]~ (yi) z

(8.6a)

i=1

where

(f)s(,) = (4~r) - ' fs(,)d A( ~ ) f ( P )

(8.2b)

It is difficult to make this idealization precise, but roughly we ask that Eq. (8.2a) should be a good approximation if the horizontal length scale for variations in f is appreciabily larger than the distance between adjacent points ~i. The third idealization is that the D random errors 8M Y~ owing tO the measuring instruments and satellite positioning and orientation are uncorrelated random variables with the same variance, 0-2. As Holme and Bioxham (1995) pointed out, this model oversimplifies the effects of errors in satellite orientation. From the measured properties of the magnetometers and navigation instruments, Langel et al. (1982) estimated that on M A G S A T o-M = 6 nT. The fourth idealization is based on a suggestion of Jackson (1990, Jackson, 1994). We assume that the crustal field is produced by a random magnetization M in the crust and that the statistics of the vector field M are invariant under all rigid rotations about the center of the Earth. We do not assume that at each r the statistics of M(r) are invariant under rotations about r.

As the data are all the Cartesian components of B, Eq. (8.6a) is equivalent to

D/3 Ilyll 2 < O'A2 ]~ IB(cPi)I 2 i=l

(8.6b)

The rest of our discussion of the dot product on Y is an analysis of the statistics of the crustal 'error' 8cY and the measurement error amy. Henceforth we simply ignore the magnetospheric signal 8py, and take V~j = 0. It should be noted, however, that Eq. (8.6b) remains true without this simplification. To use our idealization of 8cY we need some notation. For l = 0,1,2, - - - and - l < m < l, we let HF(r) be a real harmonic polynomial homogeneous of degree ! in the position vector r. We make no assumptions about the dependence of Htm on m except that the polynomials are orthogonal and Schmidt-normalized. That is, (,q~Ht,~'>s(,) = an, am,,,,(21 + 1) - '

(8.7)

where the overbar denotes complex conjugation. (The notation is designed for complex spherical harmonics, even though ours are real.) Thus the H ~ constitute a complete orthogonal (but not orthonormal) basis for the Hilbert space of scalar functions

G.E. Backus / Physics of the Earth and Planetary Interiors 98 (1996) 101-142

square-integrable on S(I). We define the following vector fields in all of three-space: Utm(r) = - r 2'+ 3V[ r - 2 1 - ' H ~ ( r ) ] = [(1+ l)r-rVt]H;"(r)

(8.8a)

Vtm(r) = V H ~ ( r ) = r - 2[ Ir + rV 1lH/m(r) (8.8b)

wtm(r) = ant=(r) = ? × V , H ? ( r )

(8.8c)

Here V 1 = r V - rOr, the angular part of V, and A = r × V. The fields Eq. (8.8a) Eq. (8.8b) Eq. (8.8c) are vector spherical harmonics and their restrictions to S(1) constitute a complete orthogonal basis for the Hilbert space of vector fields square-integrable on S(1) (see Backus (1986) for an exposition and references). It should be noted that Vo° = W° = 0. The orthogonality relations for these fields are

w;')s(,)=

123

Eq. (8.11b) Eq. (8.11c) causes no trouble because V° = W0o = 0, so those terms are absent from Eq. (8.10). I f / 3 is a magnetic field, then u ° = 0, and the terms involving Ut" describe the part o f / 3 produced by electric currents entirely inside S(c), whereas the terms involving VIm describe the part o f / 3 produced by electric currents entirely outside S(c), and the terms involving W/" describe the part of /3 produced by electric currents crossing S(c) (Backus, 1986). Here 'electric current' means the sum of the conduction current and the curl of the magnetization. If the 13 in Eq. (8.10) is the magnetic field 5cB produced on S(c) by the crust, we write its coefficients u~, v~" and w~" as 5cU'[, ~cV~ and ~cW'~. As the crust is inside S(s), therefore 5cV['(c) = gcW'~(c) = 0 for all l and m. If the statistics of the crustal magnetization M are invariant under rotation, then with some further simplifying assumptions Jackson (1990) has shown that for each l there is a coefficient Jc(c,1) that depends on the correlation length of M and for which

= 0

(8.9a)

( ~ . ut,m')s(l)= (I + 1)51r5,, m,

(8.9b)

E[~ctt-~(C)~cU~'(c)] =Jc(C,l)Stl, Smm,

(vT,

(8.9c)

With the help of Schur's lemma and the irreducibility of the spherical harmonic representations of the rotation group (Wigner, 1959), it can be shown that Eq. (8.12) follows from rotational invariance alone, without Jackson's further assumptions. We do not know enough about the crust to estimate Jc(c,l) from crustal measurements, but Jackson (1990) pointed out that Jc(c,l) can be estimated from the observed LScke-Mauersberger-Lowes (LML) spectrum of the geomagnetic field. To describe this estimate we write the total geomagnetic field outside the Earth owing to sources inside it as

" Vlrm,)S(l)

=

l~ll'~mm '

( W " Wr"')s(l)= 1(1 + 1)(2l + 1)- '5,,8,,,,, (8.9d) If ~(c?) is any square integrable vector field on S(c), there are unique coefficients u'~(c), v~'(c),w~'(c) such that

/ : E E /=0m=-I

+

c)vT(

+ w?( )wT( ?)] (8.10)

where, from Eq. (8.9a) Eq. (8.9b) Eq. (8.9c) Eq. (8.9d),

(1-'l'- 1)b/~n(c) = ( ~ / ( F ) " /3(C~))S(1)

(8.1 la)

I/.,'/rn(C) = ( ~ ( F)" /3(CF))S(I)

(8.1 lb)

and

l(l + 1)(2l + 1 ) - ' w ~ ' ( c ) = (W~t( ? ) ./3(c~))s(l) (8.11c) It should be noted that the case 1 = 0 in Eq. (8.11 a)

B = -V4,

(8.12)

(8.13a)

where

~c l q~( r?) = r o E ( rolr) t+ ' E 1=1 m=-I (8.13b) Here r 0 can be any positive number, but the convergence of Eq. (8.13b) is assured only outside the source region, i.e. where r > b = 6371 km, the radius

G.E. Backus / Physics of the Earth and Planetary Interiors 98 (1996) 101-142

124

of the Earth. From Eq. (8.13a), Eq. (8.13b) and Eq. (8.8a), clearly

B(r~) = E (ro/r) t+e E l=l

Rc(c,O=(t+l)(21+l)Jc(c,t)

g~(ro)U["(r)

m=-I

(8.13c)

g~'(r o) the Gauss coefficients of B on S(ro). The Gauss coefficients on the two spheres S(r o) and S(r~) evidently are related by We will call

r~+2g,~( ro ) = r(+ 2g[n( rl ) The LML spectrum of B on

(8.14)

S(r o) is defined as

l

R ( r o , l ) = ( l + l ) Y'~ Ig~(ro)l 2

(8.15a)

m=-I

and clearly satisfies

r2l+ 4R( ro,l ) = r?l+ 4R( r I ,l)

(8.15b)

Cain et al. (1989) and Langel and Estes (1982) have estimated R(b,l) and R(c,l) from MAGSAT data. For 1 < l < 60, Cain et al. found a five-parameter expression that fits the data rather well:

R(c,l)=Rr(c,1 ) +Rc(c,l ) +Ru(c,l )

the right side is independent of m. Hence, from Eq. (8.12),

(8.16a)

where

Rr(c,l ) = 7.44 X 108(0.251) ' nT 2

(8.16b)

Rc( c,l ) = 14.8(0.876)' nTz

(8.16c)

(8.18)

Cain et al. (1989) pointed out that their analysis of the MAGSAT data justifies Eq. (8.16b) only for 15 _ 40 or 50 they believed that the data are mostly measurement errors. As an illustration of our trimming technique, we will use Eq. (8.16b) and Eq. (8.18) for all l, but it must be remembered that this is an extrapolation. There would be great interest in justifying such an extrapolation for l < 15, but at present we know of no way to do so. The idealization of the errors described in the foregoing paragraphs does not by itself produce a diagonal error variance matrix Eq. (2.4). To achieve that goal we need one further step, a data compression /(:Y -~ 17 that will make E[~cyi~cy j] diagonal without destroying the diagonality of E[~ M.~i8M~J]. This data compression amounts to using as data some conventional preliminary least-squares estimates of the spherical harmonic expansion coefficients of the total geomagnetic field. To define the compression, we let B be the total magnetic field Eq. (8.1). We choose an integer L in a way yet to be described, and for all l,m satisfying l _< l _
and

D/3

R M(c,l) = 0.091 nT 2

(8.16d)

Langel and Estes (1982) and Cain et al. (1989) suggested that R r and R c are signals from the core and crust. Cain et al. suggested that R M is measurement error. If we adopt their point of view, then

( l + 1)~ 7' = ( D / 3 ) - '

]~ ~ t ( ~ i ) . B ( c ~ i ) i=1

(8.19a) D/3

lV~ = ( D / g ) -~ Y'. VT(Pi) .B(kP~)

(8.19b1

i=1

l

Rc(c,l)=(l+l ) •

lScUT(C)[ 2

(8.17)

1(l+ 1)(2l + 1 ) - 1 ~ '

rn= - I

If we regard the crustal field as random, we expect approximately. 1

(2/+1)-'

E

IgcU'~(c)lZ=E[lgcU'~(c)l 2]

m=-I

where, because of statistical rotational invariance,

D/3

= (D/31-'

Y'.W~t(Pi).B(cPi)

(8.19c)

i=l

The dimension of the compressed data space 17 is 3L(L + 2), the number of compressed data. The dimension of the original data space Y is D. If Eq. (8.19a) Eq. (8.19b) Eq. (8.19c) really represent a

G.E. Backus / Physics of the Earth and Planetary Interiors 98 (1996) 101-142

125

compression, it follows that L must be small enough to satisfy

integral, as in Eq. (8.2a). Then from Eq. (8.9b) it follows that

3L(L + 2) < 0

E[~,v~7~M~,' ] = ot(l + 1)-l~,,Sm~,

(8.19d)

We want to calculate the dot product on )7, as defined in Eq. (5.4). This requires that we calculate the error variance matrix of the compressed data Eq. (8.19a) Eq. (8.19b) Eq. (8.19c). To that end, we substitute Eq. (8.1) in Eq. (8.19a) Eq. (8.19b) Eq. (8.19c) and denote by 8c~ ?, 6c~'~ and ~ c ~ the fi~', ~ ' and ~/" obtained from Eq. (8.19a) Eq. (8.19b) Eq. (8.19c) when B is replaced by 8c B. We define 8M u~', 8M ~" and ~u w~" similarly in terms of ~MB. Then ~c~'f, 8c~'~ and gcff,'~ are the random errors contributed to the compressed data by the crest, and 8M~ ~, 6M~" and ~ M ~ ' are the random errors contributed by inaccuracies of measurement. All these compressed random errors have expected value zero, and the correlations between the crustal and measurement errors vanish. To calculate the compressed dot product Eq. (5.4) on 17 we must find the correlation matrix of the crustal signals and that of the measurement errors. Let us consider first the measurement errors. From Eq. (8.5) it follows that

(8.22a)

where a = 0-2 ( 0 / 3 ) -'

(8.22b)

Similarly, (8.22c)

= a(21+ 1)l-1(1+ 1)-18,,,8~m, (8.22d) and =

=

=

o

(8.22e)

Substituting Eq. (8.20) in this equation gives

These results depend on our replacing the sum in Eq. (8.21b) by an integral. To justify that replacement for all l, l' < L, we must limit the size of L. The Cartesian components of U[" are scalar spherical harmonics of degree l + 1, so the terms summed in Eq. (8.21b) involve spherical harmonics of degree at most l + l' + 2, which is no larger than 2 L + 2. On S(1) a spherical harmonic of large degree l has horizontal wavelength approximately 47r(2/+ 1)(see Appendix C), so the shortest wavelength in the sum Eq. (8.21b) is 47r(4L + 5 ) -t. A 'square' on S(1) whose side is this wavelength will have area A t = 7"r2(L + 5 / 4 ) -2, and to cover S(1) with such squares will require 47r/A t of them. If there is any hope of approximating the sum in Eq. (8.21b) by an integral, then each of these squares must contain at least two points ~i in each direction, or four points altogether. This requires at least (16/zr)(L + 5 / 4 ) 2 points ~i- However, only D / 3 such points are available. Therefore, the argument leading to Eq. (8.22a) Eq. (8.22b) Eq. (8.22c) Eq. (8.22d) Eq. (8.22e) requires

(l + 1)(1' + 1)e[g~t ~g~K~" ]

( L + 5 / 4 ) 2 < 7rD/48

E [ S M B ( CPi)gMB(C~j)] =Or2Ml(3)gij

(8.20)

where i(3) is the identity tensor in real three-space. Appealing to the fact that the errors are real numbers, we can infer from Eq. (8.19a) that ( l + 1 ) ( t ' + 1)E[gM~?gM~?' ]

D/3 =(0/3)

-2 y" ~ l ( r i ) . E [ ~ M B ( C e i ) ~ M B ( C ~ j ) ] i,j= 1

• U,,"'(ej)

(8.21a)

D/3 = (D/3)-2~

]~ ~'7(Pi). Ut,"'(~i)

(8.21b)

i=1

Now we suppose that 1 and l' are small enough that the sum in Eq. (8.21b) can be approximated by an

(8.23a)

It should be noted that Eq. (8.23a) always implies Eq. (8.19d). Cain et al. (1989) used a data set with D-~ 50000 (J.C. Cain, personal communication). Then Eq. (8.23a) becomes L < 56

(8.23b)

G.E. Backus / Physics of the Earth and Planetary Interiors 98 (1996) 101-142

126

Next we consider the variance matrix of the compressed crustal errors. For them, Eq. (8.19a) becomes

might expect the Rc(c,l) and RM(C,l)obtained from the MAGSAT data to be approximately

D/3

l

(t+ll c 7=(D/31-'

E i=l

m=-I

(8.24/ =(l+

11(2/+ l ) E [ ( S c ~ r ) 2]

For the crust, Eq. (8.10) becomes zc

= (l+ 1)(2/+

1

acS(c )= E E acur(c)V,"(e)

(8.25)

l=lm=-I

3c~'~ = 8cU7(C),

acVtm = a c # ~' = 0

(8.26a)

and E[Sc~ac~7' ]

=Jc(c,I)an,8,,,,,

(8.26b)

Eq. (8.22a) Eq. (8.22b) Eq. (8.22c) Eq. (8.22d) Eq. (8.22e) and Eq. (8.26b) show that the data compression Eq. (8.19a) Eq. (8.19b) Eq. (8.19c) does indeed produce a diagonal error variance matrix for the compressed data. There are some rudimentary ways to check the foregoing estimates against the MAGSAT data. The values of the LML spectrum R(c,l) reported by Langel and Estes (1982) and Cain et al. (1989) are probably better represented by the formula /

R(c,l 1=(l+11

E rn=

uT(c) 2 -- I

than by Eq. (8.15a), as the reported LML spectra are obtained from estimates of g'~(c) that are rather like Eq. (8.19a). Indeed, Eq. (8.19a) gives simply the weighted least-squares fit of Eq. (8.13a) Eq. (8.13b) Eq. (8.13c) to the data, the weights being obtained by inverting the error variance matrix. Therefore, we

(8.27a)

l

RM(C,I)=(I+I)

On two grounds we will neglect the contribution to Eq. (8.24) from the terms in Eq. (8.25) with l > L. At high 1 we expect the phases of the H~(P i) to be random, producing cancellation in Eq. (8.24), and the work of Cain et al. (1989) suggests that for 1 > 40 or 50 geometric attenuation reduces the crustal signal on S(c) below the measurement noise. These arguments lead us to conclude that, at least approximately, if l < L then

1)Jc(c,l )

E

(SMUT) 2

m=-I

=(/+

~m

2

1 ) ( 2 / + 1)E[(SMU t ) ]

= a(21 + 1)

(8.27b)

From the MAGSAT data, Cain et al. (1989) estimated Rc(c,l) and Rg(c,l) as Eq. (8.16c) and Eq. (8.16d). Jackson (1990) found that the Jc(c,l) calculated from Eq. (8.27a) leads to a reasonable order of magnitude for the r.m.s, crustal magnetization. For Eq. (8.27b) a more quantitative check is available from the independent estimate of Langel et al. (1982) that o-M = 6nT. Then Eq. (8.22b) with D = 50000 implies a = 0.002 nT 2. Cain et al. (1989) gave a plot of In R(c,l) vs. l that appears to be rather flat for 40 < l < 60, suggesting that in that range of l they were seeing mostly RM(c,l). In that range of l the variation of RM(c,I) with l that appears in Eq. (8.27b) is probably not detectable against the noise in the data, so we can simply set l = 5 0 in Eq. (8.27b). Then that equation predicts RM(c,I)= 0.2nT 2, slightly more than twice the value Eq. (8.16d) actually reported by Cain et al. (1989). The values of 1 used here are uncomfortably close to the limit Eq. (8.23b), so a factor of two is probably not a significant discrepancy. Incorporating the HolmeBloxham effect would lower our estimate of a, but probably not by a factor of two. In the absence of a comparable magnetospheric signal, further checks on our approximate treatment of the errors could be obtained via the two analogues of the LML spectrum calculated from Eq. (8.22c) and Eq. (8.22d). However, the literature seems to contain no estimates of these LML spectra from the data.

G.E. Backus / Physics af the Earth and Planetary Interiors 98 (1996) 101-142

Our idealization is a very rough description of actual satellite data, but probably the data compression (8.19) or a generalization of it will be useful in treating actual data, In a realistic problem the satellite orbits will be confined to a spherical shell S(cl,c 2) consisting of all positions r outside some spherical surface S(c 1) and inside some larger surface S(c2). Let rl,...,ro/3 be the locations in S(c~,c2) where the three Cartesian components of B are measured, and define ri = ri/ri. If we choose c to satisfy c I < c < c 2, then Eq. (8.19a) Eq. (8.28b) Eq. (8.28c) can be applied directly to the real data. An improvement on Eq. (8.19a) Eq. (8.19b) Eq. (8.19c) is the following data compression:

127

Eq. (8.22c) Eq. (8.22d) Eq. (8.22e) is destroyed by the Holme-Bloxham effect, we expect that diagonal dominance will not be. In a realistic problem, the advantage of having a diagonally dominant error variance matrix is that it renders numerically tractable the matrix inversions and singular value decompositions required to implement Section 5. Calculating the error variance matrix in a realistic problem is, of course, more complicated than Eq. (8.22a) Eq. (8.22b) Eq. (8.22c) Eq. (8.22d) Eq. (8.22e) and Eq. (8.26a) Eq. (8.26b). From Eq. (8.28a) Eq. (8.28b) Eq. (8.28c), for example,

(l + l)(A + D/3

=(D/3)-2 E ~ ( r J c )

D/3

( 1 + 1)fi~ = ( D / 3 ) -1 •

~]'7(ri/c ) .B(ri)

i,j = 1

i=1

(8.28a) l~;" = ( O / 3 / - 1

0/3 E

VT(ri/c)"B(ri)

(8.28b)

i=1

l(l+

0/3 1)(21+ 1 ) - 1 ~ ' = ( D / 3 ) -I ]~

.W~t(ri/c )

i=1

.B(ri)

.E[~B(ri)~B(rj) ] .~fl~(rj/c)

(8.29)

with five similar results for E [ f i ? ~ ] , etc. If we accept Jackson's crustal model Eq. (8.12), then the second-order tensor E[BcB(ri)BcB(r~)] can be computed explicitly in terms of the Jc(c,l) in Eq. (8.12) as follows:

E[ BcB( r)~3cB( s) ] (8.28c)

If the data roughly approach our idealizations (i.e. if there is good geographical coverage for all three components of B), then we expect that Eq. (8.19a) F_x1. (8.19b) Eq. (8.19c) and Eq. (8.28a) Eq. (8.28b) Eq. (8.28c) will produce compressed error variance matrices that, although not diagonal, will be strongly diagonally dominant. We expect more diagonal dominance from Eq. (8.28a) Eq. (8.28b) Eq. (8.28c) than from Eq. (8.19a) Eq. (8.19b) Eq. (8.19c) because Eq. (8.28a) Eq. (8.28b) Eq. (8.28c) actually does diagonalize the compressed error variance matrix when there is no electric current in the shell S(cj,c 2) and when the distribution of data points is sufficiently dense and uniform that the sums in Eq. (8.28a) Eq. (8.28b) Eq. (8.28c) can be approximated accurately by volume integrals over S(c~,c2). The defense of this claim is essentially the same as our discussion of Eq. (8.19a) Eq. (8.19b) Eq. (8.19c). A test of our suggestions about diagonal dominance, on the other hand, must await numerical experience. For example, although the diagonality Eq. (8.22a) Eq. (8.22b)

~c

= E Jc(c,l)(cZ/rs)t+2Kt(P, g)

(8.30a)

i=1

where l =

E U[~(P)U["(g)

(8.30b)

m=-I

Appendix E derives Eq. (8.30a) and gives an explicit expression for the second-order tensor KI(P,~) in terms of four Gegenbauer polynomials in F.g. For some stochastic crustal models, the infinite sum in Eq. (8.30a) can be performed explicitly by Jackson's technique (Jackson, 1990).

9. Application to core geomagnetism

To illustrate trimming, we shall study the inverse problem of estimating the low degree Gauss coefficients of the magnetic field B x produced outside the core by currents inside it. The data will be satellite measurements of the Cartesian components of the

128

G.E, Backus /Physics of the Earth and Planetary Interiors 98 (1996) 101-142

total vector field B measured at locations cPi, 1 <_i < D/3, on a spherical surface S(c) concentric with the Earth. Following Cain et al. (1989) we take D = 50000 and c = 6791 km. We idealize the data as described in Section 8, and we use the notation introduced there. The various parts of the problem will be taken up in the following order: the data space, the compressed data space, the model space, the compressed data function, its singular value decomposition, and the prediction function. The problem is a QLMP in the sense of Section 7. The error estimates produced in this section will be intuitively obvious to most workers. The main purpose of the example is to check that trimming works as expected in an easy case. Perhaps some interest does attach to the fact that a prior bound as weak as the virial energy bound is useful if there are enough data. The data space Y consists of the real D-dimensional vectors y = {B(C~l), .. • ,B(cPo/3)}. We need a way to refer separately to the D/3 individual 3-vectors that make up y, so we write Bv(y;cPi), for the ith entry of y. Persuaded by Section 8, we compress the data. We choose a positive integer /` > 56, as in Eq. (8.28b), and we let the compressed data space 17 consist of the 7_,(/,+ 2)-dimensional vectors

.~=l¢(y)={~'~:l < l < / ` , - l < m < l }

From Eq. (9.2a) Eq. (9.2b) and Eq. (5.4) L

1

IlYll2= E o-(I) -2 E l=1

I~?(Y)I 2

For any /,-triangular indices l,m we define y~' ~ 17 by requiring for all /,-triangular indices l', m'

~t~'( yt~) = 811,Smm, 0-(l)

(9.4)

Then Eq. (9.3) makes clear that the set {y~': 1 < 1 < L , - l < m < l} constitutes an orthonormal basis for 17. We can bound IlYllby appealing to Eq. (5.16) and Eq. (8.6b) to conclude that if y - - / ¢ ( y ) then D/3

IlYll2 < 0-M2 E Igv( Y;CPi)l z

(9.5)

i=I

The model space X consists of all magnetic fields B K produced outside the core-mantle boundary S(a) by the currents inside it. Given x ~ X and spatial location r outside S(a), we write Bx(x;r) for the magnetic field produced at r by the model x. Defining Gauss coefficients as in Eq. (8.13a), we write F/~(x) for the Gauss coefficient g•(a) of the model x on S(a). For every location r outside S(a) :¢

l

B x ( x ; r ) = E ( a / r ) t+2 E

(9.1a)

l=1

-F/~(x)U["(r)

m=-I

(9.6)

where D/3

u7 = ( l + 1 ) - ' ( D / 3 ) -I •

~ t (Pi)

"Bv(y;cPi)

i=1

(9.1b) with U[" defined as in Eq. (8.8a). We call l,m '/`-triangular indices' if 1 < I _< 7, and - l _< m _< l. Given a compressed data vector .~ and particular /,-triangular indices 1,m we write the entry uT' in Eq. (9.1a) as ~'(.~). If we know J~'(.~) for all /,-triangular indices l,m then we know .~. As /` satisfies Eq. (8.23a) Eq. (8.23b), we shall accept the approximation proposed in Section 8, based on the assumed rotational invariance of the crustal statistics: E[Sr~'Sr~'

(9.3)

rn=-I

= 8tt,~mra, 0.( l) 2

(9.2a)

< o~ and - l < m < l . We consider two prior hard quadratic bounds on the correct Earth model x e, both of the form l

~_, a(l) l=1

E

]Ftm(x)[ 2
(9.7a)

m=-I

For the heat-flow bound (Parker, 1972; Gubbins, 1983; Backus, 1989)

Q,(1) = l - ' ( l + 1 ) ( 2 l + 1)(21 + 3) and qn = 3 ×

10 17 n T 2

(9.7b)

For the virial energy bound (Appendix A)

Qv( l) = (21+ l ) - ' ( l + l)

where

0.(l)2=Jc(c,l) + ( l + l)-~(D/3)-'o'~

Any particular model x is completely determined if we know all its Gauss coefficients Ft'~(x) for 1 < l

(9.28)

and qv = 1.15 ×

10 24 n T 2

(9.7c)

G.E. Backus / Physics of the Earth and Planetary Interiors 98 (1996) 101-142

The dot product induced on X by Eq. (9.7a) is l

x , . x 2 = q -1 ~_.,Q(I) l=l

129

Appealing to Eq. (E.12) in Appendix E, we can rewrite this result as (Backus, 1989)

~., 71fft(Xl)Ft"(x2)

z

.

^

2

m=-I

(9.8) Therefore one orthonormal basis for X is the set {:~7': 1 ~ l < ~, - 1 _< m < I} whose members are specified by requiring that for all l',m' such that 1 _< 1' < oc and - l' < m' _< l' we have

Fl~'( ~'~) = ~,,8,,,,,[q/Q(l)] ~/2

(9.9)

In our later discussion of the data function F we shall need a simple truncation _=' of (X,F), as in Eq. (7.1). Here we begin its construction by defining some common truncated subspaces XtL } of X" for every positive integer L let

XiL ] = span{x: Elm(x) = 0 if 1 > L}

(9.10)

For every location r outside S(a) and for every x ~ X , if 2.

X[L] =

[/XIL ]( x )

(9.11a)

then

=

I~L+l

(9.11c) Next we turn to the data function F : X ~ Y, which is specified by

F(x) = {sx(

,ex(x;c

j3)}

or, more succinctly, by

Bv( F( x);c?i) = Bx( x;c~i),

- l < i < D/3

We plan to use the compressed data function F = K O F obtained from the data compression Eq. (9.1 a) Eq. (9.1b). From Eq. (9.5) D/3

Ile(x[t])lt 2 <

E

2

i=1

2-

Bx(X[~l;r

_
so, from Eq. (9.1 l c),

Ile(x t,), 2

)

~c l I=L~+Il(a/r)/+2m=_lE

<11 xtL]ll 2- 2 ( D/3)qoM-~ Flm(X)U[n(~)

×

( 1 + 1)(21+ 1 ) Q ( l ) - ' ( a / c ) 21+4

~, I=L+I

(9.11b)

From this result it follows that Therefore, II#O//x~,Jl < YL( sr )

(9.12a)

where

= E I=L+I

F_, {q-'/2a(1)'/2rT(x)}

C= ( a / c ) 2

(9.12b)

TL( ~r )2 = ( D/3)qtr~2SL( ~ )

(9.12c)

m=-1

× {q,/2O( 1)-,/2( a/c),+ 2U/~ ( p)}

and Schwarz's inequality and an appeal to Eq. (9.8) then establish that

~c

(l+1)(21+1)Q(l)-'~

SL( ~ ) =

"t+2

l=L+ 1

IBx(xc ;c )l

(9.12d)

oc

-
qQ(1)-l(a/c) 21+4

I=L+ 1 l

x

E m=--]

IU,~(P)I 2

Using the heat-flow bound Eq. (9.7b) or the virial energy bound Eq. (9.7c) in Eq. (9.12a) Eq. (9.12b) Eq. (9.12c) Eq. (9.12d) gives us a simple truncation Eq. (7.1), as YL( ~") can be made arbitrarily small by choosing L sufficiently large. It is clear from Eq.

G.E. Backus / Physics of the Earth and Planetary Interiors 98 (1996) 101-142

130

(7.28b) and Eq. (7.33a) that very little additional accuracy can be obtained by making 3'L(sr ) much less than unity. For the heat-flow bound 3'L('~) < 1 as long as L > 32, whereas for the viriai bound 3'L(~') < 1 as long as L > 51. Thus, for both these prior bounds it is possible to choose L large enough to make 3' < 1 in Eq. (7.1) and small enough to satisfy Eq. (8.23a) Eq. (8.23b). The fact that L and /, satisfy Eq. (8.23a) Eq. (8.23b) greatly simplifies the singular value decomposition of the compressed data function F]XtL I. If l',m' are I,-triangular indices and XIL1~ XIL1 then ~" [/~(x)] can be approximated by an integral as in Eq. (8.2a) Eq. (8.2b). Therefore, from Eq. (8.1 la),

problem is a QLMP in the sense of Section 7, so we take Z = XtL1 for the prediction space and G = Hxt~ for the prediction function. As in Section 7 , ~ . desired predictions are

z'~=~'~ .xe=q-1/2Q(l)l/2Ft'~(xE)

(9.16)

If l,m are fixed L-triangular indices and .~' is defined by Eq. (9.9), then .~;~ ~ XtL1, so Eq. (9,13) applies to x = .if. Then, because of Eq. (9.9)),

Finally we describe the trimming of (X,F,G). For the trimmed model space we take Xr= Z=XIA 1. For the trimmed data function we take FT as in Eq. (7.3c), so as to permit regularization. For the trimmed prediction function we take G r = Ix. The trimming will provide a maximum error 8zf for its estimate of z~'. For the frequentist we define 8z;" by choosing a confidence level p(K) and then using Eq. (7.28b). For the Bayesian, we use Eq. (7.33b) to define 8zf = V(Sz~") 1/2. Then, if the regularization parameter 0 satisfies Eq. (7.28a) for the frequentist or Eq. (7.33b) for the Bayesian, each will agree that

uT'[/~( x;~)] = 8tt'gm~'(a/c) '+ 2q,/2 Q(l)-I/2

la z;"l < ~bA;-~

~ ' [ if( x)] --- ( a / c ) t'+2Ipt,m'(X)

(9.13)

(9.14) If l,m are /.-triangular indices, then Eq. (9.4) and the linearity of j~,':17~ R permit us to write Eq. (9.14) as

= ~;y'[ ( a/cl,+ 2ql/2Q( l)-'/20"(l) - ' y;"] As this equation is true for all /`-triangular indices l t , m ,t it implies = Aty/"

(9.15a)

where At = ( a / c ) t + 2 q l / 2 Q ( 1 ) - l / 2 0 - ( 1 ) - I

(9.17)

where the 'philosophy factor' ~b takes the value ( y 2 + 1)J/2 for Bayesians and 3'+K for frequentists. In both ~bs we have 3, = 3'L(~') as in Eq. (9.12a) Eq. (9.12b) Eq. (9.12c) Eq. (9.12d). Regularization parameters larger than Eq. (7.28a) or Eq. (7.33b) will produce larger errors. The error of interest to us is the relative error 8z'~/z'~ = 8g'~(a)/g~(a). The bound Eq. (9.17) on ~z;" is independent of m, and the Earth's g;"(a) vary widely with m, so we obtain a better sense of the relative error if we compare Eq. (9.17) with the r.m.s, of all the z~' with a given I. This r.m.s, is the square root of l

(9.15b)

If l,m are L-triangular but not /`-triangular then Eq. (9.14) implies F(x t )= O. The x'~ for which l,m are L-triangular constitute an orthonormal basis for X[L], and the y;n for which l,m are I',-triangular constitute an orthonormal basis for I7. Therefore Eq. (9.15a) is a singular value decomposition of 16:XIL1 17, the singular values being the At . Next we consider the prediction function G: X Z. We choose a positive integer A < L N/` and set ourselves the task of estimating Ft"(x e) for the A-triangular indices l,m. This geomagnetic inverse

((z;")2>m=(2t+l) -'

E

(z;") 2

rn=-I l

=q-'(2/+l)-lQ(l)



F/"(xe) 2

m=-I

According to the discussion of Eq. (8.16a) Eq. (8.16b) Eq. (8.16c) Eq. (8.16d), we expect approximately l

E

F/"(xE) z= ( l + 1 ) - ' R r ( a , l )

m=-I

=(1+ 1)-'(c/a)2t+4RK(c,1)

G.E. Backus / Physics of the Earth and Planetary Interiors 98 (1996) 101-142

where RK(c,I) is the 'core' part of the curve fitted to the observed LML spectrum R(c,l) by Cain et al. (1989). Therefore

= (2/+ l)-l(/+

1)-'q-'Q(l)(c/a)2'+4Rr(c,l) (9.18a)

Ioz?l mx2\l/2

< Op,

(9.18b)

(( Zl ) /m where

(Opt/4') 2 = (2l + 1)(I + 1)o(1)2RK(c,l) -' (9.18c) From Eq. (9.2b) and Eq. (8.18)

131

Table 1 The values of e(1)= lO0~pl/ck for those harmonic degrees 1 which make 8pt/~b < 1; here 8pt is the average relative error in the gauss coefficients of degree l, and ~b is the philosophy factor (see text)

l

e(I)

1 2 3 4 5 6 7 8 9 10

0.027 0.050 O.095 0.180 0.341 0.650 1.24 2.39 4.60 8.9O 17.3

11 12

33.7

13

66.1

( 2 / + 1 ) ( / + 1) o ' ( / ) 2

= R c ( c , l ) + ( 2 / + 1 ) ( D / 3 ) - ' tr2

(9.18d)

If we use o-M = 6nT 2 (Langel et al., 1982), we saw in Section 8 that for 1 > 40, where it is visible in the observed LML spectrum, the last term on the right in Eq. (9.18c) is about twice as large as the R~t(c,l) estimated by Cain et al. (1989). Cain et al. (1989) expressed reservations about their Gauss coefficients of such high degree, so we adopt the a priori error estimate of o-M given by Langel et al. (1982). Our final result for the relative error Eq. (9.18a) in g~'(a) is

philosophy factor 4, is of the order of unity for both Bayesians and frequentists, the values in the table are expressed as percentages, i.e. 100~p~/4,. To obtain per cent errors for the Gauss coefficients, one must multiply the entries in Table 1 by the appropriate philosophy factor, 4, = (3, 2 + 1) 1/2 for Bayesians and 4' = Y + K for frequentists. The entry in Table 1 for l = 1 is about 2.6 times too large because the observed value of Rr(c,l) is about 6.8 times larger that that computed from Eq. (8.16b). It is well known that the dipole is anomalously large.

+ (2l + 1 ) ( D / 3 ) - ' o-~

r,,(c,/) (9.18e) This result is probably what an experienced worker would propose intuitively from the identification Cain et al. (1989) made of the three parts of their fit to the LML spectrum R(c,l), except that we have replaced RM(c,l) by the priori value (2l + 1)(D/3)-IcrM2. This replacement is conservative for l > 40 and optimistic for lower l. It must be emphasized that Eq. (9.18d) involves extrapolating Rc(c,l) down to l < 15, where it is probably swamped by the core signal. At present, we know of no data to support this extrapolation. The problem is a very old one in main field geomagnetism. Table 1 uses Eq. (9.18e) to give the values of ~Pl/4' for those l that make g p J 4 ' < 1. As the

10. Conclusions In a typical geophysical inversion the model space X is infinite dimensional whereas the vector space Y of possible data vectors is necessarily finite dimensional. Therefore the data function F: X ~ Y is not invertible, even if one replaces Y by Yr = F(Xr). From the data, one can hope to predict only a finite number of numerical properties of the correct model x e. It is convenient to collect these properties into a finite-dimensional prediction vector Ze = G(xe), where G" X ~ Z is the prediction function. The need for a finite-dimensional approximation X r to the model space X has led us to a general formalism for inverse problems, a formalism we call a 'trimming' of the problem. A trimming is a quintuple

132

G.E. Backus / Physics of the Earth and Planetary Interiors 98 (1996) 101-142

(XT,PT,FT,QT,GT), where PT:X-'* XT and QT:Y YT are projectors (idempotents) and Fr:X r ~ Y and G r : X T ~ Z are approximations to the data and prediction functions. Modeling with regularization is shown to be an example of trimming, so that the formalism provides error estimates in such modeling. Trimming provides a natural way to choose the regularization parameter. It turns out that a small amount of regularization always improves the accuracy of the inversion, too much degrades it, and no amount improves it very much. Trimming requires no commitment to either the subjectivist or the objectivist interpretation of probability; both camps can use the results. The formalism does require a careful distinction between random and systematic errors in the data and prediction vectors, and between probabilistic and deterministic prior information about x e. Concerning the random error 8Ry in Y0 we know only that it can be regarded as drawn at random from a population in the data space Y whose probability measure 'OR is known. (The question of estimating a few unknown parameters in "OR from the data has been discussed by Backus (1989).) About the systematic errors ~sY in Y0 we know only that 8sy E Vs, a known subset of Y called the confinement set for 8sy. There are also trimming errors 3TY and 8TZ produced in trying to compute the correct data vector Ye and the correct prediction vector ze from PT(Xe), the trimmed version of x E. The input to the inversion consists of "OR, Vs, Yo and prior information about x e. The latter can be a confinement set Us c X for x e or a probability measure/.t e describing where x e was likely to be in X before the data vector Y0 was measured. The output of the general inversion scheme is an estimate z0 for ze and an expression for the error, ~z = z0 = zE, in the form 8z = Az + 8rz, where Az depends on 8Ry, 8sY, STY and Yo" In this general form the inversion is of interest mainly to objectivists, because it can be used to find confidence sets for 8z but is not easily adapted to untangling the non-linear interaction of random and systematic errors that is of interest to subjectivists. Subjectivists cannot 'soften' the confinement sets to probability distributions because almost certainly dim Y is so large that such softening generates spurious extra information. Appendix B discusses this phenomenon in some detail.

If the inverse problem and the trimming are linear, 8z = 8Rz + ~sZ + 8rz, where ~RZ is the random error produced by 8Ry, ~sZ is the systematic error produced by ~sY, and 8rz is the total trimming error produced by 8ry and ~TZ. If the prior information about x E is a confinement set, 8rz is a systematic error. If that prior information is a probability measure for x e, then 8rz is a random error. These errors are displayed in a form that makes it easy for objectivists to calculate confidence sets. If dim Z > 3, subjectivists who do not want to generate spurious 'data' through the inversion process may want to leave ~z as the sum of a random error with known probability measure and a systematic error with known confinement set. If dim Z < 2, many subjectivists will be willing to 'soften' the confinement set to a personal probability distribution and combine the systematic and random parts of 8z into a single random error. We conclude that, if the systematic errors are comparable with or larger than the random errors, it usually makes no sense to try to estimate correlations between predictions. The sole exception is the subjectivist who makes at most two simultaneous predictions. The data space Y can always be furnished with a dot product that makes the variance tensor of ~RY equal tO the identity tensor on Y. If the prior information about x e is a quadratic bound, that bound furnishes a dot product on X under which X can be completed to a Hilbert space. When these dot products on X and Y are both available and the inverse problem is linear, a trimming that accomplishes the inversion can be described explicitly. The main computational burden is the singular value decomposition of an N × D matrix, where N is the dimension of the truncated model space X r and D = dim Y. In some inverse problems, the value of D can be reduced very considerably by replacing Y0 with /~(Y0), where /~:Y~17 is a linear 'compression function' chosen so that dim I7 << D. As an example, we trim the inverse problem of using MAGSAT satellite measurements of the magnetic field vector B to estimate the low-degree Gauss coefficients (spherical harmonic coefficients) of the magnetic field B K generated outside the core by currents inside it. Full formulation of the inverse problem requires a model for the errors in the data and also prior information about the correct Earth

G.E. Backus / Physics of the Earth and Planetary Interiors 98 (1996) 101-142

model x e, in this case B K. We compare two priors, the heat-flow bound and the virial energy bound. To model the errors, we neglect magnetospheric currents and assume that the measurement sites are nearly uniformly distributed on a single spherical surface S(c). We neglect recent progress on satellite orientation errors (Holme and Bloxham, 1995), and treat the errors 8My in data measurement as independent identically distributed gaussian random variables. Following Jackson (1990), we model the magnetic signal ~c B from the crust as a random error, and we idealize it by assuming that the statistics of crustal magnetization are invariant under all rotations about the center of the Earth. To deal simultaneously with measurement errors and crustal signals, we invoke a linear data compression. In place of the measured data vector Y0, which consists of the Cartesian components of B at many locations on S(c), we use the coefficients of a truncated series of vector spherical harmonics. These coefficients are chosen so that with a uniform distribution of measurement sites the series approximates an error-weighted least-squares fit to the data. Thus the compressed data are preliminary estimates of the internal, external and toroidal Gauss coefficients of B on S(c). When the data are thus compressed, the measurement errors and crustal signal in our idealized problem have a diagonal variance matrix, so the trimming can be carried out analytically. As Cain et al. (1989) pointed out, the geomagnetic power spectrum estimated from MAGSAT data probably describes the crust only for spherical harmonic degrees l between 15 and 40, whereas measurement errors are probably detectable in the spectrum only for l > 40. To obtain error estimates for B K we need the crustal and measurement variances for 1 _< l < 14, so we must extrapolate them from the higher 1 where they are visible. Such extrapolation of the crustal variances to low ! is currently without empirical justification, and we commit it only to illustrate the trimming. Extrapolating the measurement errors is defensible because for l > 40 the spectral estimate by Cain et al. (1989) is within a factor of two of the a priori estimate for satellite magnetometer and orientation errors obtained by Langel et al. (1982) from an analysis of the MAGSAT instrumentation. Our model of the measurement errors makes them smaller for low l than is indicated

133

by a straightforward extrapolation of the term in the geomagnetic power spectrum which Cain et al. (1989) identified as measurement error. In our geomagnetic example of trimming, the heat-flow and virial energy bounds on B K provide essentially the same accuracy in the Gauss coefficients of BK, but the virial energy bound, being weaker by a factor of 107, requires more computation than the heat-flow bound. To make the trimming (truncation) error as small as the measurement error in a real MAGSAT inversion requires that the data be fitted by a truncated series of spherical harmonics up to degree 32 for the heat-flow bound and 51 for the virial energy bound. The errors in the Gauss coefficients depend only on total harmonic degree l, and those calculated by frequentists and Bayesians differ only through a 'philosophy factor' which is independent of I. If the crustal magnetic spectrum found by Cain et al. for 15 < 1 < 40 can be extrapolated down to 1 < l < 14, if magnetospheric currents can be neglected, and if our idealization of the MAGSAT data is reasonable, then Bayesians and frequentists will say that somewhere between degrees 11 and 13 the errors in the Gauss coefficients of B K become as large as the values of those coefficients themselves. Moreover, their different definitions of error will lead both camps to take 0.03% as a representative figure for the error in our current knowledge of the dipole moment of the core. None of the foregoing conclusions will surprise experienced geomagnetic modelers. The purpose of the example is to show how trimming and procrastination operate in a familiar case. New information about B x will come only when trimming and procrastination are applied to real data.

Acknowledgements Some of Appendix A reflects comments by Kurt Voorhies, and some of Appendix B arose from discussions with Andrew Walker; neither colleague is responsible for errors. This work was supported in part by NASA Grants NAGW 2967 and NAG5 2967 and in part by NSF Grant 8907988.

134

G.E. Backus / Physics of the Earth and Planetary Interiors 98 (1996) 101-142

Appendix

A

It is less well known but equally useful that

pg = 17. G

A.1. The virial bound on magnetic energy

where the Cartesian components of the gravitational stress tensor G are

In this Appendix our goal is to prove that

(21Zo)-'fe3dVS 2 < (87rG)-'fR3dVg 2

(A.1)

where /z 0 is the permeability of vacuum, B is the geomagnetic field, G is Newton's universal constant of gravitation, g is the Earth's gravitational acceleration field, and R 3 is all of three-space. There is an appealing physical argument for Eq. (A.1). The left side of that inequality is E B, the energy of the geomagnetic field, and its right side is - E c, where E G is the gravitational self energy of the Earth. The total energy of the Earth is E 8 + E c + K + U, where K is kinetic energy (mostly rotational) and U is internal energy. If we imagine the Earth to be a perfect electrical conductor, and disassemble it by expanding it uniformly to infinity at constant angular momentum, then each of the four terms in the total energy will tend to zero. The assembled Earth presumably has a smaller energy than its disassembled state, and K and U are positive, so we have Eq. (A.1). A more formal proof starts with the pre-Maxweil equations, obtained by neglecting the displacement current. The momentum equation for the matter in the Earth is then

pD, u = V . T + J × B + p g

(A.2)

where p is the density of matter, u is its velocity, T is its stress tensor, and J is the electric current density. If 0, denotes the time derivative at a point fixed in space, then D t = i ) , + u . ~ ' is the time derivative at a particle moving with the material. Therefore, if r is the position vector, then u = D,r, from which follows 1

r. Dtu = ~D2t r 2 - u 2

(A.3)

Another appeal to the pre-Maxwell equations yields the well-known result that J × B = V-M

(A.na)

where the Cartesian components of the magnetic stress tensor M are

MiJ ~-- tj~o l ( BiBJ - ~ B2 ~ij )

(A.5a)

(A.4b)

=

g,gj

__

g2~ij

(A.5b)

Eq. (A.5a) and Eq. (A.5b) follow from I 7 . g = - 4 7 r G p and 1 7 × g = 0. We write the Cartesian components of r as r i or ri and those of %r as 0 / or 0;, so that we can use the Einstein summation convention. The Cartesian components of Eq. (A.2) can now be written

pDtu i= OjQ ij

(A.6a)

where Qij =

TiJ + Mij + GiJ

(A.6b)

If we multiply Eq. (A.6a) by r i and sum on i, then an appeal to Eq. (A.3) and the product rule for derivatives gives 1

-~ pD2 r2 - pu2 = c3j(riQiJ) - SijQ ij

(A.7)

We want to integrate Eq. (A.7) over all of R 3. If

OV is the surface of the region V occupied by the Earth, we can integrate Eq. (A.7) over V and R 3 \ V separately and use Gauss's divergence theorem to cancel the contributions of Oi(riQ ij) from the two sides of 0V. Thus we obtain 1

~ fR3dVpD2r 2 = fR~dVpu 2 -- fR,dV~3ijQ ij

(A.8)

For any scalar field f , we can write ft<3dVpf as fR3dmf where dm = pdV is an element of mass. If dm is always the same parcel of mass in the moving material, d m is independent of time, so ( d / d t ) f R 3 d m f = fR3dmDtf. Therefore

d -~t fR~dVpf = fn3dVpDtf

(A.9)

Throughout most of the Earth, 8o.TiJ = - 3p to high accuracy, and 8ijM ~j= - B 2 / ( 2 / z 0 ) and ~ijG ij= gZ/(87rG). Thus, applying Eq. (A.9) twice to the

G.E. Backus / Physics of the Earth and Planetary Interiors 98 (1996) 101-142

left side of Eq. (A.8) and noting that p and p vanish outside V, we obtain

1 d2 2 dt 2 fvdVpr2 = 2 K + 3fvdV p + E M + E G (A.10) The left side of Eq. (A.10) is negligible compared with the pressure integral on the right, even in a large earthquake, and p > 0 , K > 0 , so Eq. (A.10) implies Eq. (A.1). Eq. (A.10) is the scalar virial equation. For other derivations of versions of it, the reader is referred to Chandrasekhar (1961, p. 577) and Parker (1979, p. 58). If B is given by Eq. (8.13c), it is easy to convert Eq. (A.1) to the form Eq. (9.6). For Q(1) the result is

Qv(l) = (2/+ 1)-'(/+

1)

ian personal probability measure on X? How much new 'information' does this bound softening generate? Unless the density p ( x ) of the personal probability measure is a function of I[xll alone, it will single out some directions in X, generating new 'information' unjustified by Eq. (B.1). Therefore, as p is gaussian, there must be a constant 7 such that

p( x) = ( y / 2 r r ) N / 2 e x p ( - T l l x l [ 2 / 2 )

(A.1 lb)

Here S(a) is the CMB and S(b) is the surface of the Earth.

It follows that THxI[2 has a X 2 distribution with N degrees of freedom. The Bayesian literature discusses at length but does not settle how to choose 3' (Jackson, 1979; Gubbins, 1983; Backus, 1988a). We will leave 3' unspecified as our argument is independent of its value. We will study the random variable s = 3,llx112/2

Appendix B B.1. Risks in gaussian bound softening In spaces of high dimension, 'softening' a quadratic bound on a vector x by imitating the bound with a personal probability measure generates very precise information about x that comes not from the bound but from the softening process (Backus, 1988b). For spaces of low dimension, the extent of this problem can be studied quantitatively in the special case that the personal probability measure is gaussian. In the present Appendix, we will consider a vector x about which we know only that it belongs to a real dot product space X of dimension N < ac and that

(B.1)

When N is not extremely large, what are the consequences of trying to imitate Eq. (B.1) with a gauss-

(B.3a)

The probability density function for s is (Kendall and Stuart, 1977, p. 397)

p ( N , s ) = F ( N / 2 ) -~ s u / 2 - ' e -s

(B.3b)

The mean and variance of s are both N / 2 , and, for large N , p ( N , s ) is asymptotically gaussian with that mean and variance. Therefore for large N

1 - a ( 2 / U ) '/2 < 2 s / U < 1 + a ( 2 / U ) 1/2

Ilxll _< 1

(B.2)

(A.1 la)

If we adopt for the gravitational acceleration g(r) the simple but fairly accurate approximation that g ( r ) = ( r / a ) g ( b ) for O<_r~a, g ( r ) = g ( b ) for a <_r <__b, and g(r) = ( b / r ) Z g ( b ) for b _< r, then qv = 1.15 × 10 24 nTz

135

(B.4)

with probability 4' = e f t ( a / ~ / 2 ) . We summarize this situation by saying that Eq. (B.4) is true at confidence level 4' (this statistical neologism is discussed further in Appendix C). However, whenever Eq. (B.4) is true, the ratio of the largest to the smallest admissible s is

(u.s) As s is a constant multiple of llxll 2, at confidence level 4' we know [Ixtl2 to within the factor r(N,4'). For fixed 4' and large N, r(N,4') is very close to unity. Therefore, softening Eq. (B.1) to Eq. (B.2) enables us to extract, apparently from Eq. (B.I) and at confidence level 4', the actual value of Ilxll 2 with a very small per cent error. This paradox is the objection to bound softening when dim X is large. Softening the upper bound Eq. (B.1) introduces a lower bound for IIxll that is generated entirely from

G.E. Backus / Physics of the Earth and Planetary Interiors 98 (1996) 101-142

136

the belief that the upper bound can be described probabilistically. The question remains whether such a paradox persists when N is not large. To pursue this question, let us suppose that Eq. (4.3) is the probability density for s. Let us fix N, 4' and s B, and let us choose sr(N,4',s 8) so that with probability 4'

s B < s <_ s r

(B.6)

If Eq. (B.6) holds, then we know s to within the factor

r( N,4',ss ) = sr( N,4',sB ) / s B

(B.7)

If we minimize r(N,4',s B) with respect to s 8, we obtain the boldest assertion permitted by Eq. (B.2) for the given N and 4': at confidence level 4' we know Ilxll 2 to within a factor

r( N,4' ) = min sr( N,4',sB ) / s B

(B.8)

the minimum being taken with respect to s B. Table 2 gives r(N,4') for various dimensions N and confidence levels 4'. That table means, for example, that if N = 30 then replacing Eq. (B.1) by Eq. (B.2) and choosing 3' provides us with the value of I[xll2 to within a factor of 2.5, 3.0 or 5.4 according as we accept a confidence level of 0.9, 0.95 or 0.99. This conclusion is independent of the value we choose for y. If Eq. (B. 1) really is all we know about x, purists of the objective school of probability will use Table 2 to reject Eq. (B.2) as a substitute for Eq. (B.1),

whatever the value of N. Subjectivists who accept hypotheses verified at a confidence level of 0.95 might be willing to substite Eq. (B.2) for Eq. (B.1) when N = 1 or 2, especially if they could convince themselves that they really were able to estimate Jlxl[2 to within a factor of 2500 or of 112. When N > 3, softening introduces a lower bound on Ilxll 2 high enough that whether to soften Eq. (B.1) to Eq. (B.2) begins to be a personal matter. Of course, this is not a drawback for subjectivists. In any case, Table 2 will provide each subjectivist with a quantitative basis for deciding whether to soften the hard bound Eq. (B. I). There is one more argument against softening a hard bound when N > 3. The density function Eq. (B.3b) vanishes at s = 0 for N > 3 but not for N = 1 or 2. Thus, accepting Eq. (B.2) when N > 3 really does express a disbelief in values of II xll 2 considerably less than unity. As Bayesians sometimes advocate softening Eq. (B.1) to a uniform probability measure in the unit ball rather than a gaussian, it is worth pointing out that at a given confidence level the uniform measure produces confidence intervals for Ilxll even shorter than those given in Table 2. Thus, the uniform softening generates more spurious prior information than the gaussian. The underlying difficulty is purely geometrical. In spaces of dimension even as low as three or four, most of the volume of the unit ball lies very close to the surface, and the problem becomes worse in higher dimensions.

Table 2 The factor r(N,4') to within which Ilxl]2 is specified by gaussian softening of a hard quadratic bound, as a function of the dimension N of x and the acceptable confidence level 4'

U 1 2 3 4 5 10 20 30 50

4' 0.5

0.9

0.95

0.99

9.8 4.4 3.2 2.7 2.4 1.85 1.54 1.42 1.31

520 47 19.8 12.4 9.2 4.6 2.9 2.4 1.94

2500 112 38 21 14.5 6.2 3.5 2.8 2.2

8400 760 150 64 38 11.4 5.3 3.9 2.8

100

1.21

1.59

1.74

1000 :~

1.06

1.16

1.19

2.1

1.26

1

1

1

1

G.E. Backus / Physics of the Earth and Planetary Interiors 98 (1996) 101-142

Appendix C C.1. Notation If U, V and W are sets, p ~ V means that p is a member of V, and U c V means that U is a subset of V. If U~, - . . ,Us are sets then U~ x . - . x Us , their Cartesian product, is the set of all ordered N-tuples (u t , . . . , u s ) with u i ~ U i, i = l , . . . , N . The set V U W consists of all objects belonging to V or W or both. The set V A W consists of all objects common to V and W. The set W \ V consists of all objects in W and not in V. If x and y are real numbers, we shall denote the lesser of them by x N y and the greater by x U y. The context will prevent confusion with the n and U of set theory. The set with no members is written O. The set whose only members are u j , . . ' , U N is written (Ul,''.,UN). The set whose members are all the subsets of X is written 2 x. If p(x) is a statement about objects x, then { x: p(x)} is the set of all objects for which p ( x ) is true. (To avoid Russell's paradox, the xs are understood to be some of the members of a previously defined set.) If U and V are arbitrary subsets of a vector space Y, then U + V is the set of all vectors in Y that can be written as u + v for at least one u ~ U and one v ~ V. The set U - V is defined similarly from u - v. The sets {u} + V and {u} - V are usually abbreviated as u + V and u-V. If U is a subset of vector space Y then span U is the set of all finite linear combinations of members of U; span U is a subspace of Y. If X and Y are sets, a function f: X ~ Y is a rule that assigns to each x ~ X a unique value f ( x ) ~ Y: multivalued functions are disallowed. The sets X and Y are called the domain and codomain of f, and x is the 'argument' of f. The notation f ( x ) always stands for a value of the function f, never for the function itself. The function is sometimes written f ( . ) . Thus, for example, if f : U × V--* Y and uoEU and v 0 ~ V, then f ( u 0 , . ):V ~ Y and f ( . ,v0):U ~ Y are defined as follows: f(uo,. )(v)=f(u0,v) for all v ~ V, and f(.,Vo)(U)=f(u,v o) for all u ~ U. If U c X then f ( U ) is the set of all y ~ Y such that y = f ( x ) for at least one x E U. The set f ( X ) is the 'image' of f : X ~ Y. If V c Y then f - ~ ( V ) is the set of all x ~ X such that f ( x ) ~ V. The notation does not require or imply that the inverse function f - ~ : Y

137

~ X exists. However, the function f - ~ : 2 r - ~ 2 x always exists. When f - i exists in both senses, the two meanings of f - ~ ( V ) agree. If for each y ~ f ( X ) , f - l ( { y } ) contains only one member, f is an 'injection'. If f ( X ) = Y, f is a 'surjection'. If f is both, it is a 'bijection,' and then it has an inverse,

f-t:y~x. If f : X ~ Y and U c X then flU, the 'restriction' of f to U, is that function g:U--->Y such that g ( x ) = f ( x ) whenever x ~ U. If xq~ U, g(x) is not defined. If A,B,C are sets and p: A ~ B and q:B C then the 'composite' of p and q, written qOp, is that function qOp: A ~ C such that for each x ~ A, (qOp)(x)---q(p(x)). Usually (qOp)(x) is abbreviated qOp(x). If D is another set and r:C ~ D is another function, clearly rO( qO p ) = ( rO q )O p. If U is any set then the identity function on U, Iu:U U, is defined by requiring that I u ( u ) = u for each u E U. If f : U ~ V then clearly f = I v O f = f O I v. If f:U ~ V has an inverse, f - l : V ~ U, then f O f - t = I v and f - l O f = I v. We denote the set of all real numbers by R. A real-valued function f:U ~ R is a 'functional' on U. If Y is a real vector space, X is any set, and f : X ~ Y , g : X ~ Y , and a , b ~ R , we define the function (af+ bg):X---> Y by requiring for each x ~ X that

( af + bg)( x) = af( x) + bg( x)

(C.I)

Now we turn to probability measures. If X is any set, a 'or-algebra' on X is an object Ex with these properties: (1) Ex is a set whose members are some of the subsets of X; (2) X ~ E x ; (3) if U,V~'Z x then U \ V ~ E x ; (4) if V i ~ E x for i = 1 , 2 , . . . then VI U V2 U V3 U • • • ~ Ex- A probability measure on X is a pair ( ~ , E x ) in which Ex is a o--algebra on X and ix:Y~x--->R. The function /x must have these additional properties: (1) p , ( X ) = 1; (2) / x ( U ) > 0 if U ~ , x ; (3) /Z is countably additive; that is, if VI , V 2 , " " " ~ ] ~ x a n d V i i"1 V j ~ (~ f o r all i 4 : j then k~(VI U V 2 U V 3U . - . ) =

]~p.(Vi). i=l

The sets belonging to ~x are the '/z-measurable' subsets of X. Halmos (1950) gave the theory of integration with respect to measures. If ( / x , ~ x) is a probability measure on X and if the function f : X -* R is /z-integrable, then its integral is written fxdlx(x)f(x). We abbreviate the integral as E [ / x , f ] and call it the expected value or mean of f with

138

G.E. Backus / Physics of the Earth and Planetary Interiors 98 (1996) 101-142

respect to /x. When there is no danger of confusion, we write simply E[f]. Let us suppose that Ex and E r are it-algebras on X and Y, and F: X - * Y. We say that F is 'measurable' if F - I ( v ) E ~ x whenever V ~ ' Z r. If F is measurable, it defines a function F - l : ~ r ~ ~ x that exists whether or not F has an ordinary inverse F - I : Y --~ X. If (/x,E x) is a probability measure on X and F : X - * Y is measurable and 7/:/./,OF -1, then (r/,Er) is a probability measure on Y. For any V E ' Z r , r / ( V ) = t z ( F - I ( V ) ) . The measure r I is 'induced' on Y by F and /x. Induced measures give a succinct expression to the formula for changing variables of integration. If f : Y -> R then E [ / z , f O F ] = E[~l,f], or

E[ t z , f O F ] = E[ i x O F - l , f ]

(C.2)

We propose an ellipsis which slightly extends the language used to describe confidence sets. Let us suppose x is a real-valued physical quantity which we want to measure with an apparatus whose random error we know to be normal with mean zero and standard deviation or. Let us suppose we find the sample mean of N measurements of x to be M. The sample mean is a normal random variable with mean x and standard deviation o-/N 1/2, so the probability that it lies further than 2 o ' / N ~/2 from x is 0.0456. Thus, either

M-

2 t r / N j/2 < x < M + 2 o ' / N I/2

(C.3)

or an event has occurred whose probability is less than 0.0456. The usual way for a frequentist to report this finding is that at a significance level of 0.0456 the value of x lies in the confidence interval given by Eq. (C.3). We suggest that for some purposes it is intuitively clearer and simpler to say that at a confidence level of 0.9544 we can believe that x satisfies Eq. (C.3). Many statisticians would prefer to ascribe the confidence level to the test which produces Eq. (C.3) rather than to that conclusion itself, but the literature would appear to permit our ellipsis (Feller, 1968, p. 189; Freedman et al., 1991, p. 348). In the present paper, if Y is a finite-dimensional real vector space, then ~v will always be the set of all Borel subsets of Y; that is, E r is the smallest a-algebra that includes as members all open subsets of Y. Therefore, if Y and Z are finite-dimensional real vector spaces and H:Y---> Z is continuous, H is

measurable (Halmos, 1950). In particular, if H is linear it is measurable. A dot product on a vector space X is a symmetric, scalar-valued bilinear form on X. A dot product space is a vector space together with a particular dot product. A real Hilbert space is a real dot product space that is complete in the norm defined by the dot product. The following remarks paraphrase Halmos (1951). For any subset U of a real dot product space X we define U l , the orthogonal complement of U, to be the set of all x ~ X such that x . u = 0 for every u ~ U. If X is a Hilbert space, U ± is a closed subspace of X. If U is a closed subspace of Hilbert space X then (U ±) ± = U, and for each x ~ X there are unique vectors u( x) ~ U and u-L ( x ) ~ U ± such that x = u ( x ) + u ± (x). Then we can define a function I I u : X ~ U by setting I I u ( x ) = u ( x ) f o r all x ~ X. This function is the orthogonai projector of X onto U. It is linear, idempotent (i.e. 17v O H v = [ i v ) and symmetric (i.e. x I . H u ( x 2) = x 2 . [iu(Xl)]. Clearly, lit: O l i v ± = [ i v ~ C) [iu = O, and l i v + [I v± =I x . In a dot product space X the open solid ball of unit radius centered on the origin will be written B x. It is the set of all x ~ X such that IIxll < 1. If Y and z are finite-dimensional real dot product spaces and F:Y ~ Z is linear, th.en F T : z + Y is the unique linear function such that for every y e Y and every z ~ Z

F(y) "Z=y'Fr(z)

(C.4)

F r is the 'adjoint' or 'transpose' of F. Clearly (FT) r = F. If either F or F r has an inverse, so does the other, and ( F r ) r. In that case we abbreviate both ( F r ) -1 and ( F - I ) r as F - r . If W is another finitedimensional real dot product space and G:W--0 Y is linear, then ( F O G ) r = G r O F r. Finally, we comment on the observation following Eqs. (8.22a)-(8.22e). If Hi(r) is any harmonic polynomial homogeneous of degree l in r, then on S(1) the horizontal wavelength of H t is approximately 4 7 r ( 2 / + 1 ) - I ; i.e. the horizontal wavenum1 ber is approximately l + - - , In fact it is k = [l + 2 1

1)]1/2, which is nearly l + - for large l and is in

2 error by only 6% even for l = 1. The argument is simple. In the notation introduced in Eq. (8.8a)-

G.E. Backus / Physics of the Earth and Planetary Interiors 98 (1996) 101-142

(8.8c), H t satisfies V~H t + k2Ht = O. The Beltrami operator V~, being the angular part of the Laplacian, is invariant under rotation of coordinates. For every ~ S(1) we can choose spherical polar coordinates that put P on the equator. There, V 2 = ~02+ 02, with 0 being colatitude and A being longitude. However, near the equator 0 and A constitute a local Cartesian coordinate system in S(1). Thus, at every point P on S(I), Ht(~) satisfies the two-dimensional Helmholtz (wave) equation with wavenumber k.

Appendix D

automatically complete. If dim X = ~, we must assume completeness, and we do so. Next, let us suppose that Y is another real vector space, and dim Y < ~. We endow Y with the unique topology that makes it a topological linear space (Dunford and Schwartz, 1958, p. 372), i.e. the ordinary Euclidean topology in which convergence of a sequence of vectors means convergence of all their component sequences relative to each basis for Y. Let us suppose that L: X--* Y is linear and continuous. (Continuity follows from linearity if dim X < oo.) We will show that there is a pds bilinear form 6 on L ( X ) such that

L( E l ( X , Q ) ) = E l ( L ( X ) , 6 )

D.1. Linear images of solid ellipsoids

139

(D.3)

That is, L(EI(X,Q)) is a solid ellipsoid centered on In this appendix we show that the linear image of a solid ellipsoid centered on the origin is another such ellipsoid, and we discuss briefly how to find the image. This image question arises in several contexts when we calculate confinement sets, so we will try to keep the discussion as general as possible. We begin with a real vector space X about which we know nothing else. Its dimension may be finite or infinite, and it may or may not have a topology, a norm or a dot product. In such an 'unfurnished' vector space the simplest definition of a solid ellipsoid centered on 0 is this: let Q: x × X ~ R be any positive definite symmetric (pds) bilinear form on X, just as in Eq. (4.1). We let EI(X,Q) be the set of all x ~ X such that

a(x,x) < 1

(D.1)

Any subset of X constructed in this way from a pds bilinear form on X will be called a solid ellipsoid in X centered on 0. Given any such ellipsoid, we can use it to furnish X with a dot product, namely

x, . x~ = Q( x,,x2)

(D.2)

Under this dot product and its norm, EI(X,Q) is the solid unit ball consisting of all x ~ X such that Ilxll < 1. To find the linear images of EI(X,Q) we will need the orthogonal projectors of X provided by Eq. (D.2) onto the closed subspaces of X. Therefore we ask that X be complete in the norm generated by Eq. (D.2). That is, X must be a real Hilbert space with dot product Eq. (D.2). If dim X < ~ , X is

0 in L(X). To find Q, we let NL be the null space of L, the set of all x E X such that L ( x ) = 0. As L is continuous, NL is a closed subspace of X. Under Eq. (D.2), we let P ± : X ~ NL be the orthogonal projector of X onto NL, and let P = Ix - P ±. Then P is the orthogonal projector of X onto Nl.± . We define K: P ( X ) ~ L ( X ) as

K = LIP(X)

(D.na)

We claim that K has an inverse,

K - ' : L ( X ) ~ P( X )

(DAb)

To prove this, we show that K is both injective and surjective. As K is linear, to show that it is injective we need show only that x E P ( X ) and K ( x ) = 0 imply x = 0. However, if x ~ P ( X ) then x ~ N t ± , whereas if K ( x ) = 0 then L ( x ) = 0 so x ~ N L. As NL A NL± = {0}, the conclusion follows. To prove that K is surjective, let us suppose that y ~ L ( X ) . Then y = L ( x ) for some x ~ X . Thus y = L O P ( x ) + L O P ± ( x ) . As P ± ( x ) ~ N L , therefore L O P ± ( x ) = O , and y = L O P ( x ) . However, P ( x ) ~ P ( X ) , so L O P ( x ) = K(P(x)). Thus y E K(P(X)), and K is surjective. Knowing that K - I exists, we can now define Q : L ( X ) × L - ( X ) - - * R as

6(Yl,Y2) = Q ( K - t ( Y l ) , K - I ( Y 2 ) )

(D.5)

for all y l , y 2 E L ( X ) . Clearly,Q is a pds bilinear form on L(X). To prove Eq. (D.3), first we show that L ( E I ( X , Q ) ) c E I ( L ( X ) , 6). l e t us suppose

140

G.E. Backus / Physics o f the Earth and Planetary Interiors 98 (1996) 101-142

that y ~ L(EI(X,Q)). Then y = L(x) for some x ~ X that satisfies Eq. (D.1). However, L(x)= L O P ( x ) = K(P(x)), so P(x) = K-l(y). Moreover, IIP(x)ll

0 ( y , y ) < 1. We let x=K-~(y). Then x ~ P ( X ) , and Q(x,x) < 1. Thus y--- K(x) for an x L(EI(X,Q)). This completes the proof.


When dim X = ~c, the foregoing calculations require summing infinite series, and cannot be done exactly on a computer. The remedy is to construct a trimming as in Section 3 and Section 6, as if (X,L,L) were an inverse problem.

or Q(K-I(y),K - t(y)) < 1. Hence Q(y,y) < 1, and

y ~ El(L(X),O). To prove that EI(L(X),O.) c L(EI(X,Q)) we suppose that y~EI(L(X),Q). Then y ~ L ( X ) and

Appendix E E.1. Error variance of real data from a stochastic crust with rotationally invariant statistics We let S(c) be any spherical surface outside the Earth. From Eq. (8.13c) the field 8cB(r) produced by the crust at position r outside the crust is l

~cB(r) = E ( c / r ) '+2 E l=l

8cU'~(c)U,'~(?)

(E.1)

m=-I

As ~c B is real, Eq. (E.1) implies

E [ S c B ( r ) S c B ( s)] = E Y'~ (c/r)t+2(c/s)a+2Ut~(~)U~'( g)E[Scfi'Tm(C)~cU~(C)] I,A m,u

Inserting Eq. (8.12) into this equation gives 1

E[ScB(r)ScB(s)] = E (c/r)t+2Jc(c/l) l=1

E utm(r)Utm(~)

(E.2)

rn=l

Then Eq. (8.30a) follows from Eq. (8.30b). It remains to express the K t of Eq. (8.30b) in terms of Gegenbauer polynomials C~ (Erdelyi et al., 1953, p. 174ff.). From Eq. (8.8a)

U'-'["(P)ut'n(g)=[P(l+ 1 ) - V,r] [,~(l + 1 ) - V,s] {/~["(P)H~(g)}

(E.3)

where Vlr is the surface gradient V l of Eqs. (8.8a)-(8.8c) when r is the independent variable, and similarly for V~s. Summing Eq. (E.3) over m and using the addition theorem for the Schmidt normalized spherical harmonics H~, we obtain

Kt(P,g ) = [ P ( I + 1 ) - V1r] [.~(l + 1 ) -

171s]Pt(/Z)

(E.aa)

where /z = P-~

(E.4b)

and Pt is the lth Legendre polynomial. It remains only to compute the derivatives in Eq. (E.4a). We abbreviate Pt(/.t) as Pt and note that V~,£ = 0. Therefore, following Backus (1986, Backus, 1988a), Eq. (E.4a) can be written K,(P,$) = P~(I + l ) 2 p t - ( l + 1)PVlsPt - (1 + 1)( ~rlrPl)S "~ ~rlrVlsP I AS 171r Pt( Ix) = OuPI(/z)171r/Z, we need to compute 171,/.t. We begin by observing that 17r = I(3)

(E.5)

G.E. Backus / Physics of the Earth and Planetary Interiors 98 (1996) 101-142

141

^~ ^^0 ^^ where •(3) is the identity tensor in three space. As r%r= rr ~r = rr, the foregoing equation implies that i~ + r- llTjr = •(3). However, VI r = ~71 r ~ = rV 1P, so

(E.6)

V I ~ = / (3) - - ~'~

This V l is V It, so dotting g on the right in Eq. (E.6) and using Eq. (E.4b) gives (E.7a)

~71 r ]'£ ~- "~ - - ~l.£r

By symmetry, V~r/z = ~ - / x g

(E.7b)

With the help of Eq. (E.6) and Eqs. (E.7a) and (E.7b) it requires only simple algebra to obtain from Eq. (E.5) the equation

Kt( ~,~ ) = lO)O~Pt- ( ~ + ~)O~[ I.zO~Pl+ (1+ 1)P/] + ( P~ + ~)O2Pt +Pg[(l + 1)2p, + (21 + 3)/xOu P ' + ( / ~ 2 _ 1)Ofpt ] Invoking Legendre's equation, we can eliminate Otz2pt in the last term and obtain

K,(P,g) = l(3)a~ Pt- ( P~ + s'~) am[/,a~ P, + ( l + l) P,] +( n + g~)O~P, + n ( 2 1 + 1)[ lxO,P t + ( l + 1)P,]

(E.8)

However Pl = C]/2, and the recursion relations for Gegenbauer polynomials (Erdelyi et al., 1953, p. 176) include these:

O,CI~ = 2vC~+t-ii

(E.9a)

( I*0~,+ 1 + 2v)C~" = 2vC~ '+'

(E.9b)

These relations reduce Eq. (E.8) to the final result

K,(~,e) = I(3)C 3/2'_,- ( P P + ss)3Ct_"" 5/2, + ( ~ + ~)3C~/2_ + E~(21 + 1)C 3/2

(E.10)

The argument of each Gegenbauer polynomial C~ in Eq. (E.10) is the /, of Eq. (E.4b). A special case of Eq. (E.10) will be useful. Erdelyi et al. (1953) pointed out that C~(1) =

(1+2v-

1)!

/ ! ( 2 v - 1)!

(E.11)

If we set ~ = g in Eq. (E.10), invoke Eq. (E.l l), and then take the trace, we obtain (Backus, 1988a) l

IU,m(P)I 2 = ( 1 + 1 ) ( 2 l + 1) m=l

(E.12)

142

G.E. Backus / Physics of the Earth and Planetary Interiors 98 (1996) 101-142

References Backus, G.E., 1970a. Inference from inadequate and inaccurate data, I. Proc. Nat. Acad. Sci., 65: 1-7. Backus, G.E., 1970b. Inference from inadequate and inaccurate data, II. Proc. Nat. Acad. Sci., 65: 281-287. Backus, G.E., 1970c. Inference from inadequate and inaccurate data, III. Proc. Nat. Acad. Sci., 67: 282-289. Backus, G.E., 1970d. Non-uniqueness of the external geomagnetic field determined by surface intensity measurements. J. Geophys. Res., 75: 6339-6341. Backus, G.E., 1975. Gross thermodynamics of heat engines in deep interior of earth. Proc. Nat. Acad. Sci., 72: 1555-1558. Backus, G.E., 1986. Poloidal and toroidal fields in geomagnetic field modelling. Rev. Geophys., 24: 75-109. Backus, G.E., 1988a. Bayesian inference in geomagnetism. Geophys. J., 92: 125-142. Backus, G.E., 1988b. Comparing hard and soft bounds in geophysical inverse problems. Geophys. J., 94: 249-261. Backus, G.E., 1989. Confidence set inference with a prior quadratic bound. Geophys. J., 97:119-150. Backus, G.E. and Gilbert, J.F., 1967. Numerical applications of a formalism for geophysical inverse problems. Geophys. J. R. Astron. Soc., 13: 247-276. Cain, J.C., Wang, Z., Schmitz, D. and Meyer, J., 1989. The geomagnetic spectrum for 1980 and core-crustal separation. Geophys. J., 97: 443-447. Chandrasekhar, S., 1961. Hydrodynamic and Hydromagnetic Stability. Oxford University Press, Oxford, 654 pp. Constable, C.G. and Parker, R.L., 1988. Statistics of the geomagnetic secular variation for the past 5 m.y.J. Geophys. Res., 93: 11569-11581. Donoho, D.L., 1994. Statistical estimation and optimal recovery. Ann. Stat., 22: 238-270. Dunford, N. and Schwartz, J.T., 1958. Linear Operators, Part I. Interscience, New York, 858 pp. Erdelyi, A., Magnus, W., Oberhettinger, F. and Tricomi, F.G., 1953. Higher Transcendental Functions, Vol. 2. McGraw-Hill, New York, 396 pp. Feller, W., 1968. An Introduction to Probability Theory and its Applications, 3rd edn., Vol. 1. Wiley, New York, 509 pp. Franklin, J., 1970. Well-posed stochastic extensions of ill-posed linear problems. J. Math. Anal. Appl., 31: 682-716. Freedman, D., Pisani, R., Purves, R. and Adhikari, A., 1991. Statistics, 2nd edn. W.W. Norton, New York, 514 pp. Golub, G.H. and van Loan, C.F., 1983. Matrix Computations. Johns Hopkins University Press, Baltimore, MD, 476 pp. Graves, L.M., 1946. The Theory of Functions of Real Variables. McGraw-Hill, New York, 300 pp. Gubbins, D., 1983. Geomagnetic field analysis--l. Stochastic inversion. Geophys. J. R. Astron. Soc., 73: 641-652. Gubbins, D., 1984. Geomagnetic field analysis--lI. Secular variation consistent with a perfectly conducting core. Geophys. J. R. Astron. Soc., 77: 753-766.

Gubbins, D. and Bloxham, J., 1985. Geomagnetic field analysis-IIL Magnetic fields on the core-mantle boundary. Geophys. J. R. Astron. Soc., 80: 695-714. Halmos, P.R., 1950. Measure Theory. Van Nostrand, New York, 304 pp. Halmos, P.R., 1951. Introduction to Hilbert Space. Chelsea, New York, 114 pp. Halmos, P.R., 1958. Finite Dimensional Vector Spaces. Van Nostrand, New York, 199 pp. Holme, R. and Bloxham, J., 1995. Alleviation of the Backus effect in geomagnetic field modelling. Geophys. Res. Lett., 22: 1641-1644. Jackson, A., 1990. Accounting for crustal magnetization in models of the core magnetic field. Geophys. J. Int., 103: 657-673. Jackson, A., 1994. Statistical treatment of crustal magnetization. Geophys. J. Int., 119: 991-998. Jackson, D., 1979. The use of a priori data to resolve non-uniqueness in linear inversion. Geophys. J. R. Astron. Soc., 57: 137-158. Jeffreys, H., 1961. Theory of Probability. Oxford University Press, Cambridge, 459 pp. Jeffreys, H., 1969. Cartesian Tensors. Cambridge University Press, New York, 92 pp. Kendall, M. and Smart, A., 1977. The Advanced Theory of Statistics I. Macmillan, New York, 472 pp. Kendall, M. and Stuart, A., 1979. The Advanced Theory of Statistics II. Macmillan, New York, 748 pp. Langel, R.A. and Estes, R.H., 1982. A geomagnetic field spectrum. Geophys. Res. Lett., 9: 250-253. Langel, R.A., Ousley, G., Berbert, J., Murphy, J. and Settle, M., 1982. The MAGSAT mission. Geophys. Res. Lett., 9: 243245. MacLane, S. and Birkhoff, G., 1967. Algebra. Macmillan, New York, 598 pp. Neyman, J., 1937. Outline of a theory of statistical estimation based on the classical theory of probability. Philos. Trans. R. Soc. London, Ser. A, 236: 333-380. Parker, E.N., 1979. Cosmical Magnetic Fields. Oxford University Press, Oxford, 841 pp. Parker, R.L., 1972. Inverse theory with grossly inadequate data. Geophys. J. R. Astron. Soc., 29: 123-138. Press, W.H., Teukolsky, S.A., Vetterling, W.T. and Flannery, B.P., 1992. Numerical Recipes in C. Cambridge University Press, Cambridge, 994 pp. Stark, P.B., 1992a. Minimax confidence intervals in geomagnetism. Geophys. J. Int., 108: 3298. Stark, P.B., 1992b. Inference in infinite-dimensional inverse problems: discretization and duality. J. Geophys. Res. 97: 1405514082. Tarantola, A., 1987. Inverse Problem Theory. Elsevier, New York, 613 pp. Tikhonov, A.N. and Arsenin, V.Y., 1977. Solutions of Ill-Posed Problems. Halsted, New York, 258 pp. Wigner, E.P., 1959. Group Theory. Academic Press, New York, 372 pp.