Signal Processing 15 (1988) 203-211 North-Holland
203
A COMPARISON OF STOCHASTIC GRADIENT AND MINIMUM ENTROPY DECONVOLUTION ALGORITHMS A.T. W A L D E N
B.P. Exploration, Britannic House, Moor Lane, London EC2Y9BU, United Kingdom Received 9 November 1987 Revised 29 February 1988
Abstract. This paper examines the connection between a stochastic gradient method for deconvolution, originally formulated in the electrical engineering literature, and minimum entropy type (ME-type) deconvolution methods, now well known in geophysics. It is shown that Gray's favoured ME-type deconvolution algorithm has a direct stochastic gradient equivalent which differs from the method examined here only in the presence of the inverse autocovafiance matrix in Gray's method. The two procedures can be made more alike if the stochastic gradient algorithm is used merely for phase estimation, and is preceded by standard whitening deconvolution. In any event, the stochastic gradient approach is deficient in two respects, namely the assumption of an independent and identically distributed (iid) input sequence, and knowledge of its first and second absolute moments, assumptions which might be reasonable in the setting of data transmission problems, but not for seismic reflection coefficient sequences. Zusammenfassung. Untersucht wird die Beziehung zwischen einer stochastischen Gradientenmethode zur Entfaltung, wie sie urspriJnglich in der elektrotechnischen Literatur formuliert wurde, und Entfaltungsverfahren vom Minimum-Entropie-Typ (ME-Typ), wie sie heute in der Geophysik wohl bekannt sind. Es wird gezeigt, dab der von Gray favorisierte Entfaltungsalgorithmus vom ME-Type ein unmittelbares Aquivalent bei den stochastischen Gradienten-Verfahren hat, das sich von der hier untersuchten Methode nur insofern unterscheidet, als in Grays Verfahren die inverse Kovarianzmatrix auftritt. Die beiden Ans~itze werden noch 5.hnlicher, wenn man die stochastische Gradientenmethode nut zur Phasensch~itzung anwendet und eine iibliche, "weiB-machende" Entfaltung vorausschickt. In jedem Fall besitzt der stochastische Gradientenansatz zwei Schw~ichen, n~imlich die Annahme einer Eingangsfolge, die stets gleichartig verteilt und unabh~ingig ist, sowie die Kenntnis ihrer ersten beiden absoluten Momente-Annahmen, die im Rahmen von Dateniibertragungs-Problemen verniinftig sein m6gen, nicht aber im Falle von Folgen seismischer ReflexionsKoeffizienten. R6sum6. Cet article examine les liens qui existent entre une m&hode de d6convolution du type gradient stochastique, d6velopp6e ~ l'origine dans la litt6rature du traitement du signal, et les m6thodes de d6convolution du type minimum d'entropie (ME), bien connues maintenant en g6ophysique. On montre que l'algorithme de d6convolution ME pr6conis6 par Gray poss~de un 6quivalent direct du type gradient stochastique et en diit6re par la seule pr6sence de I'inverse d'une matrice de covariance. Les deux proc6d6s sont encore plus proches si l'algorithme du gradient stochastique est utilis6 seulement pour I'estimation de la phase et est pr6c6d6 par une d6convolution blanchissante classique. De toute mani~re, l'approche par gradient stochastique poss~de deux d6fauts, l'hypoth~se d'une s6quence d'entr6e constitu6e d'6chantillons ind6pendants et identiquement distribu6s, et la connaissance n6cessaire de ses moments du premier et du second ordre, hypoth/~ses qui peuvent ~tre raisonnables dans ie contexte des probl~mes de transmission de donn6es, mais qui ne le sont pas pour les s6quences de coefficients de r6flexion en sismique. Keywords. Stochastic gradient, minimum entropy, deconvolution, seismogram.
1. Introduction Goursat
[4] e x a m i n e d
[1]. T h e p r e s e n t a t i o n
deconvolution
is r a t h e r
of seismic data by a stochastic gradient method
abstract and
consequently
the connection
0165-1684/88/$3.50 © 1988, Elsevier Science Publishers B.V. (North-Holland)
to minimum
formulated
in
entropy type
A.T. Walden / Deconvolution algorithms
204
(ME-type) deconvolution methods alluded to, while clearly present, is not easily appreciated in any detail. The purpose of this note is to clarify the connection between the two methods. Gray [5] gives an interesting account of how his variable norm deconvolution technique (one of the ME-type family of methods; see [15]) can be implemented numerically as a Newton-type method. In this approach all the input data are available (an off-line method) and the algorithm iterates using all the data. In [4], the algorithm's iteration is due to the arrival of more input data (an on-line adaptive method). However, as discussed later, by cycling the total input sequence, Goursat's method becomes essentially equivalent to an off-line method. In order to distinguish the updating in Gray's method from that in Goursat's, we shall use a superscript [hi for the nth iteration in the former, and a superscript (i) for the ith iteration in the latter. Hence In] will stand for a non-adaptive standard iteration, while (i) stands for a data-adaptive iteration. The deconvolution filter will have anticipatory as well as memory terms (i.e., it is not causal), and taken to be of length L + 1 = 2 T + 1, say. For the non-data-adaptive approach the filter will be denoted, at the nth iteration, by 34"1= ( f ~ , . . .
, f o r " l , . . . , ftr"J)T,
while for the data-adaptive case, it will be denoted, at the ith iteration, by g ' " = (g~)r, • • •, g~o'), • • -, g~,)v The timing of the data of length M to be filtered by ft"J or g~' will be adjusted, for convenience, to give an output sequence with time subscripts 1 , . . . , N ; N - - M + L. We consider the following convolutionai model for a normal incidence seismogram, xk : X k = W k * r k q- n k
where * denotes convolution, {rk} is the reflection coefficient sequence, assumed independent and identically distributed or iid, {Wk} is the time-invariant wavelet, and {rig} is additive noise. The noise is not explicitly allowed for in the analysis methods to be described herein. The iid assumption is discussed later.
2. ME-type approach Assume we observe X l + r , . . . , XM+r, where we have made the timing adjustment mentioned above, and let T
;~nl= E f)"lXk j,
k=l .... ,N.
j=--T
See Fig. 1.
rk
.
[{wk}
.
Fig. 1. The noise-free seismogram xk is formed as the convolution of the input reflectivity rk with the seismic wavelet {wk}. At the nth step the deconvolution filter {f~"J} produces as output the sequence ~,,1. Signal Processing
A.T. Walden / Deconvolution algorithms
205
Gray [5] suggested that the deconvolution filter of length L + 1 should be chosen as the one which maximizes the objective • r[y,;~_,
l[g
=,n
'
161"/N]N/"~ 161VN]N/~J '
where N = M + L with the values of s and q to be selected. Gray [5] gave a Newton-type method for carrying out the maximization: the nth estimate of the reflectivity series, ~[n] is the convolution of the input and the nth estimate of the deconvolution filter f["], written
Xf["] = ~["], where X is the ( M + L ) Xl+:~
0
X2+ T
Xl+
by ( L + I ) matrix . •.
0
T •
0
•.. XM+TXM+T 0
1
XM+
0
0
T
•
•
XI+T
. • .
...
xM+r
and ~l,J= ( ~ - ] , . . . , ~ ] ) . The vector ~["] is updated through i[-+']= ~{,J_[VZG~(F[,])] 'VG~(~["]), where [VG~(¢']]~
= aG~(¢-))/O¢~-]
and .
--u
u~kl
/IUIk
ur i
.
Thus
x{ft"+']-f["]}=-[V2G~(~["])]
'VG~(~["I).
Multiplying through by xT:
x T x { f t ° +, ] _ f t ° ]} = _ x T [ v 2G~( ¢ " ] ) ] - ' V G ~ ( ¢ " ] ) . The expectation of the Hessian (second derivatives) matrix is a constant diagonal matrix with the element depending on the distribution of ~[-1; [5]• Let us denote this element by c M, and substitute the expectation of the Hessian matrix• Then
C["]xTx{f["+ll _ f [ ' ] } = - - x T v G q ( r
[hI)
or
cP,]R{f[,,+l] _ f ] , ] } = _ x T 7 G~(~M), Vol. 15, No. 2, September 1988
206
A. T. Walden / Deconvolution algorithms
where R is an estimated autocovariance matrix. For non-Gaussian data, c tnJ is finite and,
{ft"+l]-f["]}=-(1/ct"])R 'XVVG~(~["I). The terms in 7G~(~ t"l) have a simple form:
[VGq(rt"l)]k
l,~ I~,1IO/N j~ I:~"~l'/N
sgn(::J)'
where sgn denotes the sign function. For the particular case G~ (Gray's preferred objective function)
[VG~(r~"J)]k =
(l~J_lsgn(:~ j) -t,12
sgn(eEk"l) N
[ ~_]r)I / N
j~ l:~'ql/ N
N
E I:~'ql2/N 2 j
i
:~,1/N
/=l
Let us write
=fl[n],
and N
j=l
~2[.]. N
Then, setting qtN(~tk"l)= flE.J sgn(?~.J)_ ~tk.J' we can write [v
G~(~t'~)]~=
_ q'N (~'])/~21°1.
ThUS. {f[.+ll _f[.J} =
[c~.]d.2[.~]-~R-1X+qtN(~{.1).
(1)
The way chosen here to represent {ft,+~l_fE,l} is essential for seeing the connection with the stochastic gradient approach. There are several useful results in [5] which help us understand (1). Firstly, cE"J6"2~"1 is scale-free (see [5], equations (8)-(19)). When using Newton's method, as here, to maximize the objective function with gradient [VG2(r~"l)]k =--qtN(?tk"l)/~2t"J, the term c E"1 is required to be negative because normally the Hessian matrix is negative-definite for maximization. Gray [5] shows that c L"J is indeed negative provided the current estimated reflectivity series is described by the generalized Gaussian distribution with shape parameter a < 2. This will almost always be a reasonable assumption, (e.g., [17]). In this case write
{ft"+'l-ft"l}=-[lct"lld-et"l ] Signal Processing
' R - ' x T q t N (~t"~).
(2)
207
A. T. Walden / Deconvolution algorithms
Also from [5] we can d e d u c e that [Ict°~l~=t°~]increases as the input b e c o m e s more spikey, i.e., as n - co. In the unlikely event that the estimated reflectivity is described by a generalized G a u s s i a n distribution with a > 2, c t'l will be positive, but we can write (1) in the form (2) by replacing ~ N ( ~ t"l) by - ~N(~t~l), i.e., {ft,+,] _ f t , ] } =
_[[Ct,]I6.2t,]]-'R-'xT{_~N(~[,~)}"
In this case the m a x i m i z a t i o n is with respect to - G ~ , so that we are in fact minimizing rather than maximizing the original objective, (see e.g., [15] for further discussion of this type of problem). In line with current terminlogy the case a < 2 will be called s'uper-Gaussian (more spikey and long-tailed than the Gaussian) and a > 2 will be called s u b - G a u s s i a n (flatter and shorter-tailed than the Gaussian). Then in the s u p e r - G a u s s i a n case deconvolution using G~ would give
/
(f °÷l -ft' I=-[IcE°JlS2t"J] [
i^.
"
Lsgn(r~])
(3a)
,
[
JJ
i
/~"
while in the s u b - G a u s s i a n case it would give {f["+ll-ft"]}=-[Ic["]l°'2["]]
-'R-~XT
i /-fl[']
~Z~J
/
(3b)
/sgn(~) JJ
3. Stochastic gradient approach G i v e n a segment of seismic trace values X ~n = (x~+T, • • •, x~_r) T from the observations x~+T, . . . , XM+T and an estimated d e c o n v o l u t i o n filter g") = ( g ~ ) r , - . - , g~on . . . . . g~))T we have an estimated value of the c o r r e s p o n d i n g reflection coefficient given by
T
r~i)= E gJnx,-j, .j = - T
or, alternatively, ~ ( . = g(nWx(n or X(~)Wg"). The difference between the estimate ~(, and the c o r r e s p o n d i n g value r ") is ~ln_ r(,. If the m e a n square error is to be minimized, i.e.,
E{[F.~_/,)]2} is to be minimized, then the a p p r o p r i a t e gradient descent m e t h o d for updating g") is ([6])
g.+~) = g<'+ y(n{ Cg -
Cg")},
where C is the a u t o c o v a r i a n c e matrix of the Xk, y t " > 0 is a convergence factor, and g is the o p t i m u m filter, (i.e., the one which minimizes the m e a n square error). By replacing C with the instantaneous value o f XtnX ")T we obtain
g(i+l) ~- g(i)+ y(i){X(i)X(i)r[g _ g(i)]} = g(i)+ y(i){r(i) - ~(i)}X~i)
i.e., Vol. 15, No. 2, September1988
A.T. Walden/Deconvolution algorithms
208
Note, the term in curly brackets is the quantity whose mean squared error is to be minimized. This algorithm is generally known as the Widrow-Hoff LMS (or least mean square) algorithm. It was noted in [13] that the inclusion of C -1 in the Widrow-Hoff LMS algorithm makes it equivalent to Newton's method in numerical analysis, and also speeds convergence. The rate of convergence of the method is dependent on the spread of the eigenvalues of the autocovariance matrix of the observations, and the modification helps in this respect. Hence, an alternative scheme is
{g(i+ ll _ g(i)} = _7~i){7(ij_ r
(4)
In the case of blind deconvolution, we do not know the correct value of r(')--it is what we seek--and so a modified approach is needed. In deconvolution (or equalization) for data transmission the data {r (i)} to be estimated belong to a finite alphabet. In the discrete alphabet case the system can be put in the form of a self-adaptive equalizer or SAE, (see e.g., [10]):
{g(i+l) _g(,)}
= _ 7(,){~(i)_dec(7(i))}X(i)
where dec(. ) represents the decision made by a Zero-Memory NonLinear (ZMNL) device based on the observed output, {7I')}; it is the value in the alphabet that is nearest to the equalized signal. In data transmission a learning sequence is often transmitted, so that d e c ( . ) is designed adequately; no such luxury is available in seismic deconvolution. Macchi and Eweda [10] comment, however, that even when no learning sequence is available, convergence towards g is still observed in practice. In the case of a set of equiprobable and symmetrical levels for the data r Ii), ([14]), the SAE estimates only the sign bit of the transmitted data and uses this according to the decision-directed algorithm
{gli+l~ _g(,)} = _7(/){7(,)_ fl sgn(7(i))}Xi,~ where /3 is a scaling factor that depends on the data levels, and 7(') is a positive value, taken as constant in [14]. Note, the structure of this equation is similar to (3b) rather than (3a), i.e., it is appropriate for the sub-Gaussian case. This makes sense, since the problem in [14] involved uniform (discrete) input, while the (continuous) uniform distribution is a special case of the generalized Gauss,an distribution, with shape parameter a = ~ , i.e., sub-Gauss'an. This method was further examined [3] and it was pointed out that
{g(i+l)_g(,)}
= _,)/(,){7(,)-/31
sgn(7(i))}X(,),
(5)
with/3,--E{r(il2}/E{lr(')l} minimizes the cost function
E{I 7(')1_/3,}2 = E{ ~(~)-/31
sgn( 7('))}2.
(6)
The iterative algorithm (3b) can be readily associated with a stochastic gradient algorithm, see e.g., [12]. The stochastic gradient equivalent is
{g(i+l)_ gii)} = _7I,) C-I X~,){ 7~i)_ /31 sgn(7~))}.
(7)
Since C -~ in (4) makes the Widrow-Hoff LMS algorithm equivalent to Newton's method, it is not surprising to find C 1 in (7), since Gray's method described herein is also a Newton method. Similarly, we would expect that if we leave C -1 out of (7), then the resulting algorithm (5) would be less efficient t h a n (7). Equation (5) is the algorithm proposed by Goursat [4], following [1] for the sub-Gauss,an reflectivity case. It is required that ,/(~)> 0 and ¢')--> 0 as i increases--properties that are correspondingly fulfilled in Gray's method (3b), since [Ict°ll~2t"l] > 0 and [leE"JI~2t"1] '-->0 as n increases. The algorithm (5) may be represented as in Fig. 2. Note that in [1] it is assumed that the two expectations E{r <~)~}and E{lr(~)[} are known. Signal Processing
A. T. Walden / Deconvolution algorithms
209
Hard limiter
~+-£(')
X (,)
-
y
c,('+O
T° Unit delay
1
,
g(,)
Fig. 2. The stochastic gradient algorithm of Goursat showing the role of the hard limiter.
4. Significance of sub- and super-Gaussian distributions for reflectivity
Suppose the true reflectivity sequence is independent and identically distributed (more on this later), and sub/super-Gaussian. Then the estimated reflectivity, being a filtered version of the true reflectivity, will be closer to a Gaussian distribution (e.g., [11]), but still sub/super-Gaussian. The analogue of (5) for the super-Gaussian case is of course {g(i+,) _glib} = _yli){[31 sgn(F(,) _ ~(i~}x~i) '
(8)
which is the stochastic gradient version of (3a), apart from the absence of the inverse of the covariance matrix. Benveniste et al. [1] examined the stationary points of algorithms such as (5) and (8) for sub and super-Gaussian cases. The stationary points of (5) and (8) are those vectors g for which the algorithm increment has zero mean, i.e., E{[~~'~-fl, sgn( F(i)) ]}X "~'} = O, or
E{Cg} - fit E{sgn(X~i)~g)X(i~} = O. Assuming infinite dimensional filters, they showed that the unique solution (or stable attractor) in the sub-Gaussian case is the inverse of the wavelet, apart from a polarity and timing uncertainty. However, they concluded that in the super-Gaussian case, the stationary points corresponding to g equal to the inverse of the wavelet, or its polarity reverse, may not be global extrema (and hence be unstable attractors). How can the inverse filter be properly identified in the super-Gaussian case? Benveniste et al. [1] suggest a procedure which both improves computational efficiency by reducing the spread of the eigenvalues of the input (the same motivation for having the inverse of the covariance matrix in (3a) and (3b), or in (7)) and also allows filter identification in both the sub and super-Gaussian cases. Recall that in [1] knowledge of E{? i)2} is assumed, i.e., the second moment of the true reflectivity. Assuming the reflectivity has zero mean, it follows that its variance is E{?i)z}. Suppose that a whitening filter {bk} is used to whiten Xk, as in Fig. 3. This corrects the amplitude distortion caused by Wk. Ideally the whitened input, Yk say, has a variance of E{r(ir~}, the variance of the reflectivity, assumed known, and is uncorrelated for all non-zero lags. The new algorithm now merely seeks to correct phase distortion; the filter is defined by h (i+l) = A (i)[h(i~
- { s ~ i~ _ 3 1
sgn(s ( i~)} y(il], Vol. 15. No. 2, September 1988
A. T. Walden/Deconvolution algorithms
210
rk
Xk
I
Whitening filter I
{wk}
]
Yk
-
Phase adjustment
rk
:
All pass Fig. 3. Deconvolution expressed in two stages, i.e., by a whitening filter {bh} and a phase correcting filter {hk}. The output from the whitening stage y~ is now related to rk simply by an approximation to an all-pass filter.
where A(i) is chosen so that Y., h~Y+~)2= 1, y(i) is a segment of whitened seismic trace values, and g(i)= h , ) T y , ) or Y(~)Th(~). Note a spherical restriction is imposed on h"+~); this makes sense, since if §(~+~) is estimating r (i÷~), the true reflectivity, we would require Var{ 7(i+l~} = Var{ r (i+'~} = E{ r (i+ l i~}. However, Var{g '+')} = Var{h~'+t)m y ('+'~} = E{r ('+')~} y~ hl '+')~ = E{r('+')~}, i
provided that Yj hJ '+')2 = 1. (In standard whitening deconvolution the scale of the true reflectivity is taken as completely unknown, but the results of the deconvolution can be rescaled in order that, say a seafloor reflector has approximately the correct magnitude.) The spherical restriction on the filter h means that a theorem of [1] can be invoked to show that even in the super-Gaussian case, the deconvolution is now stable.
5. Some general comments (i) The idea of performing an amplitude equalization followed by phase recovery is now well established. For the phase recovery, it was suggested [2] that rather than trying to estimate an entire phase spectrum, a simple phase shift could be estimated (i.e., a single phase adjustment, the same for each frequency). This attractive idea has subsequently been examined in more detail in [7, 8 and 9]. (ii) Macchi and Eweda [ 10] point out that in [ 1] convergence of the deconvolution filter is not studied, but rather the stationary points of the algorithm. (iii) The reduction of the eigenvalue spread which results from the amplitude equalization suggested in [1] is of course included in Gray's method, but in a different way, i.e., through the presence of the estimated inverse autocovariance matrix in (3a) and (3b). (iv) In his practical implementation, Goursat [4] fed the stochastic gradient equation the cyclically repeated data sequence. Poljak and Tsypkin [12] note that such repetition is equivalent to ordinary iterations in the standard off-line approaches (such as Gray's method). (v) It is well known that primary reflectivity is usually far from an iid sequence ([16]), having an amplitude spectrum deficient in low frequencies. A simple method for correcting for coloured reflectivity in ordinary whitening deconvolution is given in [18]. An examination of phase distortions introduced by reflectivity is given in [7]. Signal Processing
A.T. Walden / Deconvolution algorithms
211
6. Conclusions By expressing relevant e q u a t i o n s in a c o m m o n format, it has b e e n d e m o n s t r a t e d that G r a y ' s M E - t y p e d e c o n v o l u t i o n m e t h o d has a direct stochastic g r a d i e n t e q u i v a l e n t which differs from G o u r s a t ' s m e t h o d for the s u b - G a u s s i a n case only in the presence o f the inverse a u t o c o v a r i a n c e matrix in G r a y ' s m e t h o d . The i n t r o d u c t i o n of w h i t e n i n g prior to stochastic gradient d e c o n v o l u t i o n (which t u r n s the latter into a pure p h a s e recovering m e t h o d ) , m e a n s that the stationary points of the algorithm are global extrema in b o t h the sub a n d s u p e r - G a u s s i a n cases, a l l o w i n g filter identification. The w h i t e n i n g stage is a way of r e d u c i n g e i g e n v a l u e spread, a n a l o g o u s to the effect of using the a u t o c o v a r i a n c e matrix in G r a y ' s a p p r o a c h . The desirability of the w h i t e n i n g stage fits in well with recent successes in the m u c h less a m b i t i o u s a n d better c o n s t r a i n e d p r o b l e m of estimating phase shifts from seismic data following s t a n d a r d w h i t e n i n g d e c o n v o l u t i o n , ([7, 8]). The t e c h n i q u e s e x a m i n e d in this p a p e r take no a c c o u n t of the n o n - i i d n a t u r e of real reflectivity, a n d are not well-suited as they s t a n d for extracting accurate phase i n f o r m a t i o n from real seismic data.
Acknowledgments This work was carried out while the a u t h o r was visiting the D e p a r t m e n t s of Statistics a n d G e o p h y s i c s at the U n i v e r s i t y of W a s h i n g t o n , Seattle, U.S.A. The a u t h o r thanks British Petroleum a n d the University for m a k i n g this possible, a n d the C h a i r m a n a n d Board of Directors of The British Petroleum C o m p a n y plc for p e r m i s s i o n to p u b l i s h the paper.
References [1] A. Benveniste, M. Goursat and G. Ruget, "Robust identification of a nonminimum phase system: blind adjustment of a linear equalizer in data communications", IEEE Trans. Automatic Control, Vol. 25, 1980, pp. 385-399. [2] J. Fourmann, "Summary of remarks made at 1984 SEG Seismic Deconvolution Workship", Geophysics, Vol. 50, 1985, p. 722. [3] D.N. Godard, "Self-recovering equalization and carrier tracking in two-dimensional data communications systems", IEEE Trans. Communications, Vol. 28, 1980, pp. 1867-1875. [4] M. Goursat, "Numerical results of stochastic gradient techniques for deconvolution in seismology", Geoexploration, Vol. 23, 1984, pp. 103-119. [5] W. Gray, "Variable norm deconvolution", Stanford Exploration Project, Vol. 19, 1979, pp. 1-101. [6] L.J. Griffiths, F.R. Smolka and L.D. Trembly, "Adaptive deconvolution: a new technique for processing timevarying seismic data", Geophysics, Vol. 42, 1977, pp. 742759. [7] J.W.J. Hosken, J. Longbottom, A.T. Walden, and R.E. White, "Maximum kurtosis phase and the phase of the seismic wavelet", Proc. SEG/ EAEG Research Workshop on Deconvolution and Inversion, Rome, 3-5 September 1986, pp. 38-51. [8] S. Levy and D.W. Oldenburg, "Automatic phase correction of CMP stacked data", Geophysics, Vol. 52, 1987, pp. 51-59.
[9] J. Longbottom, A.T. Walden and R.E. White, "Principles and applications of maximum kurtosis phase estimation", Geophysical Prospecting, Vol. 36, 1988, pp. 115-138. [10] O+ Macchi and E. Eweda, "Convergence analysis of selfadaptive equalizers", IEEE Trans. Inform. Theory, Vol. 30, 1984, pp. 161-176. [11] C.L. Mallows, "Linear processes are nearly Gaussian", J. Appl. Prob., Vol. 4, 1967, pp. 313-329. [12] B.T. Poljak and J.Z. Tsypkin, "Robust identification", Automatica, Vol. 16, 1980, pp. 53-63. [13] S.S. Reddi, "A time-domain adaptive algorithm for rapid convergence", Proc. IEEE, Vol. 72, 1984, pp. 533-535. [14 Y. Sato, "A method of self-recovering equalization for multilevel amplitude-modulation systems", IEEE Trans. Communications, Vol. 23, 1975, pp. 679-682. [15] A.T. Walden, "Non-Gaussian reflectivity, entropy, and deconvolution", Geophysics, Vol. 50, 1985, pp. 2862-2888. [16] A.T. Walden and J.W.J. Hosken, "An invstigation of the spectral properties of primary reflection coefficients", Geophys. Prosp., Vol. 33, 1985, pp. 400-435. [17] A.T. Walden and J.W.J. Hosken, "The nature of the nonGaussianity of primary reflection coefficients and its significance for deconvolution", Geophys. Prosp., Vol. 34, 1986, pp. 1038-1066. [18] A.T. Walden and K.R+ Nunn, "Correcting for coloured primary reflectivity in deconvolution", Geophys. Prosp., Vol. 36, 1988, pp. 282-297. Vol. 15, No. 2, September 1988