NUCLEAR
INSTRUMENTS
AND METHODS
76 (1969) 349-356; © N O R T H - H O L L A N D
PUBLISHING
CO.
T H E E S T I M A T I O N OF S P E C T R A F R O M E X P E R I N I E N T A L D I S T R I B U T I O N S D. W. GREEN
PhysicsDepartment, Universityof Calgary,Alberta,Canada Received 28 May 1969 The problem of estimating physical spectra from experimental pulse-height distributions is analyzed by means of Fourier transform techniques. After the existence of optimum numbers of degrees of freedom to be assigned to the input data and the spectral estimate is established, the problem is reformulated in terms
of the Fourier transform of the input data. The estimation of stable, smoothed spectra from real, statistical data then becomes a computationally simple procedure requiring no a priori spectral information. Several examples are given.
1. Introduction The relation between a pulse-height distribution, p(x), and the spectrum, n(E), from which it is derived may be expressed by the integral equation
Two special cases have arisen often enough to merit discussion. If s = t, then
p(x) =
o
,(E). r(x,E)dE,
Q*- Q-l, z = 0,
(1)
provided that Q is non-singular. I f s > t, then
where r(x,E) is the detector response function, i.e., the normalized distribution in x arising from a delta function at E, multiplied by the total detector efficiency and geometry factor. A straightforward numerical solution requires that eq. (1) be replaced by a set of linear equations. This may be accomplished by means of a substitution of the form
n(E) = ~ njqj(E), j=l
(2)
where qj are a set of suitable functions. Since p(x) is usually available as a discrete set of s numbers, {Pi}, integration over channel width, Ax, results in the expression
Pi = ~ Quni,
i = 1 , 2 .... s,
i=l
where ~--~ij
=f
Axl
f Eqj(E)'r(x,E)dEdx.
The solution of eq. (1) is thus reduced to the solution of the matrix equation
p = Qn,
(3)
where Q is an s x t matrix. The best estimate of n in the sense of least-squares is then given by 1)
= O t p + z,
(4)
where Q , is the pseudoinverse of Q, existing for any matrix, and where z is an arbitrary column vector orthogonal to the column space of QT, the transpose
of Q.
Q , = ( Q T Q ) - 1. QT,
provided that Q T Q is non-singular. The first case occurs in the " m a t r i x " method of spectrum unfolding 2) in which the solution is required to have the same number of channels as the pulse-height distribution, and the second case typically arises when eq. (2) is a polynominal expansion 3) of degree t < s. An objection to such methods is that s and t, the number of degrees of freedom assigned to data and solution respectively, should properly be regarded as measures of their information capacity and thus be related through the response function. In practice, they are essentially arbitrary and related only by unnecessary constraints such as the customary implicit requirement that s > t. A more serious difficulty is that solutions obtained using real, "noisy" data exhibit such violent oscillations that they are almost always wholly useless. Such solutions are nonetheless least-squares estimates and when substituted in eq. (3) will closely reproduce the data. It follows that the spurious information introduced by the statistical noise greatly outweighs the real information content of the data. Several methods have been devised which, using eq. (3) as a starting point, attempt to overcome this problem by the use of statistical procedures minimizing a quantity other than the sum of the squares of the residuals3),
( p - Q~)~ ( p - Q~), or by approximating the solution matrix and in-
349
350
GREEN
D.W.
corporating the a priori requirement that solutions shall be non-negative4). These methods provide smoothed estimates and are difficult to criticize except upon the general ground that they depart considerably at an early stage from the established methods of linear estimation. Although solutions may be esthetically acceptable in that oscillations are minimized, their precise relationship to n(E) is not entirely clear. In this paper, an analysis resting upon reformulation of eq. (l) by Fourier transform methods is presented. In addition to providing a rational basis for the selection of s and t, it renders the problem of noise susceptible to attack by standard estimation procedures. 2. Frequency domain representation In the following, functions of x and E and their Fourier transforms will be designated by lower and upper case letters respectively. The term Fourier transform will be denoted by the symbol FT. The transform equations will be used in the symmetric form
c(f) =
g(x) e-i2"ZXdx, -oo
G(f)e+i2"IXdf.
g(x) = -oo
Also useful is the convolution theorem s) which states that if
w(x)=y(x)'z(x),
Substituting this expression into eq. (5) yields
P(f) = where
R'(I,¢) = 2
Y(f-¢).Z(¢)dga,
o0
n(E).R(f,E)dE,
p(x)cos(2
(5)
fx)dx,
0
may be regarded as the FT of an even function Re(f,E) which is identical to R(f,E) for E > 0. 3. Reduction to numerical form Since the FT of any function other than a 6-function tends to zero in some fashion with increasing frequency, there exists a value 4~o beyond which R'(f,q~) is negligible. Consequently, eq. (6) may be well-approximated by
e(f) =
N(dp)R'(f,q~)dcb,
regardless of N(~b). There is then little point in seeking N(~b) for I~l> ~o and it may for all intents and purposes be replaced by N'(c~), a function identical to N(~b) for 14~1< 4~o and zero thereafter. Eq. (5) then becomes
P(I) =
n'(E). R(f,E) dE,
(7)
f
#o N'(~b)cos(2nq~E)d~b,
is a band-limited (i.e., smoothed) version of n(E). The integrand of eq. (7) is now a band-limited function as may be verified by expressing its FT in the form f+oo N' ( q ~- y) .R' ( f ,y) d y, -oo
by means of the convolution theorem. It is easily seen that this expression is negligible outside 0 < I~ [ < 2~bo. As a consequence of the sampling theorem in a form appropriate to integrated functions (appendix), if(E) and R(f,E) need then be known only at intervals dE_< 1/(2¢o),
and where
R(f,E) = 2
¢E)dE,
,/0
0
p(f) = 2
g(/,E)cos(2,
n'(E) = 2
where y and z are any two functions. The theorem also holds if lower and upper case symbols are interchanged. For convenience, p(x) and r(x,E) will be considered to be even functions of x. The FT of eq. (1)with respect to x is given by
f
(6)
0
- oo
where
N(q~).R'(j;~b) d~b, 0
where
+ctz
p(f) =
f
~
0
then
W(f) =
n(E) = 2 ) o
r(x,E)cos(2rrfx)dx. 0
For purpose of discussion, a further transformation with respect to E is desirable. If r is defined to be zero for E < 0 then n(E) may be defined to be even so hat
in order that eq. (7) be equivalent to
e(f) = ~., n'(Ej)R(f, Ei)AE. J
(8)
The discussion will be restricted to cases where p(x) vanishes, at least in a statistical sense, beyond-some
ESTIMATION
OF SPECTRA
FROM
x-value, c~. Otherwise the data are a sample from a distribution of indeterminate length and cannot be analysed unambiguously. P(f) is then clearly a bandlimited function and need be specified only at intervals
Af <
1/(2e).
Moreover, a corresponding E-value,/~, beyond which
n(E) must vanish may be estimated by inspection from r(x,E). Although n'(E) is a band-limited function and thus cannot vanish over any finite interval, it is a close approximation to n(E) and must rapidly become negligible for E > / L If fo is the value o f f beyond which P(f) is negligible, then eq. (8) becomes t
P(f~)= 2
j=l
n('Ej).R(f~,Ej)AE,
i = 1 , 2 .... s,
(9)
where
s =-fo/Af= 2 fo, and t -
E =
are the number of degrees of freedom assigned to and n'(E) respectively. Values of fo and ~bo may easily be estimated from the response function. For example, in the important case where Gaussian smoothing is imposed upon p(x), P(f) will be dominated at high frequencies by contributions from the narrowest peak in p(x). If a is the standard deviation of the narrowest possible peak in the region of interest, then a' = 1/(2na) is the standard deviation of the Gaussian envelope of its FT so that any P(f) and thus R(f,E)are effectively cut off beyond 3 or 4 times a'. A good estimate offo is then given by (say)
P(f)
fo = 2~(ha). If r(x,E)is written as the product s(E).g(x,E), where s(E) is the total detector efficiency, then if s is a slowly varying function it will in most eases be adequate to take q~o =fo. If on the other hand, s is a rapidly varying function, ~bo must be taken somewhat larger. Under such circumstances, a rough and ready estimate of q~o may be obtained by the methods discussed and wellillustrated in 5) for determining the gross characteristics of an FT. There are several problems which may arise. Since r(x,E) is to be terminated near E = 8, it is necessary to ensure that it goes smoothly to zero and is not truncated, since the high frequencies associated with the resultant jump discontinuity decrease only as 1/~. Since/~ has been chosen to be the upper bound of the region of interest, a smooth termination may be added
EXPERIMENTAL
351
DISTRIBUTIONS
in a way that minimizes ~bo without introducing any error into the solution. In addition, the extremely rapid variation of some response functions near E = 0 would require an inconveniently small value of AE for adequate sampling in eq. (8). The value of t may then be kept to a reasonable size only ifr(x,E) is smoothed locally. This is most simply accomplished by undersampling R(f, Ej) at intervals AE which are too coarse below some E-value, flo. The required smoothing is then automatically imposed with the consequence that spectral estimates below flo are unreliable. An internal jump discontinuity at some value of E such as would arise from the presence of an absorption edge in s(E) or from the onset of fluorescent X-ray escape effects, also results in the introduction of some error into the procedure. If the discontinuity is smoothed over a small interval, the resultant error in the estimate of n'(E) will be essentially confined to the region of the discontinuity and will be insignificant elsewhere. Provided that there are no prominent features in n(E) at that point, errors in practice will not be discernible. A convenient method of smoothing, applicable to interior discontinuities, consists of replacing two adjacent points, one on each side of the discontinuity, by a weighted average of that point and its two neighbors using weights of ¼, ½ and ¼. This is a good approximation to localized frequency filtering using a "Hanning" filter 6) having a cutoff at 4)= 1/2AE. A similar procedure may be applied to the "discontinuity" at E = fl and will increase fl by 2AE. 4. Noise and data transformation If p(x) is regarded as the idealized, noise-free, even function corresponding to k real events, then the raw detector output is a random sample of k positive, independent values of x from a population with probability density
po(X) =-p(x)/k. The even data function is then represented by k
p(x) =
Z
j=l
the FT of which is found directly to be k
P(f) = 2 E cos(2nfxj). j=l
The expected value of
cos(2nfxj)
is given by
(10)
352
O.W. GREEN Coo ( c°s (2rcfxs)> - J o po(x).cos(21zfx)dx = ½Po(f),
so that
(P(f)> = k. Po(f) = P(f).
(11)
Eq. (10) is thus an unbiased estimator of P(f) for allf. Since data are collected in channels of width Ax' and consists of a set of m numbers, {p~}, an alternative estimator is needed. The contents of the i th channel are given by
p, = j p(x). w(x,- x)dx, where
w(xi-x)=l, = 0,
x,-½Ax'
is the channel response function. Equivalently, {Pi} may be obtained by sampling the function
pz(x) -
at intervals Ax'. Applying the convolution theorem to the last expression, we obtain
el(f) = P(f)" W(f), where (12)
The FT of the sampled function is given by (appendix)
Pdf) = 2 ~ psCOS(2nfxj)Ax', j=l
(13)
and this is equal to Pl(f) in the range Ifl fn. This condition cannot be satisfied with real data since from eq. (10), P(f) is not negligible for Ifl > any f~ for a finite number of events, and since W(f) decreases only as 1/f. P2(f) is then aliased, i.e., components of Pl(f) in successive frequency bands of width 2f~ are superposed upon those in the band 0 < Ifl
P'(f) - P2(f)/ W(f)
P'(f) = {2rcf/sin(rcfAx')} ~ pscos(2~fxj)Ax',
(14)
always provides a noisier estimate than eq. (10) and this effect becomes more pronounced as Ax' increases. The common practice of increasing channel width to improve the statistical quality of poor spectral data thus has the opposite effect. The apparent improvement is due to a reduction of high frequencies at the cost of adding spurious low oaes which are not easily recognized and which do not offend the eye.
(15)
j=l
where Ax' should approach Ax o as closely as possible, regardless of the quality of data statistics. The covariance matrix of the input noise has elements ~kis = c o v [-P(fi), P(fj)] = (P(f,).P(fj)> - < P(f,)> ( P(fj)>. (16) Using expression (10) for P(f), a straightforward application of the definition of expected value leads to the result
( P(fi)" P(fj)) = k. Po(fi - f j ) + k. Po(fi +fj) + + k ( k - 1).
p(y)" w(x - y)dy, -oo
W(f) = sin(rcfAx') / Otf).
The ideal estimate eq. (10) is of course unobtainable since events may be located only to within an interval Ax o determined by the overall electronic system resolution. From eqs. (12), (13) and (14), the best practical estimate of P(f) is then given by
eo(f,)'eo(fj),
and the last term of eq. (16) may be obtained using eq. (11). If k is now allowed to be a random variable from a Poisson distribution with mean value N, k and k ( k - 1) must be replaced by their expected values, N and N 2 respectively, so that
~klj = N'Po(fi - f j ) + N'Po(fi + fj) = V(f, - f j ) + P(fi +fj). The fractional standard deviation of P(f~) is given by
(AP/P)i = (¢u)½/P(f,) = {2 + Po(2fi)}½/{ N" Po(f,)} ~. Since P(f) tends to decrease rapidly as f increases, (AP/P)~ is typically small at low frequencies and orders of magnitude larger at high frequencies, so that in general, low frequency estimates are good and the information in high-frequency estimates is virtually non-existent. 5. Spectral estimation In matrix notation, eq. (9) becomes P = Rn'.
(17)
If P is replaced by P' from eq. (15), fl obtained as in eq. (4) will exhibit the familiar signs of instability. This effect is moreover not confined to high frequency components of fl since it cannot be eliminated by low-pass filtering. This implies that the high-frequency noise in P' is cross-coupled by R to low-frequency components of n'. Improved results would in principle be obtained by
E S T I M A T I O N OF S P E C T R A FROM E X P E R I M E N T A L D I S T R I B U T I O N S
the use of a minimum variance estimator which is not identical to the least-squares estimator since the covariance matrix d/is not in the form of a constant times the s x s identy matrixV). However, the true covariance matrix needed for this calculation is not itself wellknown so that such estimates are unrealizable in practice. An estimate of n' may still be obtained by the following device. If all but the first h rows of P and R are deleted from eq. (17), the resulting subsystem
Ph = Rhn'
(18)
expresses a relation between n' and the low-frequency elements of P which serve to determine the gross features of p(x). For some choice of h < s, eq. (18) will retain most of the available real information about P contained in P' without including the spurious high-frequency elements. The appropriate estimator of n' is then (~)n = R~P~,+z.
353
where ¢/h is the submatrix of 0/ remaining after deletion of all but the first h rows and columns. It will usually be necessary to consider the case h < t. Then R~ = RT(R h R T ) - ` The situation where (RhR~ is singular occurs when the rank of Rn is less than h and is rarely encountered in physical applications. If h < t, z is no longer restricted to being the null vector. However, since R h ' Z = 0,
z makes no contribution to P~, plays no part in determining the gross features ofp(x), and by implication contains only high-frequency components. Hence if z is taken to be zero, (~)h and n' will differ in detail only so that a low-pass filtering procedure applied to each will lead to the same result. Such a smoothed approximation is no longer a least-squares estimate, but is simply related to n' and easily interpreted. The optimum filter cutoff frequency, ~bc, depends upon the type of filter used. For the Hanning filter, whose convolution equivalent is given by
The best choice of h will be dependent upon r(x,E) as well as data statistics and is not in general predictable. The procedure is then one of cut and try, using progressively larger values of h until the errors in (~)h are no longer tolerable. These may be obtained from the output covariance matrix 8)
d(E,~b¢)= ½d'(E) + ¼[a'{E- (2q5c)- a} + d'{E + (2q~)-1 }], d'(S) = sin (2rc~bcE)/(he),
Oh= R~n(R~) T,
a rule of thumb is that ~bc =fh. The fwhm of the resul-
800
Q
d <
I 0
40C
.,
."
%
k 0 0
1
.... t.~•k
f.
+
I
too
t
i %''-.
J_ 200
CHANNEL NUMBER
Fig. 1. Simulated pulse-height distribution due to 5.6 keV photons. Resolution of photopeak is 18% and the channel width is 0.04 keV.
354
D.W. GREEN
tant residual resolution is 1/~bc, and the smoothed estimate is = d. (A)h.
The ij th element of the smoothing matrix d is given by d u = d ( E , - E j , ,~o),~E,
o = 0.19 E ½,
and the covariance matrix of ~ is obtained from 0 = d0h dT.
(19)
Other useful filters may be found in refs. 6,9). 6. Results and discussion
Fig. 1 shows a Monte Carlo simulation of the response of a hypothetical soft X-ray detector to line
2(3-
1C:
0 ":---'--
O
IT
_1
I
I~
30
x
+ :z
radiation at 5.6 keV. The composite escape peak is due to fluorescent X-rays with energies 4.1, 4.4, 4.5 and 4.6 keV having intensities relative to the incident intensity of 0.05, 0.1, 0.15 and 0.15 respectively. The detector resolution function is Gaussian with standard deviation
corresponding to approximately 18% resolution at 5.6 keV. The distribution contains 25000 counts total collected in channels of width 0.04 keV. The response function is not well-behaved near E = 0. Consequently, it was assumed that the lower limit of energies of interest was 1 keV so that the minimum value of a encountered was taken to be 0.19 keV. It was assumed also that P(f) was negligible beyond 3.5 standard deviations of the corresponding frequency Gaussian so that f0 = 2.93/keV. Elsewhere, the discontinuities were smoothed by the methods given at the end of section 3. The full response matrix was 45 x 37. Fig. 2 shows the smoothed spectral estimates obtained using several sizes of submatrix. The residual resolution of 13.8% at 5.6 keV obtained for the 22-row submatrix represents a considerable improvement and the escape peak has been completely removed. The negative excursion adjacent to the peak is characteristic of the Hanning filter used and is not indicative of instability. The latter, clearly present in the estimate obtained using the 24-row submatrix, is accounted for by the representative errors shown. These correspond to one standard deviation and were derived from the diagonal elements of 0 given by (19). Fig. 3 shows a pulse-height distribution generated from a spectrum of the form
n(E) oc e -½E, 40-
'2 0
1
2
3 4 ENERGY, keV
5
6
7
Fig. 2. S m o o t h e d spectral estimates for distribution o f fig. 1
corresponding to 20-, 22- and 24-row submatrices in that order from top to bottom.
E < 6 keV,
= 0, E > 6 keV, corresponding to 150000 events incident upon the same "detector" used in the previous example. The smoothed estimate obtained with a 16-row submatrix is shown in fig. 4. The depression below 1 keV, amounting to 21% at E = 0, is too large by roughly a factor of two to be accounted for by smoothing alone. The discrepancy is traceable to the assumption, clearly invalid in this example, that the lower limit of interest was 1 keV. This region of the spectrum should thus be disregarded. Otherwise, the estimate is in excellent agreement with the incident spectrum. Computing time requirements vary with the size of the submatrix used, but tend to be modest, averaging less than one minute per estimate on an IBM 360/50 for the examples given.
ESTIMATION OF S P E C T R A FROM EXPERIMENTAL D I S T R I B U T I O N S
• t
"
355
s
#. z z < "r (o
Z 0
++ % •
-.
10
I
I
I
0
I
I 1O0
i
I
.
L
I 200
C H A N N E L NUMBER
Fig. 3. Simulated distribution due to a truncated exponential spectrum. The detector response and channel width are the same as for fig. 1. For purposes of comparison, the solid line shows the incident spectrum scaled to channels of the width used.
"%
Z
Po lo' >:
Z LU
z
i 0
1
2
3
4
ENERGY,
keV
5
6
7. Conclusions The application of transform analysis to the problem of spectral estimation has provided criteria for optimal choice of the number of degrees of freedom to be assigned to the data and the solution. Reformulation of the problem in terms of the Fourier transform of the experimental distribution has shown that the number of usable degrees of freedom assigned to the data is drastically reduced by the presence of statistical fluctuations. In the transform domain, the tendency towards concentration of the noise at high frequencies permits a computationally simple estimation procedure which uses only the low-frequency components of the data and an abbreviated response matrix. This introduces an element of arbitrariness into the high-frequency detail in the estimate so that only smoothed results are physically meaningful. Such residual resolution appears to be a common feature of all unfolding techniques which yield reasonably stable estimates. However, in the present method, the smoothing is independent of energy and a fairly wide
Fig. 4. Smoothed spectral estimate for the distribution of fig. 3. A 16-row submatrix was used and the fwhm of the smoothing is 1.07 keV. The incident spectrum is indicated by a dashed line.
356
D.W. GREEN
selection of smoothing functions is available. In cases where removal of the effects of complicated detector response is of p a r a m o u n t importance, smoothing is of little consequence. Although helpful in simplifying distributions generated by line spectra, the technique will probably find its greatest use in the analysis of continuous spectra. The author wishes to thank Professor B. G. Wilson for his encouragement and interest during the course of this work. The financial assistance of the National Research Council of Canada is gratefully acknowledged.
exactly from a set of samples at intervals Ax provided that
Ax <=½fo.
I f this condition is not satisfied it is impossible to determine G(f) from g'(x) and G'(f) is then said to be aliased. If, however, G(f) is known to decrease rapidly for [f[ > f 0 , then G'(f) given by eq. (A2) may be made as close to G(f) as desired by taking Ax sufficiently smaller than ½fo. This result was used in obtaining expressions (9) and (14). In cases where it is necessary to integrate a bandlimited function, eq. (A5) is unnecessarily restrictive. F r o m eq. (A1), the integral of g'(x) is
Appendix The following results reflect the well-known sampling theorem of communications theory and may be found in one form or another in the literature relevant to that subject. For reasons of convenience, they are given here in a coherent form suited to the purpose of the foregoing discussion. Given a function g(x) known only at a discrete set of values, (xj}, separated by Ax, we wish to obtain the transform,
I =
f+co -
+co
g'(x)dx = E g(xi)dx"
oo
-o0
I f eq. (A3) is transformed term by term, then
g'(x) = g(x) + ~ g(x)e ±2~'j2f'x.
j=l
An alternative expression for I is thus given by I =
g x e-i2~ZXdx.
f+co ()
+co
The known values may be represented by the function
g'(x) = g(x)Ax E
6(x-xj).
(A1)
Transforming eq. (A1) directly, the result is
+co c'(f) = E g(xj) e-i2 f' x.
(A2)
-co
Another expression for G'(f) may be obtained by applying the convolution theorem to eq. (A1).
G'(f) =
f+co G ( f - c ~ ) . +co Z cS(~b-j.2f,)d~b -co
-co +co
= G ( f ) + ~ G(f+j'2fn ),
(A3)
j=t
where f~ - ½Ax. If G(f) is cut off at Ifl = f o
+co G'(f) = G ( f ) = ~ g(xj)e-i2~fxJAx,
j=l
The condition necessary for
-co
+oo
G(+j.2fn).
g(x)dx + -c~
G(f) =
(A5)
Ill < f , ,
(A4)
-o0
t r o m eqs. (A2) and (A3). Hence the FT of a function which is band-limited at some fo may be determined
I = ~, g(xj)dx = -o0
f+co
g(x)dx
-co
is now seen to be that G(f) - 0, [fl > 2fn. The integral of a function which is band-limited at some fo may thus be determined exactly from a set of samples at intervals Ax provided that Ax < 1If o, an interval twice as large as the one permitted by eq. (A5). This result is used in obtaining expression (8).
References 1) R. Deutsch, Estimation theory (Prentice-Hall, New York, 1965) p. 82. 2) j. S. Pruitt and H. W. Koch, Methods of experimental physics (ed. L. Marton; Academic Press, New York, 1963) p. 520. a) j. T. Grissom, D. R. Koehler and B. G. Gibbs, Nucl. Instr. and Meth. 45 (1966) 190. a) D. D. Slavinskas, T. J. Kermett and W. V. Prestwich, Nucl. Instr. and Meth. 41 (1966) 341. 5) R. N. Bracewell, The Fourier transform and its applications (McGraw-Hill, New York, 1965). 0) R. B. Blackman and J. W. Tukey, The measurement of power spectra (Dover, 1959) p. 95. 7) C. R. Rao, Research papers in statistics, Festschrift for J. Neyman (ed. F. N. David; Wiley, New York, 1966) p. 263. s) R. Deutsch, op. tit., p. 29. 9) G. M. Jenkins and D. G. Watts, Spectral analysis and its applications (Holden-Day, 1968) p. 243.