NUCLEAR
INSTRUMENTS
AND METHODS
IO0
(I972) 317-324; ©
NORTH-HOLLAND
PUBLISHING
CO.
AN A U T O M A T I C P E A K - E X T R A C T I O N T E C H N I Q U E A. ROBERTSON, W. V. PRESTWICI-I and T. J. KENNETT
McMaster University, Hamilton, Ontario, Canada Received 21 September 1971 A simple additive correlation procedure is proposed as a tool in the automatic examination of spectra for peak centroid and intensity information. The capabilities of the technique are illustrated in the analysis of data collected during a study of thermal neutron capture in argon.
1. Introduction The advent of the small dedicated computer has stimulated interest in the possibility of handling spectral data automatically. In recent years several scientific groups have exerted considerable effort towards achieving this end1-13). In all instances data have been smoothed directly or indirectly, and in many routines optimum fitting to either line-shapes or mathematical functions is invoked. However, some use is also made of convolution 2'7'9' 12) and filtering procedures 3-5). Further, several peak-finding methods are reportedl't°-13). Undoubtedly m a n y of the computational codes employed yield results of reasonable accuracy but most require considerable processor time and access to a large memory unit. The technique delineated below is designed to be fast, efficient, and applicable to small computers. It uses a zero-area fold-in function 7) to extract signal information. 2. General outline The correlator F(x) is chosen to be constituted of a symmetric function, which resembles somewhat a typical line-shape, less a constant in order to achieve a total area of zero. Consequently constant and first order background components do not contribute to the correlation spectrum. However, since most spectra involve a slowly varying background B(x) about the desired signal S(x), such a fold-in function is sufficient to generally zero all background content. The correlation spectrum therefore reduces to k+n
C(xk)= Z S(x,_k)[S(x,)+B(x,)]
2.1. OPTIMIZATION The cancellation of a signal background will in fact be subject to statistical variations which mask the correlation. The resulting non-zero expectation value of the correlation background b(x) can be examined as follows: k+n
b(xk)=
Therefore, the variance in b(x) is given by
( bk 2 2 a2k = ¢=k-,E \ OBff %` k+n i=k-n k+n
2
~=k-n
= Z
I
k+n
R(xk) = i
"-]2
k +,
BE
e _k
i=k--n
can easily be shown, through application of the Schwartz inequality relationship, to be maximized for F(x) = S(x). Hence optimization m a y be obtained by using an exact mapping of the particular spectral lineshape as correlator. However, in this proposal there is the additional constraint of zero area. That is,
F(x) =- Y ( x ) - K(a constant), where 1
~=k--n
where the complete width of the correlator is defined as extending over 2n + 1 channels.
FL .
The correlation signal to noise ratio defined as
i=k--n k+n
F(xi-k)B(xi).
Z i=k--t~
K - 2n+l
k+n
Z
i=k=rt
r(x,),
thus providing a correlator with negative wings whose 317
318
a. ROBERTSON et al.
total area of content is exactly equal to the remaining positive area. One must conclude that maximum signal to noise is achieved for Y(x) ~ S(x) if K is relatively small, which in turn implies that the limits of summation should be wide. This is verified later in fig. 4(a) where one sees that the signal to noise ratio increases with increasing width of the negative portion of the correlator used. Since this points to use of an infinite width, no real optimum is achievable. However, it should be realized that fig. 4(a) is meant to illustrate trends for a peak whose background varies negligibly across the width of the peak. In reality, since one might ascribe a slope to the background below a signal, an optimum does exist but the associated correlator width would be much greater than could be conveniently used by the analyst. An alternate picture of optimization can be visualized in the frequency domain. Standard textbooks t*) show that the matched filter is given by
n(w)
already knows the signal and noise content of the original spectrum, in which instance, there is no need to filter. In fig. 1 is shown an optimum filter typical of Gaussian peaks on an exponential background. Only the general shape of the filter need be noted, although the parameters of the Lorentzian and Gaussian involved have been chosen to be reasonably typical of a gamma-ray spectrum collected through use of a Ge(Li) spectrometer. 2.2. SHAPE DISCRIMINATION In dealing with gamma-ray spectra for example one often wishes to discriminate against "Compton edges" or the similarly shaped low energy cutoff. These undesirable components are clearly revealed to have shapes after correlation different from that of spectral peaks by considering the following analytic example. Assume a line-shape to be of the form Sp(X) = s i n 2 rex 2 '
,q* (w) o--jwt°
'-'+ O.(w)
0 < x <--2,
,
= 0,
where G.(w) represents the noise content. I f one now disregards the phase involved, this reduces to
x<0;
x>2.
A possible zero-area fold-in function is consequently given by
f(x) = sin 2 rex
IH(w)] = IS(w)[
2
Thus the true optimum filter can be found if one
= --
1 2
½COS7CX,
2.0
1"5
F2
--.Fi
0
~'".,~
F=FI/F2
I
I
I
2
I
5 5 w/'rr
--
4
5
6
LI
7
Fig. 1. Illustration o f a typical o p t i m u m filter [solid line) for a Oaussian peak on an exponential background. The dotted c o m p o n e n t parts o f the filter represent Gaussian and Lorentzian distributions which are respectively the Fourier transforms of the Gaussian signal and the exponential noise.
AN AUTOMATIC
PEAK-EXTRACTION
where ½ represents the constant subtracted to achieve zero area. The correlation is then 1
I
I
_,f2
Ct,(x') = -2- ~'-1 c°sn(x- x' + l )sin2 -2-~rxdx, which after integration reduces to
I'O
(a)
x
Cl,(x') = -~1
sinrcx, + zc(3_ x,)cos~x, l ,
l-x< '<3.
A plot of C(x'), which is symmetric about x ' = 1, is shown in fig. 2(a). The result of similar calculations for a "Compton edge", approximated by
0.5
0"0
Sc(x) = ~
v
319
TECHNIQUE
2
1
sin rex
x 2 '
- 2_< x_< 0,
= 1,
x<-2,
= 0,
x>0,
is shown in fig. 2(b). A graph of C(x) for a low energy cutoff would be similar to that for a "Compton edge" but would be inverted.
0
-2 2 . 3 . NEAR OPTIMIZATION -4
i
i
-I
0
I
i
2
3
X I'0 (bl
x
0"5
O~
0"0 6 4 x
2
U
?o -2 -4 -6 -3
-2
-I
0
I
,
Fig. 2. S h o w n are (a) a typical line-shape, a n d (b) a " C o m p t o n edge", before a n d after correlation with a zero-area fold-in function. T h e width o f t h e positive signal in t h e correlation s p e c t r u m is defined for future reference as Z W .
In dealing with line-shapes, one usually resorts to a mathematical model such as a Gaussian. Although most mathematical models yield excellent results, when used as correlators, they suffer from the disadvantages of requiring floating-point arithmetic and the consequent increase in central-processor time. Further, they are not conveniently adaptable for use in a small computer. A correIator which overcomes these difficulties is the zero-area rectangular fold-in function shown as an insert in fig. 4(a). This merely involves the addition and subtraction of data points. The adequacy of this correlator is evident if one visualizes the convolution in terms of digital filtering. In fig. 3(a) and (b) are shown the filters achieved by use of a zero-area Gaussian and a rectangular fold-in function respectively. One should notice the gross similarities. The representations, for the rectangular function, of sensitivity and resolution are also shown, in fig. 4(a) and (b) respectively. In fig. 4(a) the dotted line illustrates the asymptotic trend of the position of optimum signal to noise as the lower half-width (LW) of the correlator is increased. For the curves shown the maxima correspond to U W = 1.215o-, 1.2880- and 1.3210-. When the lower half-width reaches 15o-the maximum is given by 1.3790- and the rate of change with LW is less than one part in a thousand. In general, fig. 4 indicates that as the lower half-width of the correlator is increased the signal to noise ratio increases but the resolution decreases. However, the speed of execution of this convolution by a dedicated computer affords the
320
A. ROBERTSON et al.
"~0 - 5 ",
g r o u n d estimate a n d use o f floating-point arithmetic. F o r t u n a t e l y these necessities are o b v i a t e d by a timely p u b l i c a t i o n by InouyelS). H e has r e p o r t e d t h a t i f one uses the square r o o t o f the original s p e c t r u m the s t a n d a r d deviation associated with the contents o f each channel will be asQ = 0.5 a n d the signal to noise ratio following filtering will be e n h a n c e d by a factor o f four. T h e square r o o t spectrum m a y be calculated by m e a n s o f an integer r o u t i n e with r o u n d off to the nearest integer thus p r o v i d i n g a m a x i m u m error o f one h a l f o f one event. It is therefore s t r a i g h t f o r w a r d to calculate the variance a~ i n t r o d u c e d into each p o i n t in the c o r r e l a t i o n spectrum. F o r a fixed c o r r e l a t o r this
(a)
o.o
I.,o I.o~- -~'_-'/~N
cb) F=F~+Fa
0"5
". I0
"~ o-o
"
......... - - - - ~ ~
(a) 8
/F
-0,5
iu~w 6 LU CO
0
I
2
3
4
5
6
7
5 w/Tr
Fig. 3. Illustration of filters (solid curves) and their component parts (dotted ]ines) associated with (a) a zero-area Gaussian correlator and (b) a zero-area rectangular correlator. In (a) F1 and F~ are derived as the Fourier transforms of the narrow Gaussian and the wide negative rectangular parts of the correlator respectively, whereas in (b) Ft and Fe correspond to the transforms of the narrow positive rectangtflar part and the wide negative rectangular part. In each case the correlator half-width extends to five standard deviations of a typical Gaussian signal. The positive rectangular half-width in (b) is chosen for optimum signal to noise ratio as per fig. 4(a).
analyst c o n s i d e r a b l e flexibility in dealing with ind i v i d u a l spectra o r in establishing a general p r o c e d u r e p e c u l i a r to his wishes.
~4
---3
~.
4o-
5o"
/----
i i
.J
ii/I 2
i/t 0
8~
"- 4 b (b)
N 2 2.4. AUTOMATIC EXTRACTION I n establishing an a u t o m a t i c extraction, r o u t i n e one is faced with the p r o b l e m of identifying the c o r r e l a t i o n noise b a n d in o r d e r to discriminate on b e h a l f o f valid signal content. H o w e v e r , since the b a c k g r o u n d in the original s p e c t r u m varies c o n s i d e r a b l y across the extend o f the spectrum, the c o r r e l a t i o n noise b a n d will also vary. O n e c a n c o n v e r t this noise t o a c o n s t a n t b a n d width by dividing each c o r r e l a t i o n channel c o n t e n t by the square r o o t o f the b a c k g r o u n d associated with the same channel in the original spectrum., This is unsatisf a c t o r y since it requires b o t h the detection o f a b a c k -
0
I
2 UW(o- unils)
3
4
5
Fig. 4. (a) This shows the variation of signal/noise (sensitivity) with the parameters involved ill a zero-area rectangular fold-in function when convolved with a Gaussian signal of standard deviation o. The dotted line show's how the position of optimum sensitivity varies with increasing width of the correlator. (b) This illustrates the variation of width of positive signal (ZW) in the correlation spectrum with the parameters of a zero-area rectangular fold-in function. This in turn gives an indication of resolution achieved (actually I/ZW).
AN AUTOMATIC
PEAK-EXTRACTION
variance will be constant and therefore need only be calculated once. One can then choose a noise discrimination level in terms of ac. For example, one might wish to discriminate against the possibility of one high channel in 4096. In terms of a Gaussian distribution of statistics this 99.756% probability of the detected signal being a peak is provided for by setting a level at 3.4873 ac. However, we should not lose sight of the fact that the correct initial distribution was e - ~2y P(m,2) = - cSy,,,, y[
m = 0,1,2 .....
and we have transformed it, by taking the square root of the spectrum, to
P(~c/m, ~/2) = e-y2! ~)Y~ c~y,,/m'
m = 0,1,2,o...
In this new distribution the 99.756% confidence level is achieved before reaching 3.4873 Gc. The 3.4873 o-c is actually approached from below as the mean value of the distribution increases thus ensuring a higher confidence level than required, by treating the distribution as Gaussian in form. It should be noted that the above treatment for a fixed correlator may also be readily applied to a variable correlator. The realization that signal description (a) in some instances varies considerably across the extent of a spectrum may make such a correlator desirable. Once again, floating-point arithmetic may be avoided by either periodically altering the correlator parameters or more simply by using a different correlator for several passes through the spectrum. The latter is made practical by the speed of convolution that can be achieved. For example, using a PDP-15 computer and a correlator with art upper width of two channels and a lower width of eight channels a 2K spectrum can be correlated in less than one second. Now, keeping in mind that under this method of correlation all odd moments disappear, it is a simple matter to determine the centroid positions, of the consequently symmetric peaks produced, by finding the centres of gravity for the respective positive points in the correlation spectrum. However, since the squareroot spectrum has been chosen for convenience of noise treatment, areas are not easily determined from the correlation spectrum. Therefore, in keeping with the philosophy of simplicity, a threepoint smoothing search routine 13) was devised to locate average background content, on either side of the centroid, for linear interpolation. The search can be conveniently conducted in the square-root spectrum where the
TECHNIQUE
321
standard deviations are constant. This procedure has has proven to be successful except in instances where peaks overlap.
3. Application In general, it is difficult to test the reliability of a procedure unless one has fore-knowledge of the correct results. For that reason an analysis was conducted of a formed spectrum with Gaussian peaks on an exponential background, firstly without, then with statistics incorporated. Invariably the centroids of detected peaks showed errors consistent with expected statistical deviations. Furthermore, in dealing with the spectrum without statistics, the area of all peaks detected were found to better than 1%. The non-linearity of the background can account for such an error. The error detected in areas extracted from the spectrum with statistics proved to be consistent with the inherent significance of the data. As a further test, more relevant to real spectra, data collected from a triple-coincidence spectrometer during an experiment on thermal neutron capture in natural Argon was re-analysed. The data, as well as the result of the action of the correlator are shown in fig. 5. In the correlation spectra of fig. 5(a) and (b) a noise discrimination level has been applied whereas fig. 5(c) shows the more complete correlation picture for a section of the original spectrum in fig. 5(b). One should take particular note of the noise content and the lineshapes illustrated in the correlation part of fig. 5(c). The system was calibrated for zero, gain and linearity using the 15N gamma-ray energies listed by Greenwood16,17). In the previous analysis the area content had been calculated by subjectively deciding the limits for a linear background fit. A comparison of area results is shown in table 1. The average deviation of the programmed method from the subjective analysis is found to be - 2 % . Part of the "error" in results may be attributed to nonlinearity of background, small peaks in the shoulders of large ones, and difficulty in working with pair-peak line-shapes. The large 12% deviation found for peak no. 31 is also due to its being a very wide composite peak with poor statistics. In general the neutron separation energies of table 2 indicate very good extraction of peak centroids. The value obtained for 41A shows excellent agreement with that listed by LycklamalS). Although the average separation energy for 37A differs from that reported by Hardel119) by - 1 . 2 keV it should be noted the cascade sums calculated in the present work show greater internal consistency. In addition, Skeppstedt 2°)
0
z
I--
"r
z
d I,LI
t o°
io t
! o ~'
io 3
IO 4
i05
o
50
IOO
150
200
250
2000
3000
4000
102L~-
oi
o
,2
.... J, k JJ, I
hL
[,
(b)
I000
CHANNEL
2000
r
NO.
3000
,oo, j, l 1,
2O0
3OO
4OO
I
4000
I,
2310
2 OK
4d
6o"
8 o~
-2O
0
20
40
2550
2390
I I
2430
.,
(c)
Fig. 5. The result of zero-area correlation with (a) an A(n,y) pair-spectrum, (b) an A(n,?) singles spectrum and (c) the portion of the singles spectrum delineated in (b) for illustration of treatment of line-shapes and "Compton edges". In all cases the correlation spectra are shown immediately above the data which were treated. Parts (a) and (b) also illustrate application of a noise discrimination level of 2a. The peculiarities of the pair-peak line-shapes, which make mathematical modelling difficult, are evidenced in the large peaks in part (a).
I000
(a)
2470
Z
01
©
.>
h~
AN AUTOMATIC PEAK-EXTRACTION TECHNIQUE TABLE1 Comparison of area content of the 36 most intense gamma rays shown in fig. 5(a) as found by a three point averaging procedure method 1) and by subjectively chosen background limits (method 2). Number
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
Area Method 1 Method 2
1945860 542216 193981 65232 53378 52151 48420 35606 27042 19577 18962 18201 17035 14314 13678 11651 9878 5908 4689 4061 3274 3194 2727 2730 2699 2413 3559 2214 2332 2148 2000 2226 2116 2041 1867 1773
1939994 541709 193962 68017 53380 53009 50489 36722 26939 20250 19138 18448 17758 14307 14271 11345 10222 6256 4893 4284 3249 3195 2933 2896 2681 2524 2427 2333 2299 2276 2265 2173 2169 2121 1827 1722
Ratio (Method 1/ Method 2) 1.00 1.00 1.00 0.97 1.00 0.98 0.96 0.97 1.00 0.97 0.99 0.99 0.96 1.00 0.96 1.03 0.97 0.95 0.96 0.95 1.01 1.00 0.93 0.97 1.01 0.95 1.05 0.95 1.01 0.94 0.88
1.02 0.97 0.96 1.02 1.03
reports a Q-value for the a6Ar(n,7)37A reaction of 8787 __ 5 keV which compares favourably with the value of 8787.8 ± 0.5 keV found in the present report. A further test was attempted using a Gaussian
323
fitting programme. Centroid locations were found to be reasonably comparable, but due to the peculiar line-shape of pair-peaks detected, extracted areas contained large errors.
4. Discussion In summary, the proposed procedure reported above involves 1. the taking of the square-root of the original spectrum, 2. the convolution with a zero-area rectangular correlator, 3. the detection of signal content by setting an appropriate discrimination level in the correlation spectrum, 4. the determination of centroids from the positive points in the convolution data and 5. the calculation of areas in the original spectrum after setting limits using three channel averaging (or summation). The avoidance of floating-point arithmetic is intended to increase execution speed and decrease any unnecessary effort. The square-root programme, which should be an integer routine, need only be applied once to the original data. Typically, using a PDP-15 computer, the square-root of 2 K spectrum can be taken in approximately 5 s. Even centroids can be calculated by integer arithmetic with suitable treatment of the remainder. Also experimenters with limited computer memory capacity can process data in blocks. The system proposed is really an interactive one with a minimum of operator decisions. For those scientific groups with computer displays, software shape discrimination may be avoided by slowly sliding an expanded version of the original or the square-root spectrum 21) across the screen for examination of detected "peaks" identified by intensified points. One would then require a monitor with "halt", "erase" and "insert" capabilities. Amongst the several advantages of the proposed technique are 1. the simplicity of the algorithm, 2. the speed of the correlation, 3. the possibility of fast and reasonable energy and efficiency calibration of the system for each experiment, and 4. the possibility of application to several fields of interest. The authors wish to thank the National Research Council and the Province of Ontario for their generous financial support,
324
A. I1.OBERTSON et al. TABLE 2 Comparison table for neutron separation energies Sn of 41A and 87A. Present work
Isotope Cascade energies (keV)
Sn (keV)
41A
5581.8 349.1 a 167.2 a 5581.8 516.2 a 5063.6 868.03 167.2 4744.0 1353A 4744.0 1186.3 167.2 4744.0 837.7 516.2 3700.4 1881.7 516.2 3700.4 1043.9 1353.1 3365.4 2565.7 167.2 3150.2 2780.6 167.2 3150.2 2432.0 516.2 3089.1 2841.7 167.2 2771.3 2810.3 516.2 2130.7 3451.5 516.2 2130.7 2613.9 1353.1 Average separation energy =
6098.1 6098.0 6098.8 6097.1 6097.5 6097.9 6098.1 6097.4 6098.1 6098.0 6098.4 6098.0 6097.8 6098.4 6097.7 6098.0 ± 0.4
37A
8787.9 6297.8 2489.9 5270.2 3517.5 5270.2 2106.9 1410.9 Average separation energy =
8787.9 8787.7 8787.7 8788.0 8787.8 ± 0.1 b
Lycklama et al. 18) Sn (keV) 6098.3 6098.2 6099.6 6097.3 6097.4 6097.9 6097.0 6098.0 6097.9 6097.8 6098.5 6098.0 6097.7 6098.1 6097.8 6098.0 ± 0.6 Hardell et al. 19) 8788.7 8789.7 8788.3 8790.0 8789.0 ± 0.5 b
a These values are taken from the accurate low energy work of Lycklama. ~ These errors illustrate the respective internal consistencies. When systematic errors are included the quoted values become Sn = 8787.8 + 0.5 and 8789.0 _+ 1.2 keV respectively.
References 1) M. A. Mar±scott±, Nucl. Instr. and Meth. 50 (1967) 309. 9) H. P. Yule, Nucl. Instr. and Meth. 54 (1967) 61. ~) M. H. Young and W. R. Burrus, Nucl. Instr. and Meth. 62 (1968) 82. 4) j. A. Blackburn, Nucl. Instr. and Meth. 63 (1968) 66. ~) T. Inouye, T. Harper and N. C. Rasmussen, Nucl. Instr. and Meth. 67 (1969) 125. 6) F. C. P. Huang, C. H. Osman and T. R. Ophel, Nucl. Instr. and Meth. 68 (1969) 141. v) W. W. Black, Nucl. Instr. and Meth. 71 (1969) 317. s) D. C. Robinson, Nucl. Instr. and Meth. 78 (1970) 120. 9) F. Ross±to and M. Terrani, Nucl. Instr. and Meth. 79 (1970) 341. 10) S. J. Mills, Nucl. Instr. and Meth. 81 (1970) 217.
11) I. A. Slavic and S. P. Bingulac, Nucl. Instr. and Meth. 84 (1970) 261. 12) Z. Kosina, Nucl. Instr. and Meth. 88 (1970) 163. 1~) j. C1. Philippot, IEEE Trans. Nucl. Sci. NS-17, no. 3 (1970) 446. 14) M. Schwartz, Information transmission, modulation, and noise (McGraw-Hill, New York, 1959). 15) T. Inouye, Nucl. Instr. and Meth. 91 (1971) 581. 16) R. C. Greenwood, Nuc|. Data A 4, no. 3 (1968) 316. 17) R. C. Greenwood, Phys. Letters 27B, no. 5 (1968). 18) lY. Lycklama, N. P. Archer and T. J. Kermett, Nucl. Phys. A100 (1967) 33. 19) R. Hardell and C. Beer, Phys. Scr. 1, no. 2-3 (1970) 85. 2o) O. Skeppstedt, R. Hardell and S. E. Arnell, Arkiv l=ysik 35 (1968) 527. 21) I. N. Hooton, Nucl. Instr. and Meth. 56 (1967) 277.