An effective method for spectra storage

An effective method for spectra storage

360 Nuclear Instruments and Methods in Physics Research B14 (1986) 360-362 North-Holland, Amsterdam AN EFFECTIVE L. ZOLNAI Institute METHOD FOR SP...

234KB Sizes 44 Downloads 156 Views

360

Nuclear Instruments and Methods in Physics Research B14 (1986) 360-362 North-Holland, Amsterdam

AN EFFECTIVE L. ZOLNAI Institute

METHOD

FOR SPECTRA

STORAGE

and S. SZILAGYI

of Nuclear Research of the Hungarian

Academy

of Sciences, H-4001 Debrecen, Pf 51, Hungary

Received 5 September 1985

The efficiency of some commonly used methods for spectrum which a considerable amount of storage can be saved.

1. Introduction

storage

are compared.

signals

on

encoding During nuclear physics measurements a lot of preliminary information is often produced mainly as amplitude spectra. If not enough storage capacity is available, storage of these spectra may be problematic until the evaluation is performed. Information theory gives a possibility for reducing the demand in storage capacity by suitable encoding methods. If such codes can be produced, the storage demand can be converted into CPU time to some degree. It does not mean a significant loss of processing time, because usually amplitude spectra are stored and read only once. After a review of some concepts of information theory we examine below the frequency distribution of channel contents of nuclear physics spectra, and give a new encoding method. Further, for practical cases, we compare the efficiency of the introduced method with that of other widely used methods.

A new encoding

the

average.

method

A:

method

The efficiency

is proposed,

by

v of a given

H

71(A)=----L log

(3)

D ’

where D is the number of letters of the encoding alphabet. If we have k different signals to encode the letters of the alphabet, D = 2k; in the case of the most often used binary alphabet D = 2. The so-called Shannon-Fan0 code [l] can be constructed for any noiseless and independent information source which is optimal in a certain sense under given conditions from the point of view of storage. The disadvantage of this code is that its realization is too difficult for practical cases.

3. Channel content distribution of nuclear physics spectra In order to develop an effective code, we have studied the frequency of channel contents of amplitude spectra

2. Information content of spectra Let us suppose that there is a set of information [ X,,X,, , X,] which is transmitted with probabilities [ p( X,),p( X2), . , p( X,,)] by an information source. In this case H the average amount of information of one message - called the entropy in information theory _ can be calculated according to [l] as follows: H(~,,~,,...,~,)=-Cp,log(p,).

(1)

r=l

If transmitting it requires

L=

a message

[X,] requires

n, signals,

c P*ft,

,=I

0168-583X/86/$03.50 0 Elsevier Science Publishers (North-Holland Physics Publishing Division)

then

0

4

8

tfg

Fig. 1. Empirical (2)

B.V.

amplitude ments

--i-

(channel

16 content1

20

0

10 20 30 40 50 60 n

n=number

of groupped ..O"-s

probability distribution of channel contents of spectra obtained from nuclear physics measure-

L. Zolnai, S. Srilrigyi / An effective methodfor spectra storage

dg

(channel

Fig. 2. Empirical derived spectra.

content)

probability

n=number

distribution

of groupped.,O'-s

of channel

contents

361

the storage media from the least significant part. The writing of one channel is finished when the remaining more significant parts are zero. The n - 2 bit pieces are always completed with a 2 bit combination, using different combinations for intermediate and closing n - 2 bit pieces. If a channel content is negative (derived spectra) its absolute value is represented as described above, but a third kind of 2-bit combination is used for closing. The fourth combination of 2-bit is to represent zeros following each other, when the n - 2 bit piece contains the number of zeros. One unit can represent maximum 2”-2 - 1 zeros in this way.

of

5. Efficiency of different coding methods obtained from nuclear physics measurements. These spectra were analysed by a modified listing type coincidence spectrum processing program [2]. Some 1 million channel contents were taken to make frequency statistics. The composition of the sample was as follows: 20% X-ray spectrum from proton excitation, measured with Si(Li), 20% Ge(Li) spectrum from nuclear reaction measurements, 20% Na(Tl)I spectrum from nuclear reaction measurements, 40% charged particle spectrum from nuclear reaction measurements. The sample was chosen to reflect the structure of the measurements in our laboratory. The derivatives of the same spectra were also analysed. Series of channel content zero were considered separately. The empirical probability distributions obtained from the original and the derived spectra are shown in figs. 1 and 2. They show a surprisingly small probability of large channel content. The ratio of zero channel content is about lo-15%. If the small value channel contents were stored in a significantly shorter form, a lot of storage could be saved. We introduce such a code below.

We have studied the efficiency of the method described above and some well-known methods for uniform distribution and for the distributions given in sect. 3. The calculations were made for m = 24 and n = 6, n = 8. The results are shown in table 1, where N and D denote the distributions shown in figs. 1 and 2 respectively, while U refers to a uniform distribution. The numbers in the table represent the following methods: 1. Method described in sect. 4, m = 24, n = 6. 2. Method described in sect. 4, m = 24, n = 8. 3. Method used in the program of Nuclear Data 50/50 Physics Analyser [3]. 4. Spectrum written in FORTRAN on PDP-11/40 using unformatted binary code. 5. Spectrum written in FORTRAN on PDP-11/40 using FORMAT (1018). 6. Shannon-Fan0 code. It should be mentioned that in some Fortran implementations the necessary double integer feature is not available, when a possible substitution is the use of real numbers. However in this case the information loss caused by the truncation can be significant.

4. A new coding method

Table 1 Efficiency of some coding methods (N, D and U) of channel contents.

Let us suppose that an analyser represents a channel by m bits, and the data transferring system has n bit long data units (e.g. 1 byte = 8 bits). We denote by N the multiple of n which is not less than m. The channel contents could be stored by N/n units if the same number of units were used for each channel. Instead we proceed as follows. An n bit unit is divided into a 2 bit and an n - 2 bit piece. The n - 2 bit pieces are to represent numbers, the four combinations given by a 2 bit piece are to distinguish the following cases. Cutting the nonzero channel contents into n - 2 bit pieces, they are written on

Coding

method

l-Present work ‘) 2-Present work b, 3-Ref. [3] 4-Method ‘) 5-Method d, 6-Shannon-Fan0 code a’ b, ‘) d,

With m = 24, n = 6. With m = 24, n = 8. With unformatted binary FORMAT (1018) used.

for different

distributions

N

D

U

0.53 0.50 0.18 0.18 0.08 0.94

0.70 0.62 0.18 0.18 0.08 0.98

0.76 0.75 0.75 0.75 0.37 0.88

code.

Table 2 Storage in 512 byte blocks on magnetic disk and CPU time in seconds spectrum. Data in brackets refer to the derived spectrum. Coding method

storage

t - Present work

9 (8) 10 (9) 26 33 66

(including

.-

Encoding

Present work Ref. [3] Unformatted binary code used FORMAT (1018) used

1.4 10.7

4096 channel

gamma

FORTRAN

MACRO

FORTRAN

17.1 (16.1) 13.2 (12.5) 11.6 1.4 10.7

2.7 (2.5) 2.2 (2.0) 4.1 1.4 10.7

11.6 (9.9) 9.1 (7.8) 9.6 1.4 10.7

practice for a MACRO subroutine including both the coding and the decoding procedures for normal and derived spectra about 1 kbyte is needed. We have programmed it in PAL III on PDP-8/I, in MACRO-11 and FORTRAN on PDP 11/40 processor. Also the reaiisation in firmware is under consideration.

our

As is shown in table 1, some 50-706 of storage and periphery usage can be saved with the above described code. The efficiency of the method is a little better for the derived spectra than for the original ones, and approaches that of the Shannon-Fan0 code, as a theoretical maximum. The practical usefulness of the method was tested by coding a typical 4096 channel gamma spectrum with the first five methods listed above. The required storage in 512 byte blocks and the CPU time in seconds are given in table 2.

The encoding and

for a typical

Decoding

2.1 (1.9) l.X(I.6) 3.7

6. Conelu$ion

realised

operations)

CPU time

MACRO

2 3 4 5

I/O

method

described

its realisation

requires

above can be easily minimal

storage.

In

References

Ul F.M. Reza, An Introduction

to Information Theory, (McGraw-Hill, New York, 1954). PI L. Zolnai. Atomki Kiizl. 21 (1979) 377. 131 Series 50/50 Physics Analyser Basic Software Instruction Manual (Nuclear Data Inc.. Palatine. Illinois, 1970).