Nuclear Instruments and Methods in Physics Research North-Holland
NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH
A 350 (1994) 322-326
Section A
A hardware neural network for on-line particle identification M. Donielsson a °*, A. Go b, K. Jon-And
a,
Th. Lindblad
a,
E. Machado
° Royal Institute of Technology, Department of Physics, Frescati, Stockholm, Sweden 6 Boston University Department of Physics, Boston, MA, USA LIP Coimbra, University of Coimbra, Physics Department, Coimbra, Portugal
Received
26
May
c,
M . Timm
a
1994
The possibility of implementing a neural network in hardware to make an on-line particle identification of pions and kaons based on scintillation signals is demonstrated . The main aim of the investigation is to show the simplicity of implementing commercial neural network integrated circuits in high energy physics experiments. 1. Introduction The use of artificial neural networks (ANN) for event classification in high energy physics has recently attracted attention [1-3]. The advantages of using a neural network are that it can be taught to recognize complicated correlations between the input variables, it is fast and rather insensitive to noise and it generally exhibits significant redundancy . A conventional approach using lookup tables or programmable integrated circuits does not, at the same time, yield all these highly desirable features . In general the trigger decisions rely to some extent on fast particle identification . The present case does not include any complicated neural network architectures, but the aim of the paper is to show that even for relatively simple cases it is possible to use neural nets for particle identification and that the required effort with respect to training and implementation is quite reasonable . To show an example of this a neural net of feed forward type with two hidden layers has been trained using scintillator signals to distinguish between charged pions and kaons. This separation is possible since the energy loss per unit length is rather different for pions and kaons [4] in the low momentum region below 500 MeV/c. The difference in dE/dx is thus one possibility to separate the two particles. To train and test the network pion and kaon data from the CPLEAR experiment [5] at CERN have been used, The dE/dx distributions of pions and kaons are shown in Fig. 1. Off-line a neural network [6] in software has been successfully used in CPLEAR for particle identification in the analysis during the last years. This network has been trained to separate pions and electrons and uses much * Corresponding author, Email: mats@cernvm .cern.ch. 0168-9002/94/$07 .00 © 1994 - Elsevier SSDI0168-9002(94)00678-Z
Science
B.V. All
more extensive information as input, compared to the hardware neural network described in this paper.
2. Input signals and learning samples The idea is to feed signals available at an early trigger stage with information on the particle momentum and the energy loss per unit length, dE/dx. In the cylindrically shaped CPLEAR detector [5], the energy loss used for this investigation is measured in a scintillator in the particle identification detector (PID) [7]. The parts of the CPLEAR detector used in this work are shown in Fig. 2. The signals from the photomultipliers connected to the scintillator are converted into digital form within 500 ns using an 8-bit flash ADC [8]. The digitized value is kept in a buffer and can be read out either through the backplane to the data taking stream and eventually to tape, or directly through the fast front panel interface at a transfer speed of 40 ns per channel. This second path can be used as input to a neural network. Alternatively, one could use the analog signal, directly from the photomultipliers, and use this as an input to the neural network. In this case the signal from the photomultiplier needs to be split into this extra path and integrated through a capacitor to get a voltage proportional to the pulse height. The advantages with this solution is that time is saved (350 ns) because the digitization is skipped and the signal is put straight into the analog neural network. The investigations described in this work are however based on the ADC values . The momenta of charged tracks are measured in a magnetic field of about 0.44 T. Assuming charged particles coming from the centre of the detector and a uniform magnetic field, the hit maps from the wires of only the
rights reserved
M. Dantelsson et al. /Nucl. Instr. and Meth . in Phys. Res. A 350 (1994) 322-326
difference or more, corresponding to a PT value of less
a
60
than about 270 MeV/c, are cut by an earlier trigger stage together with particles fast enough to fire the Cherenkov .
50 Ç d
323
Particles with PT larger than around 800 MeV/c are very scarce since in this experiment the particles are produced
40
in proton antiproton annihilations at rest [5]. The momen-
30
tum component along the beam axis, the z-component, can
20
z-coordinate of the hit in the scintillator.
be obtained from the transverse momentum knowing the To summarize, three input signals provide information
to the network on the desired quantities : 2
3 dE/dx
4 5 (MQV/cm)
6
(1) The sum of the upstream and downstream ADC
values (DENN ), calibrated online to the unit of minimum ionizing particles, provides information on the energy loss in the scintillator .
(2) The difference of the upstream and downstream
ADC values (ZNN ) provides information on the z-coordinate of the point where the particle hits the scintillator . As
a first test the off-line value of the z-coordinate with an
accuracy better than 1 cm was used as an input to the net. The resolution for the z-coordinate using the difference in ADC signals is about 20 cm .
(3) The difference of wire numbers comparing hits in the inner and outer drift chambers (PTNN ) provides information on the transverse momentum . dE/dx
(M!V/cm)
Fig. 1. Distributions of energy loss per unit length calculated from the ADC values for pions (solid line) and kaons (dashed line) in the transverse momentum intervals 270-400 MeV/c (a) and 400-800 MeV/c (b). The figure is based on the test data used to test the performance of the trained neural network.
inner and outer drift chambers can already give a crude information about the transverse momentum of the parti-
cle. This is done by looking at the difference in the wire numbers between chambers, where greater difference implies a lower PT [9]. The wire number information can be readily obtained through a minor modification of the exist-
ing logic and is then available for the neural network after around 500 ns. Kaon candidates [5] having four wires of
To train the network a sample of kaon and pion signatures from CPLEAR experimental data are used . All input
signals are normalized to take values in the range -1 to 1 before being sent to the net.
3. The neural network design and training The architecture of the network employed in the present case is, except for the number of outputs, a fairly conventional one, i.e . a network of the feed forward type
with two hidden layers as shown in Fig. 3. The network has three input nodes and six neurons in each of the hidden layers . The output consist of four neurons for reasons discussed below. Finally there are three bias nodes feeding the hidden and the output layers as shown in Fig. 3 . There exist several ways of implementing a neural
Photomulhpher (Downstmam)
`Scintilletor Drift /Ch-hers
Fig. 2. Transverse (a) and longitudinal (b) view of the scintillators together with inner and outer drift chambers for a section of the CPLEAR detector (not drawn to scale) .
network architecture as described above in an integrated circuit. In view of the analog signals generally available
from the detectors, it is suggestive to use an analog neural network chip like the ETANN 80170NX [10] or the CLNN32 [111 . These can of course also be used in the digital case together with a DA-converter. In view of its
faster performance we have chosen to use the ETANN 80170NX chip . The implementation of a neural network in an ETANN chip is described in Ref. [10] . The training is performed in the following way. The network is first simulated and trained in software until the rms error no longer decreases. Then the obtained weights are loaded down to the chip and the training is continued with the "chip-in-the-loop" [10].
324
M. Danielsson et al. /Nucl. Instr. and Meth . in Phys. Res. A 350 (1994) 322-326
-
s
s
s
Event number, increasing error Fig. 5. The output from the first output neuron for the 450 vectors plotted according to increasing errors . The output values for the other three output nodes have very similar distributions.
BIAS NODES
would be possible to achieve the same performance with less outputs if considerably more effort had been spent on training the network.
The first two input signals are normalized to take
values in the region -1 .0 to 1.0 . The third input signal
Fig. 3. Schematic view of the neural network used in the present investigations. There are three inputs and four outputs and two layers of six hidden neurons each. These layers and the output layer also have one bias node each .
also ranges from discrete values
-1 .0 to
1.0, but only takes on the
-1 .0, 0.0 or 1.0, corresponding to the
three transverse momentum intervals described earlier. To
summarize, we use a two hidden layer neural network with three inputs to classify measured data in two groups.
During the training we change the learning parameter, determining the rate of change of the weights, interactively
The error then usually increases rather significantly due to
imperfections in the chip, but after some further training
from 0.2 to 0 .005 . The training required 7000 epochs to converge . The
the same performance as in software is normally achieved .
This is e.g . the case for the training using the ETANN chip in our case .
average root mean square error ( A) was then around 20%.
propagation [12,13] paradigm . 2000 training vectors con-
1 _ ( Q2___ y
The error, 4, defined as :
The network was trained using the well known back
taining pion and kaon signatures were employed . Each
n
vector contains apart from the three input values the
) On-Tn
(1)
21
n
desired output . The output consists of a vector of numbers
where n is the number of output nodes, O is the actual
( - 0.8, 0.8, 0.8, - 0.8). To use several outputs instead of a
around 88% of the pions and kaons where correctly classified, where correctly classified means demanding output
which is for kaons (0 .8, - 0.8, - 0.8, 0.8) and for pions
output and T the desired output value. A test revealed that
single node has proven to be a successful design in previ-
ous investigations [14,15]. It will in most cases speed up
node 1 and 4 to have a value greater than 0.0 and output node 2 and 3 less than 0.0 simultaneously for kaons, and
and simplify the training of the network, though the main
reason is to have an increased redundancy in the trained
the opposite for pions. Trials with a few other network architectures resulted
system . In this case the network was easier to train and
in similar or worse results. For example, networks with only one hidden layer gave significantly higher number of
displayed a more stable performance with four outputs compared to only two nodes. We believe however that it
100
W 00001
0000001
. 0
50
100
150
200
250
300
350
400
450
Event number, increasing error Fig. 4. Output and errors of the 450 test vectors arranged according to increasing error. In the left part of the figure where the events are correctly identified the output of the four neurons are close to the desired output and the error is small. An error d > 0.64 means that the outputs are swapped and hence the event misinterpreted . As can be seen from the figure about 12% of the test vectors (event number > 400) are misinterpreted . Cf. Fig. 5 and Ref. [191 .
M. Danielsson et al. /Nucl. Instr. andMeth. in Phys . Res. A 350 (1994) 322-326 misclassified particles while attempts with less output nodes
325
a
only gave slightly worse results.
Note that the network is trained with absolute value of
the desired outputs equal to either -0 .8 or 0.8 while the swing of the output is from
-1 to 1 . This means that
outputs with an absolute value larger than 0.8 are also
allowed. This yields a certain "slack" in operation making the network easier to train, because we are not working at
the extreme values of the transfer function . It also yields a smoother operation in hardware .
ZNN
4. Test of the trained network Some 450 test vectors were used to investigate the
DENN
b
performance of the network. Figs . 4, 5 and 6 show the result of the classification using these 450 test vectors.
i
In Fig. 4 the rms errors 4 for the vectors arranged with
a2 .
respect to increasing errors are displayed. According to the
. .
M
00 i
definition of d its value is greater than 0.64 for misclassi-
I
r
fied events . This corresponds to the outputs having values
equal to zero (1 .5 V in hardware). The output from the
-0 s,'
first output neuron is plotted in Fig. 5, also in order of
__
__
0
-05
increasing error. There is a very strong correlation between
,
1
ZNN
concluded from Fig. 4 and Fig. 5 that the vast majority of and unambiguously classified . The same plot as Fig. 5 for
,-
05
the output of one node and the total rms error. It may be the test vectors result in a small rms error and are properly
.
DEM, C
the other output nodes look very similar. This actually
means that it after the training would be enough to use the information from one or two of the output nodes.
In Fig. 6 the first input variable (DEN,) is plotted
r
versus the second (ZNN) for kaons (a) and pions (b) for the 450 test vectors. In Fig. 6c the wrongly classified pions and kaons are displayed. The number of misidentified kaons and pions are approximately the same and have similar distributions.
As a comparison one can instead of using the neural
network apply conventional cuts on a calculated dE/dx value [16] for the different momentum regions to separate
pions and kaons. This corresponds to making a cut in Fig. 1 and results in an inefficiency of around 14%.
5. Interfacing the trained chip to a physics experiment In our case the most straight forward way of implementing the ETÀNN chip in the environment of a high energy physics experiment is to use a cascadable VMEstructure with three ETANN 80170NX chips on a single
VME board. Such a board is presently being developed [17] for general applications. In the present case, a single
VME board would be sufficient and each chip would hold only a single layer, i.e . two chips for the hidden layers and one chip for the output layer. This would speed up the
processing compared to having only two chips, when one
ZNN Fig . 6. Input variables DE NN and ZNN plotted versus each other (a) for kaons, (b) for pions, (c) for the incorrectly classified kaons and pions. would have to use the time-multiplexing feature of the ETANN. The outputs of the network may be available as simple ECL or TTL outputs on the VME front panel and
yield signals indicating if the neural network has identified a pion or a kaon . The processing time for a single ETANN pass is a few Rs, but depends strongly on the gain of the sigmoid. The cascadable VME-board is controlled from a workstation through a VMIC [18] SBus to VME interface. This means that the results of the neural network could be conveniently monitored.
In case one wants to implement only a single hidden
layer network an immediate way is to use a VME module very similar to the one used in Ref. [3]. One simply needs to change the lay-out to include only three inputs and two outputs. The information is in this case fed to the on-board
326
M. Dantelsson et al. /Nucl. Instr. andMeth. i n Phys. Res. A 350 (1994) 322-326
SCC68070 computer and will thus be available on the
VME-bus. As indicated above, in our case we get slightly
worse results with a single hidden layer architecture and this approach may not be useful for the present application.
6. Summary We have shown, by highlighting a particular example
of particle identification, that a rather limited effort is required to train a neural network to perform particle
identification and implement the network in hardware in a way that it would fit into a high energy physics trigger. Although the proposed neural network solution may not be
faster than a system using conventional lookup tables it
has the advantage of simple implementation and a priori a higher redundancy .
Acknowledgements First of all we would like to thank the CPLEAR group and its spokesperson P. Pavlopoulos for their support. Comments from F. Martin of the Intel Corp . Santa Clara,
CA are gratefully acknowledged and we are indebted to C.S . Lindsey for cross checking our results by training and testing a similar network developed in C + + . We would also like to thank P. Carlson, J. Miller and P. Bloch for valuable suggestions and fruitful discussions. This work
was performed in part under contract with the Swedish Natural Sciences Research Council, the Swedish Research
Council for Engineering Sciences, Junta Nacional de Investigagdo Cientifica e Tecnol6gica in Portugal and the United States National Science Foundation.
References [1] L. Lönnblad et al ., Phys . Rev. Lett . 65 (1990) 1321 . [2] P. Pavlopoulos and S. Vlachos, Nucl. Instr. and Meth . A 324 (1993) 320. [31 T. Akkila, Th. Lindblad and %1 . Eide, Nuclear Instruments and Methods 545 (1992) 871. [4] Review of particle properties, Phys . Rev. D 45 (1993) . [5] L. Adiels et al ., Proposal for the experiment PS195, CERN PSCC/85-6 P82 (1985), PSCC/85-30 P82 (1985), PSCC/86-34 M263 (1986), PSCC/87-14 M272 (1987) . [6] M. Dejardin, CPLEAR Internal report, November 1992 . [71 A. Angelopoulos et al ., Nucl . Instr. and Meth. A 311 (1992) 78 . [8] M. Kreciejewski et al ., Nucl . Instr. and Meth . A 301 (1991) 424. [91 E. Machado, Thesis in preparation . [101 Electrically Trainable Analog Neural Network 80170NX, Intel Corp . Santa Clara, CA, Data Sheet 1990-1993. [11] Cascadable Leaming Neural Network CLWW32, Bellcore Preliminary Data Sheet 1993, and references therein . [12] D. Rumelhart and J . McClelland, Parallel Distributed Processing (MIT press, 1986). [131 M. McCord Nelson and W.T . Illingworth, A practical Guide to Neural Nets (Addison-Wesley, 1991) ISBN 0-201-52376-0 . [14] Th. Lindblad, T. Akkila, B. Lund-Jensen, G. Szekely and Â. Eide, Nucl. Instr. and Meth . 327 (1993) 603. [15] B. Denby, Th . Lindblad, C.S . Lindsey, G. Szekely, J. Molnar and .4. Eide, Nucl. Instr. and Meth. A 335 (1993) 296. [16] M. Timm, Diploma Thesis, Royal Institute of Technology, Stockholm, 1993 . [171 C.S . Lindsey et al ., to be published. [18] VMI Corporation, Huntsville, AL, VMISBS-5521L User's Manual . [191 Th. Lindblad, C.S. Lindsey and A. Jayakumar, to be published .