478
Computer Physics Communications 57 (1989) 478—482 North-Holland
NEURAL NETWORKS FOR EVENT FILTERING AT DO
*
Dave CUTFS, Jan S. HOFTUN, Andrew SORNBORGER Brown University, Providence, RI 02912, USA
Christopher R. JOHNSON and Raymond T. ZELLER ZRL, Cranston, RI 02905, USA
Neural networks may provide important tools for pattern recognition in high energy physics. We discuss an initial exploration of these techniques, presenting the result of network simulations of several filter algorithms. The DO data acquisition system, a MicroVAX farm, will perform critical event selection; we describe a possible implementation of neural network algorithms in this system.
1. Introduction Pattern recognition in online event filters is crucially important for the success of the new round of experiments at high luminosity colliders. At the DO experiment at Fermilab’s 2 TeV Collider, an event rejection of 400/1 must be achieved by high-level filtering in the DO MicroVAX data acquisition system [1]. The filter algorithms must be accurate, easy to setup and modify, and fast. We think there may be a role for neural networkderived tools in the DO system, and more generally, in many areas of pattern recognition [2]. Neural network algorithms differ important ways from those common in high energy physics. In standard methods, logic is serial and tree-structured, with larger, more complex data sets requiring correspondingly larger and more complex code. In contrast, an event with more complex data can in principle be recognized more quickly by a neural network than an event with little information for example, we recognize a person via a photograph more quickly than with a crude sketch. Thus, as HEP data grows complex, we can hope to
*
Work supported in part by the US Department of Energy.
OO1O-4655/89/$03.50 © Elsevier Science Publishers B.V. (North-Holland)
find algorithms that do not scale in execution time with the event size. The key to the neural networks’ success is that they are intrinsically parallel. The structure of the network is one of parallel units operating together, and as the problem grows, so does the parallelism. To explore a neural network, one trains a simulation with data (real or Monte Carlo). The network “learns”: it generalizes the patterns intrinsic in the data. One obtain “weights” which define the strengths of connections between individual units; these weights then can be used to hardcode or hardwire the algorithm. In this process of developing an algorithm, the network will have forced a parallel solution. This property of the resultant algorithm necessarily parallel is very important. —
—
2. Back propagation networks In our exploration of neural networks, we have concentrated on networks which are called “feed forward” (the data flows in one direction), “back propagation” (in training, the errors are propagated backwards to adjust the connections be-
D. Cutts et a!.
/ Neural networks for eventfiltering at DO
479
tween the units) [3]. The data flow through the network looks like: DATA input layer
—*
Hidden layer(s)
N
1 units
-*
N2 units
Output layer
Memory Banki 2 MB
Control [0 BUs CSR
Do~rAddr
Memory BankO 2 MB L_DataIAdd,
_J I
I
I
.
RESULT.
_______
N3 units
Here, the input data is a vector of dimension N1 (the number of parameters for the problem), and the output data is another vector of dimension N3. As an example, if the network is recognizing some pattern, the output vector could have dimension 2, where (1, 0) and (0, 1) might mean TRUE, FALSE. The connections between layers are specified by the “weights” between each unit in one layer and to the unitnext i of layerThen, 2, we each unit2 in layer. to perform find the the net input V multiplication
~ w~oJ,
—~
[i
+
j~~rt [Thpeclal Function Port~
wcs
I
Addr Counters 74AS867 X 3
I
I
Addr Counters 74AS867 X 3
__________________ ADSP-3213 ADSP-3223
FP MUL/DIV FP ALU ADSP-1401 Sequencer
Fig. 1. Multiport memory with floating point processor for DO.
where the w~are the weights for the connections from layer 1 2, and W~is the output of unit j of layer 1. Given a unit’s net input, we can define its net output via the “Sigmoid” activation function: =
0
Lort~1_
(1)
=1, N,
0,
_L~iIEtiIH_~
exp(
—
V2)]
-1~
Thus, given the input vector, the results is found with two matrix multiplications and table lookups. The pattern recognition problem, once the network is trained, has been reduced to a few parallel operations.
3. Implementation at DO The DO experiment poses a filtering challenge requiring innovative techniques, and it may be that the DO data acquisition system will be a natural site for neural network-derived algorithms. As described elsewhere [1], in DO the event data is digitized in 100 VME crates and then flows directly over eight parallel cables (overall bandwidth 8*40 = 320 Mbyte/ s) into memories associated with a selected with a selected MicroVAX Level-2 node (one of many). Each MicroVAX will have
eight channels of a recently designed “multiported” memory [4], which incorporates in addition to the external 40 Mbyte/s input port, a direct processor connections, high-speed output connection (for sending events to the host), and a special function port shown in fig. 1 with an array processor implementation. All the ports of the multiport memory will have direct access to the event data (it should never have to be moved during the filter process). This memory, with a special function implementation such as that illustrated in fig. 1, looks like a natural site for just those calculations described above, for operation of pattern recognition based on neural network studies. —
4. Exercises As an initial exploration of neural network applications, we have performed several exercises with high energy physics events. These studies utilize a simulation tool which models a network; one provides the simulation package with data for both the training, or learning phase (input as well as desired output) and the recall (testing) phase,
480
D. Cutts et a!.
/
Neural networks for event filtering at DO
where one measures the efficiency of the developed algorithm. For this work, we have used the commercially available “Professional-Il” [5] which runs on PCs and some workstations. This package is very user-friendly; it is menu-driven, has excellent graphics and is very flexible with a wide vanation in network architecture learning rules and so forth being available As our work is largely VAX/VMS based, we recently wrote our own package in VMS PASCAL. With its easy access to physics data this package has been very useful, although it is specialized to back propagation networks. Both simulation tools have been used for the two exercises described below.
~
~
a) .•~ ~
~‘
9
I
4
S
I
,./‘
—
>I~1T~-.
,/~
~‘
1~ ~
t~-~
~-‘
•i~~ 12
i
BIas
11
b)
4.1. Electron/photon separation in BNL ~ In a first exercise, we studied the ability of a neural network to recognize electrons, as in the neutrino—electron elastic scattering experiment E734, completed a few years ago. In E734 neutrino—electron elastic scatters were identified by the presence in the detector of a single, forward electromagnetic shower a signal which could also be produced by the more common charged current ir°production. For this reason the detector was designed as an active target, with alternating planes of liquid scintillator and (x, y) PDT cells. Then, with the long radiation length (about six modules), repeated measurements of d E/ dx at the start of the shower could discriminate between electron and photon showers, as coming from an initial one or two particles, The limitation of this electron/photon separation came from the partial overlap of the d E/ dx distributions due to poor resolution of the chambers, Landau tails, and chamber inefficiencies. The algorithm used, derived after studies of several variants, was essentially the following: for each pair of x, y d E/ dx values, pick the smaller; then, average over the first three modules at the start of the shower. The result of this technique was a rejection of 50% of the photons from the candidate sample, when 90% of the electrons were retained by the cut ~
..©&.
•‘5~
i2
.~
9
..
-
.. .
3
16 ~
.~.
.~
2
,
4
:‘•
~
&
7
~Las
—
—
*
Additional features of the neutrino—electron data led to excellent signal/noise, see ref. [6].
Fig. 2. (a) Diagram of network used for electron/photon discrimination and (b) its response to a typical event (see text).
For a neural network exercise, we generated data similar to that from E734 and used all six values of d E/ dx at the start of the shower, as input to a network, as shown in fig. 2a. This network has six input units corresponding to the six d E/ dx values, a hidden layer of three units, and a two-unit output layer. For training, we presented an output pattern of (1, 0) if the input values were from an electron, and (0, 1) for input from a photon. Fig. 2b is a diagram of the network after training, showing its response to an event. In this figure, the size of the dark squares is proportional to the units activation; the network’s output properly indicates a preference for the photon hypothesis, despite low input in unit 3, and zero input (“ inefficient chamber”) to unit 7. Fig. 3 provides a summary of this study, and suggests that (selecting on the network output) the neural network-derived algorithm would certainly
D. Cutts et a!.
/ Neural networks for event filtering at DO
could also give the signature of isolated high E, deposits in the calorimeter. One of several algorithms used in the study [7]
1111
°‘
20
-
15
-
10
-
a)
was based on the ratio of the energy deposited in the calorimeter, inside and outside some radius about the peak. In a similar approach, we used a
5—
0
a)
~—
flflflI0HIflflIflIflhlfl 0
0.2
0.4 0.5 network output
0.8
5
20_hhhl_
100—
50-
b) 15
481
5ii
-
0.2
—
10-
—
5-
—
0.4 0.5 network output
0.8
b)
00-
0.2
0.4
Ii
1
nirnI
~
-
-
—
network output
Fig. 3. Response of the network to test data: (a) electrons and (b) photons.
20
-
~
~.
~
~0~u
~
network output
have been competitive with that used in the actual analysis. 150
4.2. Recognition of -y showers from Higgs
—*
—
-
2y
For a second exercise we used data generated for an SSC study [7] of the detection of the Higgs via its H y + y decay mode. Our goal was to compare the ability of a neural network simulation to recognize a y from H y + y, with the corresponding performance of the algorithm used in the study. The data consisted of ISAJET-generated events with energy deposits in a model calorimeter with binds of 0.1 in 8i~and 8~.These data were from both Higgs decays as well as 2-jet background, which at a low but non-zero rate
c) 100
-
-
—,
-
-~
~,
0 0 .
0.2
n~ork output
Fig. 4. Response of a network to showers from simulated (a) Higgs-yy photons, and background showers from (b) Higgs and (c) 2-jet data.
482
D. Cutts et at.
/ Neural networks for event filtering at DO
simulator to fashion a back propagation network of (121, 30, 5, 2) units. As the input vector, we used the 121 energies in an 11 by 11 array of (‘i, 4) bins, centered on the peak in E,, and trained by presenting to the output layer a vector (1, 0) for photon showers from H y + y, or a vector (0, 1) for background data. Figure 4 illustrates some results: plotted are different data sets as a function of the network response of the first output unit (ideally 1 for Higgs and 0 for background). Clearly the network has recognized characteristics of these data, and the actual performance is very similar to the standard radial cut. —*
5. Summary We believe that neutral networks may provide useful tools for pattern recognition in high energy physics. High-speed implementation of algorithms based on back propagation networks, such as we have simulated, is possible using array processor chips. The new multiported memories for the DO data acquisition MicroVAX farm have special function ports which can be used for such imple-
mentations. Our initial exercises have shown that these networks do learn to recognize patterns in HEP data. Results suggest that neutral net algorithms are, at least, competitive with standard methods. Because a neural network simulation forces a parallel solution, an algorithm so derived is potentially much faster. In a continuation of this effort, we are beginning studies of DO-specific filtering and pattern recognition problems.
References [1] D. Cutts et al., IEEE Trans. NucI. Sci. 36 (1989) 738. J.S. Hoftun, Comput. Phys. Commun. 57 (1989) 339. [2] See also B. Denby and S.L. Linn, Comput. Phys. Commun 57 (1989) 297. [3] For a full discussion of neural networks, see D. Rumelhart et al., Parallel Distributed Processing (MIT Press, Cambridge, MA, 1986). [4] ZRL Q22MPM, ZRL, 8 Rushton Drive, Cranston, RI 02905, USA. [5] Neuralware Inc., 103 Buckskin Court, Sewickley, PA 15143, [6] K. Abe et al., Phys. Rev. Lett. 62 (1989) 1709. [7] Barter et al., Proc. Summer Study on High Energy Physics in the 1990’s, Snowmass, 1988, to be published.