Nuclear Instruments and Methods in Physics Research A307 (1991) 47-51 North-Holland
47
Particle identification by Cherenkov ring imaging using a neural network approach Tom Francke and Thomas Lindblad Manne Siegbahn Institute of Physics, Stockholm, Sweden
Age Eide Ostfold College, Italden, Norway
Francois Piuz and David Williams CERN, Geneva, Switzerland
Paolo Martinengo University of Genova, Genova, Italy
Rui Ribeiro LIP, Coimbra, Portugal
Martin Suffert Centre des Recherches Nucl~aires, Strasbourg~ France Received 25 April 1991
The performance of a back-propagation neural network for particle identification using a RICH detector has been studied. When trained on 8000 simulated events in 14 iterations using a general back-propagation algorithm it correctly identifies 86% of the events out of a sample of 1000 experimentally measured pion and proton events at 3.5 GeV/c beam momentum. The identification efficiency is 70%. This is compatible with what is obtained by conventional, but mathematically much more complicated, identification algorithms. I. Introduction
2. The R I C H detector
Recently neural networks have gained interest among particle physicists as a tool for pattern recognition in the analysis and triggering of the complex particle physics experiments that are carried out today. This article presents a study on particle identification by Ring Imaging Cherenkov ( R I C H ) detectors using a neural network approach. To perform this study the following approach have been used: 1) Simulation of hit patterns generated by means of a sophisticated Monte Carlo simulation program. 2) Training of a neural network with simulated data. 3) Running the trained neural network on experimental data. 4) Comparing the results with identification obtained by time of flight measurements.
The R I C H detector, normally called the N a F R I C H prototype [1], uses a 10 m m thick crystal of N a F as Cherenkov radiator, and the principal layout is shown in fig. 1. The cone of photons emitted in the N a F radiator refracts out of the crystal and is expanded into a ring of Cherenkov light in a drift volume filled with helium. The Cherenkov photons enter the photosensitive volume through a 3 mm thick quartz window before they are converted into photoelectrons by the photosensitive gas tetrakis-(dimethylamine)-ethylene (TMAE). These photoelectrons drift towards a multiwire proportional chamber (MWPC) where they are amplified and the induced signals in the anode wires as well as in the cathode plane are read out. The cathode is segmented into 144 square, 8 × 8 m m 2, pads which give the nonambiguous 2-dimensional coordinates of each photo-
0168-9002/91/$03.50 © 1991 - Elsevier Science Publishers B.V. (North-Holland)
T. Francke et al. / Particle identification by Cherenkov ring imaging
48
10ram
9 mitt
3ram 2 mm 2ram
- 0V Cathode +1750VAnode == 0V Pads
[3 0.82 =
Fig. I. Setup of the NaF RICH detector using pad readout and operated at atmospheric pressure. The Cherenkov light produced by a/3 = 0.82 particle at normal incidence is indicated.
electron, while the anodes give a projection of the hit pattern. For particles at normal incidence towards the radiator, the hit pattern is a ring while nonperpendicular incidence gives a hit pattern in the form of a wide U-shape. The MWPC is operated at low gain to minimize the effect of photon feedback and to minimize the signal from the ionization of the amplification gas by the primary particle. Typically, the single photoelectron efficiency of the detector is around 50%. Each photoelectron fires as an average slightly more than 2 pads and the ionization fires 2-5 pads around the track. This ionization puts a lower limit on how small ring radius that can be detected without interfering with the ionization. The pad plane is 96 × 96 mm 2 and the largest ring radius that can be detected is about 38 mm. The ionization puts a lower limit on the ring radius to about 20 mm so the NaF to quartz distance has to be adjusted for every particle velocity to keep the Cherenkov ring inside the sensitive area of the detector. Since the measurable ring radius interval is very limited, only events from different particles at very high momentum are possible to compare with this prototype at a constant N a F - q u a r t z distance. Throughout this article the momentum is fixed to 3.5 G e V / c which gives a proton velocity of 0.966c and a pion velocity of 0.9992c, where c is the velocity of light in vacuum. The N a F - q u a r t z distance is fixed to 9 mm.
If a photon enters the photosensitive detector and is converted into a photoelectron, it generates an induced pulse in the pad plane. This pulse is normalized to have the same amplitude as the experimental data and the threshold for considering a pad to have fired is set by the threshold used in the electronics of the experimental measurements. The only calibration parameters needed for the simulation program is thus the amplitude of the signal generated by a single photoelectron and the threshold used in the electronics. These quantities are normally expressed in ADC channels. The ionization signal is treated separately. Typically, 2-5 pads are fired per event by ionization of the amplification gas by the primary particle. The distribution of the number of pads that are hit, as well as the amplitude as a function of the distance from the track, is measured experimentally and used to simulate the ionization.
4. The training An image of a 12 × 12 matrix with intensities ranging from 0-10 is chosen as input to the neural network. These intensities are proportional to the logarithm of the simulated signal amplitude on each pad. Fig. 2 shows an example of simulated proton and pion events. The upper two figures show a typical measured proton and pion event, respectively. The two lower figures show the corresponding matrix that is used for training or testing the network. A feed-forward neural network is used with 144 nodes in the input layer, 128 nodes in the hidden layer and two nodes in the output layer as shown in fig. 3. The output will then be the probability that the event originates from a pion (one output node) or a proton (the other output node). Each neuron performs a weighted sum of the output value from all the nodes of the previous layer, where the input to and the output from a node in the hidden layer is given by: 144
INj = E W~jOUTi,
(1)
i=1
1
OU~
1 + e -IN,"
(2)
In the same way the input to and the output from a node in the output layer is give by: 128
3. The simulation
The Monte Carlo simulation program is based on a ray-tracing technique where every photon emitted in the radiator and in the quartz window are traced [2]. Typically, between 300 and 500 photons are traced for every event making the computational work large.
I N , = E WjtOUTj, j=t
(3)
1 OUTt = 1 + e - IN, "
(4)
The general back-propagation [3] algorithm has been applied for training of the neural network. The purpose
49
T. Francke et at / Particle identification by Cherenkov ring imaging
EEEEE~EEE~DK OOO~EEE®~EK O~O00000~OOO ~OOGO000000 000000000~0 00~~00000 ~O000~O000~V OO0000000~O BO~%000~O~O0 ZO~O0~O000 0000000~0000 00000~00000~
000~ 000~00000 ~0000000 ~0~00
~0~00
0~0000~0~@0 O0~GO000~ 0~O0~U O~O0000~OO 00000000000 000~~000~ O00~O00000C 00000000000
~O~O Proton
Pion . . . . .
. . . .
4 6 3
22
3
2
.... 3
44 44 45 . . . . . . . .
. 3 4
3
25
. . . . . 38
.
.
.
.
.
.
.
. . . . .
.
.
34 4 .
.
.
84
.
.
34 . . . . .
2 . . . . 4 . . . . 2 . . . . . . . . . 34 3 . . . . . . 4 57 43 . . . .
3
.
.
. . . . . . . . .
. 4 6
. . . . . 3 4 4 4 . . .
45
3 5 3 . 4 4 2
5
.
. . . . 4 .3 . . . . . .
. . . . .
. . . . . . .
.
.
.
.
.
.
.
.
.
.
.
Fig. 2. Simulated proton arrd pion event at 3.5 G e V / c momentum and a NaF-quartz distance of 9 mm. The upper two figures show the hit pattern of each event and the lower two the corresponding matrices that are fed into the neural network for training.
was to try to establish the right values for the weights, such that each o u t p u t n o d e assigned to a specific class of events has a n o u t p u t value close to 1, each time the p a r a m e t e r s for such a n event are presented at the i n p u t nodes. The deviation from the target value (usually equal to 1 in the o u t p u t layer) represents the error ~ for t h a t node in the network.
In b a c k - p r o p a g a t i o n , this error is part of the basis for c h a n g i n g the weights, starting with the weights between the o u t p u t layer a n d the h i d d e n layer. Hence, the effect of the error is p r o p a g a t e d b a c k w a r d s in the network. After all weights are adjusted, the set of training vectors are again presented to the network. T h e weight a d j u s t m e n t A W j is given by: AWjk = ~SjOUTk,
i
N U P T I~
j
OUTPUT
Fig. 3. Schematic representation of the feed-forward neural network used in the present work. There are 144, 128 and 2 neurons in the input, hidden and output layers, respectively. The connection weights are associated with the lines connecting the neurons. Hence, there are 256 connection weights, Wjt, between the hidden and the output layer.
(5)
where the direction of the i n p u t signal p r o p a g a t i o n is from node i to n o d e j. T h e c o n s t a n t 7/controls the rate of the weight adjustment. Typically ,i is of the o r d e r of 0.1. Observe that the error 8j is different for the o u t p u t layer a n d the h i d d e n layer: 6j = (Q - O U T j ) f ' ( I N j ) ,
(6)
8j=(~akWkj)f'(INj),
(7)
for the o u t p u t a n d the h i d d e n layer, respectively, a n d where tj is the target value for n o d e j a n d is the derivative of the sigmoid function f . Here k refers to
f'(x)
50
T. Francke et al. / Particle identification by Cherenkov ring imaging
60
distribution again. This might be compensated lowering the learning rate.
by
5O
5. Testing the network -~ 40 ~J
30 N
20 10
0
1000
2000
3000
N u m b e r of Hit Patterns Fig. 4. Convergence of the training phase. The network starts with random weights and after being trained on 3000 hit patterns, the number of incorrectly identified events is down to 15%. After 14 iterations on 8000 hit patterns, this number is reduced to 5%. Further training will not improve the performance.
the summation over all nodes, e.g. the output nodes, which are in front of node j. The training set consists of 1000 simulated images, examples of which are shown in fig. 2. There are 500 proton images and 500 pion images in random order in the input data. The input matrix is mirrored in 8 planes, i.e. the actual number of input images is 8000. It is chosen to train the network with a relatively clean sample of data rather than to handle background noise in the training procedure. With the network and training set mentioned above, it is found that when 3000 of the 8000 hit patterns have been read into the neural network, approximately 85% is determined correctly (see fig. 4). Reading in the pattern several times improves this value and after 14 iterations 95% of the training set is identified correctly with no significant improvements thereafter. Throughout this experiment the default learning rate parameter ,/ is set to 0.1. An adjustment of this rate, on a trial and error basis, might have improved these results further. One may consider changing the number of neurons in the hidden layer. When this number is decreased below a certain value, one generally encounters problems in training the network. This can be seen from the distribution of the connection weights. Using 144 neurons in the input layer and 128 neurons in the hidden layer one obtains a nice narrow and bell-shaped distribution of the weights around W~i = 0. Lowering the number of neurons in the hidden layer scatters this
Although the performance of the network can be inferred from the error function and the mentioned distribution of the connection weights during the learning phase, the real test comes when the network is run on a set of experimental data. Here, the test data come from tests of the N a F R I C H prototype in a particle beam, containing mainly pions and protons, at the C E R N PS [4]. The beam m o m e n t u m is set to 3.5 G e V / c to allow the rings produced both by protons and by pions to be inside the sensitive area of the detector at a fixed N a F - q u a r t z distance of 9 mm. This m o m e n t u m is close to the limit where the two particle types are separable using this detector and the ring radius of the two particles only differ by approximately 10 mm. The experimental data may include multiparticle events that originate from several particles traversing the detector simultaneously. In this case several rings will be superimposed creating a nonrecognizable pattern. Some electronic problems, where a group of 16 adjacent pads are fired due to a baseline shift in the electronics, will also produce hit patterns that are not included in the simulation program and are difficult to recognize. It is thus natural that several patterns will differ significantly from the learning patterns generated by the simulation program. An example of this is shown in fig. 5. Clearly it is impossible to interpret this data. The number of such events is of the order of 5% of the total number of events. Hence, the neural network is actually doing better than what is shown in fig. 4.
3DgDDDD
DDE
Fig. 5. Bad event. The hit pattern oti~nates ~om two or more particles traversing the detector simult~eously, pr~ucing overlapping tings and problems of baseline shift in the dectromcs. About 5% of all me~ured events are of this t ~ e and have not been excluded in Me test.
T. Francke et al. / Particle identification by Cherenkov ring imaging
6. Analysis and results The results of the test phase is presented using the training described above, i.e. when the network is doing the correct identification of 95% of the simulated data. Running the trained neural network on 1000 experimental events renders a correct identification in 86% of all cases. The identification efficiency is 70%, i.e. for 30% of the data the neural network could not tell whether the event was a pion or a proton. These results are compatible with what is obtained by fitting the ring radius or the Cherenkov angle to each event. R e m e m b e r that the test set consists of experimental data and that the decision whether the network is correct or not depends on the interpretation of probability it presents on the identification. Requiring a high identification probability ( > 99%) to accept the results as correct naturally yields the highest number of correctly identified events but also reduces the identification efficiency of the system.
7. Conclusion It has been shown in this paper that the events generated by Monte Carlo simulations may be used in training a feed-forward neural network and that this network then is doing well when identifying experimental data.
51
It is fair to say that in 86% of all data the neural network is doing a correct identification. If it is considered that may be 5% of the experimental data are multiple events and events with electronic problems and has not been discarded as garbage, this number turns out to be just as good as what is possible to obtain by any other identification algorithm where the ring radius or the Cherenkov angle is fitted to each event and used for particle separation.
References [1] T. Francke, F. Piuz, D. Williams, P. Martinengo, R. Ribeit~o and M. Suffert, A fast and compact solid radiator RICH counter, Proc. 2nd London Conf. on Position-Sensitive Detectors, Imperial College, London, UK, 4-7 September 1990, to be published in Nucl. Instr. and Meth. [2] T. Francke, thesis, Royal Institute of Technology, Stockholm, Sweden (1991). [3] D.E. Rumelhart, G.E. Hinton and R.J. Williams, in: Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. !, eds. D.E. Rumelhart and J.L. McCleland (MIT Press, Cambridge MA, USA, 1986) p. 318. [4] F. Piuz, R.S. Ribeiro, T. Francke, P. Martinengo, M. Suffert and T.D. Williams, Development of a fast RICH detector using a MWPC at low gain with pad readout and a NaF radiator, Proc. San Miniato Conf., San Miniato, Italy, June 1990, CERN preprint CERN/PPE 91-7.