__ /._li!B
B
Nuclear Instruments and Methods in Physics Research A 379 (1996) 271-275
ELSEVIER
NUCLEAR INSTRUMENTS % METHODS IN PHYSICS RESEARCH Section A
Comparison of the BP training algorithm and LVQ neural networks for e, ,u, 7;ridentification* Z.P. Zhang, H.F. Chen*, S.W. Ye, J.W. Zhao Department of Modem Physics, University of Science and Technology of China, Hefei, Anhui, 230026, Chino
Received 22 April 1996
Abstract Two different kinds of neural networks, feed-forward multi-layer mode with Back-Propagation training algorithm (BP) and Kohonen’s Learning Vector Quantization networks (LVQ), are adopted for the identification of e, p, n particles in Beijing Spectrometer (BES) experiment. The data samples for training and test consist of ,U from cosmic ray, e and r from experimental data by strict selection. Although their momentum spectra are non-uniform, the identification efficiencies given by BP are quite uniform versus momentum, and LVQ is little worse. At least in this application BP is shown to be more powerful in pattern recognition than LVQ.
1. Introduction
- As a final results of this study, the e, ,u, r selection efficiency and background suppression level will be given.
In recent years neural networks, both feed-forward multilayer mode with Back-Propagation training algorithm (BP) and Kohonen’s Learning Vector Quantization (LVQ) , has shown a great power in pattern recognition and been widely used in high energy physics data analysis [l-5]. In the BES experiment e, p, Z- particle identification is a very important task for many off-line physical analysis. It has been proved that the conventional discriminant method works not so good in this subject, therefore people are trying to find new tricks for this purpose [6]. By adopting BP and LVQ neural networks to this target problem, the following points are carefully studied, - Since the detector simulation of BES is not so accurate, we can not rely on Monte Carlo (MC) data samples to train and test the neural networks. Instead we take the strictly selected real e and n data and cosmic ,U data for this study. The question is that with the non-uniform momentum distribution of these data samples, whether the finally resulted selection efficiency distributions in the whole momentum range will be uniform, i.e., if the networks can learn enough information about event characteristics from these kinds of data samples. - Performance of BP and LVQ networks are carefully compared using exactly the same condition to see which one is more effective, at least in this application.
2. The BES detector The BES is a general purpose solenoidal detector at the Beijing Electron and Positron Collider (BEPC). It consists of a central drift chamber (CDC) , a main drift chamber (MDC) , a time-of-flight counter (TOF) and a shower counter (SC) within a 4 kG magnetic field. Outside the magnet is a muon counter. The CDC is used for event trigger, the MDC for charged track and dlZ/ dx measurements with 47rx 85% coverage and momentum resolution a/p = 2.1%(1 +p ’ ) 1/Z, the TOF for flight time measurement in 4rr x 76% volume with a Bhabha resolution of 330 ps, the barrel SC (BSC) covers 4a x 80% with an energy resolution of a/E = 22%/d and spatial resolution of a, = 3.6 cm and a$ = 7.9 mrad, the endcap SC covers 47~ 13% with an energy resolution of a/E = 21%/a and spatial resolutions uX = 1.5 cm and a, = 1.7 cm, and then the muon counter has 47~ x 67% coverage with spatial resolutions of u,g = 3 cm and uZ = 4.5 cm for muon momenta more than 0.55 GeVlc. Details of the BES can be found in Ref. [ 7]
3. Selection of data samples *This work is supported by the Chinese National Natural Science Foundation. * Corresponding author. Tel. +86 551 3601178, fax +86 551 3631760, e-mail hfchen@b&l.mphy.ustc.edu.cn. 0168-9002/96/$15.00
In this study the very clean p samples offered by our BES colleagues are selected from cosmic ray with the cuts: total energy deposit of the event in BSC EBSC must be less than
Copyright @ 1996 Elsevier Science B.V. All rights reserved
PHSO168-9002(96)00733-4
212
Zl? Zhang et al./Nucl.
Instr. and Meth. in Phys. Res. A 379 (1996) 271-275
1.5 GeV; there are at least 2 tracks in the event with 1 or 2 charged track and 0 or 1 neutral track; the vertex cuts of the track are R, < 10 cm and IV, 1 < 30 cm; the total number of hits in the muon tubes should be more than 3; in addition, there is at least one track in the range 3.2 < 4 < 6.2 and one track with good TOF quality, that is TQUA = 1 .O. The e and 7~ samples are precisely selected by us from experimental data with the following tight cuts: Electron samples are selected from the eey channel events. The basic idea is to make sure that the most energetic track is an electron, and then the second low energy track must be an electron as well, since the data were collected under r pair production threshold. The selection criteria for the first energetic electron track are like these: in the 1cos 1315 0.8 coverage region requiring its momentum p 2 1 .O GeV; its velocity calculated from TOF (time-of-flight) and momentum being 0.8 5 v/c 5 1.4; there are no hits in the muon counter; the deviation of dE/ dx’s measured value from the expected one must be less than 2~; the ratio of energy deposit in BSC EBSC and momentum p satisfies 0.6 5 E~sc/p 5 2.0; the difference between the total hit number in the first five layers and 12 must not be less than 3. T samples are selected from J/$ ---f pvr, owzr, w4~ processes. Four-momentum conservation and zero total charge value of the event are required; the neutral track number being in 2 to 6; kinematic fit x2 < 25; the invariant mass of gamma pair which may come from ?r” should satisfy Ims, - 0.1351 < 0.05 GeV/c*. In addition some other specific cuts for individual channels are: - J/q --+pn-channel. Charged track number must be 2, and the invariant mass of two n- which may be the decay products of p is required to be jrnzm- 0.7701 < 0.2 GeVlc’. - J/(c, -+ w2~, w -+ rri~-rro channel. Charged track number must be 4, and the invariant mass of w is required to be jm3a - 0.7821 < 0.05 GeVlc*. - J/$ -+ w4a channel. Charged track number must be 6, and the same w invariant mass cut as above.
(1) p, momentum of charge track. (2,3,4) xse, xsu, xsp, standard deviation of the measured dE/ dx for e, p, 7~ from its expected value respectively. (5) TOF, time-of-fight of the track. (6) EBSC, total energy deposit in BSC. (7-12) Hit(i), i = 1 , . . . ,6, number of hits in ith layer of BSC. (13-18) See(i),i=l,..., 6, dE/ dx in ith layer of BSC. (19~20) ICOS(@MEC - 8BSC)l, ICOS(~MEC - dBSC)I, polar and azimuth difference between the track orientation determined by main drift chamber (MDC) and that by the first layer’s hit position of BSC. (21) NE, total number of hit cells in BSC. (22,23) qax, East, Eascth layer of BSC has maximum number of hit cells Nf”. For p > 0.5 GeV the following 8 more variables related to muon chamber informations are added: (24-29) DR (i) , D, (i) , i = 1,2,3, hit position differences in each layer of the muon counter and those extrapolated from the main drift chamber along the transverse R and longitudinal z direction to beam. (30,X) NE, NE, the real and expected total number of hit layers in the muon counter. All these variables, which more or less reflect the differences between these three kinds of particles except momentum p, are properly normalized to [ 0,l ] and serve as BP’s inputs. By changing the neuron number of hidden layers from 15 to 30, no significant variation of the networks performance is observed. There are 3 neurons in the output layer which correspond to e, ,LL,rr events, respectively. Each of the e, ,u, r training data sets consists of more than 5000 events in order to make sure that the BP networks can learn enough characteristic information of these three classes
4. Training and test of BP 4.1. The structure and training of BP Several ways to train the BP are tried. Firstly, as a normal procedure, the data samples in whole momentum range are taken to feed the networks. Secondly, considering that them is no muon counter information when the momentum is as low as p < 0.5 GeV, training and test is done with data samples of p < 0.5 GeV and p 2 0.5 GeV, respectively. It is found that the results of the later scheme, which will be given later, are only a little better, not much. After careful study of the special characteristics of all kinds of particles and their responses in detector, for p < 0.5 GeV particles the following 23 physical variables are selected:
Fig. 1. Momentum spectra of (a) e from eey channel, (b) @ from cosmic ray data and (c) T from J/$ -+ /VT, WTTT,047~ channel.
q
Zb? Zhang et al./Nucl.
1 2; GO.75 0 i; 0.5 e 0.25
Instr. and Meth. in Phys. Res. A 379 (1996) 271-275
213
e
..
0
.. .
0.5
i .. .:: : 1
P&
(b)
0
0.2
0.4 P(GeV)
Fig. 2. Selection efficiencies of e, pcL,m as function of momentom given by BP with cuts you, > 0.5 on corresponding output neuron (dashed line and dotted line represents the probability of ,u and r misidentified as e, e and a- as .a, and e and + as n respectively), (a) p 2 0.5 GeV, (b) p < 0.5 GeV.
1 &
0.8
i?i 0.6 ?I? LL 0.4 h
0.2
m 1
e
1
&
0.8
0.8
5 0 LI
0.6
0.6
0.4
0.4
k
0.2
0.2 0
1 & 0.8
0.8
ft 0.6 0 CL 0.4
0.4
h
0.2
0.2
0
0
0.6
Y..t
Y..t
-yy---& 0.5 Y..k
Fig. 3. Selection efficiencies and contamination probabilities of e, /.L,w vary with yout in the momentum range of (a) p 2 0.5 GeV and (b) p i 0.5 GeV.
of particles. Shown in Fig. 1 is the momentum distributions of these data sets. In training stage each of the three data sets are sequentially fed into the networks, and each event in certain set is randomly chosen. The target value of the ith output neuron is designed to be 1 for ith (i = e, ,u, a) class data set and 0 for other two classes. The inter-layer linking structure and training algorithm are almost the same as Ref. [ 41 but the temperature parameter T in the sigmoid activation function
f(x) =
1
1 + exp( -x/T)
(1)
is set to be 0.4 in order to make it more behave as a threshold function to increase compressing ability. The learning
strength q and “momentum” parameter cr are initially taken to be 0.05 and 0.3, and as the training goes v is gradually reduced every 10 000 loops according to Nk+i = O.qqNk,
(2)
but not less than 0.0001, a could be changed to 0.2 after long enough training loops. The linking weights Wij’s and threshold @j’s randomly initialized in [ -0. 1 , 0.11 are adjusted for each loop and updated every three loops. This training procedure takes typically 2 rnilhon loops. After the first training run, if the result is not so optimal, the outputted weights and thresholds can be taken as initial values to restart the next run with newly adjusted q and (Y parameters, in this way CPU time would be saved in the next training procedure.
214
ZI? Zhang et nl./Nucl.
In&r. and Meth. in Phys. Res. A 379 (1996) 271-275
Table 1 Selection efficiencies of e, p, r by BP and LVQ Id&i- BP inputs fied as e e fl r
96.9% 0.4% 1.9%
h 0.3% 90.6% 8.7%
= 1.9% 5.3% 91.9%
LVQ inputs e
JJ
87.6% 1.2% 11.2%
0.8% 87.1% 12.1%
r 1.8% 8.0% 90.2%
(1) Initialization of neurons: for each class of the e, p, 7~ particles, 150 initial neurons are set with values just read in from the corresponding class of data samples. (2) For each neuron a training counter PF is set with an initial value 0. (3) Sequentially presenting the e, ,u, r data samples to LVQ networks and for each event s the closest neuron rn: is searched. Supposing s belongs to the c’ class of particle, rnf is updated according to:
mi(t+
1) =mf(t) ifc=c’,
4.2. Test results of BP Once the networks are successfully trained, all the weights Wij’s and threshold @j’s are fixed, other independent e, ,u, r data sets can be taken as the inputs of first, second and third class events to test the BP performance. By setting a cut yout on each of the three output neurons separately, the selection efficiency for corresponding signal particles and contamination probabilities from other two classes are obtained. This results are plotted in Fig. 2 by setting a yout > 0.5 cut on each of the output neurons. It can be seen that these efficiencies stay quite constant in the complete momentum range except in the vicinity of p = 0.2 GeVlc. This is a very interesting result for practical applications by noticing how badly non-uniform those momentum spectra are as shown in Fig. 1. The selection efficiencies and background levels for each class particle vary with yoUtin p > 0.5 GeV and p < 0.5 GeV momentum range are given separately in Fig. 3. In order to give a quantitative impression, the selection efficiencies for each signal class particles and background contamination probabilities from other two classes are listed in Table 1 by setting a a yoUt 1 0.5 cut on each corresponding signal output neuron, also given in Table 1 are those from the LVQ test result, which will be explained in the next section, in order to give a comparison between these two different kinds of neural networks.
+a(t)[s-mmf(t)],
mF(t+l)=mF(t)-a(t)[s-mmf(t)], if c # c’, mT(t+ 1) =mj”(t). all others of j # i. 1
(3)
(4) Destruction step: after having presentedthe whole training data samples, eliminate those neurons which are undertrained or which are overtrained by vectors belonging to classes different from that of the neuron, that is, which are too much contaminated. By carefully tuning according to the training counters 4, the following cuts are chosen, i.e., the ith neuron which satisfies anyone of the following formula will be pruned off:
ffi = 1
cczci pi” c p.c 2
fl”‘”
=
c 1
0.2, fore, 0.5, for fi, 0.5, for 9r.
(5)
If the neuron number of a class becomes too small after this destruction new neurons are added with initial values read in from this class of data set, and then the next training loop restarts. 5.2. Test result of LVQ networks
5. Training and test of LVQ networks 5.1. Structure and training algorithm of LVQ If the n physical variables which represent a particle event are considered to be components of a vector in feature space, the vector’s distribution for each class of particles should be different in the space. From this knowledge the basic idea of Kohonen’s LVQ training algorithm is to find 1 reference vectorsofclasscparticlesmf(i=1,2,~~~,I;c=e,~,~), which local point density in feature space describes the distribution of this class of particles best. These reference vectors are also called neurons. When a given unknown signal s is present, it finds the vector mf among all the reference vectors, from which s has the smallest distance 1s - &I, then s is taken to be belonging to c’ class. Following is the detailed training algorithm adopted in this study:
The test result of successfully trained LVQ networks by another independent data sets are shown in Fig. 4, from which the selection efficiencies of each individual class of particles and the contamination probabilities from other two classes are given as a function of momentum. Comparing this result with Fig. 2 it is obvious that the selection efficiencies of LVQ are lower than BP networks, and its variation with momentum are also not so uniform as that of BP A quantitative comparison of these two different modes of neural networks is listed in Table 1.
6. Conclusions This study shows that neural networks can be effectively used for particle identification in the BES experiment, even under the condition that, because of the somewhat poor BES
2.P Bang
et at./Nucl.
instr. and Meth. in Phys. Res. A 379 (1996) 271-275
1
275
l.! P(GeV
1 & z 0.75 b G b
0.5 0.25
P(GeV)
P(GeV)
P(GeV)
Fig. 4. Selection efficiencies of e, pcL,m as function of momentum given by LVQ (symbols are the same as in Fig. 2). (a) p 2 0.5 GeV, (b) p < 0.5 GeV.
detector simulation, we can not use MC data samples for networks training and test, but must use the real experimental ones of serious non-uniform momentum spectra. It means that the networks still learned enough information in this case. This is obviously of great importance in practice. Another conclusion is that, at least in this study, the pattern recognition ability of LVQ is not so powerful as that of BP networks. This is something different from that of Ref. [ 51, where in the discrimination of pp --+ ti events it is stated that LVQ is better than BP, and both BP and LVQ are worse than the Fisher statistical discrimination method.
References [ I] L. Lijnoblad et al., Nucl. Phys. B 349 (1991) 675. [ 21 Wayne S. Babbage and Lee F. Tbompson,Nucl. Instr. and Meth. A 330 (1993) 482. [3] M. Joswig et al., DESY 93-167 (1993) ISSNO418-9833. [4] Z.P. Zhang,Y.F. WangandV. InnocentqHighEnergy Phys. andNuclear Phys. (in Chinese) 18 (1994) 769. [5] A. Cherabird and R. Odorico, Z. Phys. C 53 (1992) 139. [6] Z.J. Jiang, T.J. Wang, Y.G. Xie et al., Nucl. I~SIJ. and Meth. A 345 (1994) 541. [7] BES collaboration, J.Z. Bai et al., Nucl. Instr. and Meth. A 344 (1994) 319.