Nuclear
Instruments
and Methods in Physics
Research
A 376
C1996)41l-419
NUCLEAR INSTRUMENTS 8 METHODS IN PHYSICS RESEARCH Sectton A
Results from a MA16-based neural trigger in an experiment looking for beauty C. Baldanza”, J. Beichterb, F. Bisi”, N. Bruelsb, C. Bruschini”, A. Cotta-Ramusino”, I. D' Antone”, L. Malferrari”, P. Mazzanti”, P. Musico”, P. Novelli”, F. Odorici”, R. Odorico”‘“, M. Passaseod, M. Zuffa” “INFN Annetthe, Via Irnerio 46, 40126 Bologna, Italy bSiernens AG, ZFE T ME2 81730 Munich, Germaq ‘INFNIGmoa, Via Dodecaneso 33, 16146 Getzoa, Itaiy “CERN, 1211 Geneva 23. &it,-erland Received
15 September
1995; revised form received
20 November
1995
Abstract Results from a neural-network trigger based on the digital MA16 chip of Siemens are reported. The neural trigger has been applied to data from the WA92 experiment, looking for beauty particles, which have been collected during a run in which a neural trigger module based on Intel’s analog neural chip ETANN operated, as already reported. The MA16 board hosting the chip has a 16.bit I/O precision and a 53.bit precision for internal calculations. It operated at 50 MHz, yielding a response time for a 16 input-variable net of 3 ps for a Fisher discriminant (l-layer net) and of 6 ps for a 2-layer net. Results are compared with those previously obtained with the ETANN trigger.
1. Introduction The recent appearance of neural microprocessors, able to implement multi-layered neural networks with response times of a few microseconds, has stimulated efforts for the realization of on-line neural triggers capable of exploiting non-leptonic signatures of heavy flavors in high luminosity particle accelerators. We have already reported [ 1,2] the results from a neural trigger based on the analog ETANN chip [3] which operated in experiment WA92 at CERN during the 1993 run. The neural trigger also included two boards based on the digital microprocessor MA16 [4.5], which were not ready at the time of the run because of delays in the debugging of the board control code. We present here results from the MA16 board of the neural trigger module, which has been applied off-line to the same experimental data on which the ETANN board operated on line. An early report on the MA16 results has been presented in Ref. [6]. WA92 is an experiment looking for the production of beauty particles by a Y beam impinging on a fixed target [7]. Task of the neural trigger was to select events by exploiting a non-leptonic decay signature and to accept them into a special data stream, meant for early analysis.
*Corresponding author. Tel. +39 51 242018, 242018 or 247244. e-mail
[email protected]. 0168.9002/96/$15.00 Copyright PII SOl68-9002(96)00159-3
01996
fax
+ 39 5 1
The training of the neural network was done using WA92 data collected in the 1993 run and certified off-line by the Trident event reconstruction program. The events used had been accepted by the WA92 standard trigger. so that the task of the neural trigger was to provide additional enrichment for the events it accepted into the special data stream. Input to the neural trigger was provided by the Beauty Contiguity Processor (BCP) [8,9], which determined tracks and their impact parameters on-line, using hit locations in the silicon microstrip vertex detector. Section 2 describes the aims of the neural trigger in experiment WA92 and Section 3 the input to the neural trigger from the WA92 trigger apparatus. Section 4 highlights the main characteristics of the MA16 hardware. Section 5 details the neural network architecture and the training procedure used to account for the finite bit-precision of the MA16 chip. Section 6 presents the experimental trigger results. Section 7 contains the conclusions.
2. Aims of the neural trigger in WA92 WA92 is an experiment at CERN looking for the production of beauty particles by a n- beam at 350 GeV/c impinging on a Cu target (during the 1993 run) [7]. The neural trigger hosted in the experiment had the task of selecting events, already accepted by the WA92 standard
Elsevier Science B.V. All rights reserved
412
C. Baldun:a
et al. I Nucl. Instr.
and Meth.
trigger, by exploiting a non-leptonic beauty decay signature and to accept them into a special data stream, meant for early analysis. Specifically, the neural trigger was trained to enrich the fraction of events with C3 secondary vertices, i.e. branching into three tracks with sum of electric charges equal to + 1 or - 1. C3 vertices are sought for further analysis aimed to identify charm and beauty non-leptonic decays. Training was done with events previously collected in the same 1993 run, and certified off-line to contain or not a C3 vertex by the Trident event reconstruction program. The off-line reconstruction of C3 vertices makes largely use of data from the fine-grained silicon microstrip decay detector [IO], which has a slow response and cannot be used for triggering. In order to enrich the fraction of C3 events in the selected sample, the neural trigger exploits statistical correlations characteristic for C3 events among quantities which can be calculated on-line from the hits measured in the silicon microstrip vertex detector, which has a fast response and can be used for triggering. Such correlations are “learned” by the neural network, to be loaded on the neural trigger, in a previous off-line stage (the training stage). During training the neural network is presented with two separate event samples, which have been certified off-line as containing and not containing, respectively, C3 vertices. An appropriate training algorithm changes the internal parameters of the neural network so that its response becomes as widely different as possible for the two classes of events. Since the events to be triggered on had already been accepted by the WA92 standard trigger, the task of the neural trigger was to provide additional C3 enrichment for the events it accepted into the special data stream.
in Phys. Res. A 376 (1996)
ill-419
WA92 TRIGGER APPARATUS VERTEX DETECTOR TRIGGER MICROSTFIIPS
BUlTEFiFLY HODOSCOPE
Fig. 1. Schematic view of the parts of the WA92 apparatus relevant for the neural trigger. are those listed in Table I. The BCP could supply up to 8 hit-maps, with a resolution which can reach 256 bits. Data transfer times would be correspondingly increased. Simulation has shown that the arrangement we eventually employed is adequate for the task and does not lead to any significant loss of performance (Section 5.2). The BCP also makes a first level decision about accepting the event. Defining as secondary tracks those with lIPI > 100 pm and as primary the other tracks. the BCP Trigger (BCPT) requires the presence of 3 primary tracks, 2 secondary tracks and I track in the Butterfly Hodoscope with high pT (defined in a detector dependent mode, roughly corresponding to > 0.6 GeVlc along the ; axis). The neural trigger considers only events which have already been accepted by the BCPT.
3. Input from the WA92 trigger apparatus The part of the WA92 trigger detector [7] relevant for the input to the neural chips (see Fig. I) consists of 6 silicon microstrip planes of the Vertex Detector (VD), measuring the z coordinates of tracks, where the z axis has the direction of the magnetic field (bottom-up direction). The last plane is positioned at about 41 cm downstream of the target and has a square shape with a 5 cm side, centered around the beam. The hit information from the 6 planes is processed by the Beauty Contiguity Processor (BCP) [8], which reconstructs tracks and their impact parameters (IP), with a global response time of about 40 p,s [9]. IP is defined as the signed difference between the z value of the track extrapolated back to the target-center plane and the beam z position. The BCP supplies to the neural trigger 5 words of 64-bit each, plus a standard go-ahead termination word. Each one of the 5 words directly represents the hit-map of tracks on the last (6th) VD layer along the z axis, divided in 64 bins, for tracks falling within a given IP window. The 5 IP windows used
4. Neural trigger module A detailed presentation of the neural trigger hardware will be made elsewhere [Ill. Here we only report its main features. The neural trigger hardware consists of a crate hosting VME9U boards: i) an interface board receiving the five 64-bit words plus a termination word from the BCP and control bits to synchronize with the beam pulse: ii) four Table 1 The 5 IP windows used #IP
Range lwml
1
-200
2
200
3
- 400
4
400
5
-900
200 400 - 200 900 -400
C. Baldanza
et al. I Nucl. lnstr.
and Meth.
preprocessing boards hosting two independent preprocessing units each, each one of the latter calculating the input variables referring to a given impact parameter window entering the neural chips (with 5 impact parameter windows only 5 such units are actually used); iii) an ETANN board, hosting two independent ETANN neural chips; iv) two MA16 boards, hosting a MA16 neural chip each: v) a (VME6U) VIC board of CES interfacing the VME bus to a personal computer for control and monitoring operations, and which also allows to simulate on-line running conditions by having recorded experimental inputs passed through the crate with subsequent collection of the corresponding outputs. At the time of running the MA16 boards were not yet operating, waiting for debugging of the microcode controlling the boards. The ETANN trigger bit was sent to the WA92 trigger apparatus and fuller information about the neural chips responses was sent to the WA92 event recording apparatus where it was written in the event record on tape. 4.1. The MA16 microprocessor The MA16 is a digital, systolic chip developed by Siemens [4,5]. It has a precision of 16 bits for input variables, 16 bits for weights, 16 bits for scalar multipliers effectively modifying the transfer function shape, 47 bits for thresholds, 53 bits for internal calculations, 38 significant bits for the output (out of a 48 bit output word). It can accommodate an arbitrarily large number of input variables, the processing time increasing with their number. It operates up to a clock frequency of 50 MHz. It processes four input patterns simultaneously in a pipelined fashion. The latter feature cannot be directly used in our trigger application (where one has to wait for one event at the time), but can be exploited to reduce by a factor of four the clock frequency requirements on parts of the chip periphery on the board. The MA16 is a pure processor, with no memory available to store the neural-network parameters. The latter must be stored on external memories, and must be supplied at the required clock times to the MA16, according to the operating code loaded into the MA16 during processing. .The operating code of the MA16 determines the configuration of the internal switches, which can be dynamically changed during running time. According to the operating code setup the MA16 can perform a number of operations, including those required in a neural network of the feed-forward or of the learning vector quantization (LVQ) type. In the application considered here, the operating code is set so that the MA16 processing corresponds to that of a layer in a feed-forward net: 0, = A, c w<,Y, + e,, J
(1)
where y, are the variables in input to the layer, w,~ are the weights, 0; are the thresholds, A, are the scalar multipliers
in Phys. Res. A 376 (1996) 411-419
413
and Up are the layer outputs. The variables y, are fed serially to the input port of the chip, and the outputs o, are delivered serially from its output port. The MA16 is supported by development software to generate and to check microcodes for MA16 applications in a UNIX/XView environment, including a MA16 circuit simulator [ 121. 4.2. The MA16 board The block diagram of the VME9U MA16 board is shown in Fig. 2. The MA16 is serviced by nine l6-bit EPROMs, sustaining clock frequencies in excess of 50 MHz: i) one storing the MA16 operating code file; ii) one storing the board control file; iii) three storing 16 bits each of the 47-bit words of the file containing the thresholds (8,) and the (16-bit) scalar multipliers (A,), which are loaded at different times; iv) four storing the four weight files, each one containing the 16-bit weights (w,,) of four nodes. The board is conceived to accommodate a multi-layer neural network. The MA16 receives the 16-bit input variables from the board inputs in the 1st pass and in the following passes from its output fed back through a transfer function realized by a Look-Up-Table (LUT) implemented on 16-bit addressable EPROMs. The 1st pass input variables are temporarily stored on a RAM (accommodating up to 32 of them), which is normally loaded via a proprietary bus with the output from the preprocessing boards, but can also be loaded via the VME bus (e.g., from a PC). From there, they are transferred to input latches and loaded serially into the MA16. From the 47-bit (plus a l-bit overflow flag) output of the MA16 a 16-bit slice is cut, truncating the 16 least significant bits and keeping the 15 following bits and the sign bit, which is sent in input to the LUT. There are two output latches on the signal emerging from the LUT. The board controller can trigger them at the clock times one prefers. By organizing the weights to accommodate two independent nets, one can thus pick up their separate outputs. The transfer function T(x) implemented on the LUT is chosen to be a sigmoid: 1 T(x) = I + em”’
(2)
A check is made on the LUT input for possible 16-bit complement-of-2 underflows and overflows. In both cases the LUT is bypassed and its output is replaced by a zero for a negative overflow and with one (actually an hexadecimal FFFF, see below when discussing training) for a positive overflow. Each one of the two output latches let the signal persist so that it is made available in output from the board. Both output signals are also sent to separate threshold comparators, with threshold values preset via the VME bus, and the resulting trigger control bits are available in output
414
C. Baldatw
et al.
I Nucl.
Instr.
and Muth.
it!
Phys.
Rex. A 376
(1996)
-III-419
MA16 BOARD
Fig. 2. Block scheme of the MA16 board.
from the board. The contents of the two output latches can also be read via the VME bus from the PC, e.g. for off-line operations. The board operates at a clock frequency of 50 MHz. The response time for a l-layer net is under 3 ps and for a 2-layer net under 6 ps.
5. Neural network architecture
and training
5.1. Training target The training target for the MA16 neural trigger has been taken to be exactly the same one as for the ETANN trigger [2]. As stated in Ref. [2], in principle we could have set as target for the neural trigger the direct recognition of beauty particle decays, defined by simulated events obtained by an event generator with added GEANT simulation of the WA92 apparatus. However, we refrained from doing that mainly for two reasons: i) although such simulated events were available, they were suitable for off-line studies, but were not appropriate for quantitative tuning of a neural trigger, because of the general dependence of event generators on model assumptions which are especially critical for the correlations involving the relatively low mass beauty particles and a complex nuclear target like copper; ii) the necessity of depending on graphical scan-
ning, with the long waiting times involved, to have indications on the actual performance of the neural trigger. Taking into account the effective necessities of the experiment, which required the analysis of many tens of millions of events to extract the few thousands affordable by graphic scanning, we rather focused on the enrichment in events which are selected for that sake. Especially interesting, from this point of view, are events with C3 secondary vertices, branching into three tracks with sum of electric charges equal to 5 I. A C3 vertex may be due to a D+Karr decay. which in its turn may be originated by the decay of a beauty particle. Certification of these events comes from the Trident event reconstruction program. By applying the event reconstruction program to real data obtained with a given experimental setup, one gets the necessary event training samples to tune the neural trigger for enrichment in C3 events with the same experimental setup. In this way one avoids altogether dependence on theoretical modelling and only relies on the event reconstruction program. This is the target we have chosen for the neural trigger. In doing that the neural trigger does not act alone, but in conjunction with the BCP Trigger (BCPT, see Section 3): only events already accepted by the BCPT are worked out by the neural trigger. By concentrating on BCPT events only, the neural trigger can better focus on statistical correlations present within the events which discriminate signal from background. The non-leptonic signature represented by the combination of the BCPT and
C. Baldanza et al. I Nucl.
Instr. and
the neural trigger has been used to send events to a special data stream, meant for early analysis. In order to train the neural net, one not only needs to have samples of signal and background events certified by the event reconstruction program, but also to have the corresponding input patterns to the neural chips calculated from them. As specified in Section 3, input variables to the neural chips are calculated from the BCP output, but the latter is not recorded on the event tapes. To recover it we used the BCP simulator, which exactly reproduces the BCP calculations [ 131, applied to the raw data on tape. The training event samples we used consisted of 3000 C3 events and 10 000 non-C3 events. accepted by the BCPT. The collection of the C3 event sample required the running of the Trident event reconstruction program for about 4 days on 3 alphavaxes 3000/400. C3 vertices are defined by the Trident event reconstruction program. release 11. requiring in addition: i) IP in the y-z plane (orthogonal to the beam direction) >60 p,m, ii) distance in x from the primary vertex >6a, iii) x of the vertex not before the target and not beyond the vertex detector, iv) at least two tracks with /IP( along the z direction >20 p,m.
Meth.
in
Phys. Res. A 376 (1996) 411-419
415
variables, we found that the following 16 variables were adequate according to significance tests based on Fisher discrimination, Each 64-bit map (#IP= l-5) is divided in 4 X 16-bit groups, ordered bottom-up in z (Fig. I ), and on the basis of that the following variables are defined: Kl6MAP(i)= #hits in group i= l-4 (range: O-16), Kl6= #groups hit (range: O-4), KDW16= #groups in bottom half hit (range: O-2), KUPl6= #groups in top half hit (range: O-2). The 16 variables which have been chosen are those listed in Table 2. They range over integer values from 0 to 16, so that a 5-bit precision would be enough for their handling, versus the 16-bit input precision of the board. The slight asymmetry between windows with opposite IP is due to the fact that the BCP seeks the beam particle searching from bottom up and that there is a substantial fraction of events with two beam particles, one of which is just passing throezgh. If the latter is picked up by the BCP as the actual beam particle in the event, one gets a spurious determination of the IP of tracks leading on average to the above asymmetry. 5.3. Neural net architecture
and training
5.2. Input variables Also for the input variables, those used for the MA16 trigger are exactly the same as those used for the ETANN trigger [2]. In order to select them, we could not wait for the 1993 data being made available, but we had to rely on the 1992 WA92 data obtained with a somewhat different BCPT setup. Since the 1993 BCPT included in part the correlations exploited in the preliminary neural net studies on the 1992 data, it turned out that some of the input variables which had been chosen had become irrelevant. The performance of the neural trigger has degraded to some degree because of that. Only input variables simple to calculate with the available hardware were taken into consideration. The basic strategy in devising them was to have them conveying multi-scale information on the hit maps provided by the BCP. We first considered the maximum resolution made available by the BCP: 256 bits. By directly studying the corresponding hit maps for a set of events, however, we saw that 64-bit hit maps were enough. Using such a resolution had the advantage of reducing transfer times and simplified the collection of the hit maps from the BCP by employing an already existing but unused 64-bit output connector. We then considered variables counting the number of hits with l-bit, 4-bit, g-bit, 16-bit, and 32-bit resolutions. We submitted them to significance tests based on Fisher discrimination [2]. It turned out that the significant variables were those associated with l-bit and ldbit resolutions. The study was made by varying the IP window segmentation. Eventually, we found that an adequate IP window segmentation is one consisting of the 5 IP windows specified in Table 1. As to the choice of
The MA16 board can accommodate two independent feed-forward neural networks (on the same MA16 chip), as stated in Section 4.2. As one of them, we picked up a I-layer net with 1 output, i.e. a Fisher discriminant [l4]. Its training simply requires the inversion of the correlation (or covariance) matrix (see, e.g., [15]). For the other one, we took a 2-layer net with 5 hidden nodes and 1 output node, using the back-propagation algorithm for training. Its performance did not change appreciably when increasing the number of hidden nodes. The training of the neural net has to face the problem of
Table 2 The 16 input variables used #
Variable
#IP
1 2 3 4 5 6 7
K16 K16MAP( 1) K16MAP(4) KDW16 K16MAP(l) KUP16 K16MAP(3) K16MAP(4) KDW16 K16MAP(l) Kl6MAP(2) Kl6MAP(3) KUP16 K16MAP(2) Kl6MAP(3) K16MAP(4)
1 1
8 9 10 11 12 13 14 15 16
1 2 2 3 3 3 4 4 4 4 5 5 5 5
416
C. Baldanza
et al. / Nucl. Instr.
and Mcth.
complying with the finite-precision restrictions for the net parameters, which must be integer. A crude way to handle it might seem to be to ignore the restrictions during training, using a floating point representation for the parameters, and to eventually replace them with their closest integer values. But that would lead to uncontrollable and likely large differences with respect to the floating point results, not to say that the available bit precision would be conspicuously underused. Therefore the restrictions must be taken into account during training itself, in a way to optimize the utilization of the bit precision available at the various stages of the calculation. In Eq. (I), the parameters y,, wZ, and A, have 16.bit precision, 0< 4%bit precision, and o, 47-bit precision (plus one bit as overflow flag). For all of them a complement-of2 representation is used, except for the y, in input to the 2nd layer which are configured as 16-bit positive unsigned numbers by the MA16 opcode. After the sum in the equation is carried over the first 4 terms, the multiplication by A, is done, and the 14 least significant bits of this partial result are truncated before accumulating it to 13~.Then, the MA16 sums over the next 4 terms and so on. The partial result, before truncation, is held with 53-bit precision and therefore no overflow can occur. From o, a 16-bit window must be extracted, holding a 16-bit complement-of-2 number representing the input to the T(X) LUT. The LUT output is configured as a 16-bit positive unsigned integer. If the LUT-input overflows, the LUT EPROMs are bypassed: 0 is returned for a negative overflow and hexadecimal FFFF for a positive one. For the training, one ought to consider a number of points: i) The weights w!, and the thresholds 0, should be kept as large as possible, so as to cope with the precision losses due to cancellations. For the w,, that can be done with the help of the A,: at each weight update, a power-of-2 factor can be exchanged between A, and the w,, of the corresponding node, so that the maximum absolute value of the w,, of the node is just below the maximum allowed by 16-bit precision (2”- 1). One can further exploit the arbitrariness of the LUl-input scale, and factor out of the A, and @,of all nodes (of all layers) a common factor, let us call it A,, to be absorbed in the LUT definition. That gives extra flexibility in the setting of the A, and of the S,, and it can help in maximizing them so as to reduce the effects of the 16bit truncation before accumulating to 6,. Further help for this sake can come from the choice of the scale factor for the y,: one can multiply all y, by a common power-of-2 factor so that their range of values over the training set saturates the 16-bit allowed range. For the y, of the first layer, that can be done by a bit left-shift operation realized by a simple hardwiring on the board, and for the _Y, of the 2nd layer by an appropriate choice of the LUToutput scale. ii) The 16-bit window, to be extracted out of the 47-bit o, output of the MA16 and to be sent in input to the LUT,
in Phys. Re.r. A 376 (1996)
-l/l--119
must be picked up with care. The board can accept any choice for this (input independent) window, which remains the same for all layers. The window must not be set too low (i.e. too much shifted towards the least significant bits), otherwise there would be too many LUT-input overtlows, which might impair the proper setting of the trigger threshold. It must not be set too high, otherwise the bit truncation becomes so severe that precision would collapse (in the extreme case. all LUT-inputs would be reduced to zero). In order to make a wise choice for the window setting. one must look at the layer-output distribution on the training set and position it so as to accommodate the bulk of the layer-outputs, in the sense that most of them should be close to overflow but with the actual number of overflows kept to a minimum or, if possible, zero. The problem is made more complicated by the fact that the same window must hold for all layers. For a 2-layer net, however, this problem can be eased out by an adroit forcing of the LLJT-output scale factor AT (and retraining). If the values of the 1St-layer input-variables do not saturate the 16-bit range, extra flexibility can be gained by multiplying them by a suitable power-of-2 factor, by means of a simple bit left-shift operation, as stated at the previous point i). In principle, though, once a bit slice for entry to the LUT is selected, training itself should optimize the net for the given bit-slice selection. if one does not fall in initial configurations yielding systematically null entries to the LUT over the training sample. That can be avoided by initializing w,~. OSand A, to high values. if the input variables are always non-negative. The danger of getting stuck in this way in the saturation region for some of the nodes can be dispelled, as usual. by adding to the exact sigmoid derivative a small positive term. In sum, the training procedure included the following steps: i) input variables (to the 1st layer) are multiplied with a factor 2 ‘“. ii) At each back-propagation update of the net parameters: a) for each node, the maximum absolute value of the node weights w,, is found and a power-of-2 factor is exchanged with A, so as to bring it just below its maximum 16-bit allowed value: AZ2 1 must be preserved, and if A, ~2’~ would result, the factor is applied in its entirety to the w,, but A,=214 IS kept, thus introducing a distortion; b) the minimum and maximum values of A, for al1 nodes of all layers is determined, and the corresponding bit range is centered between 0 and 14 by exchanging a power-of-2 factor between the A, and 6, on one side and AT on the other: c) the values of w,, and 0, are truncated to integer values (A, and A, are always integer by construction). iii) The sum in Eq. (1) is carried out in steps of 4 terms. At each step the partial result of the sum is truncated to an integer and furthermore has its least significant 14 bits dropped before being accumulated to 0,. Let us call 5, the
C. Baldanza
et al. I Nucl. Instr.
and Meth.
corresponding final result for the expression, including multiplication by AT. iv) Before entering the transfer function, the 16 least significant bits are dropped from 0,. Let us call the result 0,. It enters the transfer function, defined using Eq. (2) as:
I, = (2’” - l)T(.r), X =f;,l
(215 - 1)’
j;, = 36.
The value ofJ, makes sure that saturation 6, approaches 215 - I. v) At the end of training. a power-of-2 between the A, and g on one side and h, that the maximum A, gets as close as compatibly with 4~ I. The 16-bit I/O LUT is then defined as:
(3)
is reached when factor is traded on the other, so possible to 2“’ transfer function
Ak L, = (21h - 1)&r), x =J,---r(215 - 1)’ -2”sk%2’“-
1.
(4)
For the rest, the training consists of the standard backpropagation training [16]. Both Euclidean and non-Euclidean [ 171 cost functions and various sets of values for the training parameters (e.g. correction size, momentum term, update frequency, number of epochs) have been tried, with marginal changes in performance. The performance of the neural net complying with the MA16 restrictions obtained in this way has been verified to be undistinguishable from that of a neural net without such restrictions. This is in line with the conclusions of Ref. [ 181, where it is reported that such restrictions start to be felt only for bit-precisions under 12-13 bits. For the Fisher discriminant, the effects of integer arithmetics are marginal. Care has been taken to make the coefficients of the Fisher vector saturate the allowed l6-bit range. The training has been carried out using program NEURAL [19], which incorporates all the above features and writes the resulting neural net parameters and the transfer function LUT in a form allowing for easy loading on the board EPROMs.
in Phy.
Res. A 376 (1996)
417
411-419
6. Results The neural trigger, for the ETANN part, operated for two continuous weeks within the apparatus of WA92. The results we present for the MA16 part are exactly for the same event samples and for the same input variables as those used for ETANN. The samples of events on which the results are based were collected in September 1993. They where chosen a few days apart to check for stability. The training sample dates back to mid August. Specifications of the event samples used for testing are given in Table 3. Two independent neural networks with I6 input variables have been simultaneously loaded on the MA16 board: i) a l-layer net, yielding a linear Fisher discriminant, ii) a 2-layer net, with 5 hidden nodes and one output note, whose response is referred to in the following as the neural discriminant. C3 acceptance is defined as the fraction of C3 events retained in the neural trigger selected sample out of the full original sample (of events already accepted by the BCP trigger). C3 enrichment is defined as the ratio of the C3 event fraction in the neural trigger selected sample to the C3 event fraction in the full original sample. Fig. 3 and Fig. 4 show the dependences on the trigger threshold of the acceptance and the enrichment for C3 events given by the neural trigger. applied to events already selected by the BCP Trigger. Moving up the trigger threshold, the C3 enrichment factor increases up to a value of -7. To better see the tradeoff between enrichment and acceptance, one can remove the horizontal axis and plot C3 enrichment directly versus the corresponding C3 acceptance, as done in Fig. 5 and Fig. 6. The small deviations in performance, observable when moving from one event sample to another, are due to slight modifications in the experimental apparatus, e.g. caused by temperature excursions. The overall stability of the neural trigger performance rather shows its robustness toward changes in the external conditions. Results from the Fisher and neural discriminants are qualitatively similar, which indicates that the problem at hand is essentially a linearly separable one and does not need more than one layer to be handled. The MA16 results are also qualitatively similar to those provided by the
Table 3 Main characteristics of the event samples used to get the neural trigger results for C3 event enrichment and acceptance shown in Figs. 3-6
#Events reconstructed #Events BCPT #Events BCF’Twith C3 Run start Run stop
Test set 1
Test set 2
Test set 3
388 178 253 149 6403 3&p-93 13:04 4-Sep-93 00:28
482 286 308 538 6083 5-Sep-93 21:05 6-Sep-93 04:27
485 493 316 133 6621 lo-Sep-93 15:16 IO-Sep-93 23:36
418
WA92 1993
RUN
Rates After Standard Non-Leptonic I
1
I
’
l
WA92 1993 RUN
Trigger
’
I
’
8
t!
MA1 6 Fisher
7 -6
l
: : ’ (Varying Threshold)
t
* l
TEST
SET
1 .
-a TESTSETl.
n
0 C’
t-1 66; 7400
9000
8200
9EOO
l-
’
I
Trigger
’
I
MA1 6 Neural *et l.
?Qeae*8#&g ACC
l
0.5 -
&=
IQ:,
5 T
. - 6
@$===ENB *. * Sfi ‘m
6 8
::=
s
SET
1 .
-
2*0
-
3*0
,
,
I
Trigger
I 1I
Neural
*******St . l**l.** =m . **t* . . . . . nm. l* ..,. l o.= l*. . (Varying Threshold)
l
6
4 -4
:S-3 TEST 1
MA1 6
5 ;
B
.1’ *a:. -***tl=m
, , , , , ,
7
7
plotted
From Fig. 5 and Fig. 6 one can read that for a C3 acceptance of 15% one obtains a C3 enrichment factor of 6-7, depending on the event sample. That is just the factor by which the C3 enrichment already given by the BCPT is boosted up. The global C3 enrichment factor provided by the BCPT and neural trigger combination is found to be -150, with a statistical error of -30%. due to the limited statistics available for interaction trigger runs (i.e. with the BCPT turned off). The corresponding global C3 accept-
8
8
’
0.1
Fig. 5. Same as in Fig. 3, with C3 event enrichment directly versus the corresponding C3 acceptance.
WA92 1993 RUN RatesAfter Standard Non-Leptonic
WA92 1993 RUN Rates After Standard Non-Leptonic I
0.5 C3 ACCEPTANCE
(0000-FFFF)
ETANN trigger [1,2], whose analog precision is of a few percents. Clearly this neural trigger application is not the best suited to profit of the high precision of the MA 16. The fact is that the relevant input variables one extracts from the 64-bit hit-maps in input from the BCP have integer values ranging from 0 to 16 (decimal), with most of them actually varying only between 0 and 1.
1
,:,
1
Table 3.
0.1
;:
4
1
AC00
C3 event enrichment and acceptance obtained when the trigger threshold on the Fisher trigger output (range: hex) for the WA92 1993 data. Results are shown for three event samples, whose main characteristics are given in
I
:
. .
FISHER THRESHOLD Fig. 3. varying 0-FFFF separate
:
:’
2
l l
E P
n
-I
q
Ofi
-2
3 I
3coo
6000
I
I
8400
I
I
A800
NEURAL THRESHOLD
I
I
I
cc00 (0000-FFFF)
Fig. 4. Same as in Fig. 3 for the neural trigger.
1 FOOO
1 1
0.5 C3 ACCEPTANCE
0.1
Fig. 6. Same as in Fig, 4, with C3 event enrichment directly versus the corresponding C3 acceptance.
plotted
C. Baldanza et al. I Nucl. Instr. and h4eth. in Phys. Res. A 376 (1996) 411-419
ante is - 1%. Its smallness is largely due to the fact that in the off-line event reconstruction C3 vertices are requested to have two tracks with lIPI> pm (using the data from the high-resolution microvertex Decay Detector, not usable for triggering), whereas the BCPT (using only data from the low-resolution vertex detector) requires two tracks with lIPI> 100 pm. The event sample which is eventually selected contains one event with a C3 vertex about every -1 events.
7. Conclusions The MA16 part of the neural trigger module, which operated on-line during the 1993 run of the WA92 experiment, has been completed and applied off-line to the same data on which the ETANN board operated. Differently from ETANN, the MA16 board does not have stability problems to care about and thus its testing does not need real operating conditions. The performance of the MA16 neural trigger is not very different from that of the ETANN trigger in the problem at hand: with a relative neural trigger C3 acceptance of - 15% one obtains a relative C3 enrichment of 6-7, which when combined with the standard BCP trigger yields a total enrichment of - 150 and a selected sample containing one C3 event every -7. The performance is also not very different between a l-layer (Fisher discriminant) and a 2-layer net, which is likely due to the linearly separable nature of the classification problem. The response time of the MA16 board is under 3 j.r,s for a l-layer net and 6 ps for a 24ayer net. This first digital neural trigger applied to a standard high-energy physics experiment shows the viability and the merits of neural trigger technology. The experimental setup for which it operated, dealing with a relatively low energy beam, is not the best suited to show its full potential especially in connection with the 16-bit MA16 microprocessor, since the trigger signature turns out to be a relatively simple one easily handled by a low level neural network. That is expected to change with the high multiplicity events which will have to be sorted out at the LHC experiments.
Acknowledgments We would like to thank U. Ramacher (Siemens AC) for supporting the MA16 board project. We would also like to thank the WA92 Collaboration for hosting this demonstration experiment. One of us, R.O., acknowledges useful conversations with R. Battiti about neural net training
procedures and would like to express his appreciation Ribarics for calling his attention to the MA16.
419
to P
References Ill C. Baldanza
et al. (ANNETTHE Collaboration), Proc. 3rd Int. Workshop on Software Engineering, Artificial Intelligence and Expert Systems for High Energy and Nuclear Physics, eds. K.-H. Becks and D. Perret-Gallix (World Scientific, Singapore, 1990) p. 391. VI C. Baldanza et al. (ANNETTHE Collaboration), Nucl. Instr. and Meth. A 361 (1995) 506. [31 80170 Electrically Trainable Analog Neural Network Data Booklet, Intel Corp., 2250 Mission College Boulevard, Santa Clara, CA 95052.8125, USA. [41 U. Ramacher et al., Proc. Int. Conf. on Systolic Arrays, Killamey, Ireland (Prentice Hall, 1989) p. 277. [51 II. Ramacher, J. Beichter and N. Bruels. MA16 - Programmable VLSI Array Processor for Neuronal Networks and Matrix-based Signal Processing, User Description, Version 1.4, Siemens AG, ZFE T ME2, 81730 Munich. E-mail: from N. Bruels, (available Germany
[email protected]). [61 C. Baldanza et al., Proc. 4th Int. Workshop on Software Engineering. Artificial Intelligence and Expert Systems for High Energy and Nuclear Physics, 199.5, Pisa, Italy, p. 359: also in Int. J. Mod. Phys. C 6 (1995) 567. CERNiSPSC [71 M. Adamovich et al. (WA92 Collaboration), 90-10 (1990); Nucl. Phys. (Proc. Suppl.) B 27 (1992) 251. PI G. Darbo and L. Rossi, Nucl. Instr. and Meth. A 329 (1993) 117. [91 C. Bruschini et al., 1992 Nuclear Science Symposium, Orlando, Florida, 1992, Conf. Record p. 320. [lOI M. Adinolfi et al., Nucl. Instr. and Meth. A 289 (1990) 584. 1111 C. Baldanza et al., University of Bologna report DFUB 95115 (1995). Nucl. Instr. and meth. A, in press. u21 N. Bruels, MA16 - Development Software, User Manual. Version 1.0. Siemens AG, ZFE T ME2, 81730 Munich, Germany. C. Bruschini, University of Genoa Thesis (1992). R.A. Fisher, Annals Eugenics 7 (1936) 179. P. Mazzanti and R. Odorico, 2. Physik C 59 (1993) 273. D.E. Rumelhart, G.E. Hinton and R.J. Williams. Learning internal representations by error propagation, in: Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1, eds. D.E. Rumelhart and J.L. McClelland (MIT, 1986) p. 318. A. Van Ooyen and B. Nienhuis. Neural Networks 5 ( 1992) 465. M. Hoehfeld and S.E. Fahlman, IEEE Trans. Neural Networks 3 (1992) 602. R. Odorico. program NEURAL, University of Bologna report DFUB 95/16 (1995) Comp. Phys. Commun., in press.