Nuclear Instruments and Methods in Physics Research A 443 (2000) 503}509
Energy calibration and particle recognition by a neural network C.M. Iacono Manno , S. Tudisco * INFN-laboratorio Nazionale del Sud, Via S. Soxa 44,95123 Catania, Italy Dipartimento di Fisica dell'Universita% di Catania, Corso Italia 57, 95129 Italy Received 15 July 1999; received in revised form 22 September 1999; accepted 22 October 1999
Abstract In this work a neural network has been used to reconstruct the energy and classify the atomic number of the particles detected in a silicon}CsI, *E!E telescope. The adopted net is described and the whole procedure has been compared with the standard calibration methods for the E stage. 2000 Elsevier Science B.V. All rights reserved. PACS: 07.05.mh; 07.05.kf; 29.40.mc Keywords: Neural Network; Silicon-CsI detector telescope calibration; Atomic number recognition
1. Introduction The great number of detection modules used in the most sophisticated experiments in heavy-ion physics at low and intermediate energy, requires large area and low-cost detectors. Considerable thickness is also necessary to stop the light charged particles emitted with energies up to several hundred MeV. The relatively simple handling and the light output performance of CsI(Tl) crystals coupled with photodiodes [1}3] make them very suitable for sophisticated detector assemblies [4}6] where they are often used as the second stage of a *E!E telescope. In this technique the correlation between the energy loss in the "rst stage (*E) and the residual energy in the second stage (E) is used to identify the atomic number of the incident particle [7].
* Corresponding author. E-mail address:
[email protected] (S. Tudisco).
The main obstacle to their widespread use has been the nonlinearity of their response to highly ionizing charged particles. So a detailed calibration is necessary for each of the nuclear species to be detected over the whole energy range. Literature shows several kinds of parameterization [1,2,8,9] but the choice sometimes is related to the results of the "t procedure and more than one function is needed to cover the whole Z range of detected particles. Another relevant problem is the classi"cation of each detected nucleus. This operation is usually done by a manual procedure, making some graphic cuts, and it is needed for the calibration step and to build conditions for the o%ine analysis. As a consequence, the data analysis requires a long time, particularly in the 4p-detector systems [5]. To make this operation automatic a lot of e!orts have been made [10]. In this work we used a neural network, gathering information from the energy loss in *E stage and the light output (¸) of CsI; the net automatically classi"es the atomic
0168-9002/00/$ - see front matter 2000 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 8 - 9 0 0 2 ( 9 9 ) 0 1 1 6 6 - 3
504
C.M. Iacono Manno, S. Tudisco / Nuclear Instruments and Methods in Physics Research A 443 (2000) 503}509
number Z of the detected particle and reconstructs its residual energy (E). The following section treats some general issues about neural nets; Section 3 shows the net architecture and the learning step; Section 4 discusses general net results and compares the standard calibration procedure to the neural approach.
2. Neural networks: general issues Neural nets are highly parallel computing devices; their architecture reminds one of the brain's structure where a very large number of simple processors (neurons) are tightly connected and simultaneously process information. In the arti"cial models, the number of processors and connections is very small compared to the biological systems. Nevertheless, the most important feature in both cases remains the same: connections lead the performance more than the computing power of each processor, so this approach to the problems of arti"cial intelligence is usually called connectionism [11]. The weight (u ) of the connection between G two nodes is stronger or weaker depending on how strong the correlation is. Each neuron sums up the contributions from all the preceding nodes (I ) and G through an activation function ( f ) "res its output (S), according to [11]
(1) S"f u I #0 . G G G The threshold (0) is used to compensate for data with asymmetric distributions. During the learning step some input vectors are given to the net together with a desired output vector; for each of the input vectors the learning algorithm adjusts the weights in order to match the actual output to the desired output. Both the net architecture and the activation functions must be carefully chosen for each speci"c problem. Sometimes, when direct mapping from the input to the output space is not compatible with a reasonable net complexity, a preprocessing step is used to perform some partial mapping. The most common architecture is the feedforward with the sigmoid activation function; the
backpropagation technique, based on the deepest descent algorithm, is used in the learning step [11]. The applications range from data compression through noise reduction to pattern recognition. In high-energy nuclear physics the neural nets are used to perform on-line control tasks [12] or to act as a fast data trigger when the event pattern is too complex to be treated by the ordinary on-line computing devices. Also, some o!-line applications have been developed for data analysis [13].
3. Net architecture and data normalization In this work the net has to associate two output parameters (residual energy E and atomic number Z) to each instance of an input pattern set made by two parameters (*E and the light output of CsI detector). It is important to note that the two output parameters are quite di!erent: the residual energy E varies in a continuous range while the atomic number Z is an integer assuming, in the studied reaction, about 20 di!erent values. The net was programmed to perform a best "t on the energy parameter and a classi"cation on the Z, which may appear as two di!erent tasks. Actually, the only di!erence consists in the higher degree of accuracy required for the energy so that a best "t may be considered similar to a classi"cation with the highest possible number of classes, one for each allowed output value. In other words, the target outputs are a few separated values for the Z and a continuous distribution for the residual energy outputs. As a result, the better performance of the net is expected for the Z output. A feedforward net [11] was chosen because it is able to associate as input and output two di!erent vectors (ethero } associative net). Other architectures like the Hop"eld nets can only recall among the stored patterns those which are most similar to the input (auto } associative net) and so they cannot be used for best-"t applications. A neural net used for best "t has one node in the input layer for each of the input parameters and one node in the output layer for each of the functions to "t. The Kolmogorov theorem states that a net with only one hidden layer is able to approximate any function in ¸ with whatever accuracy
C.M. Iacono Manno, S. Tudisco / Nuclear Instruments and Methods in Physics Research A 443 (2000) 503}509
[14]. Nevertheless, other hidden layers are often added to reduce the number of nodes with little loss in the accuracy. As the output pattern may have more than one component, a single neural net can perform a multifunction "t, given that the ranges of the input and output data are equal for both functions. This can be achieved by data normalization. In the nets used for classi"cation purposes, the nonlinear regions of the output range correspond to the sure answers, so they are the end-points of the correct classi"cation process leading to sharp yes}no answers. In a best-"t problem, the output values spread on a continuous range and nonlinearity is mainly intended to assure stability; it prevents the data departing from mean values from changing the con"guration of weights. As this problem is more serious in the last layers, these are provided with a sigmoid activation function and the interval for the normalization of the output data is chosen in the linear region of the sigmoid function. The input and the "rst hidden layers use the identity activation function. Fig. 1 shows the adopted net architecture. As a neural net is usually required to generalize its results to the patterns similar to those used for the learning step, the over-training problem must also be taken into account. Clearly, the exact reproduction of the training patterns is in contradiction with the goal of generalization. In this work we used the classical approach: verify another pattern
Fig. 1. Net architecture. Filled nodes correspond to sigmoid activation function; empty nodes correspond to identity activation function.
505
Fig. 2. *E versus light matrix from silicon-CsI(Tl) detector telescope.
set while learning is developing and stop the process when the net performance begins to get worse on the veri"cation set. To perform our analysis we used data from the Ca#Ca reaction at 25 MeV*A collected with the TRASMA apparatus [4]. The single telescope module consists of silicon strip detectors, 300 lm thick, as "rst stage, followed by a 6 cm thick CsI(Tl) crystal (with a photodiode readout) as second stage. All the telescope modules are placed at forward angles. For a more detailed description see Ref. [4]. Fig. 2 shows a typical *E}Light identi"cation scatter plot used in this work. To properly calibrate the *E stage di!erent reference points were used: the elastic scattering point of the Ca#Ca at 25 MeV*A and the maximum energy released in the *E stage for each Z (punch-through). These last points were identi"ed using the *E}Time identi"cation scatter plot. Training was performed with 340 patterns; each pattern consists of four parameters; the energy of *E stage and the light output of CsI are input values; the atomic number Z and the residual energy E are the output values. The latter has been evaluated using an energy loss program, the others are taken from the *E}Light identi"cation scatter
506
C.M. Iacono Manno, S. Tudisco / Nuclear Instruments and Methods in Physics Research A 443 (2000) 503}509
As the latter is the more complex and powerful net, it leads to the best performance, as expected at the cost of a longer training time. This can be proved by computing the percentage relative error for the output parameters. Fig. 3 shows the error distribution of Z and E output for the three di!erent nets and in Table 1 the results of the Gaussian "t are reported. The third con"guration shows the best performance in both symmetry and standard deviation. Nevertheless if a high accuracy is not required, the simpler con"gurations can be used instead, leading to a faster computation. This is the case, for instance, of the on-line applications (data acquisition systems) or Z reconstruction alone.
4. Net results
Fig. 3. Errors distribution for di!erent net arrangements (histogram). Gaussian normal distribution "t (continuous line).
Table 1 Parameters of the Gaussian approximation of the errors distributions for three di!erent nets (see text). For the two output parameters the mean value and the standard deviation are reported. The best values are shown for con"guration (3), for both symmetry and sharpness Net
3 2 1
Z-Output
Energy-Output
Mean value Sigma
Mean value Sigma
!0.045 0.088 0.287
0.3 0.51 0.668
!0.078 !0.27 0.563
Net performance related to the atomic number recognition is shown in Figs. 4 and 5, where all data sets coming from the *E}Light scatter plot of Fig. 2 have been treated. Fig. 4 shows the atomic number Z versus the residual energy output. A linear trend appears for each Z with a good separation between two adjacent values. In order to better evaluate this separation we show the Z-axis projection in Fig. 5. A constant value marks the
1.668 1.718 1.839
plot. About 20 points were considered for each Z. Another set of 250 patterns has been used for veri"cation during the learning step. Three di!erent net organizations have been considered: (1) one hidden layer with 10 nodes; (2) two hidden layers, each with "ve nodes; (3) two hidden layers, each with 10 nodes.
Fig. 4. Atomic number Z versus residual energy E evaluated by the net.
C.M. Iacono Manno, S. Tudisco / Nuclear Instruments and Methods in Physics Research A 443 (2000) 503}509
507
evidence, Fig. 6 shows the light output of CsI (¸ input of the net) versus the residual energy (E output). The separation achieved for all Z values in the whole energy range proves that the net is able to correlate the di!erent spaces of the treated variables. To check the validity of residual energy output a good calibration of CsI detector is necessary. The standard way to perform this, essentially consists of two steps: E selection, for each atomic number, of some points with a well-known energy; E performing a "t using the most appropriate parameterization. Fig. 5. Z-axis projection of the Z versus E matrix.
The "rst step often requires several speci"c runs of measure, also using di!erent kinds of reactions. Alternatively, where high-energy resolution is not required, the *E stage calibration can be used. For this work only the *E stage calibration was available. To determine the parameters covering all the Z values in the whole energy range [8] is di$cult. It is strictly related to the detector's geometrical shape and construction, but it also depends on the accuracy of the known energy reference points. Often some di!erent parameterizations are needed to cover the whole E and Z ranges of the detected particles. In this work, to compare the net results with the standard calibration procedure, several "ts were performed using some of the relationships reported in the literature the [1,2,8,9]. The "t has been performed for each Z and the best chi-square has been found using the following relation [9]: a(Z) E"a(Z)#b(Z)¸# d(Z)#¸
Fig. 6. Correlation between CsI light (net input) and residual energy (net output).
distance between two adjacent peaks and a sharp separation is obtained for the whole Z range. The net uses information contained in both input parameters to determine the output values. As clear
(2)
where a, b,c and d are the free parameters of the "t, E is the residual energy and ¸ the light output of the CsI(Tl). The energy reference points used for the "t are the same as that used for the learning step. Fig. 7 shows the particle spectra obtained from the calibration procedure ("lled histogram) and the particle spectra from the net (normal histogram). A good agreement between the two approaches has been
508
C.M. Iacono Manno, S. Tudisco / Nuclear Instruments and Methods in Physics Research A 443 (2000) 503}509
and the resulting net can be used for the whole set of detectors. The rather good resolution of particle energy spectra suggests a widespread use in o!-line applications. On the other hand, this approach can be also used as a second level on line trigger where a high precision is not required. For instance, the construction of the total energy and/or the total charge is useful to select the multi-fragmentation events [5]. The relative net complexity could suggest a preprocessing step. However, it is very di$cult because a general relationship is not well-known as pointed out for the "t procedure. The energy resolution achieved by the net can be improved using dedicated calibration runs giving more accurate energy reference points for the learning step. This could lead alternatively to a simpler net con"guration or to a more accurate performance compared to the current one.
Acknowledgements Fig. 7. Particle energy spectra for di!erent Z values. Comparison between net ("lled histogram) and energy calibration (normal histogram) results.
found once again. However, the neural approach is more general compared to the "t procedure. In fact, for each charge, the "t procedure gives di!erent values for the "t parameters.
5. Conclusion and outlooks In this paper a new approach based on a neural network to Z recognition and energy calibration has been investigated. The sharp separation obtained in the whole range between two following Z values, makes this technique suitable for automatic selection procedures with no graphic contours, greatly reducing the time required for the o!-line analysis, particularly in the 4p detector systems. If a large number of detectors has the same characteristics, the learning step can be performed only once for one of them
For this work we wish to thank the TRASMA collaboration for the experimental data, and in particular, G. Cardella and A. Musumarra for their helpful suggestions. We also thank Mrs. B. Parker for her careful reading of this manuscript.
References [1] C.J.W. Twenhofel et al., Nucl. Instr. and Meth. B 51 (1990) 58. [2] D. Horn et al., Nucl. Instr. and Meth. A 320 (1992) 273. [3] S. Aiello et al., Nucl. Instr. and Meth. A 369 (1996) 50. [4] A. Musumarra et al., Nucl. Instr. and Meth. A 370 (1996) 558. [5] S. Aiello et al., Nucl. Phys. B 583 (1995) 461c. [6] D. Drain et al., Nucl. Instr. and Meth. A 221 (1989) 528. [7] G.F. Knoll, Radiation Detection and Measurement, Wiley, New York, 1989. [8] P.F. Mastinu et al., Nucl. Instr. and Meth. A 338 (1994) 419. [9] B. Ocker, pp-Korrelationen in der Reaktionen Au#Au bei einschubenergien von 100,150 und 200 AMeV, Ph.D Thesis, DISS 98-22, November 1998, GSI. [10] S. Aiello et al., Microprocess. and Microsystems 22 (1998) 111.
C.M. Iacono Manno, S. Tudisco / Nuclear Instruments and Methods in Physics Research A 443 (2000) 503}509 [11] Y.H. Pao, Adaptive Pattern Recognition and Neural Networks, Addison-Wesley, Reading MA, 1989. [12] I. D'Antone, IEEE Trans. Nucl. Sci. NS-58 (1992) 39.
509
[13] C. David et al., Phys. Rev. C 1453 (1995) 51. [14] R. Hecht-Nielsen, Neurocomputing, Adaptive Pattern Recognition and Neural Networks, Addison-Wesley, Reading MA, 1989.