Applied Surface Science 149 Ž1999. 97–102
Unfolding positron lifetime spectra with neural networks a I. Pazsit , R. Chakarova ´ a
a,)
, P. Linden ´ a, F.H.J. Maurer
b
Department of Reactor Physics, Chalmers UniÕersity of Technology, 412 96 Goteborg, Sweden b Department of Polymer Technology, Goteborg, Sweden
Abstract A new method for unfolding mean lifetimes and amplitudes as well as lifetime distributions from positron lifetime spectra is suggested and partially tested in this paper. The method is based on the use of artificial neural networks ŽANNs.. By using data from simulated positron spectra, generated by a simulation program, an ANN can be trained to extract lifetimes and amplitudes as well as their distributions from a positron spectrum as an input. In principle, the method has the potential to unfold an unknown number of lifetimes and their distribution from a measured spectrum. So far, only a proof-of-principle type preliminary investigation was made by unfolding three or four discrete lifetimes. These investigations show that the task of designing a proper and efficient network is not trivial. To achieve unfolding a number of distributions requires both careful design of the network as well as long training times. In addition, the performance of the method in practical applications is depending on the quality of the simulation model. However, the chances of satisfying the above criteria appear to be good. When appropriately developed, a trained network could be a very effective and efficient alternative to the existing methods, with very short identification times. q 1999 Elsevier Science B.V. All rights reserved. Keywords: Artificial neural networks ŽANNs.; Amplitudes; Simulation model
1. Introduction Determination of mean lifetimes and their distributions, as well as corresponding amplitudes from positron lifetime spectra is traditionally achieved by algorithmic procedures such as POSITRONFIT w1x, CONTIN w2–4x and MELT w5–8x. These programs use an algorithm based on some kind of statistical minimisation principle. Another possibility is suggested and partly demonstrated in this paper, namely the use of artifi-
)
Corresponding author. Tel.: q46-31-772-3086; Fax: q46-31772-3079; E-mail:
[email protected].
cial neural networks ŽANNs.. By using data from simulated positron spectra, generated by a simulation program, an ANN can be trained to extract various parameters, such as lifetimes and amplitudes as well as their distributions from a positron spectrum as an input. This identification is done transparently, by a non-parametric algorithm. One advantage of this method is, as usual for ANNs, that once the network is trained, the identification procedure is very fast Žon the order of milliseconds. which may be an advantage in case many spectra are to be processed. In its most advanced form, the number of lifetimes or distributions does not need to be known in advance, it can be determined by the program itself. However, experience so far shows that to achieve the same flexibility as with MELT, i.e., identifying an
0169-4332r99r$ - see front matter q 1999 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 9 - 4 3 3 2 Ž 9 9 . 0 0 1 8 0 - 4
98
I. Pazsit ´ et al.r Applied Surface Science 149 (1999) 97–102
unknown number of distributions, the network and the training must be very carefully designed otherwise the training may require prohibitively large CPU times Žon the order of weeks or months.. The principles described above are partly demonstrated in the paper. A simulation program, originally developed elsewhere, was used for both the training of the network and for testing the performance of the trained network. The work was started out by selecting the simplest possible case of determining three discrete lifetimes and amplitudes. Already here the need for long training times was noticed. In a pilot study the required level of precision was set to a few percents in t 3 , and it was demonstrated how this can be achieved. One further step was the investigation of having either three or four discrete lifetimes and train the network to recognise both the number of lifetimes as well as the intensities and lifetimes. A significant increase of training time was noticed. Determination of the lifetime distribution was investigated by trying to identify mean lifetimes and their variances but as of this writing the study is not yet conclusive. In the following, the above investigations will be described in detail.
actual analytical relationship connecting the parameters with the distribution need not exist or be known at this stage, only numerical values of the training samples Žinput and output.. In case of positron annihilation lifetime spectroscopy ŽPALS., the measured quantity is a lifetime spectrum, consisting of three Žor more. exponentials. As described in several publications w7,8x, one component of a measured spectrum y Ž t . can be written in the form y Ž t . s N0 R Ž t .
U
`
H0
f Žt .
t
t
ž /
exp y
t
dt q B
Ž 1.
Here, RŽ t . is the detector resolution function, usually assumed to be approximately Gaussian with which the measured signal is convoluted, indicated by the asterisk. B is a random background, and f Žt . is the positron lifetime profile. If there are only discrete lines in the lifetime profile, it can be represented as 3
f Žt . s
Ý Ii d Ž t y t i .
Ž 2.
is1
2. General principles Determining parameters of a distribution from measured values of the distribution is a so-called inverse task. It is very seldom that an inverse task can be solved analytically in an exact manner by inverting the relationship. Even if so, the inversion is seldom unique, thus some extra criteria need to be used to select the true solution. In addition, a measured distribution often contains fluctuations and measurement errors, thus, the inversion also needs to be the most likely one and needs to be optimised in some sense. Artificial neural networks ŽANNs. represent a powerful method for dealing with such problems w9x. In particular, they are very effective in extracting parameters from a distribution which is a non-linear function of the searched parameters, and which also contains noise. In order to achieve this, the network needs to be trained on a large number of so-called training samples, i.e., realisations of the distribution where the searched parameters are also known. An
where Ii are the intensities and t i the lifetimes. In more general cases any of the three lifetimes can exhibit a variation around the mean value, e.g., a narrow Gaussian distribution instead of the form given in Eq. Ž2., i.e., 3
f Žt . s
Ý is1
Ii
(2ps
exp y i
Žt y ti . 2 si 2
2
Ž 3.
Also, as mentioned above, in certain cases a fourth lifetime constant t4 can be present. The principle of applying neural networks for determining the searched intensity and lifetime parameters form a PALS spectrum is to use a large number of spectra with known intensities and lifetimes for training the network. Such data used for the training are called training samples or patterns. The intensities and lifetimes must be different for the training samples, moreover need to cover the regions of possible values of the these parameters in the application of the trained network, i.e., when applied to a measurement. For obvious reasons, it is imprac-
I. Pazsit ´ et al.r Applied Surface Science 149 (1999) 97–102
99
Table 1 Results from the training of an a ANN. The number of output–input training patterns is 575 and the number of test patterns is 50. The desired target limit for the root-mean-square error was 5%
Training range Average recall error Ž%.
I1%
t 1 Žns.
I2
t2
I3
t3
10–40 y11.2
0.1–0.16 y4.1
30–60 7.3
0.25–0.44 y1.1
20–45 y0.5
1.6–4.0 1.0
tical to use measurements as training samples. however, the samples can be much easily generated by a simulation program. Such programs have also been used in the past to test various algorithms w7,8x. 2.1. The PALS spectrum generator The simulated lifetime spectra were generated with an algorithm based on Eqs. Ž1. and Ž2.. The range of the intensities and lifetimes was varied around the mean values I1 s 0.25, t 1 s 0.130 ns, I2 s 0.45, t 2 s 0.35 ns, I3 s 0.32, t 3 s 2.5 ns. The range of variation of the parameters is described in detail in Table 1. The total number of counts was taken to be 10 million. In principle, this parameter could also be varied during the training, so as to make the trained network to be able to interpret data from various measurements with different number of counts. However, in this first pilot study we only check the algorithm with simulated data even in the recall phase, and can chose the number of counts to be 10 million also for the test patterns. In a later phase an investigation of the influence of the number of counts on the training and recall can also be investigated. Finally, a background and a Poisson noise is added to the simulated counts in each channel. A channel width of 23.2 ps was chosen similarly to other reported simulation studies and measurements. We have actually employed a program used by several other groups before, and installed and used on a UNIX Sparcstation. An example of a simulated spectrum is shown in Fig. 1.
We have used a simple three-layered feed-forward network with backward error propagation. The principles of such networks are described in standard textbooks and review articles w9x. A layout of the network is shown in Fig. 2. The number of the input and output nodes is defined by the number of input and output data; the number of the nodes in the hidden layer is a free parameter. Its optimal value in each concrete case is usually found by trial and error. In our case, the input of the network consists of the PALS spectrum data, thus it is equal to the number of channels. In this work, up to 1500 channels were used. The number of output nodes is equal to the number of parameters to be determined. In Fig. 2, this is equal to six, i.e., three intensities and corresponding time constants. The number of nodes in the hidden layer was chosen to be 40. In the training 575 samples, i.e., different spectra, all with different intensity and lifetime values, were used. As usual, these spectra were used in a random sequence repeatedly until the root-mean-square
2.2. The network structure and training In what follows, we describe the network that can be used to determine three intensities and three discrete lifetimes as defined by Eq. Ž2.. The extensions of the procedure to include four lifetimes andror lifetime distributions will be briefly discussed later.
Fig. 1. A simulated positron lifetime spectrum.
100
I. Pazsit ´ et al.r Applied Surface Science 149 (1999) 97–102
Fig. 2. Structure of the implemented neural network for the case of three lifetimes and intensities.
network performance. In Table 1, the relative standard deviation is given, as calculated from the results of all 50 test samples. In Fig. 3, the relative error is shown for each test pattern as a function of the parameter to be determined. Since it is primarily the lifetimes which contain the physically interesting information, only the errors regarding the lifetimes are shown. It is seen that the relative standard deviation is larger than that one can obtain from traditional algorithms. This is mostly a consequence of the large r.m.s. error set in the training, that was made in order to reduce training time. With longer training times the accuracy can be no doubt brought down to any desired level that is comparable with current methods. Work is in progress in this direction. Both application of quicker training methods as well as parallelising of the algorithm is planned. 2.4. Possible extensions of the algorithm
Žr.m.s.. network error decreased below the desired limit of 5%. The r.m.s. error is defined as the squared sum of deviations of each actual node output from the correct Ž‘target’. value, as summed for several training patterns. The requested r.m.s. value of the training will decide the precision of the network, which will have a precision comparable to the r.m.s. value of the training. It is seen that in these exploratory tests, we have used an r.m.s. value which leads to worse precision than the usual traditional algorithms. The reason for this is that to achieve better precision, i.e., require smaller r.m.s. value, would require many more training samples, more training cycles, and consequently much longer training times. Already at this stage it was clear that, mostly due to the large size of the network, i.e., 1500 input nodes, the training times are quite long. After the training was completed, i.e., the r.m.s. value decreased below 5%, another 50 samples were produced by the PALS generator which were then used to test the performance of the network. 2.3. Results The results are summarized in Table 1 and Fig. 3. In both cases, the relative error is used to describe
2.4.1. Determination of four lifetimes Constructing a network to identify four different intensities and lifetimes requires only a trivial change in the network, i.e., to use eight output nodes instead of six, and possibly a modification of the number of hidden nodes. Also, the training program is to be used with four terms. In fact we performed this study with similar conclusions as above, thus the results will not be demonstrated numerically. One convenience would be to use the algorithm with an unknown number of lifetimes so that one does not need to select the number of components in advance when running the algorithm. One way to achieve this is to train a network to determine the number of lifetimes as well as the intensities and lifetimes in the same step. The network capable of doing this has a structure similar to that in Fig. 2 with the difference that it contains nine output nodes. The first node gives the number of lifetimes found, i.e., three or four; the rest gives the intensities of and lifetimes of the components identified. In case there are only three components, the content of the last nodes is irrelevant, and can thus be set to zero by the network itself. The training of this network naturally requires simulated data containing both three and four life-
I. Pazsit ´ et al.r Applied Surface Science 149 (1999) 97–102
101
Fig. 3. Individual identification errors of the lifetimes by the trained network.
times. This will require more training patterns and thus also longer training cycles and training times. In addition, the accuracy of the unfolded lifetime values can in principle deteriorate, and the possibility of faulty identification Židentifying only three lifetimes when in reality there are four and vice versa. also exists. The applicability of identifying both the parameters and the number of lifetimes by neural networks has also been investigated by simulations as in the previous case. The results can be summarised as follows. Regarding the accuracy of the intensities and the lifetimes, a similar accuracy was achieved than in the previous case of three lifetimes. However, there is also an uncertainty whether the number of lifetimes was determined correctly or not. In this regard, there was also a failure rate in the same range as the accuracy of the quantitative values of the parameters themselves. That is, in about 18% of the
identifications, the number of components was determined incorrectly. Again, this is mostly a result of the high r.m.s. error required in the training. 2.4.2. Determining lifetime distributions Determining distributions through ANN identification requires further extension of the network structure and training. In the most general case, one could attempt to identify the distributions of an undefined number of lifetimes similarly to MELT works. Then, the value of each output node should correspond to the probability of a lifetime at the given lifetime value associated with the node. The number of nodes is then defined by the number of lifetime values in which such a distribution is searched for. This would entail several hundreds of output nodes. The size of such a network would with current techniques and CPU times be impossible to use in practice. Using such a network would only be
102
I. Pazsit ´ et al.r Applied Surface Science 149 (1999) 97–102
possible with ingenious speeding up of the training and using parallel computing techniques throughout. A more modest objective would be, e.g., determination of the width of the lifetime distributions in addition to the mean values. This requires primarily the extension of the network with nodes that correspond to the extra parameters. To make a simplified study, in this case we only assumed three components, out of which the first two lifetimes were fixed and only t 3 was assumed to have a finite width. The network structure shown in Fig. 2 was appended with one more node, the value of which gives s 3 . During the training, samples with a finite fwhm of t 3 were used according to Eq. Ž3.. In the test, new samples generated with the same algorithm were used. This work is in progress and results will be reported soon. So far the algorithm was able to determine the mean values with the same precision as before Ža few percent inaccuracy. but gave no useful values for the variance Žwidth..
3. Conclusions An extrapolation of the results shown here indicates that what concerns the determination of intensities and simple lifetimes, the ANN-based unfolding procedure can works very reliably. It can also be used to extract either three or four lifetimes without prior knowledge on the number of lifetimes present
in the measurement. Determination of lifetime distributions has not been as successful yet as the maximum entropy program MELT. Practically in the preliminary tests so far only the mean lifetime was possible to determine whereas the width of the lifetime distribution was not given correctly. The largest benefit of an ANN-based unfolding procedure is expected from its extreme speed in the identification. Whereas it takes a very long time to train a network sufficiently well, one identification with a trained network takes running times in the order of milliseconds. Thus if a large number of unfolding procedures need to be made, as it may be the case with depth profiling, the ANN-based identification procedure may show significant advantages. References w1x M. Eldrup, D. Lightbody, J.N. Sherwood, Chem. Phys. 63 Ž1981. 51. w2x S.W. Provencher, Comput. Phys. Commun. 27 Ž1982. 213. w3x R.B. Gregory, Y. Zhu, Nucl. Instrum. Methods A 290 Ž1990. 172. w4x R.B. Gregory, Nucl. Instrum. Methods A 302 Ž1991. 496. w5x L. Hoffmann, A. Shukla, M. Peter, B. Barbiellini, A.A. Manuel, Nucl. Instrum. Methods A 335 Ž1993. 276. w6x A. Shukla, M. Peter, L. Hoffmann, Nucl. Instrum. Methods A 335 Ž1993. 310. w7x C.L. Wang, F.H.J. Maurer, Macromolecules 29 Ž1996. 8249. w8x C.L. Wang, T. Hirade, F.H.J. Maurer, M. Eldrup, N.J. Pedersen, J. Chem. Phys. 108 Ž1998. 4654. w9x I. Pazsit, M. Kitamura, Adv. Nucl. Sci. Techn. 24 Ž1996. 95. ´