Nuclear Instruments and Methods in Physics Research A 440 (2000) 431}437
Constrained minimization in the C# # environment S.N. Dymov, V.S. Kurbatov, I.N. Silin, S.V. Yaschenko* Laboratory of Nuclear Problems, Joint Institute for Nuclear Research, 141980 Dubna, Russia Received 11 December 1998; received in revised form 24 June 1999 accepted 6 July 1999
Abstract On the basis of the ideas, proposed by one of the authors (I.N. Silin), a suitable software has been developed for constrained data "tting. Constraints may be of arbitrary type; i.e. equalities and inequalities. The simplest possible way has been used. The widely known program FUMILI was re-written in the C## language. Constraints in the form of inequalities /(h)5a were taken into account by changing them into equalities /(h)"t and simple inequalities of type t5a. The equalities were taken into account by means of quadratic penalty functions. The suitable software was tested on the model data for the ANKE setup (COSY accelerator, Forschungszentrum JuK lich, Germany). ( 2000 Elsevier Science B.V. All rights reserved. Keywords: FUMILI; ANKE; Codes
1. Introduction In the present paper we describe two realizations (in C## language) of constrained minimization for s2-like functionals. One of them is the algorithm of the FUMILI code, which was available for users as a part of CERN library [1]. The description of this algorithm was published in Russian [2] at the end of the 1960s. Due to the fact that the access to this publication is not easy for an English reader, we give a short description of the FUMILI algorithm. This algorithm is now coded in the C## language. The second part is the realization of the idea proposed by one of the authors (I.N. Silin) for solving the constrained minimization problem in a general case, where constraints are of arbitrary * Corresponding author.
type (arbitrary equalities and inequalities) [3]. Technically, here constraints are taken into account by the method of penalty functions (though there are other ways of doing this [3]). The algorithm described below was tested on the model data for the calibration process ppPdp` under the conditions of the ANKE setup [4].
2. Algorithm of FUMILI For simplicity, let us assume that the function to be minimized has the form1
A
B
1 n f (x , h)!F 2 j s2" + j j 2 p j j/1
(1)
1 What follows can be easily generalized to the case where the covariance matrix of the data F has non-diagonal terms. j
0168-9002/00/$ - see front matter ( 2000 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 8 - 9 0 0 2 ( 9 9 ) 0 0 7 5 8 - 5
432
S.N. Dymov et al. / Nuclear Instruments and Methods in Physics Research A 440 (2000) 431}437
where f (x , h) are the measured functions at the j j points x , F are the measured values, p are their j j j errors, and h are parameters to be estimated. The minimum condition is Ls2 n 1 Lf "+ ) j [ f (x , h)!F ]"0, j Lh p2 Lh j j i i j/1 j
i"1,2, m (2)
where m is the number of parameters. Expanding the left hand side of Eq. (2) in parameter increments and retaining only linear terms we get
A B
A
B
Ls2 L2s2 #+ ) (h !h0)"0. k k Lh h h0 Lh Lh h h0 i / i k / k Here h0 is some initial value of parameters. In a general case L2s2 n 1 Lf Lf n ( f !F ) L2f j ) j . "+ ) j ) j #+ j Lh Lh p2 Lh Lh p2 Lh Lh i k j/1 j j i k j/1 i k (3)
In the FUMILI algorithm an approximate expression (3) for L2s2/Lh Lh is used in which the last i k term is discarded (it is often done, not always wittingly, and sometimes causes trouble), i.e.: L2s2 n 1 Lf Lf +Z " + ) j ) j. ik Lh Lh p2 Lh Lh i k i k j/1 j As a result the equations for parameter increments have the following form:
A B
Ls2 #+ Z ) (h !h0)"0, i"1,2, m. ik k k Lh h h0 i / k A remarkable feature of the algorithm is the technique used for step restriction. For the current approximation of the optimal parameters h0 a parallelepiped P is built with a centre at h0 and axes 0 parallel to the coordinate axes h . The lengths of the i parallelepiped sides along the i-th axis are 2 ) b , i where b have such values that the functions f (h) i j are quasi-linear all over the parallelepiped. If the step *h gives a new point h1"h0#*h outside P , 0 the crossing h1 of the vector *h with the surface of P is found and taken as a new value for the 0 parameters. After selection of a new value for the parameters, it is checked whether the function
reduction is big enough compared with that expected in the quadratic approximation. If it is not, the step reduction is performed. Some parallelepiped lengths can be increased too. In addition, FUMILI takes into account simple linear inequalities such as (4) h.*/4h 4h.!9. i i i They form a parallelepiped P (P may be deformed 0 by P). If the value of the parameter coincides with its restriction (the approximation lies on the surface of P) and the gradient component is such that s2 is not going to increase beyond P, the corresponding parameter is "xed. Then the step is calculated for all non-"xed parameters and if some parameters, lying on the surface of P, go beyond P, one of them is temporarily "xed too (the parameter, for which the ratio D*h D/J(Z~1) is maximal), and so on. i ii The criterion to be ful"lled for the end of the iteration process is that all parameters are "xed due to only the gradient component signs and step increments for non-"xed parameters D*h D(e ) J(Z~1) i ii where e is a small number &0.01. As the number of "xation combinations is "nite, the number of steps will also be "nite, at least in the convex quadratic case. Very similar step formulae are used in FUMILI for the negative logarithm of the likelihood function with the same idea of linearizing the functional argument. 3. Minimization of s2 functionals with arbitrary constraints 3.1. Formulation of the problem Again, let us assume that the function to be minimized has the same form (1), but in addition to simple linear constraints (4) there are two more types of constraints, i.e. non-linear inequalities and equalities: a 4/ (h)4b , r"1,2, m r r r d t (h)"c , s"1,2, m . s s e
(5) (6)
S.N. Dymov et al. / Nuclear Instruments and Methods in Physics Research A 440 (2000) 431}437
Here / (h), t (h) are the regular functions of the r s parameter h; a , b , m are the low and upper r r d boundaries of the inequalities and their number; c , m are any constant and number of equations. s e The regularity is taken to mean continuous second-order derivatives. The problem of taking into account the constraints in the form of the equalities of type (6) was solved before [5}7]. As for the constraints in the form of inequalities (5), the authors did not know a simple solution until one of them (I.N. Silin) proposed a method for taking them into account [3]. According to Ref. [3], any constraint of the form a 4/ (h)4b can be rer r r placed by a simple inequality and equality, namely a 4t 4b (7) r r r / (h)"t . (8) r r Here t is an additional variable constrained by two r boundaries a , b , (8) is a constraint in the form of r r the equation. You can see that constraints (7) have the same form and structure as those of Eq. (4), so we can combine them and introduce just one type of simple constraint: h.*/4h 4h.!9 i i i where index i changes in a wider range: i"1,2, m,2, m#m and for i'm d h "t , h.*/"a , h.!9"b . i i~m i i~m i i~m Then the problem of constrained minimization in a general case can be reformulated as follows. We "nd a minimum of function (1) under the constraints: (9) h.*/4h 4h.!9, i"1,2, m,2, m#m i d i i m (h)"d , u"1,2, m ,2, m #m (10) u u d d e where for 14u4m , m "/ , d "t and for d u u u u m (u4m #m , m "t d , d "c d . d d e u u~m u u~m After such reformulation the number of the parameters to be "tted and of the constraints in the form of simple inequalities of type (9) becomes m#m ; d and the number of constraints in the form of equations of type (10) becomes m #m . d e When non-simple constraints are only equations they can be taken into account by using either the method proposed in Ref. [7] or the penalty function method. Here we use the latter. In the penalty function method a minimum of function (11) is
433
searched for as ¹PR.
A
B
1 n f (x , h)!F 2 j U" + j j 2 p i j/1 md (/ !h )2 me (t !c )2 1 r`m # + s s . # ¹ + r p2 p2 2 r s r/1 s/1 (11)
A
B
Here ¹ is the penalty factor (normally it is a su$ciently big number), and p , p are the formally r s calculated errors of constraints. 3.2. Iteration scheme Let us rewrite Eq. (11) in the form 1 md (/ !h )2 r`m U"U # + r 1 2 w r r/1 where
A
B
1 n f (x , h)!F 2 1 me (t !c )2 j # ¹+ s s U " + j j 1 2 p p2 2 s j j/1 s/1 and w "p2/¹. With a chosen ¹ the minimum r r condition is LU LU md L/ (/ !h ) r) r r`m "0, " 1# + Lh Lh Lh w k k r r/1 k (k"1,2, m)
(12)
LU (/ !h ) r`m "0, (r"1,2, m ). (13) "! r d Lh w r`m r In both Eqs. (12) and (13) derivatives are taken only for those parameters which are not "xed, i.e. kOi , r#mOi , where i is the index of a "xed & & & parameters. The functions on the left-hand sides of Eqs. (12) and (13) depend on the m#m parad meters. Near the minimum we can expand the left-hand sides of the equations in parameter increments retaining only linear terms. For Eq. (12) we have
C
D
LU md L/ (/ !h ) 1# + r) r r`m Lh Lh w k r r/1 k d L2U m m L/ L/ 1 1 #+ r) r) #+ ) dh 1 Lh Lh Lh Lh w k 1 k 1 r l/1 r/1 md L/ dh r ) r`m "0. !+ Lh w r r/1 k
C
D
(14)
434
S.N. Dymov et al. / Nuclear Instruments and Methods in Physics Research A 440 (2000) 431}437
We wrote Eq. (14) in the approximation of the functional argument linearization method [8], in which the derivatives L2//Lh Lh are discarded. All k l values of functions and derivatives are taken for the current values of the parameters. Let us also note that index l (lOi ) in the second term runs over & indices of non-"xed parameters. By analogy, for Eq. (13). m L/ r ) dh !dh [/ !h ]# + "0. (15) r r`m l r`m Lh l l/1 From Eq. (15), for the non-"xed parameter h (r"1,2, m ) we have r`m d m L/ r dh . dh "[/ !h ]# + (16) r`m r r`m Lh l l/1 l Substituting Eq. (16) into Eq. (14) and after some algebra we obtain m G # + Z ) dh "0 k kl l l/1 where
C
(17)
D
LU md L/ (/ !h ) r) r r`m G " 1# + k Lh Lh w k k r r/1 L2U md L/ L/ 1 1 #+ r) r) . Z " kl Lh Lh Lh Lh w k l r/1 k l r A remarkable feature of the last expressions is that the index l runs only over non-"xed parameters l"1,2, m, and the index r runs only over those inequalities for which additional parameters t (7) are "xed! r Finally, the solution of Eq. (17) is dh"!(Z~1 ) G). The increments of the additional parameters dt "dh are calculated according to formula r r`m (16). The advantage of this iteration scheme is that the matrix inversion only of order m]m is done irrespective of the number of constraints. The scheme can be improved in order to avoid the in#uence of non-linear e!ects of permanently valid inequalities. To do this h which is neither r`m on low nor upper boundary is substituted by the
quantity t (h) (or its boundary value if the former r lies outside the boundary) after every iteration step.
4. Test Both realizations described above are coded in C## and tested on the model data for the calibration reaction ppPdp` under the conditions of the ANKE setup [4]. According to the plans, ANKE will consist of three detectors: a side detector, forward and backward ones. At the moment the side detector is fully assembled, only a scintillation hodoscope is ready for the forward detector. The side detector consists of two scintillation hodoscopes (START, STOP), and two proportional chambers each consisting of three sensitive planes. It allows one to reconstruct all the kinematic parameters of the ejectiles passing through the side detector. The scintillation hodoscope incorporated in the forward detector is capable of measuring the coordinates of the particle and its time of #ight. The "rst data were obtained in May and July 1998, the accuracies being studied. As the main calibration process required for the analysis of detector performance is the reaction ppPdp`, we took this process for the tests. A number of events was simulated for the beam kinetic energy ¹ "425 MeV with the p` meson passing "%!. through the side detector and the deuterons passing through the scintillation hodoscope of the forward detector. Simulation was done by the GEANT code with all physical processes switched on except the decay of p` mesons. In the case where the kinematic parameters of the beam proton and secondary p` are known, there is one constraint having the form of an equality, namely, the missing mass of the process should be equal to the mass of the deuteron: (E #M !E ` )2!( p !p ` )2"M2 (18) "%!. 1 p $ "%!. p where E , E are the energies of the beam pro"%!. p` ton and the secondary p` meson, p , p are "%!. p` their 3-momenta, M , M are the masses of the 1 $ proton and the deuteron, respectively. As noted above, for the deuterons detected by the forward hodoscope, their coordinates and times of #ight (t ) will be measured hopefully with the $
S.N. Dymov et al. / Nuclear Instruments and Methods in Physics Research A 440 (2000) 431}437
accuracies permitting 4c "t (using all 4 conservation laws). As at the moment not all accuracies are known, we assume that their coordinates and times of #ight lie between some boundaries and put forward requirements having the form of the following three inequalities: y 4y 4y , z 4z 4z , .*/ d .!9 .*/ d .!9 t 4t 4t . .*/ $ .!9
(19)
The "rst two requirements come from the geometrical dimensions of the scintillation hodoscope, whereas the last one does from the simulation data. Three functions y , z , t were expressed as the ones d d d (in the form of polynomials to the third-order inclusive) of two angles of the pion in the laboratory system of coordinates. The total number of "tted parameters was six, the "rst three are angles h , h of the pion relative xz yz
435
to the beam proton and the pion momentum in the laboratory system. The last three parameters were additional parameters t , corresponding to three r inequalities (19). Initial pion angles were always 0, and initial momenta were calculated as a function of these angles. The coordinates of the pion detected in the side detector were expressed as functions of three pion variables, i.e., two angles and momentum. The total number of events was &3000, the maximum number of iterations was 40. Two "ts corresponding to two di!erent realizations, described above, were performed. In the "rst "t the constraint of the form of non-linear equation (18) was disabled, while in the second one it was enabled. In Fig. 1 the accuracies for both realizations are shown. Figs. 1(a)}(c) are for the "rst "t, Figs. 1(d)}(f) are for the second one. It is necessary to stress a drastic improvement of accuracy in *p/p in the second case, which is the result of additional constraint.
Fig. 1. Accuracies of determining the particle kinematic parameter.
436
S.N. Dymov et al. / Nuclear Instruments and Methods in Physics Research A 440 (2000) 431}437
Fig. 2. Illustration of the Richardson approximation.
Each event was "tted to three values of the penalty factor ¹. The initial value of this factor was selected by using the formula n ¹"100 ) %91 n #0/ where n is the number of experimental points, %91 and n is the number of constraints. In our case #0/ n "6, n "4. %91 #0/ Each successive value of ¹ was ten times larger than the previous one. According to Ref. [3], in this case, according to Richardson, we should have the convergence, i.e. the parameter improvements * "p !p should be 10 times smaller than 32 3 2 * "p !p . Here p , p and p mean the values 21 2 1 1 2 3 of the "tted parameter for ¹ equal to ¹ , ¹ and 1 2 ¹ , respectively. 3
In Fig. 2 the ratios *h32/*h21, *h32/*h21, yz yz xz xz *p32/*p21, *(s2)32/*(s2)21 are shown. It is seen that they are close to 10, and this indicates that the statements made in Ref. [3] are correct.
5. Conclusion Two codes are developed for the minimization of s2-like functionals in the C## language. One of them is realization of the FUMILI code with constraints of the form of simple boundaries. The second one is the minimization with constraints of any type. With FUMILI as a starting point, the C## code is developed and tested on model data. The results of the test show a high performance of the algorithms developed. In conclusion, the authors express their gratitude to their colleagues from the
S.N. Dymov et al. / Nuclear Instruments and Methods in Physics Research A 440 (2000) 431}437
ANKE collaboration for providing the necessary details and to L. Pashkevich for her help in preparing an English version of the paper. References [1] I.N. Silin, CERN Program Library D 510, FUMILI, 1983. [2] I.N. Silin, Appendix III, Statisticheski metodi v eksperimentalnoi "zike, Atomizdat 1976 (translated into Russian from W.T. Eadie et al., Statistical Methods in Experimental Physics, CERN, Geneva, 1971, North-Holland, Amsterdam).
437
[3] I.N. Silin, Resolute progress in a constrained minimization problem, JINR Rapid Communications, No 3 [89]-98, pp. 25}30. [4] Forschungszentrum JuK lich, Germany, Annual Report 1996 (all the details about ANKE setup and related bibliography can be found in this report). [5] J.P. Berge, F.T. Solmitz, H.D. Taft, Rev. Sci. Instr. 32 (1961) 538. [6] R. Bock, CERN 60-30 (1960). [7] V.S. Kurbatov, I.N. Silin, Nucl. Instr. and Meth. A 345 (1994) 346. [8] S.N. Sokolov, I.N. Silin, Preprint JINR D-810, Dubna, 1961.