European Journal of Operational Research 111 (1998) 555±568
Theory and Methodology
Linear regression estimators for multinormal distributions in optimization of stochastic programming problems Istv an De ak
1,2
Operations Research Group, Department of Dierential Equations, Technical University of Budapest, H-1111 Budapest, XI. Muegyetem rkp. 3., Hungary Received 15 October 1996; accepted 11 July 1997
Abstract Several linear regression estimators are presented, which approximate the distribution function of the m-dimensional normal distribution, or the distribution function along a line. These regression estimators are quadratic functions, or simple functions of quadratic functions and can be applied in numerical problems, arising during optimization of stochastic programming problems. A root ®nding procedure is developed, that can be used to ®nd the intersection of a line and the border of the feasible set. Directional derivatives and gradient of the normal distribution can be computed. Some numerical results are also presented. Ó 1998 Elsevier Science B.V. All rights reserved. Keywords: Probability constrained programming; Multinormal distribution; Regression estimators; Quantile problem; Gradient of the normal distribution
1. Introduction The computation of the values and some related quantities of the distribution function of the m-dimensional normal distribution function is frequently required in optimization procedures of stochastic programing, when the random variables involved have a joint normal distribution [16,17,20]. The research presented here was mainly motivated by numerical diculties in optimization of the STABIL model [21]. Other areas of possible applications are reliability theory, engineering design and genetics [1,2,8,12,13,15]. The distribution function of the m-dimensional normal distribution with expectation 0 and correlation matrix R is
1 This work was partly supported by grants from the National Scienti®c Research Fund, Hungary, T13980 and T16413, and the ``Human Capital and Mobility'' of the European Union ERB CJHR XCT 930087. 2 Address for correspondence: Universitat Zurich, Institut fur Operations Research, Moussonstrasse 15, Zurich, Switzerland. E-mail:
[email protected] and
[email protected]
0377-2217/98/$19.00 Ó 1998 Elsevier Science B.V. All rights reserved. PII S 0 3 7 7 - 2 2 1 7 ( 9 7 ) 0 0 3 6 0 - 3
556
I. De ak / European Journal of Operational Research 111 (1998) 555±568
Zh1 U
h
Zhm
ÿ1
2p ÿ1
ÿm=2
jRj
ÿ1=2
exp
1 ÿ1 ÿ zR z dz; 2
1
where we assume, that the distribution is nondegenerate (R is positive de®nite), and all standard deviations are equal to 1 (nonstandard cases can be dealt with simple transformations). Conventional numerical integration techniques and series expansion methods ± due to the dimensional eect ± fail in higher dimensions (m > 5). However there are three main Monte Carlo methods for eciently evaluating Eq. (1) in higher dimensions. The ®rst one, called the method of orthonormalized estimators was presented by Deak [3] (see also [4,5,7]) and can compute distribution function values U
h with three accurate digits up to 50 dimensions. Another technique, based on Boole±Bonferroni inequalities was proposed by Szantai [23] (see also [19,20]). Finally Gassmann, by combining the two previous approaches gave a hybrid method [14]. The speed of these methods is about the same for the same example, same accuracy; there is no conclusive evidence pointing to a factor in the performance of these methods greater than 10. In spite of the existence of these methods, the evaluation of Eq. (1) and related quantities still requires a sizable portion (about 20± 30%) of computational time during optimization [18]. The Monte Carlo methods for computing U
h produce realizations y1 ; . . . ; yn of a random variable g, where E
g U
h, so the unbiased estimator y
n 1X yi n i1
is used in numerical computations as an approximate value of U
h. If D2
g r2 , then D2
y r2 =n, so p the standard deviation of the estimator y is r= n (which is generally called the error), decreasing by the square root of the sample size n. Invoking the central limit theorem g is assumed to be normally distributed, that is g 2 N
U
h; r. The use of regression estimators was ®rst suggested in [6] to facilitate the solution of some numerical problems, arising during stochastic optimization. Recently, several speci®c forms for the regression estimators were proposed in [8]. A summary of these ideas is given here and we show, how these regression estimators can be used in numerical problems of stochastic optimization: 1. ®nding the intersection of a line and the surface of the set of feasible solutions (for detailed numerical results see [10]), 2. computing the gradient of the m-dimensional distribution function and directional derivatives (relevant numerical examples are presented here), 3. speeding up computations in a parallel asynchronous version of optimization (for the description of the full parallel model see [11]), 4. applying nonlinear optimization methods for probability constrained stochastic models. The application of the suggested regression estimators basically means, that quadratic approximations (or simple functions of the same) of the m-dimensional normal distribution function, or some related quantities can be constructed and by some iterative procedure the approximation can be made more and more accurate in certain regions, where it is important to make a good approximation. The regression estimators are described ®rst, then some details of the estimators are investigated. In Section 4, a quantile ®nding algorithm is presented, which can be used for ®nding the intersection of a line and the surface U
h prel . Section 5 is devoted to evaluating the gradient of the m-dimensional distribution function and directional derivatives. Some computational experiences are presented to show the performance and the eciency of the quantile ®nding and the derivative computing procedures. In Section 6, remarks based on results of computer experiences are given and some comments are made on
I. De ak / European Journal of Operational Research 111 (1998) 555±568
557
other possible uses of the regression estimators in numerical optimization of stochastic programming problems. 2. Linear regression estimators for the normal distribution Let us denote the R1 ! R1 function U
z x
d ÿ z by f
x, where d and z are ®xed and d is assumed to be an increasing direction, that is f
x1 < f
x2 , if x1 < x2 . (This assumption merely simpli®es the discussion of the truncation and root ®nding procedure, but it is not essential.) The ®rst three regression estimators to be described below approximate the function f
x, the next three estimators are m-dimensional functions, approximating U
h. Assume, that we have a Monte Carlo method for computing a noisy value pi of the function value f
xi ; i 1; . . . ; n, where pi is a random variable pi f
xi n; n 2 N~
0; r~. Here N~
0; r~ is a truncated version of the one-dimensional normal distribution with expectation 0, variance r2 , so N~ has expectation 0, variance r~, the density function of the truncated distribution is x2 ~ /
x C exp ÿ 2 ; x 2 ÿ3r; 3r; 2r p where C 1:0028=
2pr. Furthermore we assume, that 0 < f
xi ÿ 3r; f
xi 3r < 1, holds, so that 0 < pi < 1; 8i through this paper. Let S fxi ; pi gni1 be the set of computed values, an approximation t
of f
is determined via regression, so that t
is optimal in case of a given set S. 2.1. Basic estimator ± quadratic approximation of the original function f
We are looking for an estimator t1
x of type g1
x a1 x2 b1 x c1 , where the constants a1 ; b1 ; c1 are determined to give the best ®t to S. For a given set S the best approximation of the form g1
x can be obtained by solving the problem min
a1 ;b1 ;c1
n X pi ÿ
a1 x2i b1 xi c1 2 :
2
i1
The ®rst order necessary conditions of optimality are used now: a system of three equations is obtained, dierentiating the above sum with respect to the unknown parameters a1 ; b1 ; c1 and setting the derivatives to 0. This system is linear in the parameters sought for, thus the solution is easily derived (explicit formulas can be found in the Appendix A). Obviously 0 6 f
x 6 1 holds, so an immediate truncation, called null truncation has to take place; denoting I fx j 0 6 g1
x 6 1g, then we have t1
x g1
x, if x 2 I. Since f
x is monotone increasing, because of the assumption, only the increasing part of the function g1
x can be used as an estimator to f
x. The distribution function U is logarithmically concave [20], so the function g1
x should also be concave (disregarding cases, when errors in pi make g1
x to become convex). Now another truncation, the constant truncation is made: let the solution of the equation g0
xt 0 be the truncation point, where xt ÿb1 =
2a1 , then we de®ne t1
x with null and constant truncation as max
0; min
1; g1
x if ÿ 1 < x 6 xt ;
3 t1
x max
0; min
1; g1
xt if xt < x < 1; where g1
xt c1 ÿ b21 =
4a1 . In some cases another type of truncation can be advantageous, which is called the linear truncation; the cut o part of the function instead of the constant g1
xt is replaced by an
558
I. De ak / European Journal of Operational Research 111 (1998) 555±568
increasing linear function, being tangential to g1
x at a point xtt xt ÿ j, for some small j > 0, until it reaches the value 1. Let the line be given by at x bt , then we have the equations at xtt bt g
xtt ; 2a1 xtt b1 at , from which the constants at and bt can be expressed. The ®nal form of t1
x (for the basic estimator with null and linear truncation) becomes 8 if ÿ 1 < x 6 xtt ; > < max
0; min
1; g1
x t1
x max
0; min
1; at x bt if xtt < x 6
1 ÿ bt =at ;
4 > : 1 if
1 ÿ bt =at < x < 1: For the other estimators only the constant truncation is described, though the null truncation should and the linear truncation could be used. Also we would not dwell on the case, when the direction d, given in the de®nition of f
x is not an increasing direction; we note, that it does not pose mathematical problems, this assumption merely simpli®es discussion, when roots of the equation t
x prel are looked for, but the function f
x can be approximated by a function of type g1
x, whatever the direction d is. Note here, that the approximating function t1
x is made monotone increasing, but concavity cannot be guaranteed, just hoped for. Similarly the functions t2
x; . . . ; t6
x to be described later are monotone increasing, but concavity or positive de®niteness of ÿQ cannot be ensured, though it is conjectured, that asymptotically they can be achieved in most practical cases. 2.2. Logarithmic estimator ± quadratic approximation of the function log f
x Now instead of approximating f
x directly, a quadratic function g2
x a2 x2 b2 x c2 is used to approximate log f
x, that is the logarithmic estimator t2
x exp fg2
xg approximates f
x. This is still a linear estimator, because the parameters of g2
x can be expressed from a system of linear equations. Using the notation qi log pi the problem to be solved is min
a2 ;b2 ;c2
n X i1
2 qi ÿ
a2 x2i b2 xi c2 wi ;
5
where normalized weights wi are attached (these weights are determined in the next section). The parameters a2 ; b2 ; c2 can be computed from Eq. (5) the same way, as it was done for the basic estimator t1
x (corresponding equations are given in the Appendix A). Prekopa proved [20], that U
h is a log-concave function, thus the function log f
x is concave and monotone increasing, so we want to make g2
x concave and monotone increasing as well. Let xt ÿb2 =
2a2 be the truncation point, so the estimator t2
x is given by if ÿ 1 < x 6 xt ; max
0; min
1; expfg2
xg
6 t2
x max
0; min
1; expfg2
xt g; xt < x < 1: 2.3. Reverse-logarithmic estimator ± quadratic approximation of the function log
1 ÿ f
x The function log
1 ÿ f
x is approximated by a quadratic function of the form g3
x a3 x2 b3 x c3 , that is the function t3
x 1 ÿ expfg3
xg approximates our original function f
x. This is a linear regression, too, and its parameters can be obtained by using the ®rst order necessary conditions of optimality of the problem min
a3 ;b3 ;c3
n X ri ÿ
a3 x2i b3 xi c3 2 wi ; i1
7
I. De ak / European Journal of Operational Research 111 (1998) 555±568
559
where ri log
1 ÿ pi ; i 1; . . . ; n, and the normalized weights wi are given in Section 3.1. Expressions for a3 ; b3 ; c3 can be obtained from those of a2 ; b2 ; c2 by replacing qi by ri . The function log
1 ÿ f
x is monotone decreasing, for making a truncation, assume, that the approximating function is concave (the convex case can be dealt similarly), then let xt ÿb3 =
2a3 be the truncation point; the reverse logarithmic estimator with null and constant truncation is de®ned as max
0; min
1; 1 ÿ exp fg3
xt g if ÿ 1 < x 6 xt ;
8 t3
x max
0; min
1; 1 ÿ exp fg3
xg if xt < x < 1: By ®tting a function of type t3
x to the distribution function of the one-dimensional standard normal distribution (see [9]), an approximation of the form 1 ÿ exp fg3
xg can be calculated via least square approach, which diers from the distribution function by only less than 10ÿ3 at any point on the hal¯ine 0 < x < 1. 2.4. Fitting m-dimensional linear regression estimators ± quadratic approximations ± to the m-dimensional normal distribution function The ideas used to develop estimators t1 ; t2 and t3 can equally well be applied to construct Rm ! R1 approximations to U
, they will be denoted by t4
x; t5
x; t6
x and called m-basic, m-logarithmic and mreverse-logarithmic estimator, respectively. Assume now, that for given m-dimensional points x
i ; i 1; . . . ; n the noisy function values pi ; i 1; . . . ; n are computed by some Monte Carlo method, where now pi 2 N~
U
x
i ; r~ (where N~ is a distribution with expectation U
x
i and variance r~2 ), which is a truncated version of the normal distribution N
U
x
i ; r. Similarly to the one-dimensional case we assume 0 < pi < 1; 8i. In order to ®nd the best ®t to the set fx
i ; pi gni1 among the functions of the form q4
x ÿx0 Qx b0 x c; where Q is an m m symmetric matrix, the following problem is to be solved min
qab ;bc ;c
n h i2 X pi ÿ q4
x
i
9
i1
for the unknown parameters qab ; bc ; c; with indices a; b; c 1; . . . ; n; a 6 b, where qab is the
a; bth element of Q and bc is the cth component of b. The ®rst order necessary conditions of optimality consists of
m 1
m 2=2 linear equations with the same number of unknowns. (Here, and in the sequel indices to the matrix Q, the vector b and the scalar c, indicating the estimator, to which they belong are dropped.) The corresponding system of linear equations to be solved for the unknown parameters qab ; bc ; c; where a; b; c 1; . . . ; n; a 6 b is n X i1 n X i1 n X i1
i where xa
i
pi x
i0 Qx
i ÿ b0 x
i ÿ cxa
i xb 0; pi x
i0 Qx
i ÿ b0 x
i ÿ cxa
i 0;
a; b 1; . . . ; m; a 6 b;
a 1; . . . ; m;
10
pi x
i0 Qx
i ÿ b0 x
i ÿ c 0;
is the ath element of x
i . This quadratic function q4
x in order to approximate the function U
h ± which is bounded, 0 6 U
h 6 1, and increasing for increasing directions ± should have the same
560
I. De ak / European Journal of Operational Research 111 (1998) 555±568
bounds and a positive gradient. Since rq4
x ÿ2Qx b, thus q4
x can be used for vectors lying in T4 fx j Qx 6 12 bg, which is a pointed cone with vertex at xt 12 Qÿ1 b. Summing up the constraints the mbasic estimator is de®ned in case of null and constant truncation as max
0; min
1; q4
x if x 2 T4 ;
11 t4
x max
0; min
1; q4
y if x 62 T4 ; where the vector y 2 T4 is a function of x, and is to be determined as the nearest vector to x; so y is the projection of x onto T4 . Two other estimators t5
x and t6
x can be constructed along the same lines as estimators t2
x and t3
x, respectively. The m-dimensional quadratic function q5
x approximates log U
x and the function q6
x approximates log
1 ÿ U
x. For these estimators the terms in the sum to be minimized can also have weights, that are to be determined similarly to the case of t2
x and t3
x. Truncation regions are de®ned as T5 fx j Qx 6 12 bg, T6 fx j Qx P 12 bg, since t5
x should be monotone increasing and t6
x monotone decreasing. Thus the m-logarithmic and the m-reverse-logarithmic estimators (disregarding truncations) are de®ned as t5
x exp fq5
xg
if x 2 T5 ;
t6
x 1 ÿ exp fq6
xg
if x 2 T6 :
3. Side eects of the logarithmic transformation The logarithmic transformation used in the logarithmic and reverse-logarithmic estimators change the variance and the expected value of the error of sampling. According to the Gauss±Markovian conditions one would like to have an error with zero expectation and the same variance (see [22]). The change in the variance can be balanced by introducing weights, while the distortion of the expected value can be counteracted by applying a correcting term. Weighting is to be used in all estimators having the logarithmic transformation, if the range of the values xi or x
i is wide, that is if some function values f
xi dier from each other by a value greater than r, say. Applying a correction term is recommended only when very accurate function values are required.
3.1. Weights for logarithmic transformations For the sake of simplicity we investigate the estimator t2
x only, but the reasoning and the formula can be applied for the estimators t3
x; t5
x; t6
x as well. Assume, that at x0 the value p0 is computed by a Monte Carlo method, p0 2 N~
f
x0 ; r~, that is p0 f
x0 n can be written with n 2 N~
0; r~. Obviously, in actual computations null truncation of the value p0 takes place, giving p0 max
0; min
1; f
x0 n, but for clearness of presentation in this section we will generally use the normal distribution of p0 , as a numerically feasible substitution. Assuming a completely accurate approximation (when g
x f
x) in the case of the basic estimator squares of the errors p0 ÿ f
x0 2 n2i are summed up and minimized, the errors are ni , independent of the value of f
x0 . However in case of the estimator t2
x the terms in the sum to be minimized are log p0 ÿ log f
x0 2 , that depend on f
x0 . Expanding log p0 log
f
x0 n around f
x0 gives log p0 log f
x0
1 n; f
x
12
I. De ak / European Journal of Operational Research 111 (1998) 555±568
561
where f
x 2 f
x0 ; f
x0 n, by the mean value theorem. Rearranging the terms in Eq. (12) we have log p0 ÿ log f
x0 f
x n. Thus the terms in the sum to be minimized will be independent of f
x0 , if weights f 2
x are attached to the terms. The value of f
x is unknown, but can be approximated by the average of p0 f
x0 n and f
x0 , the ®rst one being known, the second unknown, but can be approximated by any previously computed or known estimator. So let the weights wi be given by the following equations: wi wi =s; wi pi t
xi 2 =4; n X wi ; s
i 1; . . . ; n;
i1
where t
is either an earlier version of the same estimator or any other estimator of f
x. We can also use the linear regression of the form t0
x a0 x b0 for this purpose, which can be easily determined. The weights for the reverse logarithmic estimators are obtained by the same train of thoughts as: wi wi =s; wi 1 ÿ pi 1 ÿ t
xi 2 =4; n X wi : s
i 1; . . . ; n;
i1
As it was noted in the beginning of this section, the change of the variance of the error caused by the logarithmic transformation becomes disturbing only in the case, when the values of f
xi can be very dierent. So having the ®rst approximation t
x computed, if the inequality t
maxi xi ÿ t
mini xi < r holds, then there is practically no need for computing the weights, because the error r of the function evaluation overwhelms the error resulting from the logarithmic transformation. 3.2. Correction to the estimator Originally the sampling error has expected value 0; by adding a correcting term to the estimator the transformed error's expected value can be made close to 0. Assume now, that p is computed as an approximate value of f
x, for a given x, so that p f
x n; n 2 N~
0; r~. In stochastic programming a usual reliability level is f
x l 0:9 which in case of r 0:05 gives a heavily lopsided distribution, this error is much bigger, than the correction that is to be described in the later part of this section. Obviously, a standard deviation r 0:01 makes this problem practically nonexistent for values up to f
x 0:95. We assume p has a mean l f
x, standard deviation r~ (which is very close topr), density function ~ C exp fÿ
z ÿ l2 =
2r2 g; z 2 l ÿ 3r; l 3r, with a constant C 1:0028=
2pr. /
z Expanding log p log
l n around l we have 1 1
p ÿ l2 1
p ÿ lk
ÿ1kÿ1 k log p log l
p ÿ l ÿ 2 2 k l l l The expected value of log p can be obtained by substituting the Eq. (13) into ( ) l3r Z
z ÿ l2 r~2 r~2k M2k E
log p C log z exp ÿ ÿ ÿ ; dz log l ÿ 2r2 2l2
2kl2k lÿ3r
13
14
562
I. De ak / European Journal of Operational Research 111 (1998) 555±568
where M2k is the 2kth moment of N~ , since odd moments are 0. Let r be even, then the rth moment of the p standard normal distribution is 2r=2 C
r 1=2= p [15], which would give a divergent sum for E
log p, but the truncated normal distribution N~ has bounded moments. This can be shown as follows. Taking p p the expectation of the r 1st term in Eq. (13) and substituting r for r~, 1=
2pr for C 1:0028=
2pr we have l3r Z
Ir1 lÿ3r
1 C r
z ÿ lr exp rl
(
z ÿ l2 ÿ 2r2
)
rr p dz rlr 2p
Z3
r
y exp ÿ3
y2 dy: ÿ 2
15
An upper bound for this last integral can be obtained by replacing the exponential function by 1, so we have Ir1
2rr p rlr 2p
Z3
r
y exp 0
y2 dy ÿ 2
r r r Z3 r 2 r 3r 3r 1 p y 3 dy < < 6 l r 2p l r
r 1 l r2
16
0
which is decreasing with increasing r and for computational purposes is negligible, if 3r < Ll is true with a big enough L; it holds for most of the practical cases. (In cases, when 3r > m the reverse-logarithmic estimator can be used.) According to Eq. (14), since the terms are decreasing, the ®rst two terms are retained only and we have an approximate expression for the expected value of log p as follows E
log p log l ÿ
r2 ; 2l2
17
where the second term on the right-hand side will be called the correcting term. Essentially this approximate P expression means that though 1=n i f
xi ni is an unbiased estimator of f
x, but log
f
x n is not an unbiased estimator of log f
x; generally we get smaller values, and the dierence can be approximated by r2 =
2l2 . This correction is quite small, and should be used only in cases, when the required precision of the approximation-to-be-computed is near to the value of the correcting term. Anyway, we suggest the use of this correction only in the ®nal stages of the computation only. For the reverse-logarithmic estimator the correction term can be similarly derived and is equal to r2 =2
1 ÿ l2 .
4. Root ®nding ± quantile determination ± algorithm Consider the problem of ®nding an approximate root xr to the solution xr of the equation f
xr prel ; which is a quantile determination problem, it will be called the root ®nding. The following heuristic algorithm uses the regression estimator t to ®nd approximate roots, where t is any of the estimators t1
x; t2
x; t3
x. The convergence of the algorithm cannot be proven yet, but a heuristic explanation for its successful application is the following. If the approximating function is ``close'' to the function f , then the approximate root is also close to the true root. The new sample points xi -s will be near to xr , that is the next approximating function will have smaller error around the true root, so the next approximating root will be hopefully closer to the true root.
I. De ak / European Journal of Operational Research 111 (1998) 555±568
563
Our strategy for ®nding an approximation to xr is realized by an iterative procedure, where the estimator t
x is successively made more and more accurate in the neighbourhood of the approximate root. This is achieved by constructing a sequence of shrinking intervals, the so-called con®dence intervals ai ; bi around the value p, and the corresponding intervals ai ; bi around the approximate roots xr of ti
xr prel (here ti
ai ai ; ti
bi bi and ti is the actual estimator in the ith step of the algorithm). The length of the intervals is decreased in each iteration of the procedure. Considerations on the details of the algorithm are given in [10], only the ®nal algorithm is presented here. To keep notations simple, the algorithm is described for t1
x g1
x a1 x2 b1 x c1 , (after the necessary small modi®cations the estimators of t2 and t3 can be equally well used), and to further simplify notations the indices of t1 ; g1 ; a1 ; b1 ; c1 are also dropped. To apply the same algorithm for the logarithmic and reverse-logarithmic estimators just some minor changes are to be made (pi is replaced by qi or ri , the formula for the ratio o is to be changed, conditions on the roots being smaller, than the truncation point should be reversed for reverse logarithmic estimators, etc.). The reduction factor . 0:6 and the number of points to be evaluated in each iteration K 10 was used. The initial interval a0 ; b0 should be determined symmetrically around the root xr , but not knowing this, a good guess will suce. 0. (Initialization): Assume, that we have an initial interval a0 ; b0 and K points x0j in it (given maybe at equal distances) and the ``noisy'' function values p0j f
x0j , furthermore the estimator g
x using the points S0 fx0j ; p0j gKj1 is already computed. Set the initial values r0 r=., iteration counter i 0, steplength stl
b0 ÿ a0 =10. 1. (Computation of the length of the con®dence interval): Increase the iteration counter i i 1, compute ri .riÿ1 and the truncation point xt ÿb=
2a. 2. (Computation of the preliminary interval ai ; bi ): Determine the approximate roots xr from the equation g
xr prel . If a < 0, then accept the root, which is smaller than xt (or set xr xt if there is no solution to the equation). If a > 0, then accept the root, which is greater then xt (or set xr xt if there is no solution to the equation). Compute d 2 stl and let xÿ xr ÿ d; x xr d. The ratio o
prel ÿ g
xÿ =
g
x ÿ prel indicates the relation of the left- and right-hand side derivatives approximately. Let ai max
prel ÿ ori ; 0:0001; bi min
prel ri ; 0:9999: 3. (Computation of the interval ai ; bi ): Compute the roots of ai gÿ1
ai ; bi gÿ1
bi . If a < 0, select those roots, that are smaller than xt , if a > 0, then select the greater roots (if no solution exists, then take ai xt ; bi ai 0:01.) 4. (Updating g
x): Select K 10 new points in ai ; bi by letting xij
bi ÿ ai
j ÿ 1=
K ÿ 1 function values pij f
xij , ai ; j 1; . . . ; K and compute by a Monte Carlo method the noisy S E
pij f
xij . Let the set of new points be Si fxij ; pij gKj1 , and S il0 Sl . Compute the new approximation g
x using all points in S, and determine the new approximate root xr gÿ1
prel (if a < 0, then choose the smaller root, otherwise the greater one and set xr xt , if no solution exists). p 5. (Test of convergence): If r=
i 1K < 0 , where 0 is a prescribed error tolerance, then stop. Otherwise set xr xr ; stl 2
bi ÿ ai =K and go back to Step 1. From a computational point of view there seems to be two advantages of this algorithm, resulting in good performance: 1. the error of the approximation ± according to computational experiences (given in [10]) ± seems to decreasep with the square root of the number of points in S fxij ; pij g, that is jf
xr ÿ f
xr j 6 r=
N 1K , where N is the number of iterations (this feature has been used in Step 5. of the algorithm), 2. updating the estimator g
x in Step 4. can be done simply by adding the quantities (moments) related to the new points in Si to the already computed constants of g
x, no need for a total recomputation (for details see [10]). This implies, that the work associated with the evaluation (and updating) of the regression estimator depends only linearly on the number of function evaluations, that is the number of the points in S.
564
I. De ak / European Journal of Operational Research 111 (1998) 555±568
For high values of prel (around 0.95±0.99) the linear truncation of g
x is to be used, since the estimator construction will generate approximate roots very near to the truncation point, the true root being frequently greater than the truncation point xt . Instead of the content of Step 5. Other stopping rules can be used as well; the simplest one is to stop the algorithm after a prescribed number of iterations. Another way to control the iteration number is to compute the experimental error in each iteration, and stop when the required precision is attained. The m-dimensional estimators are also easy to use for the root ®nding problem, but the computation of an estimator, since it involves much more variables, requires correspondingly more computational work. Nevertheless we give the formal description: for the estimator t4
x ÿx0 Qx b0 x c the equation to be solved for the unknown value xr is the following z xr
d ÿ z0 Qz xr
d ÿ z b0 z xr
d ÿ z c p;
18
which is a quadratic equation in the unknown root, so an approximation to xr can be expressed. Returning to the one-dimensional estimators t1 ; t2 ; t3 we present some computational results, that give an indication, how the root ®nding algorithm works (a more detailed set of computer results is given in [10], the number of the examples refers to the numerical examples in that paper). For these results the constants of the algorithm were the following: K 10; N 9; r 0:05, so altogether in each example we used 100 function evaluation, and the ®nal results had errors less than r=10 0:005 in most cases. Example 1 is a two-dimensional problem, with correlated components, Example 3 is a 10-dimensional distribution, where the components are pairwise correlated (that is the ®rst and the second components are correlated with each other, but independent of the other components, the third and fourth components are correlated, but independent from the others, etc.), Example 4. has all components independent. No. of exam.
Dimens. m
Reliab. prel
``Exact'' root xr
Estimat.
Root xr
Error jf
xr ÿ f
xr j
1.
2
0.80
1.1114
3.
10
0.95
5.895
4.
50
0.90
4.276
Orig Log R-log Orig Log R-log Orig Log R-log
1.099 1.100 1.131 5.724 5.827 5.999 4.201 4.164 4.216
0.0025 0.0035 0.0048 0.0024 0.0010 0.0012 0.0038 0.0050 0.0029
5. The gradient of the distribution function Prekopa proposed the following formula for computing the gradient of the multidimensional normal distribution function [21] rU
h fU
h1 ; . . . ; hiÿ1 ; hi1 ; . . . ; hm jhi u
hi gm i1 ; where the distribution function on the right-hand side is an
m ÿ 1-dimensional conditional distribution function, whose parameters, including its correlation matrix can be computed from R and h. Using a Monte Carlo method for evaluating the right-hand side of this expression means that the gradient rU
h can be computed at the cost of m
m ÿ 1-dimensional distribution function values (which is about the same as computing m m-dimensional distribution function values).
I. De ak / European Journal of Operational Research 111 (1998) 555±568
565
Another natural idea is to make use of the numerical dierential:
f
x d ÿ f
x ÿ d=
2d, but using a Monte Carlo method with error r to evaluate function values f
and determine the dierential in the neighbourhood of the point can result in great variations of the value of the gradient, making this idea impractical. There are two possibilities for computing rU
h using the regression estimators. The ®rst one is called componentwise, since it means de®ning a function fi
x U
h xei ; i 1; . . . ; m, ei being the ith unit vector, computing n noisy function values fi
xij ; j 1; . . . ; n in the neighbourhood of the point h, where the gradient is required, (xij 2 ÿd; d, for some d > 0), determine an estimator of the function, and then compute the analytical derivative of this function as an approximate value of the directional derivative. For the derivatives of the one-dimensional estimators we have t10
x 2a1 x b1 ; t20
x 2a2 x b2 exp fa2 x2 b2 x c2 g;
19
t30
x ÿ2a3 x b3 exp fa3 x2 b3 x c3 g: In the second approach one of the m-dimensional P estimators 2can be used. Here N points are determined in the neighbourhood of h, that is let xj 2 fxj m i1
xi ÿ hi 6 dg; j 1; . . . ; N , for some d > 0, then evaluate the function values U
xj pj by a Monte Carlo method, determine the m-dimensional estimator and compute the analytical gradient of the estimator as an approximate value of the real gradient. For the estimator t4 we have rt4
x ÿ2Qx b, for the m-logarithmic estimator rt5
x ÿ2Qx b exp fÿx0 Qx b0 x cg and for the m-reverse-logarithmic estimator rt6
x ÿÿ2Qx b exp fÿx0 Qx b0 x cg are the corresponding analytical expressions. Computer experiences for the componentwise approach suggest, that good results can generally be obtained, if the interval a; b, where the function fi
x is sampled, is: 1. asymmetrical about h, e.g. 13 of the points produce values smaller values, than at h (that is x < 0), and the other 23 of the points above 0 (x P 0), 2. wide enough to make the estimator stabil, that is f
a ÿ f
b P r holds, 3. the points xj 2 a; b are uniformly distributed. The numbering of the examples below refers to those in [10]; some of them were used to produce computer experiences regarding the performance of the componentwise approach, they are presented below. The standard deviation of one function evaluation f
xi pi is r 0:05; sample size was n 25. In case of Example 4. (here m 50, the vector h given, U
h 0:95; f1
x U
h xe1 ) the ``exact'' value of the ®rst component of rU
h was 0.0883. (Here and in the sequel exact gradient values were obtained by the same method, decreasing the sampling error to r=10 and increasing the sample size to 1000). The following table shows, how results for the ®rst component of the gradient vary depending on the interval a; b. The ®rst estimator, called linear is the regression function of the form t0
x a0 x b0 . a; b
f1
a
f1
b
Linear
Basic
logarithmic
Reverse-Log
ÿ0:8; 0:8 ÿ0:5; 0:5 ÿ0:3; 0:3 ÿ0:45; 0:75
0.85 0.88 0.91 0.89
0.98 0.97 0.97 0.98
0.091 0.097 0.091 0.062
0.118 0.097 0.090 0.088
0.121 0.099 0.090 0.088
0.075 0.076 0.068 0.071
Example 1 is a two-dimensional normal distribution function, with h
1:1114; 1:5; U
h 0:8, the correlation coecient is q ÿ0:9, the exact value of the ®rst component of the gradient is 0.215. The following table shows results of the dierent runs, for the same parameters.
566
I. De ak / European Journal of Operational Research 111 (1998) 555±568
No. run
Linear
Basic
Logarithmic
Reverse-Log
1 2 3
0.202 0.252 0.236
0.202 0.251 0.236
0.200 0.252 0.246
0.198 0.253 0.205
For the same example at the point h
1:8358; 1:5; U
h 0:9 the exact gradient was (0.074, 0.129), and a computer run produced the following set of results for the gradient. Linear
Original
Logarithmic
Reverse-Log
(0.086,0.120)
(0.086,0.120)
(0.085,0.120)
(0.091,0.118)
In the case of Example 3 (m 10; h given, U
h 0:8; f1
x U
h xe1 ) the exact value of the ®rst component of the gradient is 0.1405, computer results are given for two dierent runs. Linear
Basic
Logarithmic
Reverse-Log
0.139 0.164
0.139 0.164
0.140 0.171
0.137 0.163
From these and other computer the following conjecture can be made. The error of the gradient pruns evaluation seems to be about 3r= n, so the method proposed by Prekopa (despite the necessity of computing new correlation matrices) seems to be superior. The real advantage of using one-dimensional regression estimators for computing gradient is in the case, when directional derivatives are required (for example in selecting the best descent direction in a random search pattern), because in this case we do not need to compute the whole gradient, as with Prekopa's approach would be needed. Certainly this componentwise method is much superior to evaluating the numerical dierential, since the latter one can very easily change even its sign.
6. Remarks The root ®nding algorithm produces an approximate root, whose accuracy looks to be the same, as if all function evaluations were done at the same point, so comparing this root ®nding to some other kinds of line search techniques our technique seems to be faster with an order of magnitude, not withstanding other bene®ts. Computing the gradient by the componentwise approach is not better than the method proposed by Prekopa, but it can be used for eciently evaluating directional derivatives. For the m-dimensional estimators no numerical evaluation is available yet. Finally some propositions are made for other possible uses of the regression estimators in the optimization procedures of STABIL type stochastic programming problems. Consider a parallel model of optimization, where N processors are working in an asynchronous mode. Each processor solves the same problem, with dierent methods on the processors, or with the same method and dierent parameters. Communication between the processors is restricted to broadcasting the actual feasible solution at the end of each iteration (for a complete description of this model of optimization and its convergence see [11]). Assume furthermore, that the values fx
i ; pi gni1 computed at any of the processors are collected at a special processor, which having received a certain amount of data computes an m-dimensional regression estimator of the distribution function of the multidimensional normal distribution, that is broadcasted to all processors. The processors can use this approximation to speed up the next iteration. Obviously the
I. De ak / European Journal of Operational Research 111 (1998) 555±568
567
m-dimensional estimators can be used in other parallel computational procedures, where the function values U are evaluated at more than one processor in the same manner. The greatest promise of the use of m-dimensional regression estimators lies in the possibility, that constructing an estimator essentially gives us a quadratic m-dimensional approximation to the distribution function of the multidimensional normal distribution. For example a steepest descent, a SUMT method with logarithmic barrier, or a gradient projection method can use these quadratic approximations to facilitate the numerical optimization. Appendix A Here we give the detailed formulas for the constants of the regression estimators, described in Section 4. These are given in order to help the work of a user, who might actually want to compute an estimator. A.1. Constants for the quadratic regression of the original function From the problem (2) using the ®rst order optimality conditions the following set of linear equations is arrived at: n X i1 n X i1 n X i1
pi ÿ
a1 x2i b1 xi c1 x2i 0; pi ÿ
a1 x2i b1 xi c1 xi 0;
A:1
pi ÿ
a1 x2i b1 xi c1 0:
P P P P P P Using notations p0 1n i pi ; p1 1n i xi pi ; p2 1n i pi x2i ; x1 1n i xi ; x2 1n i x2i ; x3 1n i x3i ; P x4 1n i x4i , the solution of the system of equations can be given by: 2
a1 b1
p2 ÿ p0 x2
x2 ÿ x1 ÿ
p1 ÿ p0 x1
x3 ÿ x2 x1 2
2
x4 ÿ x2
x2 ÿ x1 ÿ
x3 ÿ x2x12 p1 ÿ p0x1 ÿ a1
x3 ÿ x2 x1 x2 ÿ x1
2
;
;
A:2
c1 p0 ÿ a1 x2 ÿ b1 x1: A.2. Constants for the quadratic regression of log f
x Introducing the notations (note, that now they have a slightly dierent content than previously) q0
n 1X qi wi ; n i1
q1
n 1X qi x i w i ; n i1
x1
n 1X xi w i ; n i1
x2
n 1X x2 w i ; n i1 i
q2 x3
n 1X qi x2 wi ; n i1 i
n 1X x3 w i ; n i1 i
x4
n 1X x4 w i ; n i1 i
568
I. De ak / European Journal of Operational Research 111 (1998) 555±568
after dierentiating and solving the linear system of equations arising from Eq. (5), the parameters of the estimator can be obtained as: 2
a2
b2
q2 ÿ q0 x2
x2 ÿ x1 ÿ
q1 ÿ q0 x1
x3 ÿ x2 x1 2
2
x4 ÿ x2
x2 ÿ x1 ÿ
x3 ÿ x2 x12 q1 ÿ q0x1 ÿ a2
x3 ÿ x2 x1 2
x2 ÿ x1
;
;
A:3
c2 q0 ÿ a2 x2 ÿ b2 x1: References [1] P. Bjerager, On computation methods for structural reliability analysis, Structural Safety 9 (1990) 79±96. [2] J.A. Breslaw, Evaluation of multivariate normal probability integrals using low variance simulator, Review of Economic Statistics 76 (1994) 673±682. [3] I. Deak, Three digit accurate multiple normal probabilities, Numerische Mathematik 35 (1980) 369±380. [4] I. Deak, Computation of multiple normal probabilities, in: P. Kall, A. Prekopa (Eds.), Symposium on Stochastic Programming, Oberwolfach, Springer Lecture Notes in Economics and Math. Systems, vol. 179, Springer, Berlin, 1980, pp. 107±120. [5] I. Deak, Computing probabilities of rectangles in case of multidimensional normal distribution, Journal of Statistical Computation and Simulation 26 (1986) 101±114. [6] I. Deak, Procedures to solve STABIL on a parallel computer, Working Paper, University of Wisconsin, Department of Industrial Engineering, 89-10, lecture at the Vth International Conference on Stochastic Programming, Ann Arbor, MI, 1989, p. 26. [7] I. Deak, Random Number Generators and Simulation, in: A. Prekopa (Eds.), Mathematical methods of Operations Research, vol. 4, Akademiai Kiad o, Budapest, 1990, p. 342. [8] I. Deak, Simple regression estimators for the multinormal distributions, Lecture at XIII International Conference on Operations Research, Matrah aza, 1996. [9] I. Deak, Approximations to the standard normal distribution function, Central European J. Oper. Res. and Econ. (submitted). [10] I. Deak, Regression estimators for multinormal distributions: computer experiences in root ®nding, in: K. Marti (Ed.), Proceedings of the Workshop in Stochastic Programming, Neubiberg, Springer, 1996 (submitted). [11] I. Deak, Asynchronous parallel models for optimization, in: Ljubljana, R. Trobec, M. Vajtersic, P. Zinterhof, J. Silc, B. Robic (Eds.), Proceedings of the ParNum96, International Workshop on Parallel Numerics, Slovenia, Gozd Martuljek, 1996, pp. 195± 205. [12] O.J. Ditlevsen, P. Bjerager, Plastic reliability analysis by directional simulation, Journal of Engineering Mechanics 115 (1989) 1347±1362. [13] V. Ducroucq, J.J. Colleau, Interest in quantitative genetics of Dutt's and Deak's methods for numerical computation of multivariate normal probability integrals, Genetique, Selection et Evolution 18 (1986) 447±474. [14] H. Gassmann, Conditional probability and conditional expectation of a random vector, in: Y. Ermoliev, R. Wets (Eds.), Numerical techniques for stochastic optimization Springer series in computational mathematics, Springer, Berlin, 1988, pp. 237± 254. [15] N.L. Johnson, S. Kotz, Distributions in Statistics, vols.I±IV, Wiley, New York, 1972. [16] P. Kall, A. Ruszczy nski, K. Frauendorfer, Approximation techniques in stochastic programming, in: Y. Ermoliev, R. Wets (Eds.), Numerical techniques for stochastic optimization, Springer series in computational mathematics, 1988, pp. 33±64. [17] J. Mayer, Computational techniques for probabilistic constrained optimization problems, in: K. Marti (Ed.), Lecture Notes on Economics and Mathematical Systems, vol. 379 (1992), 141±164. [18] J. Mayer, Personal communication, 1996. [19] A. Prekopa, Boole-Bonferroni inequalities and linear programming, Operations Research 36 (1988) 145±162. [20] A. Prekopa, Stochastic Programming, Mathematics and its Applications, vol. 324, Kluwer Academic Publishers, Dordrecht, 1995. [21] A. Prekopa, S. Ganczer, I. De ak, K. Patyi, The STABIL stochastic programming model and its experimental application to the electrical energy sector of the Hungarian economy, in: M. Dempster (Ed.), Proceedings of the International Symposium on Stochastic Programming, Academic Press, New York, 1980, pp. 369±385. [22] A. Sen, M. Srivastava, Regression Analysis, Springer, Berlin, 1990. [23] T. Szantai, Evaluation of a special multivariate gamma distribution function, Mathematical Programming Study 27 (1986) 1±16.