Available online at www.sciencedirect.com
Computers & Operations Research 31 (2004) 181 – 199
www.elsevier.com/locate/dsw
An implementation of Newton-like methods on nonlinearly constrained networks E. Mijangos∗ Department of Applied Mathematics and Statistics and Operations Research, (UPV/EHU), P.O. Box 644, 48080 Bilbao, Spain Received 1 November 2001; received in revised form 1 March 2002
Abstract The minimization of a nonlinear function with linear and nonlinear constraints and simple bounds can be performed by minimizing an augmented Lagrangian function, including only the nonlinear constraints. This procedure is particularly interesting when the linear constraints are 2ow conservation equations, as there exist e5cient techniques for solving nonlinear network problems. It is then necessary to estimate their multipliers, and variable reduction techniques can be used to carry out the successive minimizations. This work analyzes the possibility of estimating the multipliers of the nonlinear constraints using Newton-like methods. Also, an algorithm is designed to solve nonlinear network problems with nonlinear inequality side constraints, which combines 7rst and superlinear-order multiplier methods. The computational performance of this method is compared with that of MINOS 5.5.
Scope and purpose Real problems with the structure of a nonlinear network 2ow problem with nonlinear side constraints exist and have a high dimensionality (e.g. the short-term hydrothermal coordination of electric power generation). They often need to be solved and it is important to 7nd the procedure that will solve them with the highest e5ciency. In previous works this author has analyzed the viability of estimating the multipliers of the nonlinear constraints from the Kuhn–Tucker system in order to improve the inexpensive Hestenes–Powell update. However, the conclusion was that there is no signi7cant advantage for multipliers estimated through the Kuhn– Tucker system. The purpose of this paper is mainly to use the convergence properties of the Newton-like methods in order to develop a new estimator of the nonlinear constraint multipliers that improves the e5ciency and the robustness of this augmented Lagrangian method. ? 2003 Elsevier Ltd. All rights reserved. Keywords: Nonlinear programming; Network optimization; Augmented Lagrangian methods; Superlinear-order methods
∗
Tel.: +34-946012653; fax: +34-944648500. E-mail address:
[email protected] (E. Mijangos).
0305-0548/04/$ - see front matter ? 2003 Elsevier Ltd. All rights reserved. PII: S 0 3 0 5 - 0 5 4 8 ( 0 2 ) 0 0 1 5 6 - 9
182
E. Mijangos / Computers & Operations Research 31 (2004) 181 – 199
1. Introduction Many network 2ow problems (in addition to the conservation of 2ow constraints and interval constraints on the 2ows) have additional constraints, they are named side constraints. Consider the network problem with side constraints (NPSC) minimize
f(x)
(1)
subject to
x ∈ F;
(2)
c(x) = 0;
(3)
where • The set F is C F = {x ∈ Rn | Ax = b; 0 6 x 6 x}; where the m × n matrix A is a node-arc incidence matrix, b is the production/demand m-vector, x are the 2ows on the arcs of the network represented by A, and xC are the capacity bounds imposed on the 2ows of each arc. • The side constraints c(x) = 0 are de7ned by c : Rn → Rr , such that c = [c1 ; : : : ; cr ]t , where ci (x) is nonlinear and twice continuously diFerentiable on the feasible set F for all i = 1; : : : ; r. • f : Rn → R: f(x) is nonlinear and twice continuously diFerentiable on F. Throughout this work the gradient of f at x is de7ned as the column vector ∇f(x), and matrix ∇c(x) = [∇c1 (x); : : : ; ∇cr (x)] is the transpose of the Jacobian of c at x, though here, for convenience, it will simply be called Jacobian. The idea is to solve NPSC by solving a series of approximate problems whose solutions “converge” on the solution of the original problem in a well-de7ned sense. In practice, it is only necessary to solve a 7nite number of approximate problems in order to reach an acceptable approximate solution to the original problem. Real problems with this structure exist and have a high dimensionality. They often need to be solved and it is important to 7nd the procedure that will solve them with the highest e5ciency. The short-term hydroelectric power generation optimization problem [1,2] and the short-term hydrothermal coordination of electric power generation [3] are examples of this type. One possible way to address problem NPSC would be to use Projected Lagrangian methods [4]. Assume that xk is the current iterate. This method would lead to solution of a series of subproblems of the type minimize
k (x)
subject to
x ∈ F; ∇c(xk )t x = −c(xk ) + ∇c(xk )t xk ;
where k (x) is either a quadratic approximation to the partial Lagrangian function about xk , or a nonlinear function based on the partial augmented Lagrangian, as does the well-known MINOS 5.5 package (see [5]). This subproblem is a nonlinear network 2ow problem with linear side constraints,
E. Mijangos / Computers & Operations Research 31 (2004) 181 – 199
183
which could be solved by specialized techniques of primal partitioning for network 2ow problems [6]. This possibility has already been explored [7]. In this work, NPSC is solved using partial augmented Lagrangian techniques, see [12]. They are based on the solution of a series of nonlinear network subproblems (NNS) of the form minimize x∈F
L (x; );
(4)
where ∈ R, such that ¿ 0, and ∈ Rr are 7xed, and L (x; ) stands for the following augmented Lagrangian function: L (x; ) = f(x) + t c(x) + 12 c(x)t c(x):
(5)
Thus, the only variables in NNS are the 2ows x on the arcs of the network represented by F (i.e. NNS is a pure nonlinear network 2ow problem), so the e5ciency of network 2ow techniques can be exploited, as there exist well-known techniques for solving nonlinear network problems like NNS, which combine a data structure of the type proposed by Bradley et al. in [10] for the linear network 2ow problem, with the variation of the active set method with superbasic variables introduced by Murtagh and Saunders in [11]. Examples of the use and application of this strategy can be found in [8,9]. A rough draft of the method used in this paper to solve NPSC is 1. Solve NNS for ¿ 0 and 7xed. Should the solution x˜ obtained by solving NNS be infeasible with respect to constraints (3), go to step 2. Should x˜ be feasible the procedure ends. 2. Update the estimate of the Lagrange multipliers of constraints (3) and (if necessary) the penalty coe5cient ; then return to step 1. Since step 1 is e5ciently solved, the key of this algorithm lies in step 2 and especially in the updating of . As is well known, in the multiplier methods it is of paramount importance that the estimation of multipliers be as accurate as possible; otherwise the convergence of the algorithm can be severely handicapped, as is shown in [12]. In practice, there are two 7rst-order procedures for estimating ; on the one hand, the method put forward by Hestenes [13] and Powell [14], symbolized by UHP in this work (and in [15]), ˜ = + c(x); ˜ and on the other hand, the estimate obtained through the classical solution to the system of Kuhn–Tucker necessary conditions (symbolized by UKT in [15]): ∇f(x) ˜ + ∇c(x) ˜ + At + = 0
(6)
using least squares, as is suggested in [4], where (; ; ) are the unknowns. A study of the viability of using these multiplier estimation techniques together with the solution of problem NNS (4) by the Murtagh and Saunders’s active set technique [11] was carried out in [16], using the method suggested by Hestenes and Powell, and in [15], by solving the Kuhn–Tucker necessary conditions using least squares. In this paper superlinear-order techniques (see [12,17]) are considered in order to improve the results that this author obtained over large-scale nonlinear network problems with side constraints using these 7rst-order estimations, see [15,16].
184
E. Mijangos / Computers & Operations Research 31 (2004) 181 – 199
Likewise, an algorithm is designed to solve NPSC, which combines 7rst- and superlinear-order multiplier methods. Its implementation gives rise to the code PFNRN03. Computational results for PFNRN03 on test problems are compared with those obtained using 7rst order estimations and with those obtained by MINOS 5.5. The paper is organized as follows. Section 2 analyzes the applicability of these techniques to solve problem NPSC when subproblem (4) is solved by the Murtagh and Saunders’s procedure [11]. Also, in Section 2, an algorithm combining superlinear and 7rst-order multiplier estimate techniques is put forward. Section 3 oFers the algorithm for solving problem NPSC together with implementation details. Section 4 presents computational tests. Finally, Section 5 oFers the conclusions. There is also an appendix in which a practical example is put forward.
2. Newton-like methods In this section, the applicability of the inexact Newton-like methods in combination with variable reduction techniques is analyzed. We introduce the concept of “reduced Newton system” in order to obtain an e5cient superlinear-order multiplier estimate, which is computed by means of techniques of numerical linear algebra. In these techniques two devices are used: the 7rst one (activation test) allows us to save computational costs and the second one (validation test) is used to check the validity of the multiplier estimate, in order to avoid convergence problems. Finally, we consider the case in which there exists a quasi-Newton approximate of the reduced Hessian of the augmented Lagrangian function. Let us consider any update formula U of in the solution of NPSC by means of multiplier methods, where the subproblem to solve is (4). The 7rst-order optimality conditions to be ful7lled by the optimizer (x∗ ; ∗ ; ∗ ; ∗ ) at the end of this sequential procedure, for all ¿ 0 large enough, give rise to the following system ∇x L (x; ) + At + = 0;
(7)
c(x) = 0;
(8)
Ax = b;
(9)
where ∇x L (x; ) = ∇f(x) + ∇c(x)[ + c(x)]: Throughout this work we assume that (x∗ ; ∗ ; ∗ ; ∗ ) is an optimizer at the end of the sequential procedure satisfying the strict complementarity condition. Likewise, we assume that given a pair (; ), is greater than a certain threshold (see Lemma 1.25 in [12]) in order to make sure the full rank of the reduced Hessian of the augmented Lagrangian in the subproblem NNS (see (4)–(5)).
E. Mijangos / Computers & Operations Research 31 (2004) 181 – 199
185
Newton’s method for solving the system consisting of (7)–(9) and the equations of the active variables at x˜ involves solving the linear system
(10)
where tˆ stands for the number of variables 7xed at their bounds in the solution of subproblem (4) and (Rx; R; R; R) is the Newton’s step used in the solution of system (7)–(9). Let ∇2 L denote the matrix of coe5cients of (10), which should be nonsingular. Let us assume that matrix ∇2xx L (x; ˜ ) is nonsingular and A has full row rank; then, the necessary and su5cient condition for matrix ∇2 L to be nonsingular is that
(11)
has full row rank, where the second row section contains the matrix A partitioned into three submatrices: BA and SA , corresponding to the last basic and superbasic submatrices after using the Murtagh and Saunders’s method to solve subproblem NNS given by (4)–(5), NA being the columns of A corresponding to the tˆ active variables in x. ˜ The partition of ∇c(x) ˜ t is the same as that of A. The variables associated with columns NA are precisely those variables whose bound is predicted to be active at the optimizer. However, it could happen that the submatrix
does not have full row rank, which could be due to one of two reasons: • either ∇c(x) ˜ t does not have full row rank though ∇c(x∗ )t does; t • or ∇c(x) ˜ has full row rank, but [∇cB (x) ˜ t ∇cS (x) ˜ t ] does not, because the prediction about the variables whose bound will be active at the optimizer is not totally correct. Both cases can occur if x˜ is not su5ciently close to x∗ .
186
E. Mijangos / Computers & Operations Research 31 (2004) 181 – 199
2.1. Reduced Newton system Let us assume that BS has full row rank. In practice, one does not need to solve system (10). Let ZC be the following matrix: t ZA 0 ZC = ; (12) 0 5 where the unit matrix has dimension r + m + tˆ and ZA is the variable reduction matrix: −BA−1 SA : ZA = 5 0
(13)
Premultiplying both sides of (10) by ZC and, since there is one and only one vector RxS such that Rx = ZA RxS ;
(14)
using this last expression we are led from system (10) to the reduced Newton system:
(15)
Since ∇2 L is nonsingular and ZA has full column rank, the coe5cient matrix of system (15) is also nonsingular, so this system has a single solution (RxS ; R). 2.2. Superlinear-order iteration The procedure contributed below by the author makes it possible to switch from the 7rst-order iteration to a superlinear-order iteration. The 7nal aim of this combination is to increase its convergence rate as far as possible, at the same time keeping the accuracy of the multiplier estimate, see [12]. Let xk = x(k ; k ) be the solution of the subproblem NNSk , i.e. NNS for pair (k ; k ) (see (4)–(5)).
E. Mijangos / Computers & Operations Research 31 (2004) 181 – 199
187
Therefore, for x˜ = xk and (; ) = (k ; k ) the submatrices of the coe5cient matrix and the right-hand side of system (15) are denoted as follows: Hk = ZAt ∇2xx Lk (xk ; k )ZA ; Nk = ZAt ∇c(xk ); hk = ZAt ∇x Lk (xk ; k ) and as a consequence of Section 2.1, the solution of (15) at xk is given by the initial calculation of R = (Nkt Hk−1 Nk )−1 (c(xk ) − Nkt Hk−1 hk ); k+1 being obtained by means of k+1 = k + R; then hCk = ZAt ∇x Lk (xk ; k+1 ) is computed, and 7nally RxS = −Hk−1 hCk ;
(16)
through which Rx is achieved using expression (14). This Rx may be used in the 7rst iteration of the solution of the new NNS subproblem as a search direction together with a suitable line search procedure. Hk can be either the genuine projected Hessian of Lk (x; k ) at xk (as above), or a quasi-Newton approximate of it. In fact, in this work we use that obtained by means of the update formula termed Broyden–Fletcher–Goldfarb–Shanno (BFGS), which is given by 1 1 HC = H + t hht + t yz yzt ; h pz yz pz where xC = x + p, p = ZA pz , pz being obtained by solving Hpz = −h, for h = ZAt ∇x L (x; ). Furthermore, yz = ZAt (∇x L (x; C ) − ∇x L (x; )). This update formula preserves the symmetry, i.e. HC is symmetric if H is symmetric, and if yzt pz ¿ 0, it also maintains the positive-de7niteness. For more details see Section 4.5.2 and Section 5.1.2.4 in [4]. 2.3. Quasi-Newton multiplier update A quasi-Newton method (here BFGS) is normally used to solve subproblem NNSk when the number of superbasic variables is no greater than a number previously given s, C which is chosen so that the computational eFort of computing the search direction in this way will be smaller than that of computing it using another e5cient method (e.g. the Truncated Newton [18]). As a result of the above and of the numerical experiments performed by this author using alternately the BFGS method and the Truncated Newton method, sC = 500 has been selected as the default value. If s ¿ s, C it is clear that we will not have this factored approximation, hence we will necessarily have to either compute the reduced Hessian of the augmented Lagrangian directly (if possible), or approximate it by means of 7nite diFerences. Therefore, the possibility of s ¿ sC is an issue to take into account when implementing quasi-Newton techniques for superlinear-order iteration. Nevertheless, we must not forget that there are other algorithmic choices that help to solve this problem, such as: the
188
E. Mijangos / Computers & Operations Research 31 (2004) 181 – 199
L-BFGS method of Nocedal [19] and the partitioned quasi-Newton method of Griewank and Toint [20]. In this work a factored BFGS update Hk is used in the solution of subproblem NNSk (see (4)), and it is reused to compute a superlinear-order multiplier estimate UINW (see [12,21]). 2.4. Activation and validation tests An issue that is interesting to consider is whether it is worth attempting to compute the superlinearorder iteration when the current point is not close enough to an optimizer, given the high computational cost of obtaining a projected Jacobian Nk having full column rank in the 7rst step of Algorithm 1. In order to try to avoid the risk that the current projected Jacobian Nk does not have full column rank, one can control the closeness to (x∗ ; ∗ ) by means of the following test: • Activation test: Tnw0 := Tx and T (i.e. Tnw0 is true if and only if Tx and T are true). For a previously 7xed tolerance #x , Tx := #Ck 6 #x ; where #Ck is an optimality tolerance for subproblem NNSk such that limk →∞ #Ck = $opt , where $opt is the level of accuracy of the optimizer chosen by the user. (For further details about #Ck see the de7nition of #Ck in Step 1 of the Algorithm 2 in Section 3). Moreover, T := k − k −1 ¡ (1 + k )# ; where # is a previously 7xed tolerance. The activation test Tnw0 is used to know the closeness of the current point to x∗ . If Tnw0 is false, the projected Jacobian Nk is not computed, so the superlinear-order iteration is not performed and k+1 is the default estimate, i.e. the 7rst-order update UHP . Another issue to bear in mind is whether the estimate computed by means of superlinear-order methods is indeed acceptable or not. Following Gill and Murray [22], we have attempted that there should be some measure of agreement between 7rst- and second-order multiplier estimates. As a result of this idea, when solving the problem NPSC we use the following test: • Validation test: This test Tnw1 := |C i − ˜ i | 6 |˜ i | ∗ #nw1
for i = 1; : : : ; r
is based on the fact that the superlinear-order iteration C is rarely correct if it is very far from the 7rst-order iteration . ˜ Hence, if Tnw1 is true, we set k+1 = ; C otherwise, we set k+1 = . ˜ The selection of the tolerance values used by the author for these tests is based on his computational experience. They are: #x = 0:01, # = 0:1 and #nw1 = 1:0. In particular, #nw1 = 1:0, among other things, makes sure that C i and ˜ i agree in sign. Moreover, we have computational evidences
E. Mijangos / Computers & Operations Research 31 (2004) 181 – 199
189
of the importance of the validation test for the convergence of the algorithm that solve NPSC, as is shown in Section 4. 2.5. Computation of the new kind of multiplier estimate The following algorithm computes a superlinear-order multiplier estimate of ∗ , for which we assume that a Cholesky factorization Hk = Rtk Rk is available. Let Dk = Nkt Hk−1 Nk and the 7rst-order estimate of ∗ be UHP (xk ; k ; k ) = k + k c(xk ):
(17)
Algorithm 1. Step 1: Calculate the
for i = 1; : : : ; r
190
E. Mijangos / Computers & Operations Research 31 (2004) 181 – 199
is true (see Section 2.4) the algorithm 7nishes with update k+1 = . C Otherwise, it ends with k+1 = ˜ (i.e. the 7rst-order estimate). 3. Solution of the problem NPSC An algorithm is given below to solve problem NPSC (see (1)–(3)). This algorithm uses the techniques outlined in Section 1 for the elimination of the side constraints (3), see [15]. The kth iteration of this algorithm is called the kth major iteration in contrast to the minor iterations of the solution of subproblem NNS. As in Section 2.2, xk = x(k ; k ) is the solution to the subproblem NNSk for pair (k ; k ). The violation of the jth nonlinear side constraint is denoted by Vjk = cj (xk ). Consequently V k = c(xk ). The minimization procedure for NNSk will be interrupted if vector xk = x(k ; k ) satis7es a predetermined stopping criterion which becomes more rigorous as k increases, so that the minimization will be asymptotically exact, as indicated in Section 2.5 of [12]. Algorithm 2 (Multiplier method for NPSC). Step 0: Initialization. Compute an initial feasible point x0 for the subproblem NNS1 and 1 , 1 , $1 , ,1 are appropriately selected. Also set V 0 = c(x0 ), which is used only in the 7rst iteration of step 4; set k to 1 and tolerances $opt and # to su5ciently small values. Step 1: Solution of the nonlinear network subproblem. Subproblem NNSk is solved by Murtagh and Saunders’s active set technique [11] specialized for networks [8,9]. The subproblem is considered solved when the algorithm associated with that active set technique stops for an optimality tolerance #k , which is de7ned as #k = #Ck ∗ -(k ); where #Ck = max{$Ck ; $opt }; and
$Ck = min{$k ; ,k V k }
k 1 -( ) = max 1; √ m k
;
k being the multiplier vector associated with the constraints Ax = b at the local minimum obtained for NNSk xk . Step 2: Feasibility of xk with respect to the general constraints. If #Ck ¿ $opt , set $k+1 = .1 $k and ,k+1 = .2 ,k , such that 0 ¡ .i ¡ 1 for i = 1; 2; go to step 3. Otherwise, if V k ∞ ¡ #, go to step 7; otherwise go to step 3. Step 3: Update of the multiplier estimate. The vector is updated following the expression: UINW (xk ; k ; k ) if Tnw1 is true; k+1 = otherwise; UHP where UINW is the update formula of obtained by means of Algorithm 1 and Tnw1 is its validation test (step 6).
E. Mijangos / Computers & Operations Research 31 (2004) 181 – 199
191
Step 4: Update of the penalty coe@cient. If the optimizer of NNSk , xk , did not reduce V k
su5ciently, in the sense that there exists some j such that 1 |Vjk | ¿ |Vjk −1 |; ,C where ,C is a predetermined tolerance, k would be updated as k+1 = /k , where / ¿ 1 and ,C ¿ /. Step 5: Set k = k + 1. Step 6: Return to step 1 with xk −1 as the initial feasible point for the subproblem NNSk . Step 7: Algorithm is stopped. xk satis7es the termination criteria and, hence, it is a good approximate solution of the problem NPSC. In step 1, -(k ) is a function used by MINOS 5.5. It gives a measure of the size of k with
k 1 = mi=1 |ik |. This measure is included to make the optimality tests independent of a scale factor on the objective function. For more details see [5]. In step 3 of this algorithm we reuse a factored BFGS update Hk obtained in the solution of subproblem NNSk to compute a superlinear-order multiplier estimate UINW , see Section 2.3. 3.1. Implementation The implementation in Fortran-77 of the previous algorithm, called PFNRN03, was designed to solve large-scale nonlinear network problems with nonlinear side constraints (i.e. NPSC problems, see (1)–(3)), where the general constraints c(x) = 0 are replaced with the two-sided inequality contraints c 6 c(x) 6 c, C where c and cC are bounds to the vector function c(x). Hence, the problem NPSC is the particular case in which c = cC = 0. This extension of UINW procedure can be found in [23]. Step 1 of this algorithm is carried out by the code PFNL, corresponding to the algorithm described in [16], which is based on the speci7c procedures for nonlinear networks 2ows [8,9] and Murtagh and Saunders’s active set procedure [11] using a spanning tree as the basis matrix of the network constraints. In this code, the user can set up the level of accuracy of the optimizer by means of the parameter $opt , whose default value is 10−6 . Note that {#Ck } → $opt as {$k } → 0, i.e. an asymptotically exact minimization is used, as Bertsekas suggests in [12,24]. For Step 2 the user can also set up a level of feasibility accuracy for the side constraints by giving a value to the parameter #. This value is 10−5 by default. If the side constraints are of the form c 6 c(x) ≤ c, C we consider that the feasibility of xk with respect to them is satis7ed when
V k ∞ = # max{1; c(xk ) }; where V k denotes the violation vector, such that c − ci (xk ) if ci (xk ) ¡ ci i Vik = ci (xk ) − cCi if ci (xk ) ¿ cCi 0 otherwise:
192
E. Mijangos / Computers & Operations Research 31 (2004) 181 – 199
The values of $1 and ,1 can be modi7ed, $1 = 0:01 and ,1 = 0:25 being the default (.1 = 0:5; .2 = 0:25 are 7xed). The computational experience has shown that insisting on accurate minimization of NNSk can be computationally wasteful. A more e5cient scheme results if this minimization is terminated and the multiplier is updated as soon as the stopping criterion is satis7ed, which becomes little by little more stringent after every multiplier iteration so that minimization is asymptotically exact. This is the fundamental reason for selecting these values, together with the results of many numerical trials. In relation to Step 3, the initial multiplier estimate vector 1 is the null vector by default, although it is worth using information on this vector when it is available, given the substantial improvement that it may bring to bear on the convergence of the algorithm, as was found by means of numerical experiments. In Step 4 the default values are 1 = 1:0 and / = 2:0. These values can also be modi7ed by the user. However, 1 must not be so large as to generate ill-conditioning from the 7rst minimization. On the other hand, k must not be increased too slowly, at least in the initial iterations, so as to avoid a poor convergence rate. These default values were selected taking into account the author’s computational experience. Likewise, ,C = 0:9/, as Bertsekas suggests in [12]. Furthermore, by default, the constraints are initially multiplied by scale factors that make the norm of ∇cj (x0 ) equal to 1, for j = 1; : : : ; r. The user may eliminate this scaling. PFNRN03 exploits the sparsity of the Jacobian matrix and of the corresponding reduced Jacobian, which are stored and dealt with in sparse mode. Hk matrix is, by default, a quasi-Newton approximate (with BFGS update) of the reduced Hessian of the augmented Lagrangian function at xk . This method is only used when the number of superbasic variables is no greater than a previously given number s, C which is chosen so that the computational eFort of computing the search direction in this way will be smaller than that of computing it using another e5cient method. In this implementation the alternative method to the BFGS is the Truncated Newton [18] and sC = 500. If s ¿ s, C the 7rst-order multiplier estimator is used, i.e. UHP (see (17)). In this package there are other parameters that can also be modi7ed by means of a speci7cation 7le (see [25]). Code PFNRN03 contains the solver PFNL for nonlinear network 2ow problems, which is used in Step 1 (subproblem solution) of the Algorithm 2. Any other nonlinear network 2ow code could have been employed instead. Moreover, the PFNRN03’s user can solve NPSC problems using alternatively the 7rst-order multiplier estimator UHP (see [16]), the 7rst-order multiplier estimator UKT (solving the Kuhn–Tucker system by least-squares, see [15]), or superlinear-order multiplier estimators UINW . The code PFNRN03 and its user’s guide [25] can be accessed for academic purposes via anonymous ftp at ftp-eio.upc.es in directory: pub/onl/codes/pfnrn. There is also a web site http://picasso.lc.ehu.es/∼eugenio, where one can access these 7les and the author’s publications. (Comments, suggestions, or descriptions of any trouble or abnormal situations experienced when using PFNRN03 reported to
[email protected] will be appreciated.) 4. Computational results In order to have an evaluation of code PFNRN03 with UINW some computational tests are performed, which consists in solving nonlinear network 2ow problems with nonlinear side
E. Mijangos / Computers & Operations Research 31 (2004) 181 – 199
193
constraints with this code and comparing the results with those obtained using UHP and those obtained by MINOS 5.5. All these tests were performed on a Sun Sparc 10/41 work station under UNIX. Two types of problems are considered: real and arti7cial. The real problems solved are of Short-Term Hydrothermal Coordination of Electricity Generation [3,16]. They correspond to problems whose name starts with “P”, and have two-sided inequality nonlinear side constraints. The objective function and the side constraints are strongly nonlinear, with a high computational cost. Problems created from DIMACS random network generators [26] are used as arti
i=1
where ci is the linear cost assigned by the DIMACS network generators to the ith arc. The scalars a1 , a2 and a3 de7ne the diFerent functions of this family. For “e1” we have a1 = a2 = 10−2 and a3 = 10−3 ; for “e2”, a1 = a2 = 10−2 and a3 = 0. Table 1 shows the characteristics of the problems resulting from these models. The names of the test problems are given in the “Problem” column. The “arcs” and “nodes” columns give us respectively the number of arcs and nodes in the network, whilst “sc” and “je” give respectively the number of side constraints of each problem and the number of non null entries in their Jacobian. Finally, the “asc” and “nsb” give respectively the number of side constraints that are active and the number of arcs that are superbasic at the optimum (in relation to the subproblem NNS). According to this table the number of superbasic arcs is always lower than sC (see Section 3.1). Therefore, the BFGS method was used for all the tests performed and Hk was a factored approximation of the reduced Hessian of the augmented Lagrangian function at xk . Table 2 shows the compared results of the MINOS 5.5 package and of PFNRN03 for the updates UHP and UINW . Nevertheless, in spite of the heading UINW , because of the activation and validation tests, in some major iterations UHP was used as a multiplier estimator. For example, when solving problem “D15e2” the activation test is true only in the last major iteration, so UINW is used only in this iteration. On the other hand, in the solution of “P3202” in the 4th major iteration of the algorithm, as we have |C 1 − ˜ 1 | = 4682385:748 and |˜ 1 | = 1:524, the validation test is false, so PFNRN uses UHP to update . If this device is not used, PFNRN does not converge for this problem. Under the heading “MIN” the column “tM ” shows the CPU seconds spent by MINOS to solve each problem. Moreover, under the headings UHP and UINW the columns “Mit”, “mit”, “”, and
194
E. Mijangos / Computers & Operations Research 31 (2004) 181 – 199
Table 1 Test problems Problem
arcs
nodes
sc
asc
je
nsb
D12e2 D13e2 D14e2 D15e2 D16e2 D21e2 D23e2 D31e1 D31e2 D32e1 D32e2 D41e2 D51e1
1524 1524 1524 1524 1524 5420 5420 4008 4008 4008 4008 12008 18000
360 360 360 360 360 1200 1200 501 501 501 501 1501 3000
180 360 36 36 36 120 120 5 5 50 50 15 30
5 10 31 3 20 2 18 1 1 5 5 9 8
183 364 538 5387 96 135 135 20 20 199 199 182 526
75 89 151 107 83 30 238 63 60 65 65 172 58
P3201 P3202 P3301 P4201 P4203 P4403
3204 3204 4005 5187 5187 7581
721 721 901 1548 1548 2262
36 36 45 91 91 133
12 29 12 32 10 10
3112 3112 3895 4443 4443 6501
89 96 86 65 17 20
“t” show respectively the number of major iterations, the number of minor iterations, the value of the penalty parameter in the last major iteration, and the CPU seconds spent by PFNRN03 in the execution for each kind of U estimator. The last columns, “tM =tI ” and “tH =tI ”, contain the ratios of the total CPU time spent using these procedures to solve each problem. These ratios are used as an indicator of the level of e5ciency; for example, tM =tI indicates the extent to which PFNRN with the estimator UINW is faster than MINOS at solving the corresponding test problem (PFNRN solves the problem “P3202” 10.6 times faster than MINOS). The number of minor iterations given for PFNRN does not include the iterations spent to obtain an initial feasible point, x0 , with respect to the set F (i.e., network constraints). Superscript a is used where MINOS aborted (often because it was unable to 7nd a feasible point for its set of linear constraints, either in phase 1 or after the linearization of the nonlinear constraints). Superscript b indicates that MINOS achieves a slightly worse objective function value, f∗ , than PFNRN. For example, in problem “P4201” PFNRN with UINW obtains f∗ = −185906:38 and with UHP f∗ = −185907:07, whereas MINOS obtains f∗ = −185868:16. Superscript c indicates that MINOS achieves a slightly best objective function value than PFNRN, for an optimality tolerance 10−6 (default value). For example, in problem “P3201” PFNRN with UINW obtains f∗ = −208126:33 and with UHP f∗ = −208176:81, whereas MINOS obtains f∗ = −208460:23. However, if we set the optimality tolerance to 10−8 for both codes, when solving the same problem we obtain with UINW f∗ = −208486:38 (t = 271:47) and with UHP f∗ = −208487:96 (t = 275:85), whereas MINOS obtains f∗ = −208461:05 (t = 373:41).
E. Mijangos / Computers & Operations Research 31 (2004) 181 – 199
195
Table 2 Computational results
Problem D12e2 D13e2 D14e2 D15e2 D16e2 D21e2 D23e2 D31e1 D31e2 D32e1 D32e2 D41e2 D51e1 P3201 P3202 P3301 P4201 P4203 P4403
MIN
UHP
tM
Mit
mit
37 36 43 7 1383 38 35 13 12 36 36 35 8
891 1231 10965 777 2405 1804 2534 1114 5425 7719 5512 1566 1495
2 × 107 2 × 107 2 × 107 2.0 64.0 3 × 107 1×107 2 × 105 6×104 3 × 104 3 × 104 8 × 103 8.0
8 7 23 9 9 7
21149 9714 11661 18296 4268 6282
2.4 2.8 a
42.2 18.0 12.8 939.1 37.5 23.1 a a
564.3 2900.1 441:0b 880:9b 614:2b 202:8c 109:0c 198:8c
UINW tH
32.0 4.0 0.8 2.0 8.0 64.0
1.1 2.3 34.8 0.60 7.0 1.8 16.7 2.1 6.0 15.8 9.6 18.4 8.3 155.9 74.7 106.8 124.5 26.1 67.6
Mit
mit
tI
tM =tI 2.4 1.5
tH =tI
18 18 157 7 12 19 18 11 10 15 18 12 7
867 1002 2684 778 1010 1701 2142 1022 424 4067 4105 1257 1497
16.0 32.0 32.0 2.0 32.0 32.0 32.0 3.0 9.0 4.0 8.0 256.0 8.0
1.0 1.9 14.4 0.9 1.0 2.2 12.2 2.0 1.3 7.4 6.4 15.7 8.6
35.9 337.2
1.1 1.2 2.4 0.7 7.0 0.8 1.4 1.0 4.6 2.1 1.5 1.2 1.0
8 7 7 9 10 8
21685 10203 10989 18419 3653 6094
32.0 4.0 0.8 2.0 4.0 64.0
168.1 82.8 99.1 128.1 25.8 67.7
2.6 10.6 6.2 1.6 4.2 2.9
0.9 0.9 1.1 1.0 1.0 1.0
a
46.9 18.0 5.8 77.0 18.7 17.8 a a
In Table 2 it should be noted that of the 16 test problems in which MINOS converges, in 14 the e5ciency of PFNRN with respect to that of MINOS is higher than 2 and in 8 it is higher than 7. Note that for four problems the e5ciency of update UINW in comparison with UHP is higher than 2, and in the rest of the problems both estimators have a very similar e5ciency. Hence, in spite of the theoretical arguments, the improvement in the results when using this new algorithm is not very signi7cant. There are some reasons that help to explain this fact: • Newton’s method is not globally convergent, unless the starting point be close enough to the optimum. We must take into account that although we use a positive de7nite matrix to approximate the reduced Hessian of the augmented Lagrangian function, Algorithm 1 does not carry out a line search in Step 5 to compute a suitable stepsize that ensures the global convergence. Therefore, our method only works reasonably well when the current iteration is close enough to the optimum. • Furthermore, in order to make sure that the projected Jacobian has full rank, the algorithm employs the activation test, which allows one to use the superlinear-order update only if it is true. In our numerical experiments this has happened mostly when the execution was very advanced (i.e. when the current iteration was very close to the optimal solution), which reduced the theoretical eFect
196
E. Mijangos / Computers & Operations Research 31 (2004) 181 – 199
of this update. For example at: “D21e2” after 1618 minor iterations and 7 major iterations, “D31e1” after 867 minor iterations and 2 major iterations, “D41e2” after 1132 minor iterations and 5 major iterations, “D51e1” after 1489 minor iterations and 6 major iterations, “P4403” after 4255 minor iterations and 5 major iterations. • The objective function and the side constraints of the “P” problems are strongly nonlinear, which makes the superlinear-order update works well only when the current iteration is very close to the optimum. For example, in problem “P3202” the superlinear-order update starts to work at the 3rd major iteration, but it does not give a very good estimation of the multipliers (though the validation test is true), which leads to the use of the default multiplier update, UHP , again in the following major iteration. • The cost of computing the superlinear-order multiplier update is considerably higher than that of the default update. For example, in problem “D13e2” 0.5 of the 1.9 cpu-seconds used to solve it are employed to compute the superlinear-order update, whereas that of the default update is negligible. Finally, the result of this comparison points out that the estimator given by Algorithm 1 (which combines the superlinear-order iteration with the 7rst-order iteration) is more robust, as it does not need to increase so much the penalty parameter to converge. UINW , in comparison with UHP , reduces the number of minor iterations in 14 of the 19 tests. Consequently, this method helps to improve the convergence of Algorithm 2. See the results for “D14e2” or “D31e2” in Table 2.
5. Conclusions In this paper the author has put forward techniques of numerical linear algebra for implementing superlinear-order multiplier estimations using the least computational eFort. The activation test is aimed at avoiding wasting this eFort when the current point is not su5ciently close to the optimum, and the validation test is useful to give some indication of the accuracy of the superlinear-order estimate. Moreover, the factored BFGS method makes it possible to reduce this eFort and ensures the positive de7niteness of the reduced Hessian of the augmented Lagrangian for any penalty parameter. An algorithm is designed that allows us to solve nonlinear problems with network constraints and nonlinear constraints. This algorithm uses variable reduction techniques together with a combination of superlinear- and 7rst-order methods to estimate the multipliers of the side constraints. This procedure exploits the existing e5cient techniques for solving nonlinear network problems. An implementation in Fortran-77 of the designed algorithm is put forward. As a result the code PFNRN03 is obtained. According to the preliminary computational results, PFNRN03 was more e5cient than MINOS 5.5 in solving these nonlinear network 2ow problems with nonlinear side constraints, and slightly improved the results obtained with the 7rst-order multiplier estimator. Another important practical result caused by performing the multiplier estimation by means of superlinear-order updates is the reduction of the di5culties associated with ill-conditioning, as they become unnecessary to increase the penalty parameter to the same extent as when the 7rst-order update is used, and this makes the
E. Mijangos / Computers & Operations Research 31 (2004) 181 – 199
197
code with the combined estimate of UINW and UHP more robust than only with UHP . Therefore, it seems that it is worth using the superlinear-order estimator when solving nonlinear constrained network 2ow problems. The developed code PFNRN03 for the solution of problem NPSC is publicly available (see Section 3.1). Acknowledgements This research was partially supported by CICYT grant TAP1999-1075-C0201. Appendix A. Practical example of convergence throughout the successive major iterations The solution of a very simple example is presented below. Let us consider a 4-node and 5-arc network as shown in Fig. 1. Eighteen units must be transported from node N1 to node N4, through the arcs a1 to a5. Let xi be the 2ow on each arc, for i = 1; : : : ; 5. We will assume that the capacity of all the arcs is 10 units (xi 6 10, for i = 1; : : : ; 5). Two diFerent kinds of constraint functions will be considered: linear and nonlinear side constraints. The 2ow balance at each node are the linear constraints, and the nonlinear side constraints are x12 + 2x32 = 88; 304 6 3x12 + 4x22 6 500: The nonlinear objective function is the quadratic function: f(x1 ; x2 ; x3 ; x4 ; x5 ) =
5
xi2 :
i=1
The optimum is
x∗ 1
= 4;
x∗ 2
= 8; x3∗ = 6; x4∗ = 4; x5∗ = 6. N2 a4
a1
18
18
a2 N1
N4
a3
a5 N3
Fig. 1. Network example problem.
198
E. Mijangos / Computers & Operations Research 31 (2004) 181 – 199
Table 3 Multiplier estimates of active side constraints at the end of each major iteration for the example problem Mit
mit
nsb
#Ck
elmk
1HP
1INW
2HP
2INW
1 2 3 4
10 20 25 28
2 2 2 2
10−2 5 × 10−3 3:1 × 10−4 4:1 × 10−6
0.38 1:5 × 10−2 5:9 × 10−4 2:2 × 10−4
−0:67536 −0:71288 −0:71425 −0:71429
−0:71241 −0:71427 −0:71429
−0:12932 −0:14255 −0:14285 −0:14286
−0:14383 −0:14383 −0:14383
5∗
30
2
10−6
−0:71429
−0:14383
Table 3 shows the two diFerent multiplier estimates for each of the components i = 1; 2 of obtained in each major estimation (Mit = 1; : : : ; 4) of the solution of the problem “example”. 5∗ points out that the optimum x∗ has been obtained in this major iteration. Column “mit” shows the number of minor iterations accumulated up to the corresponding major iteration. Column “nsb” contains the number of superbasic variables in the optimizer of the subproblems NNS. Columns #Ck and elmk show respectively the values of #Ck and of the following ratio elmk = k − k −1 =(1 + k ); for each kth major iteration in the activation test for #x = 0:01 and # = 0:1. The multiplier estimate employed (UINW or UHP ) in each major iteration appears in boldface. As can be noted at the 7rst major iteration the logical variable T in the activation test is false as elmk = 0:38 ¿ # = 0:1, i.e. x1 is not su5ciently close to x∗ = x5 to use the superlinear-order multiplier estimator. If we set # = 1:0, PFNRN needs 11 major iterations to obtain the optimum. References [1] BrXannlud H, Sjelvgren D, Bubenko JA. Short term generation scheduling with security constraints. IEEE Transactions on Power Systems 1988;3(1):310–6. [2] Wang SJ, Shahidehpour SM, Kirschen DS, Mokhtari S, Irisarrri GD. Short-term generation scheduling with transmission and environment constraints using an augmented Lagrangian relaxation. IEEE/PES Summer Meeting, San Francisco, CA, USA, 94 SM 572-8 PWRS, 1994. [3] Heredia FJ, Nabona N. Optimum short-term hydrothermal scheduling with spinning reserve through networks 2ows. IEEE Transactions on Power Systems 1995;10(3):1642–51. [4] Gill PE, Murray W, Wright MH. Practical optimization. New York: Academic Press, 1981. [5] Murtagh BA, Saunders MA. MINOS 5.5. User’s guide. Report SOL 83-20R, Department of Operations Research, Stanford University, Stanford, CA, USA, Revised July 1998. [6] Kennington JL, Helgason RV. Algorithms for network programming. New York: Wiley, 1980. [7] Heredia FJ, Nabona N. Large scale nonlinear network optimization with linear side constraints. EURO XI, European Congress on Operational Research, Aachen, Germany, July 1991. [8] Dembo RS. A primal truncated newton algorithm with application to large-scale nonlinear network optimization. Mathematical Programming Studies 1987;31:43–71. [9] Toint PhL, Tuyttens D. On large scale nonlinear network optimization. Mathematical Programming 1990;48:125–59. [10] Bradley GH, Brown GG, Graves GW. Design and implementation of large scale primal transshipment algorithms. Management Science 1977;24(1):1–34.
E. Mijangos / Computers & Operations Research 31 (2004) 181 – 199
199
[11] Murtagh BA, Saunders MA. Large-scale linearly constrained optimization. Mathematical Programming 1978;14: 41–72. [12] Bertsekas DP. Constrained optimization and Lagrange multiplier methods. New York: Academic Press, 1982. [13] Hestenes MR. Multiplier and gradient methods. Journal of Optimization Theory and Applications 1969;4:303–20. [14] Powell MJD. A method for nonlinear constraints in minimization problems. In: Fletcher R, editor. Optimization. New York: Academic Press, 1969. p. 283–98. [15] Mijangos E, Nabona N. On the 7rst-order estimation of multipliers from Kuhn–Tucker systems. Computers and Operations Research 2001;28(3):243–70. [16] Mijangos E, Nabona N. The application of the multipliers method in nonlinear network 2ows with side constraints. Technical Report 96/10, Department of Statistics and Operations Research, Universitat PolitZecnica de Catalunya, Barcelona, Spain, 1996. [17] Tapia RA. Diagonalized multiplier methods and quasi-Newton methods for constrained optimization. Journal of Optimization Theory and Applications 1977;22(2):135–94. [18] Dembo RS, Eisenstat SC, Steihaug T. Inexact Newton methods. SIAM Journal on Numerical Analysis 1982;19: 400–8. [19] Nocedal J. Updating quasi-Newton matrices with limited storage. Mathematics of Computation 1980;35:773–82. [20] Griewank A, Toint PhL. Partitioned variable metric updates for large structured optimization problems. Numerische Mathematik 1982;39:119–37. [21] Fletcher R. An ideal penalty function for constrained optimization. In: Mangasarian OL, Meyer RR, Robinson SM, editors. Nonlinear programming 2. London: Academic Press, 1975. [22] Gill PE, Murray W. The computation of Lagrange-multiplier estimates for constrained minimization. Mathematical Programming 1979;17:32–60. [23] Mijangos E. On superlinear multiplier update methods for partial augmented Lagrangian techniques. Technical Report 00/01, Department of Applied Mathematics and Statistics and Operations Research, Euskal Herriko Unibertsitatea, P.O. Box 644, 48080 Bilbao, Spain, 2000. [24] Bertsekas DP. Network optimization: continuous and discrete models. Belmont, MA: Athena Scienti7c, 1998. [25] Mijangos E. PFNRN03 user’s guide. Technical Report 97/05, Department of Statistics and Operations Research, Universitat PolitZecnica de Catalunya, Barcelona, Spain, 1997. [26] DIMACS. The 7rst DIMACS international algorithm implementation challenge: the bench-mark experiments. Technical Report, DIMACS, New Brunswick, NJ, USA, 1991. [27] Heredia FJ. Optimitzaci[o de Fluxos No Lineals en Xarxes amb Constriccions a Banda. Aplicaci[o a Models Acoblats de Coordinaci[o Hidro-TZermica a Curt Termini. PhD thesis, Statistics and Operations Research Department, Universitat PolitZecnica de Catalunya, Barcelona, Spain, 1995.
Eugenio Mijangos is an Associate Professor of Mathematics in the Department of Applied Mathematics and Statistics and Operations Research at Euskal Herriko Unibertsitatea (UPV/EHU), Leioa, Bizkaia, Spain. He received his degree in Science form UPV/EHU and the PhD in Science form Universitat PolitZecnica de Catalunya, Barcelona, Spain. His research interests are in the 7eld of continuous optimization.