European Journal of Operational Research 159 (2004) 35–51 www.elsevier.com/locate/dsw
Continuous Optimization
A modified augmented Lagrangian method for a class of monotone variational inequalities Bing-sheng He a
a,*,1
, Hai Yang b, Chen-song Zhang
c
Graduate School of Management Science and Engineering, Department of Mathematics, Nanjing University, Nanjing 210093, PR China b Department of Civil Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, PR China c Department of Mathematics, Nanjing University, Nanjing 210093, PR China Received 13 March 2002; accepted 24 February 2003 Available online 24 November 2003
Abstract The augmented Lagrangian method is an attractive approach for solving linearly constrained convex optimization problems (CCOP). The method is iterative and the task in each iteration is to solve a (convex optimization) subproblem whose dimension is equal to the number of variables. For large problems, solving such a sub-problem can be still very expensive. In this paper, we propose a modified augmented Lagrangian method for a class of monotone variational inequalities which include CCOP. The computational load in each iteration of the proposed method is quite small. In order to demonstrate the ability and efficiency of the proposed method, we present some numerical results for convex nonlinear programming and traffic network equilibrium problems. 2003 Elsevier B.V. All rights reserved. Keywords: Nonlinear programming; Augmented Lagrangian method; Variational inequality
1. Introduction Let S be a nonempty closed convex subset of Rn and f be a continuous mapping from Rn into itself. Consider the variational inequality problem VIðS; f Þ: Find x 2 S, such that ðx x ÞT f ðx Þ P 0;
8x 2 S:
ð1:1Þ
In many practical problems, the set S usually has the following construction: S ¼ fx 2 Rn jAx ¼ b; x 2 Xg;
*
ð1:2Þ
Corresponding author. E-mail address:
[email protected] (B.-s. He). 1 The author was supported by the NSFC grant 10271054, MOEC grant 20020284027 and Jiangsu NSF grant BK2002075. 0377-2217/$ - see front matter 2003 Elsevier B.V. All rights reserved. doi:10.1016/S0377-2217(03)00385-0
36
B.-s. He et al. / European Journal of Operational Research 159 (2004) 35–51
where A 2 Rmn , b 2 Rm , X is a closed convex subset of Rn . For example, X is the non-negative orthant fx 2 Rn jx P 0g, or the ÔboxÕ constraints fx 2 Rn jl 6 x 6 hg. If f ðxÞ is the gradient of a differentiable function h : Rn ! R, then problem (1.1) is equivalent to the minimization problem minfhðxÞjx 2 Sg: When f is differentiable, a necessary and sufficient condition for f to satisfy the above condition is that the Jacobian matrix rf ðxÞ is symmetric. Unfortunately, this symmetric condition does not hold in many practical equilibrium problems. In this paper, we consider the problems in which the symmetric condition is not necessary. Since it is easy to see that VIðS; f Þ is equivalent to VIðS; sf Þ when s > 0; by attaching a Lagrange multiplier vector y 2 Rm to the linear constraint Ax ¼ b, the problem under consideration can be explained as a variational inequality in Rmþn of the following form: Find u 2 X, such that ðu u ÞT Fs ðu Þ P 0;
8u 2 X;
ð1:3Þ
where u¼
x ; y
Fs ðuÞ ¼
sf ðxÞ AT y ; Ax b
X ¼ X Rm :
ð1:4Þ
We denote problem (1.3) and (1.4) by VIðX; Fs Þ. Note that Fs ðuÞ is monotone whenever f ðxÞ is monotone. When the problem is a large-scale one, it is often solved by some alternating direction methods. An alternating direction method, which was proposed originally by Uzawa [1], is used frequently in literature [7,9,11,18]. At each iteration of the classical alternating direction method, the new iterate u~ ¼ ð~x; y~Þ 2 X Rm is generated from a given u ¼ ðx; yÞ 2 X Rm by the following procedure: First, ~x is obtained (with y held fixed) by solving the sub-variational inequality ~x 2 X;
ðx0 ~xÞT ðsf ð~xÞ AT ½y bðA~x bÞÞ P 0;
8x0 2 X;
ð1:5Þ
and then the multiplier is updated by y~ ¼ y bðA~x bÞ;
ð1:6Þ
where b > 0 is a given constant. This method is referred to as an augmented Lagrangian method in the literature [6,7]. However, for large-scale problems, solving sub-problem (1.5) is still expensive. Our objective in this paper is to develop a modified augmented Lagrangian method in which the computational amount in each iteration is very small. For given uk ¼ ðxk ; y k Þ 2 X, we get an auxiliary point u~k ¼ ð~xk ; y~k Þ by ~xk ¼ PX fxk bk ðsf ðxk Þ AT ½y k bk ðAxk bÞÞg
ð1:7Þ
and then y~k ¼ y k bk ðA~xk bÞ:
ð1:8Þ
Here bk is the penalty parameter for restriction Ax ¼ b and s can be viewed as a scaling parameter. The new iterate ukþ1 is updated by uk Þ: ukþ1 ¼ PX ½uk ak bk Fs ð~
ð1:9Þ
The convergence of the proposed augmented Lagrangian method will be guaranteed by choosing suitable sequences fbk g and fak g. The paper is organized as follows. In Section 2, we summarize some preliminaries for the considered problem. In Section 3, we state our modified augmented Lagrangian algorithm and prove the main theorems in this paper. In Section 4, we establish the convergence of the proposed method. In Section 5, we
B.-s. He et al. / European Journal of Operational Research 159 (2004) 35–51
37
give implementation details of the proposed method and use some examples to test its efficiency. We apply our method to traffic network equilibrium problems in Section 6. Finally, we give some conclusion remarks. A few words about our notation are in order. All vectors are column vectors with superscript T denoting transpose. The notation Rn is used for real n-dimensional space. Superscripts such as in uk refer to specific vectors and usually are iteration indices. The Euclidean norm will be denoted by k k.
2. Preliminaries In this section, we summarize some basic properties and related definitions which will be used in the following sections. We use the concept of projection under the Euclidean norm, which will be denoted by PX ðÞ, i.e., PX ðvÞ ¼ arg minfkv ukju 2 Xg: A basic property of the projection mapping is T
ðv PX ðvÞÞ ðPX ðvÞ uÞ P 0;
8v 2 Rn ; 8u 2 X:
ð2:1Þ
It follows from (2.1) that 2
2
2
kPX ðvÞ uk 6 kv uk kv PX ðvÞk ;
8v 2 Rn ; 8u 2 X
ð2:2Þ
and kPX ðvÞ PX ðwÞk 6 kv wk;
8v; w 2 Rn :
ð2:3Þ
Lemma 1. Let b > 0, then u solves VIðX; Fs Þ if and only if u ¼ PX ½u bFs ðu Þ: Proof. See Eaves [5] or Bertsekas and Tsitsiklis [2, p. 267].
Hence, solving VIðX; Fs Þ is equivalent to finding a zero point of the residue function eðu; bÞ :¼ u PX ½u bFs ðuÞ:
ð2:4Þ
Usually, instead of eðu; 1Þ, we use eðuÞ to denote u PX ½u Fs ðuÞ. In the papers [13–15] of solving VIðX; F Þ, keðuÞk is often regarded as some measure of the discrepancy between the solution and the current iterate point. Lemma 2. For all u 2 Rn , keðu; bÞk is non-decreasing with respect to b > 0. Moreover, for b~ P b > 0, it holds that ~ keðu; bÞk keðu; bÞk P : b b~
ð2:5Þ
Proof. See Gafni and Bertsekas [10 (Lemma 1)], Calamai and More [3 (Lemma 2.2)] or Peng and Fukushima [20 (Lemma 3.3)].
38
B.-s. He et al. / European Journal of Operational Research 159 (2004) 35–51
For the problem (1.3) and (1.4) in this paper, we have x PX ½x bðsf ðxÞ AT yÞ eðu; bÞ ¼ : bðAx bÞ
ð2:6Þ
Now we make the following standard assumptions: Assumption A.1. X is a nonempty and closed convex set. f ðxÞ is monotone with respect to X and Lipschitz continuous. Assumption A.2. The solution set of VIðX; F Þ, denoted by X , is nonempty. Because f is monotone and X is closed convex, the solution set X of VIðX; F Þ is closed and convex. It is trivial to see the solution set of VIðX; Fs Þ is X too. For any u 2 X, we let distðu; X Þ :¼ minfku u kju 2 X g denote the Euclidean distance from u to X . It is clear that distðu; X Þ ¼ 0 () eðuÞ ¼ 0: 3. The modified method and its main properties In the original augmented Lagrangian method, the main task of each iteration is to solve sub-variational inequality (1.5). For given ðxk ; y k Þ 2 X, according to the equivalence of variational inequality problem and the related projection equation, solving (1.5) is equivalent to finding ~xk which satisfies the following equation: ~xk ¼ PX f~xk bk ðsf ð~xk Þ AT ½y k bk ðA~xk bÞÞg: k
ð3:1Þ
k
It is not trivial to obtain such an ~x because ~x occurs on the both sides of Eq. (3.1). In order to simplify this process, a natural idea is substituting ~xk in the right-hand-side of (3.1) by xk , i.e., we get ~xk with ~xk ¼ PX fxk bk ðsf ðxk Þ AT ½y k bk ðAxk bÞÞg: Notice that this is just the formula (1.7) and therefore the proposed method is called modified augmented Lagrangian method. We now consider the following framework of the modified augmented Lagrangian method. A variant of this algorithm that can be implemented will be described in Section 5. Algorithm 3.1. [A framework of the modified augmented Lagrangian method] Given e > 0, s > 0 and m 2 ð0; 1Þ, starting with an arbitrary u0 ¼ ðx0 ; y 0 Þ 2 X, for k P 0 do: Step (Prediction) 1. Find bk > 0 such that ~xk ¼ PX fxk bk ðsf ðxk Þ AT ½y k bk ðAxk bÞÞg;
ð3:2Þ
y~k ¼ y k bk ðA~xk bÞ;
ð3:3Þ
which satisfies bk ksðf ðxk Þ f ð~xk ÞÞ þ bk AT Aðxk ~xk Þk 6 mkuk u~k k: Step (Correction) 2. Update u
kþ1
ukþ1 ¼ PX fuk ak bk Fs ð~ uk Þg;
¼ ðx
kþ1
;y
kþ1
ð3:4Þ
Þ via ð3:5Þ
B.-s. He et al. / European Journal of Operational Research 159 (2004) 35–51
39
where T
ak ¼ c
ðuk u~k Þ dðuk ; u~k ; bk Þ kdðuk ; u~k ; bk Þk2
;
c 2 ð0; 2Þ;
ð3:6Þ
uk Þ dðuk ; u~k ; bk Þ :¼ uk u~k bk ½qðuk ; u~k ; bk Þ Fs ð~
ð3:7Þ
and qðuk ; u~k ; bk Þ ¼
sf ðxk Þ AT ½y k bk ðAxk bÞ : A~xk
ð3:8Þ
Step (Stopping criterion) 3. If kuk u~k k 6 e, Stop; otherwise, k :¼ k þ 1 and go to Step 1. Remark 1. From the method, one can easily find that the computational amounts in Step 1 and Step 2 are very small. Since f is Lipschitz continuous, it is not hard to find a b satisfying condition (3.4). In fact, if L is the Lipshitz constant, then for all # pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi s2 L2 þ 4mkAT Ak sL b 2 0; ; ð3:9Þ 2kAT Ak we have bsL þ b2 kAT Ak 6 m. Because bk ksðf ðxk Þ f ð~xk ÞÞ þ bk AT Aðxk ~xk Þk 6 ðbk sL þ b2k kAT AkÞkxk ~xk k and kxk ~xk k 6 kuk u~k k, Condition (3.4) is satisfied if (3.9) holds. The detailed technique for finding such b will be illustrated in Section 5. It is clear that the method is also applicable for the case A ¼ 0 and b ¼ 0. Lemma 3. If u~k ¼ uk , then ðxk ; y k Þ is a solution of VIðX; Fs Þ. Proof. From (3.2) and (3.3), it follows that xk ¼ PX fxk bk ½sf ðxk Þ AT y k g; Axk ¼ b: This means that ðxk ; y k Þ is a solution of VIðX; Fs Þ of form (1.3) and (1.4).
Lemma 3 indicates that the proposed method will terminate if (kxk ~xk k2 þ ky k y~k k2 Þ ¼ 0. Recalling that solving problem VIðX; Fs Þ is equivalent to finding a zero point of eðu; bÞ, the following lemma implies that in order to show the convergence of the proposed method, we only need to verify 2
2
lim ðkxk ~xk k þ ky k y~k k Þ ¼ 0:
k!1
Lemma 4. If inffbk g ¼ bmin > 0, then there is a constant l > 0 such that lkeð~ uk Þk2 6 kuk u~k k2 ; Proof. Note that eð~ uk ; bk Þ ¼
8k P 0:
! ~xk PX f~xk bk ½sf ð~xk Þ AT y~k g : bk ðA~xk bÞ
ð3:10Þ
ð3:11Þ
40
B.-s. He et al. / European Journal of Operational Research 159 (2004) 35–51
Replacing the first ~xk in (3.11) by PX fxk bk ðsf ðxk Þ AT ½y k bk ðAxk bÞÞg kPX ðzÞ PX ðz0 Þk 6 kz z0 k, we get ðxk ~xk Þ b sðf ðxk Þ f ð~xk ÞÞ b2 AT Aðxk ~xk Þ k k k keð~ u ; bk Þk 6 bk ðA~xk bÞ ! ! xk ~xk bk sðf ðxk Þ f ð~xk ÞÞ þ b2k AT Aðxk ~xk Þ ¼ k : y y~k 0
and
by
It follows from the monotonicity of f and (3.4) that pffiffiffi keð~ uk ; bk Þk 6 2kuk u~k k:
using
ð3:12Þ
Notice that keðu; bÞk is nondecreasing with respect to b. According to Lemma 2 (see (2.5)) we have uk Þk: keð~ uk ; bk Þk P minf1; bk gkeð~
ð3:13Þ
Since inffbk g ¼ bmin > 0, the assertion of this lemma follows from (3.12) and (3.13) directly.
4. The convergence analysis For given uk 2 X Rm , let u~k be the point generated by (3.2) and (3.3). It follows from (3.2), (3.3) and (3.8) that u~k ¼ PX ½uk bk qðuk ; u~k ; bk Þ:
ð4:1Þ
For the convenience of the sequel analysis, we denote uk , u~k and bk by u, u~ and b, respectively. Note that sðf ðxÞ f ð~xÞÞ þ bAT Aðx ~xÞ uÞ ¼ qðu; u~; bÞ Fs ð~ : ð4:2Þ 0 It follows from (3.7) and (3.4) that uÞÞ ðu u~ÞT dðu; u~; bÞ ¼ ku u~k2 bðu u~ÞT ðqðu; u~; bÞ Fs ð~ 2
T
2
2
¼ ku u~k fbðx ~xÞ ðf ðxÞ f ð~xÞÞ þ kbAðx ~xÞk g P ð1 mÞku u~k :
ð4:3Þ
The new iterate of the modified augmented Lagrangian method can be written as uðaÞ :¼ PX ½u abFs ð~ uÞ:
ð4:4Þ 2
2
Now let us observe the difference between ku u k and kuðaÞ u k . Theorem 1. For given u 2 X, let uðaÞ be the new iterate generated by the proposed method. Then we have 2
T
HðaÞ P 2aðu u~Þ dðu; u~; bÞ a2 kdðu; u~; bÞk ;
ð4:5Þ
where 2
HðaÞ :¼ ku u k kuðaÞ u k
2
ð4:6Þ
and u is any solution point of VIðX; Fs Þ. Proof. First, it follows from (2.2) and (4.4) that 2
2
2
kuðaÞ u k 6 ku abFs ð~ uÞ u k ku abFs ð~ uÞ uðaÞk :
B.-s. He et al. / European Journal of Operational Research 159 (2004) 35–51
41
Substituting this into (4.6) we obtain 2
T
T
HðaÞ P ku uðaÞk þ 2abðu u Þ Fs ð~ uÞ þ 2abðuðaÞ uÞ Fs ð~ uÞ:
ð4:7Þ
Now we observe the second term in the right-hand-side of the above inequality. It follows from the monotonicity of Fs that T
T
uÞ P ð~ u u Þ Fs ðu Þ P 0: ð~ u u Þ Fs ð~
ð4:8Þ
The second inequality of (4.8) follows from (1.3) and u~ 2 X. Consequently from (4.8) we have T
T
ðu u Þ Fs ð~ uÞ P ðu u~Þ Fs ð~ uÞ:
ð4:9Þ
Substituting (4.9) into (4.7) we get 2
T
T
HðaÞ P ku uðaÞk þ 2abðu u~Þ Fs ð~ uÞ þ 2abðuðaÞ uÞ Fs ð~ uÞ 2
T
u uðaÞÞ Fs ð~ uÞ: ¼ ku uðaÞk 2abð~
ð4:10Þ
From (4.10) we obtain HðaÞ P kðu uðaÞÞ adðu; u~; bÞk2 þ 2aðu uðaÞÞT dðu; u~; bÞ a2 kdðu; u~; bÞk2 2abð~ u uðaÞÞT Fs ð~ uÞ 2
T
P kðu uðaÞÞ adðu; u~; bÞk þ 2aðu u~Þ dðu; u~; bÞ a2 kdðu; u~; bÞk
2
T
þ 2að~ u uðaÞÞ fdðu; u~; bÞ bFs ð~ uÞg:
ð4:11Þ
Now we consider the last term in the right-hand-side of (4.11). Notice that dðu; u~; bÞ bFs ð~ uÞ ¼ u bqðu; u~; bÞ u~:
ð4:12Þ
Since u~ ¼ PX ½u bqðu; u~; bÞ, by using v :¼ u bqðu; u~; bÞ and u :¼ uðaÞ 2 X in the basic inequality of projection mapping (2.1), we get T
ð~ u uðaÞÞ ðu bqðu; u~; bÞ u~Þ P 0:
ð4:13Þ
Therefore, from (4.11) we get T
HðaÞ P 2aðu u~Þ dðu; u~; bÞ a2 kdðu; u~; bÞk and the theorem is proved.
2
Theorem 1 is the foundation for choosing a suitable a in (4.4). Notice that the right-hand-side of (4.5) is a quadratic function of a and it reaches its maximum at a ¼
ðu u~ÞT dðu; u~; bÞ 2
kdðu; u~; bÞk
ð4:14Þ
:
Thus from (4.5) and (4.14) we have T
Hða Þ P a ðu u~Þ dðu; u~; bÞ:
ð4:15Þ
Let c 2 ð0; 2Þ be a relaxation factor and a ¼ ca . Using (4.5), by a simple manipulation we can get T
Hðca Þ P cð2 cÞa ðu u~Þ dðu; u~; bÞ:
ð4:16Þ
Notice that it follows from (4.3) that T
2
a ðu u~Þ dðu; u~; bÞ > ð1 mÞa ku u~k :
42
B.-s. He et al. / European Journal of Operational Research 159 (2004) 35–51
This and (4.16) imply 2
Hðca Þ P cð2 cÞð1 mÞa ku u~k ;
c 2 ð0; 2Þ:
ð4:17Þ
It follows from (3.4) and (4.2) that kbðqðu; u~; bÞ Fs ð~ uÞÞk 6 mku u~k;
m 2 ð0; 1Þ;
and consequently, T
2
T
T
2
T
2
ðu u~Þ dðu; u~; bÞ ¼ ku u~k bðu u~Þ ðqðu; u~; bÞ Fs ð~ uÞÞ P ð1 mÞku u~k :
ð4:18Þ
Moreover, ðu u~Þ dðu; u~; bÞ ¼ ku u~k bðu u~Þ ðqðu; u~; bÞ Fs ð~ uÞÞ 1 1 2 T 2 > ku u~k bðu u~Þ ðqðu; u~; bÞ Fs ð~ uÞÞ þ kbðqðu; u~; bÞ Fs ð~ uÞÞk 2 2 1 2 ¼ kdðu; u~; bÞk 2
ð4:19Þ
and it follows from (4.14) that 1 a > : 2 Combining this with (4.17), we get Hðca Þ >
cð2 cÞð1 mÞ 2 ku u~k ; 2
c 2 ð0; 2Þ:
ð4:20Þ
We summarize the analytical results of this section in the following theorem. Theorem 2. For given u ¼ ðx; yÞ 2 X, let b be chosen such that ~x ¼ PX fx bðsf ðxÞ AT ½y bðAx bÞÞg; y~ ¼ y bðA~x bÞ; satisfying kbsðf ðxÞ f ð~xÞÞ þ b2 AT Aðx ~xÞk 6 mku u~k; Let
dðu; u~; bÞ :¼
x ~x y y~
sðf ðxÞ f ð~xÞÞ þ bAT Aðx ~xÞ b 0
and T
a ¼
ðu u~Þ dðu; u~; bÞ kdðu; u~; bÞk
2
:
Then the modified augmented Lagrangian method uÞg uðaÞ ¼ PX fu abFs ð~ with a ¼ ca ;
c 2 ð0; 2Þ;
m 2 ð0; 1Þ:
B.-s. He et al. / European Journal of Operational Research 159 (2004) 35–51
produces a new iterate which satisfies cð2 cÞð1 mÞ 2 2 2 ku u~k ; kuðaÞ u k 6 ku u k 2
8u 2 X :
43
ð4:21Þ
Remark 2. Since cð2 cÞð1 mÞ is a positive constant, the last inequality shows that when ku u~k, which can also be viewed as some measure of error, is not very small, and the profit we make in each iteration step of the proposed method would not be very small too. Consequently, we have the following global convergence theorem. Theorem 3. The sequence fuk g generated by Algorithm 3.1 converges to some u1 which is a solution point of VIðX; Fs Þ. Proof. From Theorem 2, we have kukþ1 u k2 6 kuk u k2
cð2 cÞð1 mÞ k ku u~k k2 ; 2
8u 2 X ;
ð4:22Þ
and consequently lim kuk u~k k ¼ 0:
ð4:23Þ
k!1
Therefore, both fuk g and f~ uk g are bounded sequences and have cluster points. Moreover, it follows from Lemma 4 and (4.23) that lim eð~ uk Þ ¼ 0:
ð4:24Þ
k!1
Let u1 be a cluster point of f~ uk g and the subsequence f~ ukj g converges to u1 . Since eðuÞ is a continuous function of u, it follows from (4.24) that ukj Þ ¼ 0: eðu1 Þ ¼ lim eð~ j!1
According to Lemma 1, u1 is a solution point of VIðX; Fs Þ. Note that inequality (4.22) is true for any solution point of VIðX; Fs Þ, hence we have 2
2
kukþ1 u1 k 6 kuk u1 k ;
8k P 0:
ð4:25Þ
Since f~ ukj g ! u1 and kuk u~k k ! 0, for any given e > 0, there is an l > 0, such that k~ ukl u1 k < e=2
and
kukl u~kl k < e=2:
ð4:26Þ
Therefore, for any k P kl , it follows from (4.25) and (4.26) that uk l u1 k < e kuk u1 k 6 kukl u1 k 6 kukl u~kl k þ k~ and the sequence fuk g converges to u1 .
5. Implementation details and some numerical examples To ensure a faster convergence, we take the relaxation factor c 2 ð0; 2Þ, which is close to 2, say 1.8. The ÔidealÕ a is given by (4.14). Basing on the above analysis, for given u, we use an Armijo-like and b-satisfying Condition (3.4). On the other hand, it may cause a slow convergence if bk is too small. To avoid that situation, sometimes increasing the probe b is necessary. We prefer to use the following technique to adjust b automatically in our proposed method:
44
B.-s. He et al. / European Journal of Operational Research 159 (2004) 35–51
bkþ1
82 > < 3 bk ; ¼ 32 bk ; > : bk ;
if rk > 0:9; if rk < 0:1; otherwise:
Now we arrive at the following practical modified augmented Lagrangian method. Algorithm 5.1. [A practical modified augmented Lagrangian method] Step 0. Let b0 ¼ 1, 0 < l ¼ 0:1 < m ¼ 0:9 < 1, s0 ¼ 1, u0 2 X, c ¼ 1:8 and k ¼ 0. Step 1. ~xk ¼ PX fxk bk ðsk f ðxk Þ AT ½y k bk ðAxk bÞÞg, y~k ¼ y k bk ðA~xk bÞ, sk f ðxk Þ AT ½y k bk ðAxk bÞ k k qðu ; u~ ; bk Þ ¼ : A~xk b b kqðuk ; u~k ; bk Þ Fsk ð~ uk Þk > m, then goto Step 3, Step 2. If rk :¼ k k k ku u~ k else set dðuk ; u~k ; bk Þ ¼ ðuk u~k Þ bk ðqðuk ; u~k ; bk Þ Fsk ð~ uk ÞÞ, T ðuk u~k Þ dðuk ; u~k ; bk Þ ukþ1 ¼ PX ½uk cak bk Fsk ð~ uk Þ, where ak ¼ , 2 kdðuk ; u~k ; bk Þk 8 1 > k k T k k > > < 2 sk ; if sk kf ðx Þ f ð~x Þk > 2bk kA Aðx ~x Þk; skþ1 ¼ 2sk ; if sk kf ðxk Þ f ð~xk Þk < 12bk kAT Aðxk ~xk Þk; > > > : sk ; otherwise: 8 <3 b if rk 6 l; bkþ1 :¼ 2 k k :¼ k þ 1, go to Step 1. : bk otherwise;
Step 3. Reduce the value of bk by bk :¼ 23bk and go to Step 1. Remark 3. From our computational experience, the choice of s plays a significant role in implementation. For the sake of balancing sk kf ðxk Þ f ð~xk Þk and bk kAT Aðxk ~xk Þk, we present a self-adaptive technique to adjust s automatically in each iteration as illustrated in Step 2 of Algorithm 5.1. In our implementation, all codes were written in MATLAB and run on a AMD 1.5 GHz personal computer. Throughout the computational experiments, we took c ¼ 1:8, b0 ¼ 1, s0 ¼ 1, l ¼ 0:1 and m ¼ 0:9. The stopping criterion for the algorithm was that the max-norm kuk u~k k1 6 e ¼ 107 : 5.1. The first set of test examples To test the ability of the proposed method, we take some examples from literature. The iteration numbers and CPU-times with various starting points are reported in Table 1 and a remark followed. Example 5.1. The first example was used in [21,22]. Finding a x 2 S such that ðx x ÞT f ðx Þ P 0;
8x 2 S;
B.-s. He et al. / European Journal of Operational Research 159 (2004) 35–51
45
Table 1 Numerical results for Examples 5.1–5.7 Example
Stating Point x0
k
kxk x k1
CPU (s)
5.1
ð25; 0; 0; 0; 0Þ ð10; 0; 10; 0; 10Þ ð10; 0; 0; 0; 0Þ ð0; 2:5; 2:5; 2:5; 2:5Þ ð0; 0; 0; 0; 0Þ ð0; 0; 100; 0; 0Þ ð70; 70; 70; 60; 60Þ ð3; 5; 3; 2; 2Þ ð35; 31; 11; 5; 5Þ ð2:5; 0:5; 2; 1; 0:5Þ ð10; . . . ; 10Þ
24 30 24 23 25 152 161 178 143 81 188
8.07 · 109 1.01 · 108 7.09 · 109 1.07 · 108 7.39 · 109 1.61 · 107 1.64 · 107 7.06 · 108 1.56 · 107 1.33 · 107
0.02 0.03 0.02 0.02 0.02 0.08 0.06 0.06 0.05 0.04 1.34
5.2 5.3 5.4 5.5 5.6 5.7
where
0 1 0 1 10 1 arctanðx1 2Þ 5:308 0:726 0:949 0:266 1:193 0:504 x1 B C B arctanðx2 2Þ C B 0:008 C B 1:645 0:678 0:333 0:217 1:443 C B C B C B CB x2 C B C C B C B C f ðxÞ ¼ B 1:016 0:225 0:769 0:934 1:007 CB x3 C þ 10B B arctanðx3 2Þ C þ B 0:938 C @ A @ A @ @ 1:063 A arctanðx4 2Þ x4 1:024 A 0:567 1:144 0:550 0:548 arctanðx5 2Þ x5 1:312 0:259 1:453 1:073 0:509 1:026
and
0
( S¼
) X 5 x ¼ 10; xi P 0; i ¼ 1; 2; . . . ; 5 : x 2 R5 i¼1 i
Therefore, in the form of VIðS; f Þ (1.1), A ¼ ½1; 1; 1; 1; 1 and b ¼ 10. The solution point of this problem is T x ¼ ð2; 2; 2; 2; 2Þ . Example 5.2. This example is altered from the Example 2 in [21], in which it contains an inequality constraint. Finding a x 2 S such that T
ðx x Þ f ðx Þ P 0; 8x 2 S; where 0 3 4 16 15 B 4 1 5 10 B f ðxÞ ¼ B 16 5 2 11 B @ 15 10 11 3 4 11 7 10 and
1 10 1 0 41 0 15 x1 4 4x1 B C B 7x4 C B 10 C 11 C C CB x 2 C B 2C B 3 B C C B þB 50 C 7 CB x3 C þ 10 B 5x43 C C C B @ 9x4 A @ 30 A 10 A@ x4 A 4 25 1 x5 8x45
S ¼ x 2 R5 j Ax ¼ d; xi P 0; i ¼ 1; 2; . . . ; 5 ;
in which
0
0 B 2 A¼B @ 2 5
0 2 2 3
0:5 0 4 2
0 0:5 2 0
1 2 2 C C 3 A 2
0
and
1 10:00 B 37:84 C C d¼B @ 12:84 A: 20:88
The solution point of this problem is x ¼ ð9:08; 4:84; 0:00; 0:00; 5:00ÞT .
46
B.-s. He et al. / European Journal of Operational Research 159 (2004) 35–51
Example 5.3. This problem is taken from [4,8], in which 0 1 10x1 þ 5x4 þ 1000 B 15x2 þ 5x5 þ 950 C B C C f ðxÞ ¼ B B 20x3 þ 3000 C @ 2x1 þ 20x4 þ 1000 A x2 þ 25x5 þ 1300 and S ¼ fx 2 R5 jx1 þ x2 þ x3 210 ¼ 0; x4 þ x5 120 ¼ 0; xi P 0; i ¼ 1; . . . ; 5g: T
The solution of this problem is known to be x ¼ ð120; 90; 0; 70; 50Þ . Example 5.4. This example is derived from problem 48 in [16]. In this problem, 1 10 1 0 0 x1 1 1 0 0 0 0 C CB C B B 0 C B x2 C B 0 C B 0 1 1 0 C CB C B B C B C B f ðxÞ ¼ B 0 0 C C B x 3 C þ B 0 C; B 0 1 1 C CB C B B 0 1 1 A@ x4 A @ 0 A @0 0 0 A¼
1 1 0 0
0 1 1
0
1
1 1 ; 2 2
1
x5
b¼
0
5 : 3
The solution point of this problem is x ¼ ð1; 1; 1; 1; 1ÞT . Example 5.5. The following example is derived from problem 50 in 10 1 0 x1 1 1 0 0 0 0 CB x2 C B 1 2 1 0 1 2 3 0 CB C B CB x3 C; A ¼ @ 0 1 2 f ðxÞ ¼ B 0 1 2 1 0 CB C B @ 0 0 0 1 0 1 2 1 A@ x4 A 0 0 0 1 1 x5
[16]. In this problem, 1 0 0 3 0 A; 2 3
0 1 6 b ¼ @ 6 A: 6
The solution point of this problem is x ¼ ð1; 1; 1; 1; 1ÞT . Example 5.6. This example can 0 1 1 0 0 B 1 2 1 0 B f ðxÞ ¼ B 1 1 0 B 0 @ 0 0 0 1 0 0 0 0 0
1 A ¼ @0 0
3 0 1
0 1 0
also be found in [16]. It is derived from problem 51 in it. In this problem, 1 10 1 0 0 0 x1 B C B C 0C CB x2 C B 2 C B C B C 0 CB x3 C þ B 2 C C; 0 A@ x4 A @ 1 A 1 1 x5
1 0 0 1 2 A; 0 1
0 1 4 b ¼ @ 0 A: 0 T
The solution point of this problem is x ¼ ð1; 1; 1; 1; 1Þ .
B.-s. He et al. / European Journal of Operational Research 159 (2004) 35–51
47
Example 5.7. P The last example is the problem 119 in [16], which is to find a minimum point of f ðxÞ in S ¼ fx 2 R16 j 16 j¼1 bij xj cj ¼ 0; i ¼ 1; . . . ; 8 and 0 6 xi 6 5; i ¼ 1; . . . ; 16g, in which X is a ÔboxÕ constraint. Because this is a special case of VI(S; rf ), we can solve it by the proposed method too. In [16], the author gave out a proximate solution x of it. But in our experiment, we found that this solution point is not as exact as we got. Thus we leave the column kxk x k1 of this example blank. Remark 4. From Tables 1 and 2, we found that the iteration numbers are medium and the computation amount in each iteration of the proposed method is quite small. In Step 2 of our method, we must determine a proper parameter b to make sure rk 6 m, which is an important property in convergence analysis. From our computing experience, we find it is not hard to find such a b. In fact, in our examples, we need 2– 4 times of probe to satisfy this condition in the very first iteration. Thereafter, we need no more than two times of probe in each step and typically (in fact, in more than 98% of iterations) only one time. So this task is not very heavy for the method. 5.2. The second set of test examples and comparison This set of test example is the nonlinear complementarity problem x P 0;
f ðxÞ P 0;
xT f ðxÞ ¼ 0;
where f ðxÞ ¼ DðxÞ þ Mx þ q; DðxÞ and Mx þ q are the nonlinear part and the linear part of f ðxÞ, respectively. This problem belongs to problem class (1.3) and (1.4) with A ¼ 0 and b ¼ 0. If the proposed method is applied to solve this problem, without loss of generality, we can take sk 1. In this case, if we set the step length ak 1 in Algorithm 5.1, the method is a variant of the Korpelevich method [17]. In other words, we can compare the efficiency of the proposed method and the Korpelevich method by setting ak ¼ cak (see (4.14)) and ak 1, respectively. We construct the test problems as follows: The linear part is similar as in Harker and Pang [12] and the nonlinear part of f ðxÞ is similar as in [21]. 2 The matrix M ¼ AT A þ B, where A is an n n matrix whose entries are randomly generated in the interval ð5; þ5Þ and a skew-symmetric matrix B is generated in the same way. The vector q is generated from a uniform distribution in the interval ð500; 500Þ (easy problems) and ð500; 0Þ (hard problems), respectively. In DðxÞ, the nonlinear part of F ðxÞ, the components are Dj ðxÞ ¼ dj arctanðxj Þ, where dj is a random variable in ð0; 1Þ. We test the problems with dimension n ¼ 100, 200 and 500. In all these problems, we take start vector x0 ¼ 0. The test results for easy problems (q 2 ð500; 500Þ) and hard problems (q 2 ð500; 0Þ) are reported in Tables 2 and 3, respectively. In the tables, k is the number of iterations and l denotes the number of evaluations of mapping f which is the main computational load in this special method. For the same problems, the proposed method needs less than 50% computational cost of the extra gradient method.
2
In the paper by Harker and Pang [12], the matrix M ¼ AT A þ B þ D, where A and B are the same matrices as what we used here, and D is a diagonal matrix with uniformly distributed random variable djj 2 ð0:0; 0:3Þ. The nonlinear part DðxÞ of the test problems in [21] is Dj ðxÞ ¼ constant arctanðxj 2Þ.
48
B.-s. He et al. / European Journal of Operational Research 159 (2004) 35–51
Table 2 Numerical results for easy problem The proposed method
n 100 200 500
The extra-gradient method
k
l
CPU (s)
k
l
CPU (s)
274 362 402
589 777 862
0.44 2.14 11.20
566 810 867
1173 1680 1860
0.72 4.50 23.89
Table 3 Numerical results for hard problem The proposed method
n 100 200 500
The extra-gradient method
k
l
CPU (s)
k
l
CPU(s)
592 778 958
1273 1673 2060
0.82 4.72 27.46
1278 1651 2049
2651 3419 4252
1.87 9.50 56.24
6. Applications in traffic network equilibrium problems In this section, we apply the proposed method in a class of traffic network equilibrium problems with fixed demands and take an example from [19] to demonstrate the efficiency of the proposed method. Consider a network ½N ; L of nodes N and directed links L, which consists of a finite sequence of connecting links with a certain orientation. Let a; b, etc., denote the links, and let p; q, etc., denote the paths. We let x denote an origin/destination (O/D) pair of nodes of the network and Px denotes the set of all paths connecting O/D pair x. Let xp represent the traffic flow on path p and dx denote the traffic demand between O/D pair x, which must satisfy X dx ¼ xp ; ð6:1Þ p2Px
where xp P 0; 8p. Let fa denote the link load on link a, which must satisfy the following conservation of flow equation: X fa ¼ dap xp ; ð6:2Þ p2P
where dap ¼ 1, if link a is contained in path p, and 0, otherwise. Let c ¼ fca ; a 2 Lg be the row vector of link costs, with ca denoting the user cost of traversing link a. In general, we assume that the link cost may depend on the flows on every link, that is, c ¼ cðf Þ; where c is a given function and f denotes the column vector of link loads. A user traveling on path p incurs a (path) travel cost hp satisfying X dap ca : hp ¼ a2L
Let M be the path-arc incidence matrix of the given problem. Since x is the path-flow, the arc-flow f is given by f ¼ M T x:
B.-s. He et al. / European Journal of Operational Research 159 (2004) 35–51
49
For given link travel cost vector c, the path travel cost vector h is given by h ¼ Mc: Hence, the path travel cost h is a mapping of the path-flow x and its mathematical form is hðxÞ ¼ McðM T xÞ: According to [19], the problem is a variational inequality in the space of path-flow pattern x: T
Find x 2 K such that ðx x Þ hðx Þ P 0; 8x 2 K;
ð6:3Þ
where
( ) X K ¼ x x ¼ dx ; x P 0 : p2P p
ð6:4Þ
x
We can translate (6.4) to the standard form of K ¼ fxjAx ¼ b; x P 0g, where A is an O/D pair-path incidence matrix and b is a demand vector. The method was implemented for solving Example 8.4 described in [19, Chapter 8, Section 8.4.2]. The network is depicted in Fig. 1, which can also be found in [19, Chapter 7, Section 7.4.2, Figure 7.4]. There the problem was solved by a projected dynamic system. The method needs a suitable parameter sequence fas g. According to the results reported in [19], such a suitable sequence is greatly problem dependent. It is very difficult to specify those parameters when we face a practical problem by using the method in [19]. The details of the link cost functions can be found in [19, pp. 232–235]. The user link cost functions are given here for completeness: c1 ðf Þ ¼ 5 105 f14 þ 5f1 þ 2f2 þ 500; c3 ðf Þ ¼ 5 105 f34 þ 3f3 þ f4 þ 350; c5 ðf Þ ¼ 6 105 f54 þ 6f5 þ 4f6 þ 600; c7 ðf Þ ¼ 8 105 f74 þ 8f7 þ 2f8 þ 400; c9 ðf Þ ¼ 105 f94 þ 6f9 þ 2f1 0 þ 700; c11 ðf Þ ¼ 7 105 f114 þ 7f11 þ 4f12 þ 650; c13 ðf Þ ¼ 105 f134 þ 7f13 þ 3f18 þ 600; c15 ðf Þ ¼ 3 105 f154 þ 9f15 þ 2f14 þ 200; c17 ðf Þ ¼ 3 105 f174 þ 7f17 þ 2f15 þ 450; c19 ðf Þ ¼ 8f19 þ 3f17 þ 600; c21 ðf Þ ¼ 4 105 f214 þ 4f21 þ f22 þ 400; c23 ðf Þ ¼ 3 105 f234 þ 9f23 þ 2f24 þ 350; c25 ðf Þ ¼ 3 105 f254 þ 9f25 þ 3f26 þ 450; c27 ðf Þ ¼ 3 105 f274 þ 8f27 þ 3f28 þ 500; c29 ðf Þ ¼ 3 105 f294 þ 3f29 þ f30 þ 450; c31 ðf Þ ¼ 3 105 f314 þ 8f31 þ f32 þ 750; c33 ðf Þ ¼ 4 105 f334 þ 9f33 þ 2f31 þ 750; c35 ðf Þ ¼ 3 105 f354 þ 8f35 þ 3f32 þ 600; c37 ðf Þ ¼ 6 105 f374 þ 5f37 þ f36 þ 350:
c2 ðf Þ ¼ 3 105 f24 þ 4f2 þ 4f1 þ 200; c4 ðf Þ ¼ 3 105 f44 þ 6f4 þ 3f5 þ 400; c6 ðf Þ ¼ 7f6 þ 3f7 þ 500; c8 ðf Þ ¼ 4 105 f84 þ 5f8 þ 2f9 þ 650; c10 ðf Þ ¼ 4f10 þ f12 þ 800; c12 ðf Þ ¼ 8f12 þ 2f13 þ 700; c14 ðf Þ ¼ 8f14 þ 3f15 þ 500; c16 ðf Þ ¼ 8f16 þ 5f12 þ 300; c18 ðf Þ ¼ 5f18 þ f16 þ 300; c20 ðf Þ ¼ 3 105 f204 þ 6f20 þ f21 þ 300; c22 ðf Þ ¼ 2 105 f224 þ 6f22 þ f23 þ 500; c24 ðf Þ ¼ 2 105 f244 þ 8f24 þ f25 þ 400; c26 ðf Þ ¼ 6 105 f264 þ 7f26 þ 8f27 þ 300; c28 ðf Þ ¼ 3 105 f224 þ 7f28 þ 3f29 þ 650; c30 ðf Þ ¼ 4 105 f304 þ 7f30 þ 2f31 þ 600; c32 ðf Þ ¼ 6 105 f324 þ 8f32 þ 3f33 þ 650; c34 ðf Þ ¼ 6 105 f344 þ 7f34 þ 3f30 þ 550; c36 ðf Þ ¼ 2 105 f364 þ 8f36 þ 4f31 þ 750;
The O/D pairs of this problem were x1 ¼ ð1; 20Þ, x2 ¼ ð1; 25Þ, x3 ¼ ð2; 20Þ, x4 ¼ ð3; 25Þ, x5 ¼ ð1; 24Þ and x6 ¼ ð11; 25Þ. The total path number in this problem was 55. The demand in each O/D pair was dx1 ¼ 100, dx2 ¼ 100, dx3 ¼ 100, dx4 ¼ 65, dx5 ¼ 125, dx6 ¼ 50. The number of iteration for the starting point x0 ¼ ð50; 50; . . . ; 50ÞT and y 0 ¼ ð0; 0; . . . ; 0ÞT is given in Table 4. Note that the initial error keðu0 Þk1 is about 970 in this example. The results of our computational experience are reported in Table 5.
50
B.-s. He et al. / European Journal of Operational Research 159 (2004) 35–51
Fig. 1. A transportation network with 25 nodes and 37 links.
Table 4 Numerical results for Algorithm 5.1, when s is self-adaptive Iteration number
kuk u~k k1
kAxk dk1
kxk PX ½xk ðf ðxk Þ AT y k Þk1
100 200 300 400 498
1.2153 0.0497 0.0260 0.0025 9.8386e)006
10.3886 0.3206 0.0146 0.0146 4.6254e)005
5.4309e+004 1.6532e+003 612.6471 160.3179 0.3768
Table 5 Numerical results for Algorithm 5.1, when s is fixed kuk u~k k1 5
< 10
Iteration numbers s ¼ 102
s ¼ 103
s ¼ 104
s ¼ 105
s ¼ 106
/
879
460
3169
/
Just like what we said in Section 5, the number of iteration will greatly depend on the choice of s. In Table 5, we give the iteration numbers using the proposed method with fixed s to demonstrate this view. In Table 5, Ô/Õ means the iteration number is larger than 5000. From Table 5, we can easily find that the choice of s plays a key role in the proposed method. Generally speaking, in different problems, we need different sÕs. But in practical situation, it is hard to determine which value of s will be proper for the problem. Thus our self-adaptive technique is efficient and indispensable when applying the method. 7. Concluding remarks Instead of solving a sub-variational inequality (1.5) in each iteration, the proposed modified scaling augmented Lagrangian method obtains the new iterate via two simple projections (1.7) and (1.9). In addition, it extends the original one [6] by allowing the penalty and scaling parameters to vary from iteration to iteration. The preliminary numerical results indicate that the improvements are necessary and significant. The proposed method can be viewed as a kind of prediction–correction projection-type method. Based on
B.-s. He et al. / European Journal of Operational Research 159 (2004) 35–51
51
the quantitative convergence analysis in this paper, it is possible to develop some more general adjusting rules for the variable parameters and this is one of our on-going research topics.
Acknowledgements The authors thank the referees for their valuable comments and suggestions which helped us improve the presentation of this paper.
References [1] K.J. Arrow, L. Hurwicz, L. Uzawa, Studies in Linear and Nonlinear Programming, Stanford University Press, 1985. [2] D.P. Bertsekas, J.N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods, Prentice-Hall, Englewood Cliffs, NJ, 1989. [3] P.H. Calamai, J.J. More, Projected gradient methods for linearly constrained problems, Mathematical Programming 39 (1987) 93– 116. [4] S. Dafermos, Traffic equilibrium and variational inequalities, Transportation Science 14 (1980) 42–54. [5] B.C. Eaves, On the basic theorem of complementarity, Mathematical Programming 1 (1971) 68–75. [6] J. Eckstein, Some saddle-function splitting methods for convex programming, Optimization Methods and Software 4 (1994) 75– 83. [7] M. Fortin, R. Slowinski (Eds.), Augmented Lagrangian methods: Applications to the Numerical Solution of Boundary-Value Problems, North-Holland, Amsterdam, 1983. [8] M. Fukushima, A relaxed projection method for variational inequalities, Mathematical Programming 35 (1986) 58–70. [9] D. Gabay, B. Mercier, A dual algorithm for the solution of nonlinear variational problems via finite element approximations, Computers and Mathematics with Applications 2 (1976) 17–40. [10] E.M. Gafni, D.P. Bertsekas, Two-metric projection methods for constrained optimization, SIAM Journal on Control and Optimization 22 (1984) 936–964. [11] R. Slowinski, Numerical Methods for Nonlinear Variational Problems, Springer-Verlag, New York, Berlin, Heidelberg, Tokyo, 1984. [12] P.T. Harker, J.S. Pang, A damped-Newton method for the linear complementarity problem, Lectures in Applied Mathematics 26 (1990) 265–284. [13] B.S. He, A class of new methods for monotone variational inequalities, Applied Mathematics and Optimization 35 (1997) 69–76. [14] B.S. He, Inexact implicit methods for monotone general variational inequalities, Mathematical Programming 86 (1999) 199–217. [15] B.S. He, H. Yang, Some convergence properties of a method of multipliers for linearly constrained monotone variational inequalities, Operations Research Letters 23 (1998) 151–161. [16] W. Hock, K. Schittkowski, Test Examples for Nonlinear Programming Codes, Springer, Berlin, 1981. [17] P. Marcotte, Application of KhobotovÕs algorithm to variational inequalities and network equilibrium problems, Information Systems and Operational Research 29 (1984) 258–270. [18] A. Nagurney, Network Economics, A Variational Inequality Approach, Kluwer Academics Publishers, Dordrect, 1993. [19] A. Nagurney, D. Zhang, Projected Dynamical Systems and Variational Inequalities with Applications, Kluwer Academic Publishers, Boston, 1996. [20] J.M. Peng, M. Fukushima, A hybrid Newton method for solving the variational inequality problem via the D-gap function, Mathematical Programming 86 (1999) 367–386. [21] K. Taji, M. Fukushima, T. Ibaraki, A globally convergent Newton method for solving strongly monotone variational inequalities, Mathematical Programming 58 (1993) 369–383. [22] S. Wang, H. Yang, B.S. He, Solving a class of asymmetric variational inequalities by a new alternation direction method, Computers and Mathematics with Applications 40 (2000) 927–937.