Accepted Manuscript
A numerical study of applying spectral-step subgradient method for solving nonsmooth unconstrained optimization problems M. Loreto, Y. Xu, D. Kotval PII: DOI: Reference:
S0305-0548(18)30310-1 https://doi.org/10.1016/j.cor.2018.12.006 CAOR 4605
To appear in:
Computers and Operations Research
Received date: Revised date: Accepted date:
19 March 2018 4 December 2018 4 December 2018
Please cite this article as: M. Loreto, Y. Xu, D. Kotval, A numerical study of applying spectral-step subgradient method for solving nonsmooth unconstrained optimization problems, Computers and Operations Research (2018), doi: https://doi.org/10.1016/j.cor.2018.12.006
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Highlights • Novel adaption of spectral step approach to solve nonsmooth unconstrained problems. • Spectral-step subgradient method is superior to classical subgradient methods. • Spectral-step subgradient method is efficient and easy to implement.
CR IP T
• Comprehensive numerical experimentation based on two sets of nonsmooth problems.
AC
CE
PT
ED
M
AN US
• Comparison based on performance profiles pondering precision and computational cost.
1
ACCEPTED MANUSCRIPT
A numerical study of applying spectral-step subgradient method for solving nonsmooth unconstrained optimization problems✩ M. Loretoa,1,∗, Y. Xua,1 , D. Kotvalb,2
b Middle
of Washington Bothell, 18115 Campus Way NE, Bothell, WA, 98011-8246, USA
CR IP T
a University
Tennessee State University, 1301 East Main Street, Murfreesboro, TN, 37132-0001, USA
Abstract
AN US
The purpose of this work is two-fold. First, we present the spectral-step subgradient method to solve nonsmooth unconstrained optimization problems. It combines the classical subgradient approach and a nonmonotone linesearch with the spectral step length, which does not require any previous knowledge of the optimal value. We focus on the interesting case in which the objective function is convex and continuously differentiable almost everywhere, and it is often non-differentiable at minimizers. Secondly, we use performance profiles to compare the spectral-step subgradient method with other subgradient methods. This comparison will allow us to place the spectral-step subgradient algorithm among other subgradient algorithms.
ED
M
Keywords: Nonsmooth Optimization, Subgradient Methods, Spectral Gradient Methods, Nonmonotone Line Search. 2000 MSC: 49M05, 90C26, 65K10
1. Introduction
AC
CE
PT
Subgradient methods are a popular approach to solve nonsmooth optimization problems and have been widely studied. They generalize gradient-based methods used to solve smooth optimization problems where the gradient is replaced by an arbitrary subgradient. A detailed explanation can be found in [18]. Classical subgradient methods show slow convergence to the optimal solution, and some of their modifications converge faster because an accurate approximation to the solution is provided. Therefore, a first goal of this work is to propose a reliable subgradient method with attributes such as: easy to implement, more efficient than classical subgradient methods and no need for previous knowledge of the solution. We adapted the spectral-step subgradient method to be applied to nonsmooth unconstrained minimization problems for the first time. We consider the case in which the objective function is convex and continuously differentiable almost everywhere, but it is often not differentiable at minimizers. ✩ This
work was supported by the National Science Foundation (grant DMS-1460699) author Email addresses:
[email protected] (M. Loreto),
[email protected] (Y. Xu),
[email protected] (D. Kotval) 1 School of Science, Technology, Engineering, and Mathematics (STEM) 2 Department of Mathematical Sciences
∗ Corresponding
Preprint submitted to Elsevier
December 5, 2018
ACCEPTED MANUSCRIPT
CR IP T
An iteration of the spectral-step subgradient method requires a subgradient and the spectral step. The spectral step was originally proposed by Barzilai and Borwein [1], and it has been widely studied and extended when solving smooth problems [2], [3], [6],[10], [15], [16], and [17] . Since the subgradient is not necessarily a descent direction and the spectral step presents a nonmonotone behavior, the spectral subgradient method is embedded in a nonmonotone line search strategy. Indeed, we enforce the line search scheme with the nonmonotone globalization strategy of Grippo et al. [7] combined with the proposed globalization scheme of La Cruz et al. [10]. The convergence of the proposed method is established based on results presented by Loreto et al. in [11]. The second goal of this research work is to show a fair and comprehensive comparison between the spectral-step subgradient method and the classical subgradient methods. These classical subgradient methods differ in their step sizes, and they are presented in section 3. For a fair comparison, we use performance profiles proposed by Dolan et al. [19]. Therefore, we present comprehensive numerical experimentation based on two sets of nonsmooth problems.
AN US
Our main motivation for extending the use of the spectral step to solve nonsmooth problems is the wide variety of practical problems for which smooth spectral methods have been successfully applied, see [2, 3]. Also, we want to develop a nonsmooth algorithm easy to implement and more efficient than classical subgradient methods. The success obtained by combining the spectral step with the subgradient approach was initially recreated for a particular nonsmooth case by Loreto et al. [4], when the so-called spectral projected subgradient method was developed to solve specifically the Lagrangean dual problem that appears when solving integer programming problems. Moreover, the spectral projected subgradient method was further analyzed and enhanced in [11] and [12].
ED
M
Furthermore, nonsmooth spectral gradient methods were developed by Loreto et al. [13] for solving nonsmooth problems when the objective functions were locally Lipschitz continuous. In that work, subdifferentials were computed through gradient sampling or simplex gradient.
PT
This paper is organized as follows. In Sect. 2, we describe the spectral-step subgradient method. In Sect. 3, we present numerical results to provide further insights into the method and to place the spectral-step subgradient algorithm among other subgradient algorithms. Finally, in Sect. 4 we present some final remarks.
CE
2. Spectral-Step Subgradient Method
AC
In this section, we introduce the Spectral-Step Subgradient (SS) method to solve the problem: min f (x),
x∈Rn
(1)
where the objective function f : Rn → R is convex and continuously differentiable almost everywhere, but it is often not differentiable at minimizers. We assume that at every iterate xk a subgradient gk = g(xk ) can be computed, where a subgradient is any vector g that satisfies the inequality f (y) ≥ f (x) + g t (y − x) for all y. At each iteration the method moves in the negative direction of gk with a step length determined by the spectral step αk . The 1 spectral step is the least-squares solution of minimizing k sk−1 − yk−1 k22 , where sk−1 = xk − xk−1 and αk yk−1 = ∇f (xk ) − ∇f (xk−1 ). Hence, sT sk−1 αk = Tk−1 (2) sk−1 yk−1 3
ACCEPTED MANUSCRIPT
The spectral step was originally proposed by Barzilai and Borwein and then was further developed by Raydan et al. (see [1], [15], and [16]).
min{αmax ,
αk =
1 } k sk−1 k
CR IP T
The formula for αk is used provided that for all iterations the steps are bounded away from zero on the positive side, and bounded away from infinity. In other words, we use some safeguard fixed parameters 0 < αmin < αmax < ∞. Moreover, when the spectral step calculated using (2) is going to be negative or 1 as an alternative step if possible. Hence, the step length undefined, the method tries to leverage k sk−1 k is defined as follows: if sTk−1 yk−1 ≤ 0
T
sk−1 sk−1 }} min{αmax , max{αmin , T sk−1 yk−1
if sTk−1 yk−1 > 0
(3)
αk > 0 ∀k,
AN US
The usual converge analysis of subgradient schemes is based on the fact that the sequence {k xk+1 −x∗ k} is strictly decreasing, where x∗ is the optimal point. Therefore, the condition (4) is added to αk to guarantee decreasing of the distance to the optimal point as it is suggested in [14]: lim αk = 0,
k→∞
We bound the spectral step by the sequence {
2. if αk ≤
then αk =
108 log(k) 10−8 log(k)
(4)
M
108 log(k) 10−8 log(k)
αk = ∞,
ED
1. if αk ≥
k=1
C } with C > 0, which satisfies (4) using the following log(k)
procedure: Procedure (B):
∞ X
then αk =
PT
As we stated before, the method moves in the negative direction of gk with a step length αk . Such movement is not guaranteed to be descending on the objective function, hence the method incorporates a globalization scheme. We enforce the line search scheme with the nonmonotone globalization strategy of Grippo et al. [7] combined with the proposed globalization scheme of La Cruz et al. [10].
CE
f (xk − τ gk ) ≤
max
0≤j≤min{k,M }
f (xk−j ) − γτ gkT gk + ηk ,
(5)
AC
where M ≥ 0 is a fixed integer, 0 < γ < 1 is a small number, τ > 0 is obtained after a backtracking process that starts at αk , and ηk > 0 is chosen such that: X 0< ηk = η < ∞. (6) k
Roughly speaking, to get a optimal point x∗ that minimizes f , the method computes a trial point x+ in the negative direction of gk multiplied by αk . The point x+ is tested until the nonmonotone globalization strategy (5) is satisfied and a new iterate is obtained. Spectral-Step Subgradient Algorithm (SS): Given x0 , integer M ≥ 0, g0 a subgradient at f (x0 ), a parameter MAXITER as the maximum number of iterations allowed, 0 < αmax < αmin < ∞, α0 ∈ [αmin , αmax ], η0 = max(f (x0 ), k g0 k), and γ = 10−4 . 4
ACCEPTED MANUSCRIPT
For k = 0, . . . ,MAXITER τ = αk x+ = xk − τ g k 0 ηk = kη1.1 While f (x+ ) >
max
0≤j≤min{k,M }
f (xk−j ) − γτ gkT gk + ηk
Reduce τ x+ = xk − τ gk 5. Set xk+1 = x+ 6. Set sk = xk+1 − xk and yk = gk+1 − gk 7. Compute αk+1 7.1 Calculate bk = sTk yk 7.2 If bk ≤ 0, set αk+1 = min{αmax , else, compute ak = sTk sk and
1 } k sk k
ak }} bk
AN US
αk+1 = min{αmax , max{αmin ,
CR IP T
1. 2. 3. 4.
8. Verify Procedure (B) End For Remarks:
ED
M
1. Different strategies could be used to reduce τ . As an example, Loreto et al. [4] uses τ = τ /2. 2. For the parameter ηk = η0 /k r , when r > 1 condition (6) will hold. In particular, we chose r = 1.1, hence ηk = η0 /k (1.1) and the condition (6) is satisfied. r = 1.1 is suitable for the sufficiently nonmonotone desired behavior of the method.
PT
It is worth to mention that when f is differentiable, gk is ∇f (xk ) and the spectral-step subgradient method is a version of the Barzilai and Borwein gradient method for the large scale unconstrained minimization problem [16].
CE
2.1. The Quadratic Case
AC
We believe SS is an ideal candidate to minimize nonsmooth quadratic functions, since the spectral step has been successfully applied to minimize smooth quadratic functions as it can be seen in [6],[15] and [17]. Hence, we included minimizing maximum of quadratic-functions in our experimentation, which are defined as the point-wise maximum of a finite collection of quadratic functions. These kind of functions are relevant examples of nonsmooth functions that appear in many applications.
Let us consider the unconstrained minimization problem of a differentiable quadratic-function f (x) = 1 T T 2 x Ax + b x + c, where A is real symmetric and positive definitive (SPD) n × n matrix. It is well known that in the quadratic case, the step size can be computed directly by solving the subproblem min f (xk − αgk ). Therefore, the spectral step in the quadratic case is obtained solving the problem α≥0
min f (xk − α≥0
1 gk ): α
αk+1 =
5
sTk sk stk Ask
(7)
ACCEPTED MANUSCRIPT
Additionally, αk+1 is the inverse of the Rayleigh quotient of A evaluated at sk , and since A is SPD, αk+1 is between the minimum and the maximum eigenvalues of A. Hence, there is no risk of dividing by zero in equation (7). When considering nonsmooth quadratic functions defined by the maximum of quadraticfunctions, the spectral step is also defined by equation (7). Furthermore, the algorithm is simplified to the following version: Spectral-Step Subgradient Algorithm (Quadratic-Case):
For k = 0, . . . , MAXITER 1. xk+1 = xk − αk gk 2. Set sk = xk+1 − xk 3. Compute αk+1 using equation (7) and verify Procedure (B)
AN US
End for
CR IP T
Given x0 , g0 a subgradient of f (x0 ), a parameter MAXITER as the maximum number of iterations allowed, and α0 > 0.
2.2. Convergence Results
M
We close this section commenting on the theoretical properties of the proposed method. The following theorem established by Loreto et al. in [11][section 3, pages 922-926] is applicable to SS, since it can be seen as a particular case of the Modified Spectral Projected Subgradient method (MSPS) presented in that work. Consider
fkbest = min{f (x0 ), f (x1 ), . . . , f (xk−1 ), f (xk )},
ED
where, fkbest is the best objective value found in k iterations. As fkbest is a non-increasing sequence, it has a limit. Theorem 1: Consider the nonsmooth convex minimization problem min f (x) where Ω is a convex set, x∈Ω
CE
PT
suppose an optimal solution x∗ exists and let the sequence of {αk } satisfies the condition (4). Suppose further the M SP S is applied with the additional assumption that there exists G > 0 such that k g k2 ≤ G for all g ∈ ∂f (x), the set of subgradient vectors of f (x), then: lim fkbest = f (x∗ )
k→∞
Briefly, we describe the MSPS and its connection to SS.
AC
MSPS solves the lagrangean dual problem min f (x) where Ω is a convex set. To get an optimal point x∗ , x∈Ω
MSPS computes a trial point x+ in the negative direction of m+ , which is a trial direction based on the subgradient gk , the spectral step αk , and the momentum parameter µ. The following steps are used to obtain m+ and x+ , with m0 = 0. 1. m+ = αk gk + µmk 2. x+ = PΩ (xk − m+ ), where PΩ is defined as the projection of x on Ω
The point x+ is tested until it satisfies the globalization condition (5) and a new iterate is obtained. When Ω = Rn and the momentum term is zero, an iteration of the MSPS becomes an iteration of the SS. Hence, theorem 1 applies to SS proving its convergence. 6
ACCEPTED MANUSCRIPT
3. Numerical Results 3.1. Sets of Problems
CR IP T
We present comprehensive experimentation results based on two sets of problems. The first set of problems comprises 10 nonsmooth minimization problems taken from [8] and described in Table 1, which includes the optimal value f∗ = f (x∗ ), the dimension n, and an upper bound fb for f∗ . This upper bound will be used in the experimentation to establish a fair comparison with methods requiring previous knowledge about f∗ , as described later in this section. Additional information about this set of problems, including the suggested initial point, can be found in [8].
f (x) = max{fj (x) =
AN US
The second set of problems were generated as described in [9], and it comprises 40 functions. Each function is a point wise maximum of a finite collection of quadratic functions. Specifically, these functions are defined as the maximum of 20 quadratic functions of 100 variables each. The matrices A0j s of the quadratic forms were chosen to be positive definite, the optimal values f∗ were chosen to be zero while fb was set to 10−3 for all the problems, and the starting points were calculated as suggested in [9]. The general equation for these functions is defined by equation (8). 1 T x Aj x + bTj x + cj | j = 1, . . . , nf } 2
(8)
where A0j s are n × n symmetric matrices, bj ∈ Rn , cj ∈ R, and nf the number of fj (x) functions to take the maximum from. A subdifferential of a maximum of quadratics is given by:
M
∂f (x) = conv{∂fi (x) | fi (x) = f (x)}
ED
Hence, at any given x, an active index i ≤ nf where the maximum of (8) is attained, we have a subgradient Ai x + bi ∈ ∂f (x) available. Prob
PT
P1: MAXQ P2: MXHILB P3: Chained LQ P4: Chained CB3 I P5: Chained CB3 II P6: Activefaces P7: Brown 2 P8: Mifflin 2 P9: Crescent I P10:Crescent II
0.0 0.0 −(n − 1)21/2 2(n − 1) 2(n − 1) 0.0 0.0 -34.795 0.0 0.0
20 50 2 2 2 2 2 50 2 2
fb 10−3 10−3 -1.4128 2.002 2.002 10−3 10−3 -34.7602 10−3 10−3
AC
CE
Table 1: First Problem Set f∗ n
3.2. Algorithms to Benchmark We compare the spectral-step subgradient method with other six subgradient methods or algorithms, each of them using different step sizes described in Table 2. Detailed information about these steps can be found in [18, 21, 5], and implementations of the subgradient methods using these steps are available at [20]. Some of these subgradient methods require previous knowledge of the optimal value f∗ to guarantee convergence, since these are most valuable from the theoretic standpoint. Hence, these methods typically outperform other subgradient methods, which do not require knowledge of the optimal value f∗ . In order 7
ACCEPTED MANUSCRIPT
Table 2: Comparison Methods Subgradient Methods
Stepsize
CS
Constant Step
NSD
Non-Summable Diminishing step
SSNS
Square Summable but Not Summable step
CFM
Camerini, Fratta, and Maffioli
OP
Optimal Polyak’s method
FP
Filtered Polyak’s method
αk =
γ ,γ = 1 ||gk ||
a αk = √ , a = 1 k
CR IP T
Abbr.
a , a = 0.1 k f (xk ) − f∗ αk = sk where sk = gk + βsk−1 −(1.5)sT k−1 gk β = max{0, } ||sk ||2 αk =
f (xk ) − f∗ ||gk ||2
AN US
αk =
f (xk ) − f∗ sk where sk = (1 − β)gk + βsk−1 , β = 0.25 αk =
to present a fair comparison, we include in our experimentation two cases, when the optimal value f∗ is provided to those methods and when an upper bound fb of f∗ is provided instead.
ED
M
Since the solutions of the testing problems are known, we expand the typical stopping criteria based on iterations, to consider if the current function value f (xk ) is close enough to the optimal value f∗ , determined by a predefined tolerance . Hence, all algorithms will stop when either the MAXITER is reached or the following condition is satisfied:
PT
|f (xk ) − f∗ | ≤ , when f∗ 6= 0 |f∗ |
If f∗ is zero, then the condition becomes just |f (xk ) − f∗ | ≤ . Moreover, we define fmin as the best value found by the algorithms.
AC
CE
All algorithms were implemented using Matlab 2016. The experiments were executed on a desktop computer with CPU Intel Core i7 3.60 GHz, RAM memory of 16 GB, running Windows 10 Enterprise 64-bits. 3.3. Benchmark Approach We analyzed the results for all algorithms using the performance profiles introduced by Dolan et al. in [19]. Hence, we describe here below these performance profiles based on [19], adapting their concepts to our experimentation. Given a set of solvers S, which in our case represent the algorithms to be compared, and a set of problems P , one can define performance ratio for each solver as follows: rp,s =
tp,s min{tp,s : s ∈ S} 8
ACCEPTED MANUSCRIPT
tp,s represents a performance measure, e.g. computation time required to solve problem p by solver s. In our case, we are interested in three different performance measures for solver s ∈ S to solve problem p ∈ P : the relative error between fmin and f∗ (absolute error when f∗ is zero), the number of function evaluations performed and CPU time consumed up to find fmin . Performance profiles are included in our results for each of these performance measures.
CR IP T
We say an algorithm solves a problem if the relative error between fmin and f∗ (absolute error when f∗ is zero) is less than δ. If solver s does not solve problem p, then we set rp,s = rM , where rM is chosen such that rM ≥ rp,s for all p ∈ P and s ∈ S. This allows to set a minimum bar in terms of precision of the solutions found by each method, and hence produces a fairer comparison, especially for performance metrics such as function evaluations and CPU time. Although, the performance of a solver s on any given problem p may be of interest, we are more interested on the overall assessment of the performance of solver s over all the problems, defined as: 1 size{p ∈ P : rp,s ≤ τ } np
AN US
ρs (τ ) =
Where np is the number of problems in P , and τ ∈ [1, rM ]. Therefore, ρs (τ ) is the probability for the solver s that its performance ratio rp,s is within a factor τ ∈ R of the best possible ratio for all the problems. The performance profile ρs : R → [0, 1] for a solver s is nondecreasing, piecewise continuous from the right at each breakpoint. The ρs (1) is the probability that the solver s will win over the rest of the solvers for all problems. The value of ρs (τ ) for a given τ ≥ 1 shows the percentage of problems solved by the solver s with a performance measure which is τ times more than the best solver.
ED
M
For our sets of problems, the performance measures analyzed allow to compare in terms of precision (error between fmin and f∗ ) and computational cost (number of function evaluations and CPU time consumed). We set δ = 10−3 as the maximum acceptable error to determine when a solver failed, and in that case we set its ratio rp,s = rM = 2 × maxs {rp,s }. For our first set of problems we set MAXITER = 15, 000 and for the second set, MAXITER = 500. We set = 10−8 for both problem sets.
CE
PT
Results on numerical experiments are presented in figs. (1-6). The charts show the ratios scaled using logarithm (log2 ) as suggested in [19] to make sure all the activity along τ < rM is captured. Therefore, the value of ρs (τ ) at τ = 0 gives the percentage of test problems for which the corresponding solver is the best. In these charts, the ordinate ρ(τ ) shows the percentage of problems solved by a particular solver. In order to better understand the behavior of the algorithms where step size computation depends on previous knowledge about f∗ , we also present results when fb (a bound for the solution value) is provided instead of f∗ .
AC
3.4. Results for first set of problems In figs. 1, 2 and 3, we present performance profiles based on error between fmin and f∗ , number of function evaluations required to find fmin and CPU time consumed, for both cases providing f∗ for the solvers OP, CFM, and FP and providing fb instead.
Concerning the error when providing f∗ , the performance profile in fig. 1 shows CFM as the most precise solver overall. Not only does CFM provide the most precise solution for 60% of the problems but also it solves more problems as slightly more error is tolerated (9 out of 10 when τ is close to 10). Our algorithm, denoted as SS, could be considered second in terms of precision, closely followed by OP. It is worth to mention that OP is the only one solving 100% of the problems when error tolerance is big enough, while CFM and SS solve only 90% of them. Nevertheless, CFM and OP require the solution to be provided while SS does not. Furthermore, SS can be considered the most precise solver among the 9
ACCEPTED MANUSCRIPT
ones not requiring previous knowledge of the solution f∗ . Moreover, in spite of FP also requiring access to the solution, it does not compete in terms of precision with CFM, SS and OP.
CR IP T
On the other hand, fig. 1 also illustrates that SS is the most precise solver when fb is provided instead of the solution f∗ to solvers OP, CFM, and FP. This demonstrates SS superiority when previous knowledge about the solution is inaccurate or nonexistent. SS is followed by SSNS and NSD. Moreover, precision of CFM clearly worsened now that f∗ is not provided, and it only beats OP and CS. Furthermore, CS failed for all the problems and OP for almost all of them. Regarding the number of function evaluations performed to find fmin when f∗ is provided, fig. 2 shows CFM was the most successful for 80% of the problems, followed by OP(20%) and FP(10%). These algorithms were tied when solving some of the problems, hence the percentage summation exceeds 100% for τ = 0. Observing ρs (τ ) at the rightmost abscissa, OP solves 100% of the problems while SS solves 90%, when more function evaluations are allowed. Again, SS showed the best performance in terms of function evaluations among the algorithms not requiring previous knowledge about f∗ .
AN US
When fb is provided instead of f∗ , fig. 2 still shows CFM as the most efficient solver in terms of function evaluations for 60% of the problems. It is followed by FP, SS and OP. However, observing ρs (τ ) at the rightmost abscissa, SS solves 90% of the problems when more function evaluations are allowed, while CFM and FP solve at most 60% of the problems. It is worth mentioning that SS might execute linesearch if the condition (5) is not satisfied, hence more function evaluations may be required per iteration. Nevertheless, SS seems to be the best option considering all factors. In relation to CPU time, fig. 3 shows almost exactly the same behavior seen for function evaluations, which was expected since evaluating the function drives the computational effort for these algorithms.
ED
M
After analysing all performance profiles, not only SS shows strong performance for all three criteria when previous knowledge about the solution f∗ is nonexistent or inaccurate, but also it is capable of solving more problems. As a reminder, solving a problem means finding a solution with error less than δ = 10−3 . 3.5. Results for second set of problems
CE
PT
In figs. 4, 5, and 6, we present performance profiles for the second set of problems in the same fashion than section 3.4. In this numerical experimentation the solvers CS, NSD, SSNS failed for all the problems. Therefore, we excluded these solvers from all performance profiles, keeping only solvers CFM, OP, FP, and SS. This clearly shows SS superiority when previous knowledge about the solution is nonexistent.
AC
Concerning the error when providing f∗ , fig. 4 shows CFM and SS found the most precise solution for roughly 40% of the problems each, showing similar performance and clearly outperforming the rest of the algorithms. However, as more error is tolerated, SS starts outperforming CFM around τ = 1. Moreover, SS is capable of solving 100% of the problems around midway between τ = 1 and τ = 2, while CFM reaches that point around midway between τ = 2 and τ = 3. On the other hand, superiority of SS over all other solvers is clear when fb is provided instead of f∗ , since SS finds the most precise solution for 100% of the problems. Furthermore, all other solvers reached 100% only when τ is quite high almost at τ = 20, which means they solve all problems when the tolerated error is several times the error yield by SS. As stated in section 2.1, we were expecting SS to perform successfully on this specific type of problems. Regarding the number of function evaluations required to find fmin , in both cases when either f∗ or fb is provided, fig. 5 shows CFM is the most efficient solver, followed by OP, while SS is in third place. Moreover, the same observation is valid for performance profiles on CPU time showed on fig.6. However, notice that the range of τ in the abscissa is relatively short, which means the factor of difference is quite
10
ACCEPTED MANUSCRIPT
Performance Profile: Error using f*
0.9
0.8
0.8 ns )
0.9
0.6
(log2(r p,s )
0.5
0.4 0.3
0.4 0.3 0.2
0.1
0.1 0
5
10
15
SS
20
25
CS
30 NSD
AN US
0.2
0
CR IP T
0.6
:1
:1
0.7
s
0.7
0.5 (log2(r p,s )
Performance Profile: Error using fb
1
s
ns )
1
0
0
SSNS
5
CFM
10
15 OP
20 FP
Figure 1: Error for first set of problems
0.9
0.4
ns ) s
CE
0.3 0.2
AC
0.1 0
5
0.6
0.4 0.3 0.2 0.1 0
10 SS
0.7
0.5 (log2(r p,s )
0.5 (log2(r p,s )
:1
PT
:1
0.6
0.9 0.8
ED
0.7
s
ns )
0.8
0
Performance Profile: Function Evaluations using fb 1
M
Performance Profile: Function Evaluations using f* 1
CS
NSD
0
SSNS
2
4 CFM
6
8 OP
10
12
FP
Figure 2: # of function evaluations for first set of problems
low. Nevertheless, although CFM and OP find their best solution sooner, if previous knowledge about the solution is inaccurate then SS’s solution is much more precise as shown on fig. 4. Notice the big range for τ in the abscissa shown in that chart.
11
ACCEPTED MANUSCRIPT
Performance Profile: CPU time using f *
0.9
0.8
0.8 ns )
0.9
0.6 0.5
(log2(r p,s )
0.4 0.3
0.4 0.3 0.2
0.1
0.1
AN US
0.2
0
CR IP T
0.6
:1
:1
0.7
s
0.7
0.5 (log2(r p,s )
Performance Profile: CPU time using f b
1
s
ns )
1
0
0
2
4
6
8
SS
10
12
CS
NSD
0
5
SSNS
10
CFM
OP
FP
Figure 3: CPU time for first set of problems
Performance Profile: Error usingf*
0.9
ns ) (log2(r p,s )
0.3 0.2
AC
0.1
0
s
PT
0.4
0
0.6 0.5
CE
(log2(r p,s )
0.5
0.7
:1
0.6
0.9 0.8
ED
0.7
:1
s
ns )
0.8
0.4 0.3 0.2 0.1 0
1
Performance Profile: Error usingfb
1
M
1
2
3 SS
CFM
0
5 OP
10
15
20
FP
Figure 4: Error for the second set of problems
4. Final Remarks In this paper, we developed the Spectral-Step Subgradient (SS) method whose step size, the spectral step, does not require previous knowledge about the optimal value. The structure of each iteration of the 12
ACCEPTED MANUSCRIPT
Performance Profile: Function Evaluations usingfb 1 0.9
0.8
0.8
s
0.7
0.5 (log2(r p,s )
0.5 0.4 0.3
0.4 0.3
0.2
0.2
0.1
0.1
0
AN US
(log2(r p,s )
0.6
:1
0.6
0.7
CR IP T
ns )
0.9
:1
s
ns )
Performance Profile: Function Evaluations usingf* 1
0
0
0.5
1
1.5 SS
0
CFM
0.5
OP
1
1.5
FP
Figure 5: # of function evaluations for the second set of problems
Performance Profile: CPU time usingf *
0.9
ns ) (log2(r p,s )
0.3 0.2
AC
0.1
0
s
PT
0.4
0
0.6 0.5
CE
(log2(r p,s )
0.5
0.7
:1
0.6
0.9 0.8
ED
0.7
:1
s
ns )
0.8
0.4 0.3 0.2 0.1 0
0.5
Performance Profile: CPU time usingf b
1
M
1
1
1.5 SS
CFM
0
0.5 OP
1
1.5
FP
Figure 6: CPU time for the second set of problems
SS is very attractive because of its simplicity and low memory requirements. Furthermore, the method inherited the convergence results from the MSPS as explained in section 2.2.
13
ACCEPTED MANUSCRIPT
We presented numerical results optimizing both recognized nonsmooth test-problems and large-scale quadratic test-problems. Experimental results were analyzed using performance profiles to facilitate the comparison, pondering criteria such as precision and computational cost for both sets of problems. The numerical experimentation showed the superiority of SS on both sets of problems when compared to classic subgradient methods that do not require previous knowledge about the optimal value.
CR IP T
When compared to methods requiring previous knowledge about the optimal value, the SS competed closely and showed its superiority when such previous knowledge is not as precise and certain precision level is required for the solution found. Nevertheless, it may require slightly more computational effort in that case.
AN US
Acknowledgments. The authors thank to Hugo Aponte and Marcos Raydan for their constructive and comprehensive suggestions. We also thank Prof. Napsu Karmitsa for making available the test problems in her website, Prof. Stephen Boyd for making the Matlab implementation of Subgradient methods publicly available, and to two anonymous referees whose comments helped us to improve the quality of this paper.
References
[1] J. Barzilai and J. M. Borwein, Two point step size gradient methods, IMA Journal of Numerical Analysis, 8, pp. 141-148, 1988.
M
[2] E. G. Birgin, J. M. Mart´ınez, and M. Raydan, Spectral Projected Gradient Methods, in Encyclopedia of Optimization, Second Ed., Editors: C. A. Floudas and P. M. Pardalos, Part 19, pp. 3652-3659, 2009.
ED
[3] E. G. Birgin, J. M. Mart´ınez, and M. Raydan, Spectral Projected Gradient methods: Review and Perspectives, Journal of Statistical Software, 60(3), pp. 1–21, 2014. [4] A. Crema, M. Loreto, and M. Raydan, Spectral projected subgradient with a momentum term for the Lagrangean dual approach, Computers and Operations Research, 34, pp. 3174–3186, 2007.
PT
[5] P. Camerini, L. Fratta, and F. Maffioli. On improving relaxation methods by modifying gradient techniques, Math. Programming Study, 3, pp. 26-34, 1975.
CE
[6] R. Fletcher, On the Barzilai-Borwein method. In L. Qi, K. teo, and X. Yang (eds), Optimization and Control with Applications , Series in Applied Optimization, vol 96, pp. 235-256, 2005. [7] L. Grippo, F. Lampariello, and S. Lucidi, A nonmonotone line search technique for Newton’s method, SIAM Journal on Numerical Analysis, 23, pp. 707–716, 1986.
AC
[8] N. Karmitsa, Test Problems for Large-Scale Nonsmooth Minimization, Reports of the Department of Mathematical Information Technology, Series B. Scientific Computing, No. B. 4, University of Jyv¨askyl¨a, 2007. [9] N. Kartmisa, http://napsu.karmitsa.fi/testproblems/
[10] W. La Cruz, J. M. Mart´ınez, and M. Raydan, Spectral residual method without gradient information for solving large-scale nonlinear systems of equations, Mathematics of Computation, 75, pp. 14291448, 2006. [11] M. Loreto and A. Crema, Convergence analysis for the modified spectral projected subgradient method, Optimization Letters, 9, pp. 915-929, 2015.
14
ACCEPTED MANUSCRIPT
[12] M. Loreto, S. Clapp, Ch. Cratty, and B. Page, Modified Spectral Projected Subgradient Method: Convergence Analysis and Momentum Parameter Heuristics, Bulletin of Computational Applied Mathematics, CompAMa, 4, no. 2, pp. 28-54, 2016. [13] M. Loreto, H. Aponte, D. Cores and M. Raydan, Nonsmooth Spectral Gradient Methods for Unconstrained Optimization, EURO Journal on Computational Optimization, 5, pp. 529-553, 2017.
CR IP T
[14] B.T. Polyak, A general method of solving extremum problems, Soviet Mathematics Doklady, 8, pp. 593-597, 1967. [15] M. Raydan, On the Barzilai and Borwein choice of steplength for the gradient method, IMA Journal of Numerical Analysis, 13, pp. 321-326, 1993. [16] M. Raydan, The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem, SIAM Journal on Optimization, 7, pp. 26-33, 1997.
AN US
[17] A. Friedlander, J. M. Mart´ınez, B. Molina, and M. Raydan, Gradient Method with Retards and Generalizations, SIAM J. Numer. Anal., 36, pp. 275-289, 1999. [18] N. Z. Shor, Minimization Methods for Non-differentiable Functions, Springer Series in Computational Mathematics, 1985. [19] E. D. Dolan and J. J. Mor´e, Benchmarking optimization software with performance profiles, Math. Program, 91, pp. 201-213, 2002. [20] S. Boyd, http://stanford.edu/class/ee364b/lectures.html
AC
CE
PT
ED
M
[21] B. Polyak, Introduction to Optimization, Optimization Software, Inc., 1987.
15