Almost Robust Discrete Optimization

Almost Robust Discrete Optimization

European Journal of Operational Research 276 (2019) 451–465 Contents lists available at ScienceDirect European Journal of Operational Research journ...

1MB Sizes 0 Downloads 97 Views

European Journal of Operational Research 276 (2019) 451–465

Contents lists available at ScienceDirect

European Journal of Operational Research journal homepage: www.elsevier.com/locate/ejor

Discrete Optimization

Almost Robust Discrete Optimization Opher Baron a, Oded Berman a, Mohammad M. Fazel-Zarandi b, Vahid Roshanaei a,∗ a b

Rotman School of Management, University of Toronto, Toronto, Ontario M5S 3E6, Canada Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA 02139, USA

a r t i c l e

i n f o

Article history: Received 15 May 2018 Accepted 18 January 2019 Available online 22 January 2019 Keywords: Combinatorial optimization Decision making under uncertainty Robust/stochastic discrete optimization Decomposition Almost robust optimization Binary variables

a b s t r a c t The main goal of this paper is to present a simple and tractable methodology for incorporating data uncertainty into optimization models in the presence of binary variables. We introduce the Almost Robust Discrete Optimization (ARDO). ARDO extends the Integrated Chance-Constrained approach, developed for linear programs, to include binary integer variables. Both models trade off the objective function value with robustness and find optimal solutions that are almost robust (feasible under most realizations). These models are attractive due to their simplicity, ability to capture dependency among uncertain parameters, and that they incorporate the decision maker’s attitude towards risk by controlling the degree of conservatism of the optimal solution. To solve the ARDO model efficiently, we decompose it into a deterministic master problem and a single subproblem that checks the master problem solution under different realizations and generates cuts if needed. In contrast to other robust optimization models that are less tractable with binary decision variables, we demonstrate that with these cuts, the ARDO remains tractable. Computational experiments for the capacitated single-source facility location problem where demands in each node are uncertain demonstrate the effectiveness of our approach. © 2019 Elsevier B.V. All rights reserved.

1. Introduction Traditional optimization models often assume that the parameters and data are known and certain. In practice, however, such assumptions are usually unrealistic, e.g., the data may be noisy or incomplete. A main reason for ignoring uncertainty is that the incorporation of uncertainty into optimization models often makes them intractable. The problem of intractability is amplified when the decision variables are binary (such requirements on decision variables are ubiquitous in practice, especially in operational and planning models, e.g., logistics problems including facility location, vehicle routing, and manufacturing problems such as production planning and scheduling). A popular technique to capture data uncertainty in linear and mixed-integer linear programs is to use scenarios (Demb, 1992; Escudero, Kamesam, King, & Wets, 1993). Two of the main approaches used to incorporate uncertainty are stochastic program´ ming (see Ruszczynski & Shapiro, 2003 and references therein) and chance-constrained programming (Charnes & Cooper, 1959). In stochastic programming (which are usually modeled in two stages) the decision maker optimizes an expected value of an ∗

Corresponding author. E-mail addresses: [email protected] (O. Baron), [email protected] (O. Berman), [email protected] (M.M. Fazel-Zarandi), [email protected], [email protected] (V. Roshanaei). https://doi.org/10.1016/j.ejor.2019.01.043 0377-2217/© 2019 Elsevier B.V. All rights reserved.

objective function that involves random parameters. Ahmed and Shapiro (2002) proposed a sample average approximation method for stochastic programs where the recourse variables are integers. In Chance-Constrained Programming, the decision maker wants to ensure that the constraints hold with at least a specified probability. Both approaches are typically very challenging to solve, especially when the decision variables are discrete. Another method to incorporate uncertainty into optimization problems is robust optimization (RO). Unlike the two methods mentioned above, which use probabilistic information, in RO the uncertainty is set-based, requiring no specific information about the distribution functions. In such models, the decision-maker seeks to find solutions that are feasible for any realization of the uncertainty in a given set (see Ben-Tal, Ghaoui, & Nemirovski, 2009 and Bertsimas & Sim, 2004a). The majority of the RO literature focuses on convex optimization and limited work is done on discrete optimization problems; see Atamtürk (2006); Averbakh (2001); Baron, Milner, and Naseraldin (2011); Bertsimas and Sim (20 03, 20 04b); Kouvelis and Yu (1997) for exceptions. Soft-constrained RO allows for violations of a subset of uncertain constraints at a penalty. The application of soft-constrained RO model to linear programming models can be traced back to Mulvey, Vanderbei, and Zenios (1995). Their approach integrates goal programming with scenario-based description of the problem data. It seeks to generate solutions that are less sensitive to the specific

452

O. Baron, O. Berman and M.M. Fazel-Zarandi et al. / European Journal of Operational Research 276 (2019) 451–465 Table 1 Comparison of ARDO with other approaches. Model

Constraint type

Uncertainty type

Risk type

Stochastic programming Chance constraints Robust optimization ARO (ICC) ARDO

Soft Soft/Hard Hard Soft/Hard Soft/Hard

Probabilistic Probabilistic Set-based Probabilistic/set-based Probabilistic/set-based

Quantitative Qualitative Qualitative Quantitative Quantitative

realization of the data. They show that by varying the penalty coefficient in the objective function, different model robustness (feasibility under most scenarios) can be achieved. The main drawback of their approach is the requirement that the random variables be distributed symmetrically around the mean. Ben-Tal and Nemirovski (1999) compared the solutions of their hard-constrained RO model with those of the soft-constrained RO of Mulvey et al. (1995) on a portfolio selection problem. Ben-Tal ad Nemirovski results show higher average yield of portfolio than those of Mulvey et al. (1995). Takriti and Ahmed (2004) consider a non-decreasing piecewise-quadratic penalty function for the RO model of Mulvey et al. (1995) that only penalizes violations exceeding the right hand side of constraints and Pınar (2007) uses a piece-wise linear penalty function and apply it to a multi-period portfolio selection problem. The integrated chance-constraints (ICC) (Haneveld & van der Vlerk, 2006; Klein Haneveld, 1986) is another soft-constrained approach to optimization under uncertainty. It integrates the concept of stochastic programming with that of chance-constrained optimization for problems with continuous decision variables. Depending on the risk tolerance of a decision maker, the ICC allows a violation of some scenarios at no cost. We prefer to name this approach Almost Robust Optimization (ARO) because (i) it relaxes the standard notion of robustness and permits a controlled violation of constraints based on a measure of their violation (such as the expected amount of violation), and (ii) its stochastic modeling concept resembles soft-constrained robust approaches (Ben-Tal, Bertsimas, & Brown, 2010; Ben-Tal, Boyd, & Nemirovski, 2006). Our Almost Robust Discrete Optimization (ARDO) shows that the stochastic concept used in the ARO can be applied in the presence of discrete (binary) decision variables, i.e., the ARDO is tractable. Also, we introduce the robustness index (RI) to incorporate the decision maker’s risk tolerance into the decision-making process. The ARDO combines ideas from stochastic programming, chance-constrained programming, and robust optimization. Table 1 compares the existing stochastic optimization approaches in the literature in accordance with the types of constraint (soft/hard), uncertainty (probabilistic/set-based), and risk (quantitative/qualitative). Quantitative risk measures the amount of constraint violation, whereas qualitative risk measures the probability of violation. ARDO is attractive for several reasons: (i) it is simple; (ii) it permits dependence among uncertain parameters; (iii) it incorporates the decision maker’s attitude towards risk by controlling the degree of conservatism of the optimal solution; (iv) it suits a wide spectrum of available probabilistic information: full, partial, or none; and (v) with the cuts we develop it is computationally tractable in the presence of binary variables. To efficiently solve the ARDO, we develop a decomposition approach that decomposes ARDO into a deterministic master problem and a simple subproblem that checks the master problem solution under different realizations and generates cuts if needed. An important upside of our approach is that since the master problem is a deterministic discrete optimization problem, existing approaches for solving such problems can be incorporated into the decomposition model to speed up the solution procedure. The

decomposition approach also overcomes the main drawbacks of the integer L-shape method (Laporte & Louveaux, 1993), which is the main tool used to solve stochastic integer programs (IPs). Specifically, unlike the L-shaped method, in which the subproblems are IPs and are hard to solve, the subproblems in our model are very simple to solve; furthermore, unlike the L-shape method where the cuts are generated from the relaxed LP dual of the subproblems (which can drastically decrease its efficiency), the cuts in our approach are efficient for IPs. We prove the finite convergence of our decomposition method. We illustrate both the modeling and the decomposition approach, on the capacitated facility location problem with uncertain demands. We show that our approach is very effective: we find optimal solutions up to three orders-of-magnitude faster than the original IP formulation. Furthermore, when the data uncertainty is represented by a set of scenarios, unlike other existing approaches that scale exponentially with the number of scenarios, this number does not affect the computability of the decomposition approach.1 Contributions. The contributions of this paper are threefold. First, we extend the concept of ARO to include binary variables and demonstrate that the ARDO framework remains tractable in the presence of binary decision variables. We also discuss how the ARDO model can be extended to distributionally-robust optimization cases. Second, we define the Robustness Index that allows the decision maker to adjust the ARDO to her risk preferences. We further demonstrate the potential advantages of being almost robust as opposed to being fully robust. Third, we develop a tractable decomposition-based approach to solve the ARDO model with discrete (binary) decision variables. We show, via proving the validity of our integer cuts, that ARDO converges to global optimality in finitely many steps. The paper is organized as follows. In Section 2, we review prior hard-and soft-constrained robust optimization models, present the ARDO framework and formulation, and compare ARDO with the existing models in the literature. The details of the decomposition approach are described in Section 3. Section 4 extends the model to a general uncertainty structure. An application of the ARDO and our decomposition methodology is provided in Section 5. Computational results and discussion are presented in Sections 6, and Section 7 concludes the paper. Proofs and extensions are included in the Appendices. 2. Relevant optimization models In Section 2.1, we present prior hard-constrained robust formulations in which uncertain data is incorporated into a deterministic optimization model to make its solution robust against uncertainty, and discuss their advantages and disadvantages. Then, we present, in Section 2.2, the two most relevant soft-constrained models to this study, the generalized robust optimization (GRO) and ARO. We discuss the Robustness Index that is used in the ARDO model to investigate the trade-off between the changes in the objective 1 Note that Fazel-Zarandi, Berman, and Beck (2013) applied a few concepts of the proposed approach to solving a specific stochastic location-allocation problem, which motivated us to develop the general methodology presented here.

O. Baron, O. Berman and M.M. Fazel-Zarandi et al. / European Journal of Operational Research 276 (2019) 451–465

function and given various allowable risk values determined by a decision maker. In Section 2.3, we elaborate on the differences between the existing stochastic approaches and the ARDO. We first present the deterministic model, and then provide its robust and worst-case counterparts. Consider the deterministic optimization problem:

minimize

z = cT x

(deterministic model )

subject to Ax ≤ b x ∈ X. where c and x are N × 1 vectors, b is a (M + J ) × 1 vector, A is a (M + J ) × N matrix, and X is {R+ }N if the decision variables are continuous and is {0, 1}N if they are discrete. The classical paradigm in mathematical programming assumes that the input data c, A and b is known. In practice, however, portions of this data may be uncertain. Without loss of generality, assume that the data uncertainty only affects A or a portion of it, and that both c and b are deterministic (see Ben-Tal et al., 2009).2 We partition A into a deterministic M × N matrix D, and an uncertain J × N matrix U. Similarly, b is partitioned into a M × 1 vector bD corresponding to D, and a J × 1 vector bU corresponding to U (note that bU is still deterministic without loss of generality). For ease of exposition, throughout most of the paper we assume that the uncertain data is represented by a set of scenarios S = {1, 2, . . . , |S|}, with probabilities ps for each s ∈ S, where the decision maker may or may not have full information about the probabilities (in Section 4, we demonstrate our approach for cases where the uncertainty has other structures). Each scenario is associated with a particular realization of the uncertainty matrix, Us , with corresponding entries usi j . Scenarios are used since they are powerful, convenient and practical in representing uncertainty, and they allow the decision maker flexibility in the specification of uncertainty (e.g., uncertain data may be highly correlated, sampled from simulation, and/or obtained from historical data). 2.1. Hard-constrained robust optimization models For completeness, we present robust optimization and worstcase hard constraints optimization and discuss their advantages and disadvantages. 2.1.1. Robust optimization model Consider the following set-based robust formulation proposed by Ben-Tal and Nemirovski (1998) where the uncertainty, Us , set is a convex hull of the scenarios:

minimize

(robust model )

z = cT x

subject to Dx ≤ bD U s x ≤ bU

∀s ∈ S

x ∈ X. The robust model ensures that the optimal solution does not violate any of the constraints under any possible scenario realization. This formulation has three main drawbacks. First, since the constraints must hold under all scenarios, it may no longer have a feasible solution. Second, even when feasible solutions exist, they may be too conservative, i.e., the optimal solution may be very costly. This conservatism is magnified when a scenario with a low 2 These can be achieved by incorporating them into the uncertainty partition of A, by adding the constraint z − cT x ≥ 0, introducing the variable 1 ≤ xn+1 ≤ 1, and transforming the constraints to Ax − bxn+1 ≤ 0.

453

probability has a large effect on the optimal outcome. Specifically, a major flaw of the robust model is that it completely ignores any available information regarding the probabilities of different scenarios.3 Finally, the size of the robust formulation grows with the number of scenarios and therefore this formulation typically becomes less tractable as the number of scenarios increases. This RO model is suitable for cases where the decision maker cannot tolerate any amount of risk. 2.1.2. Worst-case model Let each element of the matrix U be equal to the maximum value of the corresponding element over all scenarios, i.e., ui j = max usi j . The worst-case formulation, first proposed by Soyster s

(1973), is

minimize

z = cT x

(worst-case model)

subject to Dx ≤ bD U  x ≤ bU x ∈ X. This formulation has the first two drawbacks of the robust model (its solution is even more conservative). The main upside of this model is its smaller size that makes it more tractable (similar to the deterministic problem, its complexity is independent of the number scenarios). This robust model with hard constraints is conceivably advantageous for very high-risked applications and decision makers with absolute reluctance in taking any amount of risk. 2.2. Soft-constrained optimization models In this section, we discuss two soft-constrained RO models that have been developed for linear optimization problems: Globalized Robust Optimization (GRO) (Ben-Tal et al., 2006) and ICC that we call ARO (Haneveld & van der Vlerk, 2006). We also provide a new interpretation of the ARO, demonstrating that it has a flexible and intuitive structure and that is accessible to researchers and practitioners. We then extend the ARO to the ARDO, i.e., to include binary integer variables. 2.2.1. Globalized Robust Optimization (GRO) The GRO model extends RO by also considering realizations that are outside of the “regular” uncertainty set. For such realizations in the “irregular” uncertainty set, the GRO relaxes the constraints for each realization. We note that Ben-Tal et al. (2010) studied an application of GRO with distribution ambiguity. The GRO paradigm was initially called “comprehensive robust counterpart”, and Ben-Tal et al. (2010) use the expression “soft robust model” for an analogous model that is based on GRO. Let Z and Z¯ denote the sets of realizations within the convex hulls of the regular and irregular uncertainty sets, respectively; let s ∈ S be the scenarios in Z (as in the RO formulation), and s¯ ∈ S¯ be scenarios in Z¯ (That is, U s¯ ∈ Z¯ for any s¯ ∈ S¯). As in RO, for all possible realizations s ∈ S, GRO requires a solution that satisfies all constraints. However, for all possible realizations s¯ ∈ S¯, GRO allows to violate the constraints in proportion to the distance of each realization from the regular uncertainty set Z. Let dist(U s¯, Z ) be a distance metric between a realization U s¯ ∈ Z¯ and the regular uncertainty set, Z, and let α ≥ 0 be the global 3 Note that robust optimization was developed to be effective when no specific probabilistic knowledge is available.

454

O. Baron, O. Berman and M.M. Fazel-Zarandi et al. / European Journal of Operational Research 276 (2019) 451–465

sensitivity (proportion) coefficient. Then, the GRO formulation adds to the robust model the following constraints:

U s¯x ≤ bU + α · dist(U s¯, Z )

∀ s¯ ∈ S¯.

The GRO formulation allows the decision maker to specify different levels of approaches with respect to risk by jointly taking three decisions: the extent of the regular and irregular uncertainty set, the global sensitivity parameter α , and the distance metric. Its main drawback is that, similar to RO, it ignores probabilistic information. Moreover, while it relaxes the constraints for realizations within the irregular uncertainty set these relaxed constraints are now “hard”, so GRO’s feasibility and conservatism face similar criticism to those of the RO. Finally, controlling the risk by a combination of three decisions may be non-intuitive for managers. 2.2.2. The almost robust optimization model We focus our discussion on what Haneveld and van der Vlerk (2006) refers to as the Individual ICC—first type (see their Section 2.2). We remind that we refer to this model as ARO. The ARO trades off the solution value with robustness to find solutions that are almost robust (feasible under most realizations). Unlike the basic set-based robust formulation, which does not allow the optimal solution to violate any of the constraints under any realization, in ARO, infeasibility of the uncertain constraints under some of the scenarios may be allowed at a penalty. Consider the jth uncertain constraint, U sj x ≤ bUj . For every j = 1, . . . , J, the amount of infeasibility of solution x in scenario s can be measured by

 

Q sj (x ) = max 0, U js x − bUj





≡ U js x − bU



+

.

(1)

ARO allows other measures of infeasibility such as the absolute 





value of infeasibility, Q sj (x ) = (U sj x − bUj ), which is applicable to equality constraints. For ease of exposition, we focus on the infeasibility measure (1). We now introduce penalty functions to penalize uncertain constraint violations. Three alternative penalty functions, based on the available probabilistic information, are maximum penalty, expected penalty, and distributionally-robust penalty. We use the first two penalty functions to explain ARO: 1. Max penalty function: this penalty is applicable when the decision maker does not have knowledge of the probabilities ps (a central assumption in RO):

Q j (x ) = max α sj Q sj (x ), s∈S

(2)

where α sj is the per unit constraint violation penalty under scenarios s. This penalty function determines the maximum (worst-case) penalty over all scenarios. This penalty function is most suitable for applications where high risks are involved (Mulvey et al., 1995), but is unsuitable for low and medium risk applications as it is very conservative. 2. Expected penalty function: this penalty is suitable when the decision maker has full probabilistic knowledge of the scenarios:

Q j (x ) =



ps α sj Q sj (x ).

(3)

s∈S

The primary advantage of the expected penalty function over the quadratic mean-variance approach of Mulvey et al. (1995) is its linearity and ability to capture asymmetries in the distribution of the random variable. The Robustness Index we define in (5) makes this penalty function appropriate to all levels of risk in application. The ARO (Haneveld & van der Vlerk, 2006) uses this penalty function.

We further define Q (x ) = [Q 1 (x ), Q 2 (x ), . . . , Q J (x )]T . Let j denote the maximum allowable risk specified by the decision maker for violating constraint j, so that the J × 1 vector  indicates an upper bound on the expected amount of violations for each constraint j. Specifically, the vector  incorporates the decision maker attitude towards risk, i.e., a decrease in the values of  implies the decision maker is less tolerant and is more conservative. Later in this section, we introduce the Robustness Index that the decision maker can use to determine an appropriate vector  based on his/her risk attitude. Using the above notation, the ARO formulation is

minimize

(ARO )

cT x

subject to Dx ≤ bD Q (x ) ≤  x ≥ 0. As can be seen, this model allows the decision maker to control the magnitude of the penalties (violations) by changing . Note that the robust optimization model is a special instance of the ARO where  j = 0, ∀ j, i.e., when infeasibility of any constraint under any scenario is not allowed. In general, other objective functions can also be considered for ARO, e.g., the mean value   T objective, min s∈S ps cs x ≡ min c¯T x, where c¯ = s∈S ps c, and the  f sT distributionally-robust objective: min max s∈S ps c x. f ∈F

2.2.3. Almost robust discrete optimization The ARDO model uses the same procedure as the one used in the ARO to compute violations. The only difference is that in ARDO all the decision variables are binary. The ARDO model, for the sake of completeness, is therefore presented as follows:

minimize

(ARDO )

cT x

subject to Dx ≤ bD Q (x ) ≤  x ∈ {0, 1}. We discuss in Section 3 how a decomposition-based tractable methodology can be designed for ARDO. In addition to changing the decision variables, we discuss how ARDO can be extended to a more general penalty function. To this end, we describe the adaptations required for ARDO to be able to handle the distributionallyrobust penalty function. Distributionally-robust penalty function. This penalty is applicable when the decision maker has limited information regarding the probability distribution of the scenarios, i.e., s/he knows that the probability distribution of the scenarios belongs to a known f set of distributions, F. Let ps be the probability of scenario s un f f der distribution f ∈ F, and Q j (x ) = s∈S ps α sj Q sj (x ) be the expected penalty of uncertain constraint j under distribution f. This penalty function has the form: f

Q j (x ) = max Q j (x ). f ∈F

(4)

This penalty function determines the worst-case expected penalty over distributions in F, thus, they enforce robustness with respect to a set of possible distributions. This penalty function is suited for applications with extremely ambiguous sources of uncertainty, and when the decision maker is highly risk averse. For ease of exposition, we focus on the expected penalty function as in (3) and assume α sj = 1. The inclusion of different α sj can be easily incorporated into the solution methodology and the proofs. In Appendix B, we extend the results to the

O. Baron, O. Berman and M.M. Fazel-Zarandi et al. / European Journal of Operational Research 276 (2019) 451–465

max and distributionally-robust penalty functions. Furthermore, in Section 4, we extend the summation used in (3) to represent an expectation with respect to any well-defined random variable. Remark 1. Note that we can also construct other penalty functions that are a mixture of the above three functions. For example, consider a setting in which the decision maker only knows the prob abilities of some of the scenarios S ⊂ S (where s∈S ps < 1), but does not know the probabilities of the other scenarios SࢨS . A potential penalty function in this case is a mixture of (2) and (3):   Q j (x ) = s∈S ps Q sj (x ) + (1 − s∈S ps ) max Q sj (x ). s∈S\ S 

We now discuss “how to determine the best value for the penalty limit, ?” We address this question by introducing the Robustness Index (RI) that can be used to investigate the trade-off between the changes in the objective function and in the penalty function as the penalty limits, , change. The RI enables the decision maker to determine the appropriate penalty limits, given his/her risk preference. To formally introduce RI, let x∗ denote the optimal solution of ARDO given the penalty limits , and let x∗0 be the optimal solution when a violation is not allowed. Recall that the objective function value of the problem where infeasibility is not allowed is equal to the objective function of the robust optimization model. Let I1 × J denote a J × 1 vector of 1s. Then:

Robustness Index = =

improvement in objective function value increase in total penalty cT x∗0 − cT x∗ Q (x∗ )T I1×J

.

(5)

RI measures the relative improvement in the objective function value, cT x∗0 − cT x∗ , (the difference between the objective function with penalty limits, , and the objective function values where infeasibility is not allowed, i.e.,  = 0) over the increase in the total penalty (Q (x∗ )T I1×J ). It is important to point out that when different constraints have different units or scales, prior to using the RI, the decision maker must first perform a uniformization of the constraints to ensure that they are of the same magnitude. This uniformization can be done by adjusting the value of α sj . Similar to “the price of robustness” (Bertsimas & Sim, 2004a), we expect that allowing some infeasibility while paying a penalty should improve the objective function. Indeed, our computational results in Section 6 show that the benefit of allowing a higher deviation alongside an increase in the corresponding penalty will be less effective as the allowed penalty increases. That is, we expect some decreasing returns for allowing a higher penalty. As RI in (5) measures the benefit of allowing a higher expected penalty, we expect that it will be a convex decreasing function of the penalty, . Indeed our computational results in Section 6.2 agree with this intuition. Moreover, our results suggest that often a little increase in the penalty limit is all that is required in order to substantially improve the objective function.

455

use probabilistic knowledge of the uncertain data (i.e., full, partial or no probabilistic information). Second, in RO, the constraints are hard, i.e., the constraints must be satisfied for all possible realizations (an exception is Ben-Tal et al., 2006), while ARDO accommodates both hard and soft constraints, i.e., the decision maker has the choice to decide whether to allow infeasibility under some realizations. GRO vs. ARDO: The GRO model (Ben-Tal et al., 2006) and the soft robust (Ben-Tal et al., 2010) are extensions of RO. The main differences between ARDO and GRO are: (i) unlike the GRO model, in ARDO the decision maker has full or partial probabilistic knowledge of the uncertain data; (ii) the reformulated constraints in GRO are still hard, and (iii) ARDO uses a single parameter for controlling the robustness of the solution whereas GRO uses three such decisions. Thus, the ARDO is simpler to operate and communicate with managers. Stochastic programming and chance-constrained programming vs. ARDO: The main similarity between these three models is that they use probabilistic knowledge of the uncertain parameters; but ARDO also applies in situations with limited probabilistic information. The main difference between stochastic programming and ARDO is that the former seeks to minimize the expected cost (disutility) of a decision made in the first stage, while the latter seeks to find solutions that are almost robust (feasible under most realizations). That is, while stochastic programming penalizes any constraint violation in the objective function via a recourse function, we seek to find solutions that are less sensitive to the uncertain data, while at the same time allowing the decision maker to adjust the amount of constraint violation. Conceptually, there are three main differences between chance constraints and ARDO: (i) ARDO is more suitable for cases where the decision maker faces quantitative risk (where measuring the amount of constraint violation is important), whereas chance constraints are more suited for cases with qualitative risk (where instead of considering the amount of violation as in ARDO, the probability of violation is important); (ii) as mentioned above, while chance constraints require probabilistic knowledge of the uncertain parameters, ARDO can be applied to settings with limited (or no) probabilistic information; (iii) unlike chance constraints that are in general intractable (one of the main motivations for the development of robust optimization is the intractability of chance constraints Ben-Tal et al., 2009), ARDO preserves the convexity of the constraints, thus, it is much more tractable. Remark 2. Note that while our framework is conceptually different from chance constraints, we can modify the penalty function such that the ARDO formulation coincides with chance constraint models (Charnes & Cooper, 1959; Klein Haneveld, 1986). Specifically,  we can define a penalty function Q j (x ) = s∈S ps I{Q sj > 0} where s s I{Q j (x ) > 0} = 1 if Q j > 0, and zero otherwise. Given the penalty

2.3. Comparison between ARDO and other stochastic approaches

function Q j (x ), the penalty constraint is Q j (x ) ≤ , where  is a vector of probability limits. Note that such a penalty function is not convex, which, as mentioned above, leads to intractability.

We now elaborate on the differences between RO, stochastic programming, chance-constrained, and ARO and ARDO. This comparison serves as a guideline for decision makers as to which stochastic approach is better suited under different uncertain decision-making circumstances. RO vs. ARDO: Both RO and ARDO seek to find solutions that are robust given an uncertainty set and can incorporate the decision makers’ risk attitude (Ben-Tal et al., 2010; Bertsimas & Sim, 2004a; Natarajan, Pachamanova, & Sim, 2009). There are two main differences between the two approaches (e.g., Ben-Tal & Nemirovski, 1998; Soyster, 1973). First, unlike RO, ARDO can directly

ARO (ICC) vs. ARDO: The differences between ARDO and ARO are as follows. While ARO has been developed for continuous variables (x ≥ 0), its tractability in the presence of binary variables has not been investigated. In contrast, as we demonstrate in Section 6, ARDO is tractable for optimization problems with binary variables, x ∈ {0, 1}. Given an upper bound on the amount of penalty limit, j , ARO considers the amount of violation for each constraint j under each scenario s as a decision variable (ys in Eq. (13) in Haneveld & van der Vlerk, 2006). ARDO computes these violations using the existing constraints of the model without adding new variables. To summarize, ARDO has three benefits: (i) its formulation

456

O. Baron, O. Berman and M.M. Fazel-Zarandi et al. / European Journal of Operational Research 276 (2019) 451–465

Fig. 1. Pseudo code of the decomposition approach.

is simple and easy to explain to managers; and (ii) the application of ARDO to binary programs (BPs) remains a BP and hence it is tractable; and (iii) it allows managers to use the robustness index (5) to determine the best tradeoff between infeasibility and over-conservatism. 3. The decomposition approach The ARDO size grows exponentially not only in the number of binary variables but also in the number of scenarios—the latter mostly impacts the number of constraints. Thus, we develop a solution methodology to efficiently solve ARDO to optimality or near-optimality. To this end, we decompose the ARDO into a deterministic master problem and simple stochastic subproblems. As in most decomposition methods (see, for example, Benders, 1962; Blankenship & Falk, 1976; Hooker & Ottosson, 2003), we start by solving a relaxation of the original problem (the master problem) and iteratively adding cuts (derived from solving the subproblem) until a feasible (optimal) solution is found. Specifically, in the master problem, the uncertain parameters are replaced with their expected values (the master problem is similar to the deterministic problem).4 Every single subproblem calculates the penalty of the different scenarios given the master problem solution and generates cuts when this solution is infeasible for the ARDO (i.e., when the master problem solution violates at least one penalty constraint). The cuts, when added to the master problem, eliminate at least the current infeasible master solution (as we show later, each cut can remove many infeasible solutions). The decomposition approach iterates between solving the master problem and the subproblem(s) until the optimal solution to ARDO is found. The implementation details of our decomposition approach are presented in Algorithm 1. To generate an optimal solution for ARDO, our decomposition must satisfy the following two conditions: 1. Condition 1 (Lower Bound): In any iteration, the master problem solution must be a lower bound on that of ARDO. 2. Condition 2 (Validity of the Cuts): A cut is valid if it 4 Note that for ARDO with the max penalty function (2), instead of using the expected values, the minimum values of the uncertain parameters over all scenarios must be used in the master problem. Please see Appendix B for more details.

(i) excludes at least the current master problem solution, which is not feasible to ARDO, and (ii) does not remove any feasible solutions of ARDO. Condition 1 ensures that any feasible solution of ARDO is also feasible for the master problem. Condition 2 ensures that the decomposition does not get stuck in a loop. Since the decision variables have a finite domain of {0, 1}, this condition guarantees that the decomposition converges in a finite number of steps. Condition 2 guarantees optimality, i.e., because the cuts never remove feasible solutions of ARDO, while removing infeasible solutions of ARDO (condition 2(i)), and since all feasible solutions of ARDO are also feasible for the master problem (condition 1), the decomposition is guaranteed to find an optimal solution to ARDO, if one exists. 3.1. Master problem



 s s T be a J × N matrix Let U = Es [U s ] = s∈S psU1 , . . . , s∈S psUJ where each element is equal to the expectation of the corresponding element over all possible realization, and let the J × 1 vector  =  + bU . The deterministic master problem formulation is: minimize

cT x

(master problem )

subject to Dx ≤ bD U x ≤  Cuts x ∈ {0, 1}. As stated earlier, cuts are constraints that are added to the master problem when its current solution is not feasible for ARDO. As can be seen, the master problem formulation is equivalent to a deterministic formulation with an extra set of cuts. Note that the cuts that we introduce later are both linear and deterministic. As we will show in Theorem 1, the master problem solution is a lower bound on the solution of ARDO. This implies that if the master problem in iteration t does not find any feasible solution, then ARDO will not also find any feasible solution. Thus, the decomposition will terminate.

O. Baron, O. Berman and M.M. Fazel-Zarandi et al. / European Journal of Operational Research 276 (2019) 451–465

3.2. Subproblem The subproblem checks the master problem solution (if one exists) under different realizations and creates cuts for the master problem when the current solution is not feasible for ARDO. These cuts prevent (at least) the current infeasible master problem solution from being considered in future iterations without removing any feasible integer solution of the original ARDO. The subproblem is comprised of three steps: (i) penalty calculation, (ii) feasibility check, and (iii) cuts generation. To discuss these steps let xt and x denote the optimal master problem solution in iteration t and any iteration > t, respectively. Step 1 - Penalty calculation: For any scenario s ∈ S and uncertain constraint j = 1, . . . , J calculate Q sj (xt ). Then, calculate the penalty of each uncertain constraint given the j master solution,  Q j (xt ) = s∈S ps Q sj (xt ), j = 1, . . . , J. Step 2 - Feasibility check: Check if Q j (xt ) ≤ . The feasibility check has two possible outcomes: •



No violation: If the optimal solution of the master problem does not violate any penalty constraints, the decomposition approach is terminated with the current solution, which is optimal. Violation: If the optimal master solution violates at least one of the penalty constraints, determine the set of penalty constraints that is violated in iteration t, denoted by Vt , and the set of scenarios where uncertain constraint j ∈ Vt has a non-zero penalty, denoted by Stj , where Stj = {s|Q sj (xt ) > 0}, and continue to step 3.

tion of uij , ∀i, j, with mean μ, and covariance matrix  , and support (U). Under such uncertainty, we extend the penalty function (3) to Q j (x ) = E[(U j x − bUj )+ ] = (U ) (U j x − bUj )+ dF , while penalty function (4) is extended to Q j (x ) = max E f [(U j x − bUj )+ ], where F f ∈F

denotes the of set of probability distributions with mean μ and covariance matrix  that is assumed to include the true distribution f. Given the appropriate penalty function, ARDO with a general uncertainty structure is given by:

min cT x subject to Dx ≤ bD Q j (x ) ≤  j



ps (U js (xt − x )) ≤  j ,

j ∈ Vt.

(6)

s∈Stj

As stated earlier, the cuts ensure that at least the current infeasible master solution is eliminated, i.e., if xt = x the second term in the left hand side will be equal to zero forcing the cut to be violated (as Q (xt ) > ). This cut has been introduced in Haneveld and van der Vlerk (2006) for continuous optimization problems. A main goal of our paper is to demonstrate that this cut is effective for binary decision variables as well. Theorem 1 (Finite convergence). If ARDO has an optimal solution, the decomposition approach finds this solution in a finite number of iterations. Theorem 1 follows from (i) the decomposition satisfies conditions 1 and 2, and (ii) the finiteness of the number of constraints. Note that while similar to other decomposition models (see, e.g. Hooker & Ottosson, 2003 and Balas, Ceria, & Cornuéjols, 1993), an upper bound on the number of iterations 2N , in practice, the number of iterations of the decomposition is much smaller. Intuitively, this decomposition is effective because rather than solving the ARDO with all of its constraints (many of them are nonbinding at optimality), it iteratively identifies a subset of binding constraints, and solves the corresponding smaller optimization problem. This intuition is confirmed by our computational experiments where the average number of iterations is about 3. This is because a cut not only removes the current infeasible solution, but possibly also a whole subset of infeasible solutions. 4. General uncertainty structure In this section, we extend ARDO to allow a more general uncertainty structure. Assume that the uncertainty matrix U has a multivariate cumulative distribution, f (representing the joint distribu-

j = 1, . . . , J

x ∈ {0, 1}. Note that when  = 0 and (U) is bounded, the generalized model is equivalent to the robust counterpart (Ben-Tal et al., 2009), where the uncertain parameter belongs to the uncertainty set (U). Due to the multidimensional integrals, which are computationally prohibitive when the dimension exceeds four (see, e.g., Chen & Sim, 2009), the Generalized ARDO (even with continuous variables) may be computationally intractable. Therefore, it may be essential to find a tractable approximation for the penalty function. For each constraint j, let vector U j denote the mean of Uj and  j denote the corresponding covariance matrix. The next lemma provides a distribution-free bound, Bj (x), on the penalty function. Lemma 1 (Distribution-free upper bound). For any distribution, the following upper bound always holds (Popescu, 2007):

Step 3 - Cuts generation: Generate and add to the master problem the cuts:

Q j (xt ) −

457

E f [(U j x − bUj )+ ] ≤ = B j ( x ), ∀ f ∈ F .

1 (U j x − bUj ) + 2



 j xxT + (U j x − bUj )2 (7)

The next result provides a tractable approximation of the Generalized ARDO model with general uncertainty structure. Proposition 1. If the following second-order cone programming model has an optimal solution, it is a feasible solution of the Generalized ARDO model with a general uncertainty structure:

minimize

cT x

(Generalized ARDO)

subject to Dx ≤ bD B j (x ) ≤  j

j = 1, . . . , J

x ∈ {0, 1}. The above result follows directly from Lemma 1: since Q j (x ) = E f [(U j x − bUj )+ ] ≤ B j (x ), any solution x that satisfies Bj (x) ≤ j , will

also satisfy Q j (x ) ≤  j , ∀ j. Since Bj (x) is an upper bound on Q j (x ), the optimal solution of the formulation in Proposition 1 is more conservative than that of ARDO. Note that replacing uncertainty constraints with their computationally tractable safe approximation is at the heart of robust optimization (Ben-Tal et al., 2009). Decomposition approach. The decomposition approach presented in Section 3 can be modified to solve the Generalized ARDO with discrete variables. As before, we use the first moment matrix U = E[U] in the master problem. Furthermore, the subproblem is comprised of the same steps as in Section 3: (i) penalty calculation: given the master problem solution in iteration t, xt , calculate Q j (xt ) = E[(U j xt − bUj )+ ]; (ii) feasibility check: check if any penalty j constraints is violated; (iii) cut generation: the following modified

458

O. Baron, O. Berman and M.M. Fazel-Zarandi et al. / European Journal of Operational Research 276 (2019) 451–465

cuts are generated and added to the master problem:

∀ j ∈ Vt.

Q j (xt ) − U j (xt − x ) j ,

Corollary 1. If the ARDO with general uncertainty structure has an optimal solution, the decomposition approach produces an optimal solution for it in a finite number of iterations. Remark 3. Note that in the penalty calculation step of the decomposition, when calculating Q j (xt ) is difficult, Bj (xt ) can be calculated instead. In this case, while the optimal solution generated by the decomposition model is a feasible solution of the Generalized ARDO, it may not be its optimal solution. This follows from Lemma 1: while Bj (xt ) < j implies Q j (xt ) <  j , the reverse may not be true.

in (10) may no longer hold under all scenarios. We will allow violations of constraints h ∈ H under some scenarios, Qhs (x ) =   ( g∈G dgs xgh − bh yh )+ , at a penalty, Q h (x ) = s∈S ps Qhs (x ). The objective is to minimize the total cost such that the penalty of each facility, Q h (x ), does not exceed a given threshold value, h . Recall that the threshold value for each facility reflects the decision makers’ willingness to have less than the required capacity at the potential facility. Note that if there is no facility located at location h, we have yh = xgh = 0 ∀g ∈ G, and therefore Q h (x ) = 0. In this problem, the deterministic part of the formulation (Dx ≤ bD ) is captured by constraints (8) and (9), and the uncertain part (Ux ≤ bU ) is represented by Constraints (10). The ARDO formulation of the single-source capacitated facility location problem with uncertain demand is:



f h yh +



(ARDO-FL )

5. The capaciated facility location problem with uncertain demand

minimize

We present the deterministic model for the single-source capacitated facility location problem and develop its ARDO counterpart that incorporates the uncertainty in demand into the deterministic model. We then present our decomposition method.

subject to Constraints (8 ), (9 ), and (11 )

h∈H

5.1. The deterministic model Consider the single-source capacitated facility location problem. There is a set of potential facilities, H, and a set of customers, G. For each facility h there is an opening cost, fh , and a capacity on the total demand that can be served, bh . Furthermore, each customer has demand, dg , and providing service to customer g from facility h has an associated cost, cgh . The goal is to select the set of facilities to open and assign customers to facilities in order to minimize the total cost, under the conditions that facility capacities are not exceeded. Let the binary variable yh indicate whether facility h is selected and xgh indicate whether customer g is assigned to facility h. The deterministic IP formulation of the single-source capacitated facility location (FL) problem is

minimize



f h yh +

h∈H

subject to





(IP-FL )

cgh xgh



5.3. Decomposition method To efficiently solve the ARDO-FL model, we develop a decomposition approach, consisting of a deterministic master problem, and feasibility checking subproblems, one for each open facility. 5.3.1. Master problem Replacing different stochastic demand scenarios s in each de mand node g, dgs , with its expected demand, dg = s∈S ps dgs , leads to the master problem:

minimize

dg xgh ≤ bh yh

∀g ∈ G

(8)

∀g ∈ G, h ∈ H ∀h ∈ H

(9)

(10)

g∈G

xgh , yh ∈ {0, 1}



f h yh +

h∈H



(master problem )

cgh xgh

g∈G h∈H

subject to Constraints (8 ), (9 ), and (11 )  dg xgh ≤ (bh + h )yh

∀h ∈ H

g∈G

Cuts.

h∈H

xgh ≤ yh

∀h ∈ H.

Q h ( x ) ≤ h

g∈G h∈H

xgh = 1

cgh xgh

g∈G h∈H

∀g ∈ G, h ∈ H.

5.3.2. Subproblem The subproblems check the deterministic solution of the master problem with respect to all demand scenarios. If the amount of capacity violation in any open facility exceeds the maximum allowable risk, the subproblems generate cuts (12) that remove these infeasible solutions. •



(11)

Constraints (8) ensure that each demand node h ∈ H is uniquely assigned to a facility g ∈ G. Constraints (9) limit the allocation of demand nodes to open facilities, and finally Constraints (10) ensure that the capacity of open facilities is not violated. Constraints (11) show the binary decision variables.



Ht , Gth = g|xtgh = 1 , calculate the penalty cost of all facilities h ∈ Ht in scenario s, Qhs (xt ) =



g∈Gt h

dgs − bh

+

, and the

expected penalty cost of each facility over all the scenarios,  Q h (xt ) = s∈S ps Qhs (xt ), h ∈ Ht . •

5.2. The ARDO model The IP-FL model assumes that the customer demand data is perfectly known. In practice, however, this assumption may not hold. We use ARDO to consider such uncertainties. Assume that the customer demands are uncertain, and that different demand conditions are considered as different scenarios, dgs , with known probability ps . Due to this uncertainty, the capacity constrains

Step 1—Penalty Calculation: Given the set of facilities opened solution of the masterproblem at iteration t, Ht =

in the  h|yth = 1 , and the set of customers assigned to facility h ∈



Step 2—Feasibility Check: If Q h (xt ) ≤ h , h ∈ H, terminate. Otherwise, determine the set of facilities that violate the penalty constraints, Vt , and go to step 3. Step 3—Cuts Generation: Generate and add the following cuts to the master problem:

Q h (xt ) −

 {s|Qhs (xt )>0}

≤ h , &h ∈ V t .



ps

 g∈Gth

dgs (1 − xgh ) −





dgs xgh

g∈G\Gth

(12)

O. Baron, O. Berman and M.M. Fazel-Zarandi et al. / European Journal of Operational Research 276 (2019) 451–465

The inner summation term is the decrease/increase in the penalty cost of the corresponding facility, given some of the customers are reassigned. As can be seen, the cut prevents the customers in Gth or a super-set of them from being assigned to facility h in future iterations. Since demand is non-negative, by removing  s g∈G\Gt dg xgh , the cut remains valid. h

6. Computational experiments In this section, we present computational results for the singlesource capacitated facility location problem with uncertain demands. We conduct the following three experiments: 1. Efficiency: Our initial experiment addresses the efficiency of the decomposition approach by comparing its performance with the original IP formulation of ARDO (note that, in what follows, the abbreviation ARDO denotes the original IP formulation of ARDO). 2. Value of almost robustness: In the second experiment, we use the Robustness Index, RI, (defined in (5)) to show the connection between the decision maker’s risk preference and the penalty limit. Furthermore, by using this index, we highlight the potential benefits of being almost robust as opposed to being fully robust. 3. An industrial size problem: In the third experiment, in order to highlight our methodology, we apply it to an industrial size problem concerning the location of facilities across the City of Toronto. Problem instances: The problem instances were randomly generated as follows: the facility capacities are uniformly drawn from an integer interval bh ∈ [50, 200]; the fixed facility opening costs, fh , are randomly generated based on the capacities bh and are calculated as: fh = bh (10 + rand[1; 15] ), where 10 is the per unit capacity cost. We have added a random integer multiplier from [1, 15] to take into account differences in property costs. The assignment costs are uniformly generated from an interval cgh ∈ [1, 100]. The demands for each scenario are uniformly generated from the interval dgs ∈ [1, 30]. To generate the probabilities of the different scenarios, we assigned each one of them a random number in the range [1, 100] in a uniform manner. We then normalized these numbers to probabilities by dividing by the sum of the numbers assigned to all scenarios. Overall, the problem instances consist of three sizes: 30 × 15 (i.e., 30 customers, 15 possible facility sites), 40 × 20, and 50 × 25; five number of scenarios: 5, 15, 30, 60, 240; four different penalty limits () 10, 20, 30, and 40. Finally, six instances of each sizescenario-limit combination are generated for a total of 3 × 5 × 4 × 6 = 360 instances. With such high number of scenarios for demand stochasticity, it is possible to evaluate how the algorithms perform when there are many scenarios. The tests were performed on a Dual Core AMD 270 CPU with 1 megabytes cache, 4 gigabytes of main memory, running Red Hat Enterprise Linux 4. The models are implemented in ILOG CPLEX. A time limit of 3 hours was used, and for unsolved instances (the instance that where timed-out) the time-limit is used in the calculations (we also report the percentage of unsolved instances). 6.1. Efficiency of the decomposition approach Figs. 2 and 3 illustrate the effects of the number of scenarios on the run-time of ARDO and the decomposition approach. As can be seen, the mean and median run time of ARDO drastically increases with the number of scenarios. However, the number of scenarios has only a minor effect if any on the runtime of the decomposition approach. Note that because of scaling, the two figures do not

459

include the result for the problems with 240 scenarios. As can be seen in Table 2, the above trend holds for such problems. Table 2 gives the mean and median CPU time in seconds required to solve each problem instance for each scenario size for both ARDO and the decomposition approach. The column labeled “% Uns.” indicates the percentage of the problems for which the models were terminated because of the time limit (of 3 hours). For the decomposition approach, the column labeled “Iter.” indicates the average number of iterations needed to converge to optimality. The “Time ratio” for a given instance is calculated as the ARDO runtime divided by the decomposition run-time. The mean over each instance in each subset is then calculated. Table 2 indicates the decomposition model is very efficient. Specifically, the decomposition approach is unable to find the optimal solution within the time limit in only 1.1% of instances, while ARDO is unable to find the optimal solution in 31.1% of the instances. For ARDO, we see an increasing trend in the number of unsolved instances as the number of scenarios increases (up to 77.8%). Furthermore, as the time ratios indicate, on average, the decomposition approach is 2539 times faster than ARDO. We again see an increasing trend of the time ratios as the number scenarios increase. Finally, the table demonstrates that the decomposition approach converges to optimality on average in 3 iterations. Fig. 4 shows a scatter plot of the run-times for both ARDO and the decomposition approach. Both axes are log-scaled, and the points below the x = y line indicate a lower runtime for the decomposition approach. On all but 8 of the 360 instances (over 97%), the decomposition achieves equivalent or better run-time (Note that 6 out of the 8 instances that are dominated are for the problem instances with 5 scenarios). Observation 1. (Computational efficiency). The decomposition is much more efficient than the ARDO, i.e., it is up to three ordersof-magnitude faster, and can find optimal solutions for many more problems within the time limit. Why did decomposition converge in few iterations? In the first iteration using deterministic demand, dg , the decomposition algorithm establishes cost-effective locations based on fh and determines the best demand nodes for each open facility based on cgh . Still, the MP deterministic solution, yt , xt , likely violates the maximum capacity assigned to each facility (regular+budgeted capacity: bh + h ). The deterministic MP uses h > 0 as additional capacity, producing a solution that is more realistic (less conservative) than when h = 0, leading to fewer infeasibilities. Upon detecting infeasibility in any of the open facilities, the feasibility cut (12) permanently removes the MP-generated solution in the previous iteration, producing a new solution, yt+1 , xt+1 , that requires at least a single change in the location of open facilities and/or the re-allocation of demand nodes to open facilities. In general, the decomposition converges to global optimality when the deterministic MP optimum does not violate the maximum capacity of open facilities. Example. We provide an example to clarify the issue of expeditious convergence of our decomposition algorithm to almostrobust solutions. Consider six demand nodes with facility setup costs of 700, 750, 800, 850, 900, and 950, respectively. For the sake of simplicity, assume an identical weighted demand of d¯g = 100 for all these demand nodes, a fixed assignment cost of cgh = 1, identical initial capacity of bh = 600, and a maximum allowable risk of  = 5. Assume that the demands in the first five nodes are deterministic and the demand in node 6 can be represented by two scenarios of 130 and 30 that occur with probability 70% and 30%, respectively. The optimal solution of the master problem is to open only a facility in demand node 1 with setup cost of 700 and assign all the other demand nodes to this open facility with the initial capacity of 600. This deterministic solution leads to 100% utilization of the established facility’s capacity. We know that 500 units of the

460

O. Baron, O. Berman and M.M. Fazel-Zarandi et al. / European Journal of Operational Research 276 (2019) 451–465

Fig. 2. Average Run-time of ARDO and decomposition approach as a function of the number of scenarios. Table 2 The mean and median CPU time (in seconds) and percentage of unsolved problem instances (“% Uns.”) for ARDO and decomposition approaches, and the mean number of iterations (“Iter.”) for the decomposition approach. “Overall” indicates the mean results over all problem instances. ARDO

Decomposition

Time

Time

# of scenarios

Mean

Median

% Uns.

Mean

Median

% Uns.

%Iter

Time ratio

5 15 30 60 240 Overall

655 1188 3238 5524 9173 3955

13.2 98.9 557.4 5883.7 10,800.0 534.3

4.2 6.9 23.6 43.1 77.8 31.1

258 133 248 286 241 233

2.1 4.9 3.2 5.1 4.8 4.1

1.3 0 1.3 1.3 1.3 1.1

3.2 2.4 3.4 3.1 2.9 3.0

18 91 240 566 11,779 2539

initial capacity of the open facility is not subject to uncertainty and we only require to check the feasibility of the master problem with respect to the 100 units of capacity set aside for node 6. The feasibility check shows that the required capacity for node 6 is 121 units ((600 − 500 ) + ((30 − −100 )+ × 30% + (130 − 100 )+ × 70% )), which exceeds the maximum allowable risk of  = 5. We note that if the amount of violations was ≤ 5, the master problem solution would still be considered optimal. Now that the aggregate amount of demands (500 + 100 + 21 = 621) exceeds the 605 units of the regular and budgeted (maximum allowable risk) capacity of the open facility (infeasible master solution), a cut is needed to change the master solution. In the second

iteration, the cut ensures that the master problem will open one more facility in node 2 and re-optimizes the allocation of demand nodes. Any master problem solution that removes a demand node from the first facility is considered feasible in the next iteration. We assume, in the second iteration of our decomposition method, the demand nodes 1, 3, 4 are assigned to open facility 1 and demand nodes 2, 5, and 6 are assigned to open facility 2. With this feasible solution, the initial capacity of the first facility is only utilized at 50% (30 0/60 0) and the utilization of the second facility will be 53.5% (321/600). Even more drastic fluctuations in the demand would not impact the feasibility of the second-iteration solution given its utilization. This example is in line with the

O. Baron, O. Berman and M.M. Fazel-Zarandi et al. / European Journal of Operational Research 276 (2019) 451–465

461

Fig. 3. Median Run-time of ARDO and decomposition approach as a function of the number of scenarios.

Table 3 The mean, median, and maximum of the Robustness Index as a function of the penalty limit, . Robustness Index Penalty limit ()

Mean

Median

Max

5 15 20 30 40

105.4 43.3 30.6 23.1 21.7

100.2 40.8 27.7 22.8 19.3

1397.8 146.3 103.1 103.1 103.1

results for an industrial-sized data, as summarized in Table 4, that RO requires the opening of more facilities than ARDO. 6.2. Value of almost robustness In this section, we show how the Robustness Index in (5) can be used by decision makers to choose an appropriate penalty limit to better suit their risk preferences. We also highlight the benefits of being almost robust as opposed to fully robust. To study the RI, we calculate it for all the problem instances discussed earlier for the capacitated facility location problem with uncertain demand. We summarize the results in Table 3 and Fig. 5. In Table 3, we show the mean, median, and maximum of the RI over all problems instances with a particular  value. Fig. 5 depicts the changes in the average RI (the average is taken over all

the instances with a given ) and the average improvement in the objective function value (cT (x∗0 − x∗ ) × 10−1 ) as a function of the penalty limit. As expected from the discussion regarding RI in (5), as the penalty limit increases, the average incremental improvement in the objective function value increases slower than the average incremental increases in the penalty. Therefore, it may not be necessary for the decision maker to choose a very high penalty limit since her relative gain (in terms of improvement in the solution value) may not grow as fast as the loss (in terms of the penalty). This experiment also demonstrates that allowing a small amount of infeasibility can substantially improve the solution value. Specifically, from Table 3 we can see that as we go from  = 0 to  = 5, the RI can be as high as 1397, which indicates that the improvement in the objective function value can be up to 1397 times more than the expected penalty. This suggests that the solution of ARDO can be (substantially) less conservative than that of the robust model, and therefore, highlights the potential benefits of being almost robust (instead of being fully robust). Moreover, Table 3 also emphasizes that a little increase in the expected penalty is all that is required in order to substantially improve the objective function. We also compared the optimal solutions of the worst-case, robust and ARDO models. As expected, the worst-case model has the most conservative solution: it is on average 36.5% higher than that of the robust model, and 73.6% higher than the optimal objective function value of ARDO plus the total expected penalty Q (x∗ )T I1×J . By comparing the solution of the

462

O. Baron, O. Berman and M.M. Fazel-Zarandi et al. / European Journal of Operational Research 276 (2019) 451–465

Fig. 4. Runtime of ARDO (x Axis, Log-Scale) vs. decomposition model (y Axis, Log-Scale). Table 4 The location of facilities for both the robust and ARDO models and the percentage of customers allocated to corresponding facility, For ARDO “% Penalty/Obj” indicates ((expected penalty/objective function value) × 100), and “% Infeasibility indicates the percentage of the scenarios where the capacity constraint of the corresponding facility is violated”. Robust model Nodal location 7 28 50 87 10 17 44 55 62 91

ARDO % Allocation 8.3 8.3 5.2 8.3 12.5 8.3 12.5 12.5 12.5 12.5

Nodal location 7 28 50 87 14 63

robust model and ARDO, we see that the solution of the robust model is still too conservative: on average, it is 24.6% higher than the ARDO solution plus the total expected penalty. Moreover, the optimal solution of ARDO is feasible in more than 80% of the scenarios (i.e., in more than 80% of the scenarios, the penalty is zero). Observation 2. (The Robustness Index) The decision maker can use the Robustness Index to determine the appropriate penalty limit based on her risk attitude. The index demonstrates that a (possibly) small amount of penalty can substantially improve the objective function value. 6.3. An industrial size example To further highlight the applicability of our methodology, we apply it to an industrial size problem concerning the location of

% Allocation 18.1 19.1 12.8 17.7 11.7 20.2

% Penalty/Obj 0.81 0.95 0.95 0.98 0.96 0.93

% Infeasible 93 95 95 93 93 94

facilities (e.g., super markets or restaurant chain) across the City of Toronto, one of North America’s largest metropolitan centers with an area of 641 square kilometers and over 2,70 0,0 0 0 residents. Experimental setup: The data used in this experiment are adopted from Berman, Krass, and Menezes (2007). In this dataset, the city is divided into 96 “FSA” areas (defined by the first three digits of the Canadian postal code). For each FSA, the dataset provides the total number of residential and business dwellings, which we treat as a proxy for expected demand, dg . For each FSA, we use the coordinates of the centroid as both the demand point and potential facility location. That is, we construct a 96-node network with FSA centroids as nodes with a given expected demand. We use the normalized (between 1–100) distance between the nodes as a proxy for the assignment costs cgh . For each node g, we generate 10 0 0 random demand scenarios dgs by randomly drawing

O. Baron, O. Berman and M.M. Fazel-Zarandi et al. / European Journal of Operational Research 276 (2019) 451–465

463

Fig. 5. The mean Robustness Index and the improvement in the objective function value (cT (x∗0 − x∗ ) × 10−1 ) as a function of the penalty limit.

numbers around its expected demand, i.e., dgs = dg × rand[0.5; 1.5]. As we do not have data on facility capacities bh , we randomly generate them by multiplying each node’s expected demand by a random number between 6 and 10 (under such capacities, each facility serves on average 8 customers (nodes), which seems reasonable (see Lu, Ran, & Shen, 2015). Finally, for each facility h, we set the penalty limit to 0.1 × bh . Evaluation: We attempted to run the robust, ARDO IP, and decomposition method to solve this problem. Given the large size of the ARDO IP (as a result of the linearization of the penalty constraints, which scales with the number of scenarios), the computer could not load the problem and ran out of memory. Fortunately, we can solve the decomposition method (since it does not scale with the number of scenarios). Similar to before, the decomposition model converges to the optimal solution in 4 iterations. The results of the robust model and ARDO are summarized in Table 4. As can be seen, the robust model opens 10 facilities, whereas ARDO opens 6 facilities. Both approaches share four common facilities. Moreover, we can see that in the ARDO solution, the total demand assigned to each facility is below its total capacity in over 90% of the scenario. In this problem, the total objective function value (cost) of the robust model is 40% higher than the objective function value plus the total expected penalty of ARDO. Note that the column “% Penalty/Obj” shows that the expected penalty of each opened facility is very small: it is less than 1% of the optimal objective value (cost). Moreover, the last column in Table 4 shows that in

the ARDO solution, the total demand assigned to each facility is below its total capacity in over 90% of scenarios, i.e., for each facility, ARDO is robust in more than 90% of the scenarios. This implies that in the fully robust model, the decision maker incurs 40% extra cost in order to be feasible for an additional 10% of the scenarios. This result further highlights the benefits of the ARDO model: it can achieve substantial savings by allowing a small amount of infeasibility. 6.4. Discussion In this section, we discuss the applicability of our cuts to optimization problems with binary variables and present additional possible applications for ARDO. What are the possible applications of ARDO? ARDO is relevant for all applications of robust optimization with either continuous (ARO or ICC can be used) and/or binary decision variables. We highlight just a few. For instance, ARDO can be applied to the distributed (multi-hospital) operating room (OR) planning and scheduling (Roshanaei, Luong, Aleman, & Urbach, 2017) in which the opening and closing of surgical suites and the number of ORs therein should be decided. Each operating room is usually in use for eight regular hours in each day and managers try to strictly avoid long OR overtimes. Given the large variability in surgical durations, extending OR regular time hours by some additional minutes under some scenarios may significantly increase the number of surgeries that can be performed in each hospital-day with little

464

O. Baron, O. Berman and M.M. Fazel-Zarandi et al. / European Journal of Operational Research 276 (2019) 451–465

extra overtime cost. The robustness index can help the executives responsible for the OR scheduling appropriately trade off the number of surgeries with overtime cost. ARDO can also be applied to the robust multi-period facility location and demand allocation problem of Baron et al. (2011). They consider the optimal location of facilities, the initial capacity of open facilities, allocation of demand nodes to open facilities, and period-based allocated capacity to each demand node is determined. The period-based required capacity of each demand node is based on the realizations of demand in that period. Incorporating scenarios of uncertainty into their framework can be effectively done using the ARDO approach. Another application for ARDO is the wind and solar generators installment in energy-intensive facilities to attain low-carbon manufacturing operations. This problem has recently been formulated as a multi-period, production-inventory planning model in a multi-plant manufacturing system powered with onsite and grid renewable energy (Golari, Fan, & Jin, 2017). The goal is to optimize the production quantity, the stock level, and the renewable energy supply in each period so as to minimize the aggregate production cost. Given the uncertainty in the demand for each product in each period, the ARDO approach can be useful. 7. Conclusions We presented the Almost Robust Discrete Optimization model that is simple, intuitive, flexible, and easy to communicate to managers. The model allows uncertainty in the decision-making process and bridges stochastic programming and robust optimization. For ARDO problems with binary decision variables, we developed a decomposition approach and demonstrated its effectiveness. We also defined and studied the Robustness Index that can be used to adjust ARDO in accordance with their risk preferences. By using this index we demonstrated the potential advantages of being almost robust as opposed to being fully robust. We believe that the Almost Robust Optimization framework is applicable in many settings and for many problems and that it may be used in future studies. One direction for future research is to develop other soft-constrained robust optimization methods for integer optimization problems. It could be interesting to compare ARDO with applications to low, medium, and high-risk environments using the penalty functions (max, expected, and distributionally-robust) discussed in this study. Appendix A. Proof of Theorem 1 As explained in Section 3, conditions 1 and 2 are sufficient to establish Theorem 1. We first prove condition 2, that the cuts (6) are valid. We start by establishing 2(i), i.e., that the cut excludes the current infeasible master problem solution from all subsequent solutions. Suppose the optimal master problem solution is infeasible in iteration t. Then there exists at least one penalty constraint, i, such that:

Q i (xt ) > i .

(13)

The generated cut for this constraint, from (6), is

Q i (x ) − t



( (x − x )) ≤  j .

ps Uis

t

(14)

s∈Sti



If

s∈Sti

the same solution xt is found in future iterations, ps (Uis (xt − x )) = 0, so that the left hand side of inequality

(14) equal to Q i (xt ) that from inequality (13) is greater than i , creating a contradiction. Thus, xt is infeasible in future iterations of the master problem. We now prove 2(ii), that is, the proposed

cut does not remove any feasible solution of ARDO: let xw be a feasible solution of ARDO found in iteration w > t,

Q j (xw ) ≤  j , j = 1, . . . , J.

(15) xw

We present a proof by contradiction. Assume that satisfy a cut formed in iteration t for constraint i. Thus,

Q i (xt ) −



does not

ps (Uis x10 − Uis x01 ) >  j .

(16)

s∈Sti

Since Q i (xt ) =





s∈Sti

ps (Uis xt − bUi ), (16) can be simplified to

ps (Uis xw − bUi ) > i ,

(17)

s∈Sti

which contradicts (15) and thus, our assumption that the feasible solution xw does not satisfy the cut. This establishes that the cut does not remove any feasible solution of ARDO. Since both conditions 2(i) and 2(ii) are satisfied, we conclude that the cut is valid. We now prove condition 1, that the optimal objective value of the master problem is a lower bound on that of ARDO. We first note that because the cuts are valid, they do not remove any feasible solution of ARDO at any iterations of the decomposition. Moreover, since the objective function and the deterministic constraints of ARDO also appear in the master problem, we must only take into consideration the penalty constraints of ARDO, Q j (x ) =  s U + s ps (U j x − b j ) ≤  j , ∀ j, and the expected-value constraints of  the master problem, U j x ≤ j ≡ s ps (U sj x − bUj ) ≤  j , ∀ j. In particular, we must show that any feasible solution of ARDO is also feasible to the master problem, i.e., if Q j (x ) ≤  j then U j (x ) ≤ j =  j + bUj . Indeed,

U j x − bUj =



ps (U js x − bUj ) ≤

s∈S



ps (U js x − bUj )+ = Q j (x ),

∀ j.

s∈S

(18) Thus, we conclude that any feasible solution of ARDO is also feasible for the master problem. Therefore, the optimal objective value of the master problem is a lower bound on that of ARDO. Appendix B. Extension The exposition of the paper focuses on the expected penalty function. In this section, we modify the decomposition approach to capture both the max and the distributionally-robust penalty functions. Max penalty function: The following modifications in the decomposition approach are required: (i) in the master problem, replace the expected value matrix U = Es [U s ] by a minimum ma

trix U with elements equal to the minimum of the corresponding element over all scenarios; (ii) in Step 1 of the subproblem calculate the maximum penalty Q j (xt ) = max Q sj instead of the exs

pected penalty; (iii) in Step 2 of the subproblem let stj denote the scenario with the maximum penalty; and (iv) in Step 3 use the modified cut: st

Q j (xt ) − U j j (xt − x ) ≤  j ,

j ∈ Vt.

(19)

Corollary 2. Cuts of the form (19) are valid. Thus, the modified decomposition approach produces an optimal solution to ARDO with the max penalty function in a finite number of iterations. The proof of Theorem 1 directly applies here, we thus omit the proof. Distributionally-robust penalty function: This penalty function is applicable when the decision maker has limited probabilistic information. As is common in the literature (see, for example, Bertsimas & Popescu, 2005; Popescu, 2007), we assume that the

O. Baron, O. Berman and M.M. Fazel-Zarandi et al. / European Journal of Operational Research 276 (2019) 451–465

decision maker knows that the probability distribution of the scenarios belongs to a known set of probability distributions, F, and has an estimate of the first moment of the uncertainty matrix, U . The following modifications in the decomposition approach are required: (i) as before, use the first moment matrix U in the master problem; (ii) in Step 1 of the subproblem, for each uncertain constraint j calculate the expected penalty under distribution f ∈ F , Q j (xt ) f ∈ F , determine the distribution with the maximum ft

expected penalty f tj = argmax Q j (xt ), and let Q j = Q j j (xt ); (iii) in f ∈F

Step 3 use the modified cuts:

Q j (xt ) −



ft

ps j (U js (xt − x )) ≤  j ,

j ∈ Vt.

(20)

s∈Stj

Corollary 3. Cuts of the form (20) are valid. Thus, the modified decomposition approach produces an optimal solution to ARDO with the distributionally-robust penalty function in a finite number of iterations. Proof. The proof of condition 2(i) is similar to that in Theorem 1 and is omitted. Similar to Theorem 1, we use contradiction to prove condition 2(ii): we assume xw is a feasible solution of the distributionally-robust model, i.e., f

max Q j (xw ) ≤  j , f ∈F

j ∈ 1, . . . , J,

(21)

that does not satisfy the cut (20) of constraint i. Therefore, the cut when x = xw is (after simplification)



ft

ps i (Uis xw − bUi ) > i .

(22)

s∈Sti

It

follows

argmax Q i f ∈F

that

( xw )



fit

under or

fiw

both

=

fit ,

possible

outcomes,

fiw =

(22) contradicts (21). Thus,

the cut is valid. Furthermore,

E f [U js x − bUj ] = U j x − bUj ≤ E f [U js x − bUj ], ∀ f ∈ F , j ∈ J.

(23)

From (23) and using the same argument as the proof of condition 1 of Theorem 1, we can conclude that the master problem solution is a lower bound on that of the distributionally-robust model. References Ahmed, S., & Shapiro, A. (2002). The sample average approximation method for stochastic programs with integer recourse. https://www.researchgate. net/profile/Shabbir_Ahmed6/publication/20 0 035231_The_sample_average_ approximation_method_for_stochastic_programs_with_integer_recourse_ Submitted_for_publication/links/0 0b4952c302ca74f550 0 0 0 0 0.pdf. Atamtürk, A. (2006). Strong formulations of robust mixed 0–1 programming. Mathematical Programming, 108(2), 235–250. Averbakh, I. (2001). On the complexity of a class of combinatorial optimization problems with uncertainty. Mathematical Programming, 90(2), 263–272. Balas, E., Ceria, S., & Cornuéjols, G. (1993). A lift-and-project cutting plane algorithm for mixed 0–1 programs. Mathematical Programming, 58(1), 295–324. Baron, O., Milner, J., & Naseraldin, H. (2011). Facility location: A robust optimization approach. Production and Operations Management, 20(5), 772–785. Ben-Tal, A., Bertsimas, D., & Brown, D. B. (2010). A soft robust model for optimization under ambiguity. Operations Research, 58(4-part-2), 1220–1234.

465

Ben-Tal, A., Boyd, S., & Nemirovski, A. (2006). Extending scope of robust optimization: Comprehensive robust counterparts of uncertain problems. Mathematical Programming, 107(1), 63–89. Ben-Tal, A., Ghaoui, L. E., & Nemirovski, A. (2009). Robust optimization. Mathematical Programming, 107(1), 63–89. Ben-Tal, A., & Nemirovski, A. (1998). Robust convex optimization. Mathematics of Operations Research, 23(4), 769–805. doi:10.1287/moor.23.4.769. Ben-Tal, A., & Nemirovski, A. (1999). Robust solutions of uncertain linear programs. Operations Research Letters, 25(1), 1–13. Benders, J. F. (1962). Partitioning procedures for solving mixed-variables programming problems. Numerische Mathematik, 4, 238–252. Berman, O., Krass, D., & Menezes, M. B. C. (2007). Facility reliability issues in network p-median problems: Strategic centralization and co-location effects. Operations Research, 55(2), 332–350. Bertsimas, D., & Popescu, I. (2005). Optimal inequalities in probability theory: A convex optimization approach. SIAM Journal on Optimization, 15(3), 780–804. Bertsimas, D., & Sim, M. (2003). Robust discrete optimization and network flows. Mathematical Programming, 98(1), 49–71. Bertsimas, D., & Sim, M. (2004a). The price of robustness. Operations Research, 52(1), 35–53. Bertsimas, D., & Sim, M. (2004b). Robust discrete optimization under ellipsoidal uncertainty sets. Working paper, (pp. 1–23). Blankenship, J. W., & Falk, J. E. (1976). Infinitely constrained optimization problems. Journal of Optimization Theory and Applications, 19(2), 261–281. Charnes, A., & Cooper, W. W. (1959). Chance-constrained programming. Management Science, 6(1), 73–79. Chen, W., & Sim, M. (2009). Goal-driven optimization. Operations Research, 57(2), 342–357. Demb, R. S. (1992). Scenario immunization. In S. A. Zenios (Ed.), In financial optimization (pp. 290–308). Cambridge University Press. Escudero, L. F., Kamesam, P. V., King, A. J., & Wets, R. J.-B. (1993). Production planning via scenario modelling. Annals of Operations Research, 43(6), 309–335. Fazel-Zarandi, M. M., Berman, O., & Beck, J. C. (2013). Solving a stochastic facility location/fleet management problem with logic-based Benders decomposition. IIE Transactions, 45(8), 896–911. Golari, M., Fan, N., & Jin, T. (2017). Multistage stochastic optimization for production-inventory planning with intermittent renewable energy. Production and Operations Management, 26(3), 409–425. Haneveld, W. K. K., & van der Vlerk, M. H. (2006). Integrated chance constraints: reduced forms and an algorithm. Computational Management Science, 3, 245–269. Hooker, J. N., & Ottosson, G. (2003). Logic-based Benders decomposition. Mathematical Programming, 96(1), 33–60. Klein Haneveld, W. (1986). On integrated chance constraints. Duality in stochastic linear and dynamic programming, 113–138. Kouvelis, P. S., & Yu, G. (1997). Robust discrete optimization and its applications. Norwell, MA: Kluwer Academic PublishKlu. Laporte, G., & Louveaux, F. V. (1993). The integer l-shaped method for stochastic integer programs with complete recourse. Operations Research Letters, 13(3), 133–142. Lu, M., Ran, L., & Shen, Z.-J. M. (2015). Reliable facility location design under uncertain correlated disruptions. Manufacturing & Service Operations Management, 17(4), 445–455. Mulvey, J. M., Vanderbei, R. J., & Zenios, S. A. (1995). Robust optimization of large-scale systems. Operations Research, 43(2), 264–281. Natarajan, K., Pachamanova, D., & Sim, M. (2009). Constructing risk measures from uncertainty sets. Operations research, 57(5), 1129–1141. Pınar, M. Ç. (2007). Robust scenario optimization based on downside-risk measure for multi-period portfolio selection. OR Spectrum, 29(2), 295–309. Popescu, I. (2007). Robust mean-covariance solutions for stochastic optimization. Operations Research, 55(1), 98–112. Roshanaei, V., Luong, C., Aleman, D., & Urbach, D. (2017). Propagating logic-based Benders’ decomposition approaches for distributed operating room scheduling. European Journal of Operational Research, 257(2), 439–455. ´ Ruszczynski, A., & Shapiro, A. (2003). Stochastic programming models. In Stochastic programming. In Handbooks in operations research and management science: 10 (pp. 1–64). Elsevier. Soyster, A. L. (1973). Convex programming with set-inclusive constraints and applications to inexact linear programming. Operations Research, 21(5), 1154–1157. Takriti, S., & Ahmed, S. (2004). On robust optimization of two-stage systems. Mathematical Programming, 99(1), 109–126.