Computers and Chemical Engineering 27 (2003) 781 /801 www.elsevier.com/locate/compchemeng
Design and planning under uncertainty: issues on problem formulation and solution L. Cheng, E. Subrahmanian, A.W. Westerberg * Department of Chemical Engineering, Institute for Complex Engineered Systems, Carnegie Mellon University, Doherty Hall 2311, Pittsburgh, PA 15213-3890, USA Received 30 November 2001; received in revised form 14 November 2002; accepted 14 November 2002
Abstract Firms operate today in a rapidly changing and risky environment, where such factors as market and technology are inevitably shrouded in uncertainties. They must make design and operating decisions to satisfy several conflicting goals such as maximizing expected profit, minimizing risk, and sustaining long-term viability and competitiveness. Proper formulation is both essential and critical for finding appropriate solutions to such problem. We show how one can formulate this problem as a Markov decision process with recourse that considers decision making throughout the process life cycle and at different hierarchical levels. This formulation incorporates multiple kinds of uncertainties such as market conditions and technology evolution. It allows decisionmakers to provide multiple criteria */such as expected profit, expected downside risk, and process lifetime */that reflect various conflicting or incommensurable goals. The formulation integrates design decisions and future planning by constructing a multiperiod decision process in which one makes decisions sequentially at each period. The decision process explicitly incorporates both the upper-level investment decisions and lower-level production decisions as a two-stage optimization problem. This problem formulation leads to a multi-objective Markov decision problem, searching for Pareto optimal design strategies that prescribe design decisions for each state the environment and process could occupy. We can often recast this class of problem in order to exploit a rigorous multi-objective stochastic dynamic programming algorithm. This approach decomposes the problem into a sequence of single-period subproblems, each of which is a two-stage stochastic program with recourse. We show how one can solve these subproblems to obtain and propagate the Pareto optimal solutions set recursively backward in time. A small illustrative example appears throughout the paper to demonstrate the formulation and solution issues. The scalability of the rigorous algorithm is limited due to the ‘‘curse of dimensionality’’, suggesting the need for approximating approaches to solve realistic problems. # 2002 Elsevier Science Ltd. All rights reserved. Keywords: Design and planning; Investment decisions; Uncertainties; Multi-criteria decision-making; Markov decision process; Dynamic programming; Two-stage stochastic program with recourse
1. Introduction Consider a chemical company that is designing a chemical reaction process. The process is operating in a dynamic and uncertain environment throughout the process life cycle, one in which the market conditions like product demand and price are uncertain and in which the reaction technology is evolving rapidly. In such an uncertain environment, the designers have to
* Corresponding author. Tel.: /1-412-268-2344; fax: /1-412-2687139. E-mail address:
[email protected] (A.W. Westerberg).
consider some important issues such as: (1) the capacity of the process should be able to satisfy a changing uncertain demand, (2) future technological advances may occur, e.g., a new catalyst could lead to rapid obsolescence of existing technology, and (3) in response to changes in the environment, existing reactors could be expanded, replaced, or salvaged in the future. All these issues will affect current design decisions. For instance, due to technology breakthroughs, the loss in value of existing equipment could render the exploitation of scale economics unattractive, and even favor the option of ‘‘wait and see’’ without investing. Given such a complicated situation, decision-makers must seek design solutions to satisfy several goals including maximizing
0098-1354/02/$ - see front matter # 2002 Elsevier Science Ltd. All rights reserved. PII: S 0 0 9 8 - 1 3 5 4 ( 0 2 ) 0 0 2 6 4 - 8
782
L. Cheng et al. / Computers and Chemical Engineering 27 (2003) 781 /801
profit, minimizing risk, and staying viable and competitive in business. The difficulty here lies in the conflicts among the various objectives. For example, a more aggressive investment strategy may result in a higher expected profit while increasing the risk that the actual profit could slump in a recession like scenario; an exclusive profit-driven criterion may lead to a solution that would shut down the process in a short time, which contradicts the decision-makers’ desire of sustainability and social responsibility. In general, this class of problems is called ‘‘design/ investment under uncertainty’’, whose importance has been widely recognized in both academia and practice. This paper intends to make the following contributions in this area: 1) We allow decision-makers to provide multiple objectives, such as expected profit, expected downside risk, and process lifetime. These objectives are usually conflicting or incommensurable (e.g., in different units). 2) We formulate the problem as a discrete-time Markov decision process. We generalize the conventional ‘‘design’’ to a dynamic design process and integrate the design decisions and future planning by creating a decision process in which one makes decisions sequentially throughout the process life cycle. 3) The formulation explicitly incorporates upper-level design decisions and lower-level production decisions at each period in the decision process. One typically makes design decisions based on an assessment of future uncertainty and production decisions after resolving some of the uncertainties. 4) The formulation incorporates different kinds of uncertainties including market conditions (product demand, price, etc.) and technology improvements (timing and magnitude of future technology breakthroughs). We characterize the behavior of those uncertainties using either independent random variables or random processes evolving over time. The formulation can also accommodate ‘‘uncertainty’’ in the process itself, e.g., issues of process flexibility. 5) We develop a multi-objective stochastic dynamic programming solution algorithm based on the oconstraint method, developed specifically to discover the set of Pareto optimal solutions. We show that under certain reasonable assumptions, we can decompose the original problem into a sequence of single-period optimizations using this algorithm. By solving these single-period subproblems, we can calculate and propagate the Pareto optimal frontiers recursively backward in time. The remainder of the paper is organized in the following manner. In Section 2, we briefly review related
work on ‘‘design/investment under uncertainty’’ coming from the disciplines of process systems engineering and operations research/management science. We also provide a review of the theoretical background and methodology of ‘‘sequential decision making under uncertainty’’ and ‘‘multi-criteria decision making’’ in operations research, which we believe provides the basis for formulating and solving this class of problems. We present a small motivating example of design under uncertainties in Section 3, providing a primary yet concrete feel for the problem and facilitating the understanding of the problem formulation and solution. Section 4 briefly analyzes the problem of dynamic design and investment in the context of a process life cycle and presents a generic framework that incorporates decision making at different hierarchical levels (design and production) throughout the process life cycle. We model the decision making as a Markov decision process with recourse, and formulate it as a multi-objective Markov decision problem. Section 5 discusses how to recast the original problem so that we may employ a multi-objective dynamic programming approach to decompose the original problem into a sequence of subprograms. The Pareto optimal solution set is found through solving those subproblems recursively backward in time. We discuss the computational results in Section 6 and conclude the paper with a summary and future directions in Section 7.
2. Review of the literature 2.1. Literature on applications There is a rich literature in the area of ‘‘process design under uncertainty’’, with many contributing to the solving of process design/planning problems having uncertain parameters. Ierapetritou, Acevedo, and Pistikopoulos (1994) present design/plan models involving stochastic parameters and propose a decompositionbased approach for its solution. Pistikopoulos and Ierapetritou (1995) formulate a two-stage stochastic programming model to determine the design that maximizes the expected revenue while simultaneously measuring design feasibility. Bhatia and Biegler (1999) develop an efficient decomposition algorithm for solving multi-period design problems resulting from incorporating multiple scenarios into the problem formulation. This area has found many applications in batch plant design under product demand uncertainty (Subrahmanyam, Pekny, & Reklaitis, 1994; Ierapetritou & Pistikopoulos, 1995; Epperly, Ierapetritou, & Pistikopoulos, 1997; Petkov & Maranas, 1998). A vast economics literature in this area labels this type of problem ‘‘investment under uncertainty’’ (Dixit & Pindyck, 1994), in which there are research topics such
L. Cheng et al. / Computers and Chemical Engineering 27 (2003) 781 /801
as capacity planning, equipment replacement, and technology adoption. In capacity planning, Eppen, Martin, and Schrage (1989) model a multi-period, multi-product capacity planning problem in the face of demand uncertainty and construct a mixed-integer programming model based on a scenario planning approach. Eberly and Van Mieghem (1997) present a framework to study the dynamic investment under uncertainty in multiple resources, representing various types of capital and labor, and showed that the optimal investment strategy follows a control limit policy at each point in time. Harrison and Van Mieghem (1999) present a model to determine the optimal investment strategies for a manufacturing firm that employs multiple resources to market several products to meet an uncertain demand. The model incorporates explicitly both the capacity investment decisions and production decisions. In the area of technology adoption, Balcer and Lippman (1984) present a rich model of technological innovations and replacement under uncertainty and provide valuable insights on the impact of future technological expectation on the replacement polices. Nair and Hopp (1992), Nair (1995, 1997) present a nonstationary model for technology decisions in the face of anticipated breakthroughs and use a forecast horizon approach to solve the model to find the optimal decision over an infinite horizon. Recently, some work attempts to unify the different areas in capacity planning and technology replacement. Rajagopalan (1998) develops a general model that considers replacement of capacity as well as expansion or disposal, together with economy of scale. Rajagopalan, Singh, and Morton (1998) consider capacity expansion issues simultaneously with replacement, in the context of a sequence of random technological developments, and develops a regeneration point-based dynamic programming algorithm. As the above review illustrates, widespread recognition of the importance of this problem exists, with its extensive study in the literature on ‘‘design/investment under uncertainty’’. However, we believe more emphasis must be put onto problem formulation. We first contend that there are no design problems that are singleobjective problems. Always design involves trade-offs among different goals. Thus, we suggest one must formulate and solve these problems as multi-objective problems. Without this recognition, one would discover ‘‘optimal’’ solutions that are unacceptable when examined. For example, the solution that maximizes present worth may well be to shut down the current operation, irrespective of the social consequences. We further contend that this class of problem must involve decisions of different types (capacity, technology, etc.), at different levels (investment, production, etc.) and at different times (now and in the future). There are usually significant uncertainties in the decision context that includes market conditions, technological environment,
783
and regulatory institutions. There is a great need for a generic unifying framework that can systematically assist designers or managers to make design/investment decisions in the presence of these uncertainties throughout the process life cycle. While there is a paucity of prior work attempting to bridge the gap, two research streams in the discipline of operations research, ‘‘sequential decision under uncertainty’’ and ‘‘multiple criteria decision making’’, have developed rapidly. These two areas provide a fundamental theoretical basis for formulation and solution of this class of problem. 2.2. Literature on theory and methodology Generally speaking, there are two modeling approaches in operation research discipline to address the problem of ‘‘sequential decision under uncertainty’’. The first is ‘‘multistage stochastic programming with recourse’’ and the second is ‘‘Markov decision processes’’. A multistage stochastic program results from problems that involve sequences of decisions over time, and the decisions can respond to realizations of outcomes that are not known a priori (Birge & Louveaux, 1997). The stochastic programming problem is usually reformulated into a large-scale deterministic equivalent problem that is further decomposed into smaller and manageable pieces (Caroe, 1998). A Markov decision process considers a model for sequential decision making under uncertainty, taking into account both the outcomes of current decisions and future decisionmaking opportunities. One uses the qualifier ‘‘Markov’’ because the transition probability and reward function depend on the past only through the current state of the system and the actions selected by the decision-maker in that state (Puterman, 1994). Usually, finite horizon models are solved using a stochastic dynamic programming approach, and infinite horizon models are solved by such algorithms as value iteration and policy iteration (Bertsekas, 1995). In our work, we model design and planning under uncertainty in the framework of a Markov decision process and model the design/production decisions at each decision epoch of the process as a two-stage stochastic program with recourse. The area of multiple criteria decision making has developed rapidly. As the statistics collected in Steuer, Gardiner, and Gray (1996) demonstrates, by 1994, 144 conferences had been held and over 200 books and proceedings volumes had appeared on this topic. Ringuest (1992) aims at unifying the concepts of multicriteria decision theory with those of multi-objective optimization. The computational and behavioral issues related to multi-objective optimization are examined and discussed. Miettinen (1999) provides a self-contained and consistent survey and review of the literature and the state of art on deterministic nonlinear multiobjective optimization. The contents are divided into
784
L. Cheng et al. / Computers and Chemical Engineering 27 (2003) 781 /801
three parts: theoretical background, methodology, and related issues. Four classes of methods are presented according to the role of decision-makers in the solution process: no-preference methods, a posteriori methods, a priori methods, and interactive methods. There are also a few works reported in the literature to address multi-objective multi-stage decision problems. Research extended to multi-objective cases has enabled the development of decomposition methodologies for separable multi-objective dynamic optimization problems. Li and Haimes (1989) review the major concepts used in multi-objective dynamic programming and examine current progress in the development of its theory and methodology. Li (1990) considers a general separable class of stochastic programming multi-objective optimization with perfect state information. A generating approach using a stochastic multi-objective dynamic programming method finds the set of Pareto optimal solutions. Abo-Sinna and Hussein (1995) present an algorithm for generating efficient solutions of multi-objective mathematical programming models. The generation of efficient solutions uses the principle of optimality in dynamic programming.
Fig. 1. Possible states of the system.
In this section, we provide a small design problem that we will carry from formulation through solution to illustrate our model for a process life cycle. Consider a chemical company that is designing a reaction process operating in a time horizon of 4 years. The company knows that a catalyst a is already available, while there is a better catalyst b that is still under development and is expected to be available in the near future. We assume that the reactors with both catalysts are of identical capacity. Therefore, the design decision on capacity planning and technology adoption is to determine how many reactors to purchase and what type of catalyst to use. We consider the internal process (reactors and catalyst) and the exogenous environment (available catalyst) together as a system, which is a collection of interactive objects defined by our scope of study or interest. At each period, the system could occupy one of the eight different states as shown in Fig. 1. For example,
the initial state, if we decide to design a process with one reactor using a, then the system at next period will evolve to the state in which b is available with probability p or to the state in which b is not available with probability 1/p. Fig. 2 depicts the ‘‘stochastic decision tree’’ representation of the state transition (Kall & Wallace, 1994). There are two types of uncertainties with which the company is concerned, the future product demand and the timing of the arrival of the new catalyst b over the next 4 years. The company predicts that the product demand follows a growing trend as shown in Fig. 3. Each year, the product demand could have some variations around the mean value and take a level of ‘‘high’’, ‘‘medium’’, and ‘‘low’’ values according to the ‘‘optimistic’’, ‘‘moderate’’, and ‘‘pessimistic’’ scenarios, which occur with the specified probabilities. According to the company’s assessment, the time when the new catalyst b becomes available follows a discrete distribution, which is represented by the histogram in Fig. 4. From the histogram, we can see that the probability that b will never be available in the planning horizon is 0.5. The advantage of the new catalyst b over catalyst a is measured by a decrease in the marginal operating cost as well as an increase in production throughput due to a higher reaction rate and conversion with catalyst b. As the new catalyst is expected to spread quickly through-
Starting from any of the states, when the designer chooses a design decision, the state of the system will transit to another possibly different state, which is indicated by the arrows between different states or the loop within the same state. For example, starting from
Fig. 2. State transition of the system.
3. An illustrative example
L. Cheng et al. / Computers and Chemical Engineering 27 (2003) 781 /801
785
framework to capture the essence of decision making under uncertainty in the process life cycle. We formulate the decision making as a Markov decision process with recourse, which, along with the multiple optimality criteria, leads to a multi-objective Markov decision problem. Fig. 3. Product demand distribution.
Fig. 4. Distribution of the arrival time of b .
out the industry, the sales price for the product is very likely to drop once b becomes available, which would reduce the cash flow of any existing process using a. We model the downward pricing pressure and investment losses due to the new catalyst b by letting the purchase price and salvage value of the reactor be dependent on the current best catalyst. Even though it is possible to replace the catalyst a with b by modifying the reactor to accommodate catalyst b, the high cost and penalty associated with upgrading to the new catalyst may not justify the change. Therefore, the reduction in profit as well as the depreciation in investment would render the existing reactor with catalyst a uneconomic to operate in the presence of catalyst b, and initial investment in a reactor using a is also strongly discouraged due to all of these concerns. However, to keep up with growing product demand, enough capacity is required to avoid a penalty reflecting customer dissatisfaction for not satisfying demand. Operating in such an uncertain and changing operating environment, the company must seek solutions to the design and future planning such that various objectives can be satisfied. For example, the discounted cash flow as well as the risk exposure is of great concern to the company. Therefore, the company would like to maximize expected profit while minimizing the risk involved. The company also realizes that the sustainability of the process, or the process economic lifetime, should also been taken into account. In other words, it is averse to shutting down because of the consequences to the employees.
4. Problem formulation In this section, we will generalize the example problem into a class of problems called ‘‘design and planning under uncertainty’’. We develop a unifying modeling
4.1. Problem definition: a unifying framework Consider a firm that deploys several types of resources and technology to manufacture several products in periods t f1; . . . ; Ng: Fig. 5 presents the framework that we develop to address the problem of dynamic design in the context of life cycle of a process. In our perspective, process life cycle is composed of a sequence of snapshots that are taken at different time periods throughout the lifetime. At the beginning of each period t , the firm investigates both the internal process and external operating environment, which includes the product market, technology, and institutions. The decision-makers gather information such as the available technology and institution regulations, and assess its uncertain future for factors such as the product demand and technology developments. Then the firm makes a design decision to create a process from scratch or to modify the existing plant configuration by expanding capacity, replacing technology, etc., given the firm’s acknowledgement of present information and assessment of future information. They then start receiving more information and resolving some of the uncertainties. For example, the actual product demand for the current time period is known as orders are received. The firm then chooses its production decisions, constrained by the earlier design decisions for capacity and technology, and contingent on the revealed information. As the process and the environment evolve, more information arrives and more uncertainty is resolved. At the beginning of next period, t/1, the firm faces a similar problem with a different internal process and external environment and a possibly different set of decisions from which to choose. The sequential decision process described above is a typical Markov decision process. At each decision epoch (or time period), the system, which includes the process and its operating environment, occupies some state that represents the plant configuration and environment condition. The firm as the ‘‘controller’’ observes the ‘‘state’’ of the system and chooses a ‘‘control’’ from the set of admissible controls for that state. As the consequences, the firm receives a net profit, which is the operating profit less the capital cost involved in the capacity expansion and technology acquisition, and the system evolves to a different state at the next decision epoch. The idea is similar to ‘‘feedback control’’ or ‘‘closed-loop control’’ in control terminology, where the measurement of the state is used to adjust the control
786
L. Cheng et al. / Computers and Chemical Engineering 27 (2003) 781 /801
Fig. 5. Process life cycle modeling framework.
variable, i.e., the optimal control is a function of the current state. The closed-loop controlled system can achieve better or at least as good a performance as the open-loop controlled system, in which the controller chooses control without information about the current state (Bertsekas, 1995). This two-level decision problem at each period is characteristic of real option models: first invest in capabilities, then receive some additional information, and finally exploit capabilities optimally, contingent on the revealed information (Harrison & Van Mieghem 1999; Bollen, 1999). In operations research, it is also known as a two-stage stochastic programming model with recourse. In this model, the decision-maker makes the first-stage decision, taking into consideration of uncertainty. Then, after a random event has taken place, the decision-maker takes some ‘‘corrective’’ actions in the second stage to hedge against the uncertainties (Kall & Wallace, 1994). This approach provides qualitative insight into the real-world design/planning practices, which typically involve two levels of decision making. Lower-level production planners propose capacity adjustments necessary for the deterministic production plan; senior-level managers approve an investment plan by ‘‘perturbing’’ the deterministic reasoning (Harrison & Van Mieghem, 1999). Fig. 6 illustrates an example of two-level decision making at period t during the decision process. At the beginning of period t, the firm decides to purchase another reactor with catalyst a, in addition to the one that already exists. After the actual demand is realized, which turned out to be fairly low, the firm makes its production decision in response to the demand realization. The firm decides to operate only one reactor in order to meet the demand. However, no matter what the actual demand turns out to be, the design decision is irreversible during period t , which means that the design decisions cannot anticipate the outcome of the future events. At the beginning of next period t/1, the system
will move to the same or different states with corresponding probabilities. As a result of choosing and implementing design/ investment decisions at each period, the decision-makers receive the outcome of decisions such as profits from each period. The decision-makers need to choose a strategy such that the corresponding outcome is as ‘‘good’’ as possible, or in other words, optimal. The optimality criterion that is most commonly used is the expected total discounted reward, or expected net present value of the profit in our problem, because a very complete theory and numerous computational algorithms are available for its solution (Ringuest, 1992). The discounting naturally reflects the decisionmakers’ preference for returns over time. However, in many situations of decision under risk, expected profit is not the exclusive objective. Decision-makers are also concerned about other performance measures such as the risk involved. Therefore, the decision problem is naturally a multiple criteria decision-making problem, in which decision-makers must handle several conflicting or incommensurable objectives. In this problem, the objectives we consider are, but not limited to, expected profit, expected downside risk, and process lifetime. 4.2. Model formulation: a formal presentation Based on the framework we propose to model the life cycle of a process, we formulate the dynamic design process under uncertainty as a Markov decision process with recourse. A formal presentation of the mathematical model is provided as follows. 4.2.1. State representation and transition At the beginning of each time period t , the firm observes the system (process and environment) in state st /St , and then selects control ut from the set of allowable controls for the system being in state st , i.e., ut /Ut (st ).
L. Cheng et al. / Computers and Chemical Engineering 27 (2003) 781 /801
787
Fig. 6. Two-level decision making at period t .
The state of system st includes the current plant configuration and the environment condition. We represent the capacity and technology level of the process by Kt1 and Tt1, respectively. Let TMt1 denote the highest technology level available in the market for equipment. Then the state of the system st at the beginning of period t can be described as st (Kt1 ; Tt1 ; TMt1 );
t 1; 2; . . . ; N 1:
(1)
Correspondingly, the control ut /Ut (st ) can be partitioned into two parts in terms of capacity change and technology adoption: ut (DKt ; DTt );
t1; 2; . . . ; N:
(2)
As a result of choosing action ut in state st at period t, the state of the system evolves according to the discretetime system equation st1 ft (st ; ut ; vt );
(3)
where vt /V is called the state of the world, which represents the uncertainty in the external environment and/or the internal process. vt is characterized by a probability distribution Pt (×½st ; ut ) that may depend explicitly on the current state st and control ut but not on prior values of vt1 ; vt2 ; . . . ; v1 : When the disturbances vt are correlated over time, we must and always can introduce additional state variables so that this assumption can still be satisfied (Bertsekas, 1995). The evolution of the latest technology TMt is modeled by a Markov chain with a one-step transition matrix Pt [pmn (t)]; where pmn (t ) is the probability that technology TMn will appear during time period t , given that TMm is the last technology that appeared before. For instance, for a technology environment with two states: state 0 in which the innovation has not yet been realized and state 1 in which the innovation has already appeared, the one-step transition can be represented by the following 2 /2 probability matrix:
where p is the probability that the innovation will be available in this period given that it has not appeared up to now. Therefore, the state of the system transits according to Kt Kt1 DKt ; Tt Tt1 DTt ; pmn (t)P(TMt TM n ½TMt1 TM m ):
(4) (5) (6)
The model above is based on the following assumptions: 1) The firm has no control over the technology evolution. We can make the model more complicated by assuming that the firm itself is also an active technology developer. So the firm may speedup the space of the technology development by investing in R&D, which we could account for by introducing another control variable. 2) Once a design decision is made at the beginning of each time period, it will be implemented instantaneously, and the new plant configuration will be known for certain. It can be generalized to handle delay or uncertainty during the implementation of the design decision. 4.2.2. Production decision model Our approach to modeling the production decision problem is based on the premise that a flexible manufacturing system will exploit its available resources and employ its flexibility toward maximizing profit, contingent on the information available in that time period. We model the firm’s manufacturing process and production decisions as follows. In each period t , having chosen a capacity level Kt and a technology level Tt , and observed a product demand Dt and a product price pt , the firm decides period t production according to the
788
L. Cheng et al. / Computers and Chemical Engineering 27 (2003) 781 /801
environment. In the current context, 8t is assumed to depend on vt only through the demand Dt and price pt in period t.
following mathematical program: max
f(xt ; yt ; pt );
s:t:
g(xt ; yt ; Kt ; Tt )50; h(xt ; Kt ; Tt )0; x¯ t 5Dt ; m xt X ; yt f0; 1g ;
xt ;yt
(7)
where f is the objective function of operating profit, xt the process continuous variables, including production level x¯ t ; yt the binary variables that denote process structure and existence of units, g the process inequality constraints, and h the process equality constraints. In the model described above, the plant configuration (Kt ,Tt ) is determined by design decision, and the environment scenario (Dt ,pt ) is an actual realization of the probabilistic outcome observed from the external environment. The production decision problem actually is a process flowsheet optimization problem, in which the optimal operating conditions and process structure are determined to maximize the operating profit. The decision variables include continuous variable xt and binary variable yt , where xt represents process operating conditions such as flows and temperatures, and also the production volume, while yt denotes the process structure and utilization of units. The objective function f(×/) evaluates the operating profit, which is the sale revenue less operating cost. The equations h(xt ; Kt ; Tt ) 0 are generally nonlinear and correspond to material and energy balances, while the inequalities g(xt ; yt ; Kt ; Tt )5 0 represent process specifications, capacity limitations, and logical constraints. The constraints on production level x¯ t 5 Dt are based on the assumption that the firm will adjust the production to meet the observed demand, but production will not exceed demand. Note that in this problem the demand vector Dt is the total demand occurring during that period. As production will not exceed the total of the demand, no inventory is carried over from one period to the next. This assumption is consistent with what is observed in practice where ‘‘each time period is of sufficient length (e.g., 1 year) that production levels can be altered within the time period in order to satisfy as closely as possible the demand that is actually experienced’’ (Eppen, Martin & Schrage, 1989). We do not consider the integration of production and inventory control in this work. The optimal objective value of the process flowsheet optimization problem (7) is the maximal operating profit function
8 t (st ; ut ; vt ) f(xt ; yt ; pt );
(8)
where x t and y t are the optimal solutions to the optimization problem and usually take different values for different Kt , Tt and vt . vt , as introduced earlier, is a random vector that represents uncertainties in the
4.2.3. Multiple optimality criteria When making the design decision ut (DKt ; DTt ); the firm incurs a capital cost for capacity change and technology acquisition ct (st ; ut )ct (DKt ; DTt ; TMt1 ):
(9)
We assume that the capacity change DKt is composed of three components: (1) capacity expansion, which means that we add new equipment with available technology, (2) capacity disposal, decrease of capacity by disposing of existing equipment, and (3) technology replacement, capacity changes when we replace the technology on existing equipment by newer technology, which is equivalent to a disposal followed by an immediate expansion. Correspondingly, the capital cost ct (st ,ut ) consists of the costs for technology adoption and replacement, as well as capacity expansion and disposal, which generally are functions of capacity and technology change, as well as the current best technology available. The net profit in period t, defined as the operating profit minus the capital cost, is given by vt (st ; ut ; vt ) 8 t (st ; ut ; vt )ct (st ; ut ):
(10)
Denote p as a design strategy, which is a sequence of design policies pfm1 ; m2 ; . . . ; mN g: Each policy prescribes the design decisions on capacity modification and technology acquisition for each possible state that the system could occupy, i.e., ut /mt (st ). After choosing and implementing the design strategy p , the decisionmaker receives the outcomes for the system performance, e.g., a sequence of rewards or profits for each period. It is the decision-maker’s incentive to choose a design strategy such that the outcomes are as ‘‘good’’ as possible, in other words, optimal. This necessitates the need for performance measures of those outcomes, or optimality criteria, to compare alternative decisions. We consider three different optimality criteria in this paper, namely expected profit, downside risk, and process lifetime, and the results obtained here can be generalized to the situations with more objectives.
4.2.3.1. Expected profit. The optimality criterion most commonly used in research and practice is the expected total discounted reward. The process expected net present value (ENPV) evaluated at the beginning of period 1, under design strategy p starting with initial capacity and technology (K0,T0,TM0) is
L. Cheng et al. / Computers and Chemical Engineering 27 (2003) 781 /801
I1 E [n]
4.3. A multi-objective Markov decision problem
v
E v
X N
789
lt1 vt (st ; mt (st ); vt )lN nN1 (sN1 ) ;
(11)
t1
where the expectation is with respect to the joint distribution of the random variables involved, which is called the state of the world v , v (v1 ; . . . ; vN ) V: n is the present value of the profits at each period, discounted to the beginning of the first period. nN1(sN1) is the salvage value function for the process with final configuration (KN ,TN ), given that the latest available technology is TMN . Future value is discounted to the beginning of period 1 according to a risk-free discount factor l, with 0 B/l B/1.
We assume that the firm’s objective is to maximize its expected profit I1 and process lifetime I3 and minimize the downside risk I2 (or equivalently maximize /I2). Therefore, the decision problem becomes one to determine a design strategy defined by a sequence of decision policies p fm1 ; m2 ; . . . ; mN g; where each policy mt maps state st into control action ut , that solves the following multi-objective optimization:
max I [I1 ;I2 ; I3 ]; p
s:t:
st1 ft (st ; mt (st ); vt ) (state transition equations) t1; 2; . . . ; N; mt (st ) Ut (st ) st St (state and control sets) t1; 2; . . . ; N: (14)
4.2.3.2. Downside risk. We assume that decision-makers are risk-averse, i.e., they are unwilling to make risky decisions. There are several measures of risk that are considered in research and practice. Markowitz (1959) employs a mean-variance approach for portfolio selection, in which the standard deviation is the measure of risk. The mean-variance approach has been widely used since Markowitz introduced modern portfolio theory in the 1950s. Eppen, Martin & Schrage (1989), however, reject the notion of a variance measure in a capacity planning problem, motivated by the fact that the points on the mean-variance efficient frontier may correspond to stochastically dominated solutions in the situations where distribution of profit is skewed or asymmetric. They propose an alternative risk measure that is to include risk aversion in a linear utility model under the form of linear constraint, called downside risk. The risk associated with a decision is measured by the failure to meet a target profit or the variability of profit below a target profit, n; ˆ defined by max(nn; ˆ 0): The expected downside risk associated with profit n if a target value nˆ is chosen, is then defined by
The optimization problem (14) is a multi-objective Markov decision problem, which has the following characteristics:
I2 E [max(nn; ˆ 0)]:
5. Solution strategies
v
(12)
4.2.3.3. Process lifetime. We consider the option of the decision-makers to shut down the plant and terminate the process life cycle. Then another concern arising naturally is to sustain the process’ viability, i.e., the capability to develop continuously and competitively in a risky environment. A simple measure of the process sustainability is the length of the process life cycle, or process economic lifetime: I3 length of the process life cycle:
(13)
1) Multiple objectives are considered in the decisions. They are usually conflicting (e.g., return vs. risk) or incommensurable (e.g., in different units). A vectorvalued optimization is involved where the objective is a K -dimensional vector (K /3 in this case). 2) Decisions are made sequentially at each period and based on the state of system. The solution to the problem is a sequence of functions fm1 ; m2 ; . . . ; mN g that prescribe the design selection for each state at each period. 3) The optimal operating profit 8t is the optimal objective function of the second-stage production decision problem. Therefore, a sequence of inner recourse optimizations are embedded in the optimization problem (14).
In this section, we shall develop a rigorous solution strategy to solve the multi-objective Markov decision problem. We first recast the original problem so that the backward separability and monotonicity conditions required for a dynamic programming strategy are satisfied. Then we develop a multi-objective stochastic dynamic programming approach to decompose the original problem into a sequence of recursive singleperiod optimization problems. Each subproblem is a two-stage stochastic program with recourse. We recursively solve those subproblems backward in time, and eventually find the Pareto optimal frontier, as well as the Pareto optimal strategies, for the multi-objective Markov decision problem.
L. Cheng et al. / Computers and Chemical Engineering 27 (2003) 781 /801
790
5.1. Multi-objective optimization: an introduction Consider an optimization problem with two objectives, maximizing profit I1 and minimizing risk I2, both of which are functions of the decision x , i.e., max[I1 (x);I2 (x)]: xX
(15)
Fig. 7 shows an example of the objective space of the problem, in which the feasible objective region is illustrated by the shaded area. The feasible objective region comprises the two-dimensional objective vectors [I1,I2] corresponding to all feasible decisions x /X . Because of the contradiction and possible incommensurability of the objective functions, it is not possible to find a single solution that would be optimal for all the objectives simultaneously. For instance, if we only consider profit as the objective while disregarding the risk involved, we will find a solution that corresponds to point A, with the highest profit but also an unacceptable risk. The goal of multi-objective optimization is to find the set of Pareto optimal solutions to the multi-objective decision problem. The definition of Pareto optimality of a optimization with K objectives, i.e., maxx X [I1 ; I2 ; . . . ; IK ]; is given by the following definition. Definition 1 (Pareto optimality). A decision x * /X is Pareto optimal if there does not exist another decision x / X, such that Ii (x)]Ii (x) for all i 1; 2; . . . ; K and Ii (x)Ii (x) for at least one index j. A decision x* /X is weakly Pareto optimal if there does not exist another decision x /X such that Ii (x)Ii (x) for all i 1; 2; . . . ; K:/ As shown in Fig. 7, all points on the curve connecting points A and B correspond to solutions in the Pareto optimal set by definition. This curve is defined as the Pareto optimal frontier. Any solution point inside the region, such as point C, is dominated since it can be improved by moving toward the curve, with increasing profit and decreasing risk. According to the definition of Pareto optimality, moving from one Pareto optimal solution to another necessitates trading-off between different objectives. Mathematically, every point in the Pareto optimal set is an equally acceptable solution to the multi-objective optimization problem.
There are a variety of methods in multi-objective optimization for finding the Pareto optimal solutions. One solution approach is to take the decision-maker’s prior preference into consideration, and thus transfer a multi-objective optimization problem into a scalarobjective optimization problem. Another approach is to generate the set of Pareto optimal solutions for the decision-makers, who in turn make post-preference assessment among the members of this solution set to select their preferred solution (Miettinen, 1999). We choose generating methods over scalarization methods in this work as generating methods provide decisionmakers with the maximum information about the potential best outcomes of the system, of which they often have little knowledge and ability to make an a priori preference assessment. Weighting method and o-constraint method are the basic methods that are used commonly in solving multiobjective optimizations. Li (1990) presents a stochastic multi-objective dynamic programming algorithm in which the Pareto optimal set at each stage is identified by the weighting method. Since the problems we are dealing with often involve non-convexity (in minimizations), in which all the Pareto optimal solutions cannot be found by the weighting method, we develop a decomposition algorithm using the o-constraint method instead. In the o-constraint method, we select one of the objective functions Il to be optimized and convert the others Ij , j 1; 2; . . . ; K; j "l into constraints by setting a bound to each of them, i.e., max
Il (x);
s:t:
Ij ]o j ; j 1; 2; . . . ; K; j "l:
xX
(16)
The Pareto optimality of the solutions obtained from oconstraint method can be proved according to the following theorem given by Miettinen (1999). Theorem 1. All solutions of the o-constraint problem are weakly Pareto optimal . The solution of o-constraint problem x* /X is Pareto optimal if one of the following conditions is satisfied: 1) x* solves the o-constraint problem for every l/ 1,2,. . .,K where o j Ij (x) for j 1; 2; . . . ; K; j "l:/ 2) x* is a unique solution of the/o/-constraint problem for some l with o j Ij (x) for j/1,2,. . .,K , j "/l . Therefore, the Pareto optimal solutions to problem (15) can be found by solving the following problem: maxfI1 (x)½I2 (x)5og: xX
Fig. 7. Pareto optimal set in the objective space.
(17)
As illustrated in Fig. 7, all points on the Pareto optimal frontier and their corresponding Pareto optimal solutions can be generated by successively tightening the risk constraint o.
L. Cheng et al. / Computers and Chemical Engineering 27 (2003) 781 /801
791
5.2. Multi-objective dynamic programming Dynamic programming provides a framework for studying Markov decision problems, as well as for devising algorithms for computing optimal decision policies. The dynamic programming technique rests on a simple idea, the principle of optimality. An early verbal statement of the principle of optimality appeared in the book ‘‘Dynamic Programming’’ by Bellman (1957, p. 83). Research has extended dynamic programming to multi-objective sequential decision problems and enabled the development of decomposition methodologies for separable multi-objective optimization problems (Li & Haimes, 1989). In this work, we developed a novel multi-objective stochastic dynamic programming algorithm based on the o-constraint method to solve the multi-objective Markov decision problem. 5.2.1. Dynamic programming decomposition Dynamic programming solves a sequential decision problem based on a decomposition scheme. We first start with illustrating the decomposition scheme using a single-objective Markov decision problem, assuming that the expected profit is the only objective in the case study we introduced earlier. Based on the principle of optimality, the maximum expected profit, as well as the optimal design policy, can be determined in a backward fashion. Suppose that the maximal expected profit, starting from period t/1 with each state st1 to the end of horizon, has been found and is denoted by Jt1(st1). In other words, at the beginning of period t/1, given the current plant configuration and available technology, we know the maximal expected profit we can make from now until the end of the horizon, which is often called the ‘‘profit-to-go’’. We start from the last period N/1, where the expected future profit for a state is equal to the process salvage value, which itself depends on the best technology available at the last period, i.e., JN1 (sN1 ) vN1 (sN1 ): Now consider that we are at period t with a given state st , e.g., a reactor with a while only a is available, and select a design decision ut to purchase another reactor with a, as shown in Fig. 8. We will receive an immediate net profit vt (st ; ut ; vt ); which equals the maximal operating profit minus capital cost. On the other hand, at the next period t/1, we could find ourselves in two different states, from which we already know the maximal expected profit-to-go. Therefore, the maximal expected profit-togo starting from period t with st , given that decision ut is selected, equals the expected profit of period t plus the discounted maximal expected profit of the remaining periods. Therefore, we must select a design decision ut such that the expected profit-to-go from period t , starting with state st , is maximized, i.e.,
Fig. 8. An example of dynamic programming decomposition.
Jt (st )maxfE [vt (st ; ut ; vt )lJt1 (st1 )½st ; ut ]½ut Ut (st )g: (18) ut
vt
By induction, the maximal expected profit-to-go at each period t with all states st can be obtained by computing the ‘‘Bellman’s equations’’ (18) recursively, starting from the last period and proceeding backward in time. At the same time, the optimal policy, i.e., the optimal design decision for any given state is obtained. Therefore, the whole problem is solved when we reach the first period where the expected profit-to-go is actually the objective of the original problem, i.e., the expected net present value of the profits in the whole time horizon. The procedure described above is actually the dynamic programming algorithm, which decomposes the original multi-stage problem into a sequence of singlestage subproblems. While dynamic programming can be applied to a variety of sequential decision problems, it is applicable only to problems that satisfy the conditions of separability and monotonicity, which are formally defined as follows. Definition 2 (Separability and monotonicity). The ith performance index Ii is said to be backward separable if there exist elementary functions fti such that N1 Ii f1i (s1 ; u1 ; f2i (s2 ; u2 ; f3i (. . . ; fN (sN1 )) . . .))): i (sN ; uN ; fi
(19)
The separable performance index Ii is backward monotonic if all functions fti are strictly increasing with respect to the last argument for each fixed pair of first two arguments st and ut , i.e., assume that at period t, u˜t and uˆt satisfy fti (st ; u˜t ; f˜t1 )fti (st ; uˆt ; fˆt1 ); t t
(20)
and then it follows that [st1 ; ut1 ; fti (st ; u˜t ; f˜t1 )] ft1 i t ft1 [st1 ; ut1 ; fti (st ; uˆt ; fˆt1 )]: i t fti
(21)
We denote function as the ‘‘objective-to-go’’, which is the value of the ith objective evaluated for the remaining periods, from period t to period N/1. For example, ‘‘profit-to-go’’, as introduced earlier, represents the
792
L. Cheng et al. / Computers and Chemical Engineering 27 (2003) 781 /801
profit for the remaining periods from period t to the end of horizon. As shown in the example with expected profit as the only objective, the problem is decomposable by dynamic programming algorithm because of its underlying Markovian nature. Lemma 1 (Separability and monotonicity of expected profit). The performance index for the expected net present value I1 satisfies the separability and monotonicity condition . A proof is provided in Appendix A. We note, however, the performance index associated with the expected downside risk N1 X t1 I2 E max n ˆ l vt ; 0 (22) v
t1
cannot be decomposed into stage-wise separable functions, which presents an intrinsic problem for dynamic programming algorithm. In order to apply dynamic programming decomposition techniques, we reformulate the problem into the basic form by introducing a new state, wt , which denotes the total profit that has been realized by period t, discounted to the beginning of period 1: wt
t X
lt1 vt :
(23)
t1
This enlargement of the state space is called state augmentation, which is to include in the enlarged state at time t all the information that is known to the controller at time t and can be used with advantage in selecting control ut (Bertsekas, 1995). The enlarged state vector is defined as s˜t (st ; wt );
u˜t ut :
1 process is operating; 0 process has been terminated; 1 continue operating; qt 0 shut down the process:
zt
(26)
Therefore, the augmented state vector and control vector are given by s¯t (st ; wt ; zt );
u¯t (ut ; qt ):
(27)
Correspondingly, the new system of equations s¯t1 f¯t (s¯t ; u¯t ; vt ) can be written as 0 1 0 1 st1 ft (st ; ut ; vt ) @wt1 A @wt lt1 vt A; (28) zt1 ct (zt ; qt ) where ct (zt ; qt ) represents the transition of the process on/off status and is defined by 1 if zt 1 and qt 1; ct (zt ; qt ) (29) 0 otherwise: The third performance index, length of process life cycle, then becomes I3
N X
zt :
(30)
t1
Therefore, the original problem is reformulated into a new Markov decision problem, with an enlarged state space, an enlarged control space, and a new set of system of equations. This new Markov decision problem satisfies the conditions of separability and monotonicity that are required by dynamic programming decomposition. For notation simplification purpose, we will drop the bar on the new state s¯t ; control u¯t and system equations f¯t (×) in the rest of the paper.
(24)
The augmented state transits according to the new system of equations s˜t1 f˜t (s˜t ; u˜t ; vt ): st1 ft (st ; ut ; vt ) : (25) wt1 wt lt1 vt By introducing the new state, the expected downside risk criterion then can be expressed in a Markovian fashion with the desired separability and monotonicity. Lemma 2 (Separability and monotonicity of expected downside risk). By augmenting the state space with wt , the performance index for the expected downside risk I2 satisfies the separability and monotonicity condition . Proof of Lemma 2 is given in Appendix B. We can also incorporate the option to terminate the process and the measure of viability into the decision process, by introducing a new state zt and a new control action qt that are defined by
5.2.2. A multi-objective dynamic programming algorithm The research problem is to determine a design strategy, i.e., a sequence of decision policies p fm1 ; m2 ; . . . ; mN g; which maximizes the vector-valued objective functions I [I1 ;I2 ; I3 ]: In other words, the Pareto optimal solutions to the multi-objective Markov decision problem must be obtained. However, problems with multiple objectives are beyond the scope of classical dynamic programming, which commonly only considers a single objective with stage-wise additive form (Bertsekas, 1995). We must extend dynamic programming to handle sequential decision problems with multiple objectives. Given the assumptions of separability and monotonicity, we will show that a multi-objective stochastic dynamic programming algorithm can be developed to decompose the original problem into a family of single-period (or single-stage) subproblems. It is based on the principle of optimality in multi-objective
L. Cheng et al. / Computers and Chemical Engineering 27 (2003) 781 /801
dynamic programming, which is first presented by Li and Haimes (1987). Theorem 2 (Principle of optimality). A Pareto optimal strategy has the property that , whatever the initial state and initial decision are , the remaining decisions must constitute a Pareto optimal strategy with respect to the state resulting from the first decision . Based on the principle of optimality, we can decompose the multi-objective Markov decision problem using a decomposition scheme similar to the one used in single-objective problem. In single-objective dynamic programming, we introduced a profit-to-go function to represent the optimal expected profit for the remaining periods Jt (st ), starting from each period t with each state st , assuming that optimal policies are applied to those periods. Similarly, in multi-objective dynamic programming, we need to calculate the Pareto optimal frontier for the vector-valued objectives-to-go, such as profit-to-go and risk-to-go, at each period and propagate it recursively backward in time. We introduce Jt (st ,ut ) to represent the Pareto optimal frontier in state st at time period t, where ut denotes the (K/1)-dimensional parameter vector of the set of Pareto optimal solutions. In o-constraint method, ut represents the upper bounds (or lower bounds in maximization) on the K/1 objectives-to-go, and Jt (st ,ut ) denotes the optimal value of the primary objective-to-go. In the case with two objectives, expected profit and expected downside risk, Jt (st ,ut ) becomes the maximal expected profit-to-go given that the expected risk-to-go for the remaining future is constrained by an upper bound ut . In the rest of this section, we are going to show how to decompose this twoobjective Markov decision problem using dynamic programming techniques. The results obtained here, however, can be generalized to situations with more than two objectives. Assume that we have calculated the maximal expected profit-to-go from period t/1, starting from each state st1 and constrained by each upper bound on risk-to-go ut1. In other words, the parametric Pareto optimal frontiers Jt1(st1,ut1) starting at period t/1 with every state st1 have been found. In order to find the Pareto optimal frontier for a given state st , we need to calculate the maximal expected profit-to-go Jt (st ,ut ) for all possible values of risk-to-go ut . Fig. 9 depicts an example that illustrates the procedure to find the maximal expected profit-to-go for a given expected risk-to-go ut . We shall, to keep the example simple, assume we know the demand for period t. We start from state st where we have one reactor with a while only a is available, and the total profit discounted to period 1 is 10 (not shown). After choosing a design decision ut to purchase another reactor with a, we receive a net profit of 1 for period t by producing to meet the known
793
demand for that period. At the next period t/1, the system could occupy the state s1t1 where b has become available with probability p and state s2t1 where b has not appeared with probability (1/p ). The Pareto optimal frontiers of those two states, the maximal profit-to-go given different risk-to-go, have been already calculated and are shown in Fig. 9. Denote u1t1 and u2t1 as the expected risk-to-go starting from those two states at period t/1. Then the expected risk-to-go at period t is given by pu1t1 (1p)u2t1 and must not exceed the upper bound ut , i.e., pu1t1 (1p)u2t1 5ut :
(31)
Having chosen design decision ut and expected future risk-to-go (u1t1 ; u2t1 ); we can calculate the expected profit-to-go from st as the sum of expected net profit in period t and the expected profit-to-go at period t/1 given the choice of expected risk-to-go at period t//1, i.e., Jt (st ; ut ½ut ; ut1 )vt l[pJt1 (s1t1 ; u1t1 )(1p)Jt1 (s2t1 ; u2t1 )]:
(32)
For example, assuming that the probability that b will appear in period t is p/0.5, that the expected risk-to-go at period t is ut /3 and that the discount factor is given by l /0.9, two candidate pairs of expected risk-to-go (u1t1 ; u2t1 )I (2; 4) and (u1t1 ; u2t1 )II (4; 2) satisfy the risk constraint (31). The expected profit-to-go at period t given that pair I is chosen can be calculated as 10:9(0:5100:516)12:7: Similarly, the expected profit-to-go while choosing pair II is 11.8, which is inferior to pair I, given that the expected risk-to-go for both choices are equal. The choice of pair I is thus preferred to the choice of pair II. We continue searching until we find a pair of expected risk-to-go, such that the expected profit-to-go in (32) is maximized, while the risk constraint (31) is satisfied. Therefore, we find the maximal profit-to-go at period t while the expected risk-to-go at period t is ut , given that design decision ut is selected. We repeat the above procedure for different values of the design decision ut , searching for the design that results in the maximal profit-to-go while satisfying the risk constraint, which leads to the optimal design selection ut for a given state st and risk-to-go ut . Thus, we find the Pareto optimal solution point corresponding to the given risk-to-go ut . As we successively tighten the constraint on risk-to-go ut , we can eventually find all points on the Pareto optimal frontier and corresponding solutions. The computational procedure described above leads to a recurrence relation for calculating the Pareto optimal frontier and obtaining the Pareto optimal policy
L. Cheng et al. / Computers and Chemical Engineering 27 (2003) 781 /801
794
Fig. 9. An example of the procedure to find the Pareto optimal frontier.
backward in time. It is summarized by the following two-step algorithm: 1) For each ut , select ut1: For a given state st and risk-to-go ut , having selected design decision ut , we choose ut1 for all possible next state st1, such that the expected future profit-to-go is optimized subject to the constraint on expected risk-togo, i.e., Qt (st ; ut ; ut ) maxfE [Jt1 (st1 ; ut1 )½st ; ut ]½ E [ut1 ½st ; ut ] ut1
vt
vt
5ut ; st1 ft (st ; ut ; vt )g:
(33)
In general, for a given st , ut and ut , the optimal future risk-to-go ut1 takes different value according to each next state st1. 2) Select optimal ut : The Pareto optimal frontier point corresponding to ut at state st , Jt (st ,ut ), is obtained by selecting the design decision ut /Ut (st ) that solves the following optimization: Jt (st ; ut )maxfE [8 t (st ; ut ; vt )]ct (st ; ut ) ut
vt
lQt (st ; ut ; ut )½ut Ut (st )g:
(34)
Therefore, the optimal design for a given st and ut is given by um (s t t t ; ut ); where m t is defined as the Pareto optimal policy. In summary, a multi-objective stochastic dynamic programming algorithm can be developed and employed to decompose the multi-objective Markov decision problem with backward separability and monotonicity into a sequence of single-objective single-period subproblems (34). As we solve those subproblems recursively by proceeding backward in time, starting from the last period N/1 to the first period 1, we can calculate and propagate the Pareto optimal frontier Jt (st ,ut ) recursively backward in time and obtain the Pareto optimal policies m t at the same time. When we reach the beginning of the first period 1, we solve the multiobjective Markov decision problem with a Pareto optimal frontier for the whole horizon as well as the Pareto optimal design strategies.
5.3. Computational considerations A rigorous solution procedure can be developed based on the decomposition of the original problem into a sequence of single-period subproblems using multiobjective dynamic programming techniques and solution of the stochastic programming problems at each period. To determine the Pareto optimal frontier Jt (st ,ut ) and the Pareto optimal policy mt ; the optimality equations (34) have to be solved for each st and ut , recursively backward in time, starting at period N/1 and ending at period 1. In general, the solution to this type of problem requires the following major computational tasks to be performed: 1) Estimation of the Pareto optimal frontier Jt (st ,ut ): The maximal expected profit for each possible state st and ut has to be calculated and stored, which is practical only if the number of states is small. In the problems with large-scale state space, the computational requirement will become overwhelming and prohibitive. For example, in the case where only one level of capacity is available, the number of states associated with plant configuration is 32 /9 because the number of equipment using either technology could be 0, 1 or 2. When five capacity levels are available for both types of equipment, the number of states becomes 325 /95 /59049, which grows exponentially in the number of capacity level. Other factors, such as the augmentation of the state space for non-separable objectives and bounds on objectives, may also cause the explosive increase of number of states. This phenomenon, called ‘‘curse of dimensionality’’ by Bellman (1957), renders the rigorous algorithms practically inapplicable to many realistic problems. 2) Solution of the two-stage stochastic program with recourse in (34): Stochastic program with recourse is usually reformulated into its deterministic equivalent so that it can be solved by conventional mathematical programming techniques. There are two challenges encountered in the solution process. Firstly, in the case when the underlying probability distributions are continuous, the calculation of
L. Cheng et al. / Computers and Chemical Engineering 27 (2003) 781 /801
expected value requires multivariate integration. Either discretization techniques are required to discretize the continuous distributions into discrete distributions with a finite number of realizations, or a random sample of the independent replications of the random variables is generated and the expected value is approximated by the sample average (Shapiro & Homem-de-Mello, 1998; Diwekar & Kalagnanam, 1997). Secondly, the deterministic equivalent usually composes a large-scale optimization problem due to the introduction of a secondstage decision vector for each realization of the random variables. Therefore, the decision space as well as the number of constraints increases as the number of scenarios increases. In addition, the function nonlinearity and integrality requirement in many practical situations will complicate the solution process even further. Therefore, the rigorous solution procedure is practical only if the computational requirement discussed above is reasonably easy to satisfy. Efficient approximation methods need to be developed for solving large-scale realistic problems.
6. Results and discussions Having developed a rigorous multi-objective dynamic programming algorithm, we can apply it to solve the example problem introduced in Section 3 and conduct experiments to investigate the solutions. We will start with the same problem in Section 3 as the benchmark problem, after which we will change the probability of the arrival time of b and assess the impacts of the environmental change on the design decisions. Finally, we will demonstrate the difference between a traditional single-objective optimization problem that maximize exclusively expected profit, and a two-objective optimization problem that maximizes profit while minimizing risk. 6.1. Solutions of the benchmark problem Given the original settings in the example problem in Section 3 and that the target profit nˆ is selected as 3, the multi-objective dynamic programming algorithm is exploited to calculate the Pareto optimal frontiers for each state and propagate them recursively backward in time. Due to the simple structure of the example problem (simple linear model and small number of alternatives), exact analytical solutions can be found using the rigorous algorithm. Fig. 10 shows the Pareto optimal frontier at the first period, starting from the initial state where we have no reactor while the only catalyst available is a.
795
Fig. 10. Pareto optimal frontier of the benchmark problem.
The Pareto optimal frontier represents the maximal expected profit given all possible tolerances on the expected downside risk. For example, if risk is allowed to be greater or equal to 0.54, then the current design decision in the optimal design strategy, as alternative (1), is to purchase two reactors with a, which yields the highest expected profit of 5.28. If the acceptable risk becomes lower, say has to be lower than 0.54 while greater than 0.52, then the optimal initial design choice changes to alternative (2), purchasing only one reactor with a, and the expected profit drops down to 2.98. When the risk tolerance is smaller than 0.52, there does not exist a feasible solution, which means 0.52 is the minimal risk involved in the problem. Design strategies (1) and (2) are the two Pareto optimal solutions to the problem. Design strategy (1) prescribes that two reactors should be initially installed at the first period. At the second period, if b has become available, upgrade two reactors to b; otherwise, keep the same configuration. Then stay in the same configuration until the fourth period when all remaining equipments are sold for its salvage value. Design strategy (2) differs from (1) by having only one reactor at the beginning. Fig. 11 A and B depict the two decision trees that represent the controlled Markov chains given that those two strategies are implemented. After the Pareto optimal set is generated, it is the decision-makers’ responsibility to select the final solution according to their own preference structure. For example, the decision-makers may ask themselves ‘‘How much are we willing to give up in terms of expected profit in order to reduce the risk exposure?’’ In this case, we expect that the choice for (1) is likely be preferred to (2), because for less than a 4% increase in the expected downside risk, we can improve the expected profit by 77%. For many of these problems, the final selection becomes evident when the trade-off curve is available.
796
L. Cheng et al. / Computers and Chemical Engineering 27 (2003) 781 /801
Fig. 11. Decision tree representations of design strategies (1) and (2).
6.2. Impacts of uncertainty in environment Now we consider another case where the new catalyst b is expected to be available sooner than the benchmark case. As shown in Fig. 12, the probability that b will be available in the first year changes from 0.1 to 0.3. Similar to the benchmark case, we solve this problem and obtain the maximal expected profit for all risk levels, which is shown on the Pareto optimal frontier in Fig. 13. As can be seen from the Pareto optimal frontier, the expected profit is lower while the expected risk is higher compared with the benchmark problem (Fig. 10) due to the more imminent appearance of the new catalyst. Also design strategy (3), which was dominated in the previous case, now becomes one of the Pareto optimal solutions. Design strategy (3) is almost the same as strategy (2), except that in the second year, if a is still the best catalyst, then one more reactor should be purchased to expand the capacity according to strategy (3). Design strategy (3) is more aggressive than strategy (2) and more conservative than strategy (1). Therefore, it results in a modest expected profit (3.14) as well as a modest level of expected downside risk (0.96) compared with strategy (1) (higher profit (3.84) and higher risk (1.06)) and to strategy (2) (lower profit (2.54) and lower risk (0.76)). The final selection among those Pareto optimal solutions are much more difficult than that in the
previous case, as all solutions are similar. In this case, suppose that decision-maker cannot accept a risk level higher than 1.00, then design strategy (3) becomes the final selection, which suggests that only one reactor should be purchased at the beginning, even though one reactor is not enough to meet the expected value of the product demand in the first year. From this case study, we have shown that the factors in the environment, such as technology evolution, can have a considerable effect on one’s design strategy as well as the current design decision. We see clearly from these examples that different environment conditions can significantly alter our design decisions and the process performance. Those changes can be explained intuitively that, as the arrival time of catalyst b moves earlier, reactors with a will more likely be obsolete earlier; thus it becomes better to sacrifice some demand opportunity and wait for the better catalyst b. We can see that it is important to incorporate these uncertainties in the environment when we make design and planning decisions. 6.3. Single objective vs. multiple objectives Consider the second case where catalyst b is about to appear sooner than for the benchmark problem. If we formulate this problem as a single-objective optimization where maximizing expected profit is our only
L. Cheng et al. / Computers and Chemical Engineering 27 (2003) 781 /801
797
Fig. 15. The histogram of the total profit for strategy (1).
Fig. 12. Change of technology evolution.
Fig. 13. Pareto optimal frontier of the second case.
objective, it is equivalent to eliminating the constraint on risk. The solution to the single-objective problem would be the same as design strategy (1) with the same maximal expected profit of 3.84. After finding the optimal design strategy, we apply the strategy to the Markov decision process, which leads to a controlled Markov chain as depicted in Fig. 11(A). We start ‘‘simulating’’ the controlled process by enumerating all possible values of random variables such as product demand and availability of b at each period. Each combination of those input variables constructs a ‘‘scenario path’’ that could physically occur in the future. Fig. 14 shows an example of the scenario path that represents one possible realization of the future: catalyst b will appear in the second year and the product demands for years 1, 2 and 3 are 2, 3 and 4, respectively. For each of those scenario paths, we can evaluate the actual present value of total profit, which is one possible outcome of the design strategy (1). After generating all
possible outcomes, we can draw the histogram of profit that reflects the possible values of actual profit and their relative likelihood, as shown in Fig. 15. Denote ni the actual total profit in the ith scenario path and pi the corresponding probability. The profit histogram shows that the mean of the total profit is given by ai pi ni 3:84 and the expected downside risk is given by ai pi max(3 ni ; 0)1:06; which are consistent with the results obtained from optimization in the previous section. As can be noticed in the profit histogram, although the expected profit is around 4.0, the actual profits that could possibly occur are mostly scattered on both sides of the expected value. There is only a very small chance that the actual profit will turn out to be close to the expected value, and actual profit is more likely to be much higher or much lower than the mean value. Therefore, using expected value as the only performance measure fails to capture the uncertainty involved in this problem. For example, the variability of the profits cannot be discovered by just looking at the expected value of profits. Another intrinsic problem with expected profit is that it assumes that decision-makers are risk neutral, which is misleading in this case because decision-makers are typically risk averse, i.e., they prefer a decision with a lower risk given the same expected profit. For example, a profit of 4 with certainty is preferred to two possible profits of 0 and 8 with 50% chance each. Therefore, a problem formulation with expected profit as the exclusive objective could lead to an ‘‘optimal’’ solution that results in an unacceptable risk. In this particular case, as can be seen from Fig. 15, with probability 0.50 we could fail to meet the target profit, and there is a probability of 0.06 that we could end up with a negative profit. Now we formulate this same problem as an optimization with two objectives: maximizing expected profit and minimizing expected downside risk. We add a constraint on the expected downside risk, say risk cannot exceed
Fig. 14. An example of one scenario path.
L. Cheng et al. / Computers and Chemical Engineering 27 (2003) 781 /801
798
Fig. 16. The histogram of the total profit for strategy (2).
0.8, and solve the ‘‘constrained’’ optimization problem. We can easily obtain the optimal solution from the Pareto optimal frontier in Fig. 13. The maximal expected profit is 2.54 for a risk level of 0.8, and the corresponding design strategy is (2). Again, we simulate the process given that strategy (2) is applied and obtain the histogram of profit shown in Fig. 16. By appending the risk constraint, we take into consideration our attitude toward risk. The risk constraint cuts off the original optimal solution obtained in the single-objective problem due to the high risk involved. The profits generated by design strategy (2) are distributed closely around the mean value of 2.54. There is only a 29% chance that the total profit will fall below the target, and there is zero chance that we will incur a loss. Compared with design strategy (1), strategy (2) yields a lower expected profit but also a lower expected downside risk. If we are strongly risk averse, in other words, we are willing to give up expected profit to reduce risk exposure, we will likely choose strategy (2) instead of strategy (1), which results in significantly different outcomes.
7. Conclusions In this work, we study process design and planning in the presence of uncertainties from a life cycle perspective. We formulate the design and planning under uncertainty as a multi-objective Markov decision process with a two-level decision at each period. We strongly believe that design problems are always multiobjective, i.e., they always involve trade-offs, and we have demonstrated with a simple problem that one will obtain unacceptable answers if one does not take this issue into account. The current design and future planning are integrated into a sequential decision process in which design decisions are made at the upper level periodically throughout the process life cycle. At each period, the production decisions are made at the lower level, upon the actual realization of some random variables, e.g., product demand, and constrained by earlier design decisions. The decision-makers thus seek a
design strategy, defined as a sequence of policies that prescribe the design decisions given the state information observed at each period, such that various objectives can be satisfied. The objectives we are concerned with in this paper are, but not limited to, expected profit, expected downside risk, and process lifetime. Our formulation explicitly incorporates into the decision process such uncertainties as technology evolution, e.g., a sequence of technology breakthroughs whose timings and magnitudes are uncertain, and market conditions such as product demand and price. The formulation results in a multi-objective Markov decision problem with recourse. We develop a rigorous solution strategy to solve the multi-objective Markov decision problem by finding the Pareto optimal design strategy. The original problem is first recast such that backward separability and monotonicity conditions are satisfied. Then a multi-objective stochastic dynamic programming approach is developed to decompose the original problem into a sequence of single-period subproblems. These subproblems are solved recursively backward in time to calculate and propagate the Pareto optimal frontier of the vectorvalued objectives and find the Pareto optimal policies for each period, which together define the Pareto optimal design strategy. The optimizations at each period are two-stage stochastic programs with recourse that can be reformulated into deterministic equivalent problems and solved by conventional optimization techniques. The effort in performing the computational tasks required in the rigorous solution procedure is so intensive that a rigorous solution strategy is practical only if the dimensions of the state space in the problem and the number of realizations of random variables are relatively small. In order to solve realistic problems with large dimensions, further efforts need to be devoted to solving large-scale Markov decision problems and stochastic programming problems. Recent research developments in these two fields, such as those following, have shortened the gap between theoretical results and practical applications and provided the foundation for further investigation on solving realistic problems. 1) The recent emergence of neuro-dynamic programming (NDP) puts forth a possibility to avert the ‘‘curse of dimensionality’’. Neuro-dynamic programming primarily focuses on suboptimal methods that center around the evaluation and approximation of the optimal ‘‘cost-to-go’’ function, possibly through the use of neural networks and/or simulation (Bertsekas & Tsitsiklis, 1996). Tsitsiklis and Van Roy (1996) develop a methodological framework and present a few different ways in which dynamic programming and compact representations can be combined to solve large-scale stochastic
L. Cheng et al. / Computers and Chemical Engineering 27 (2003) 781 /801
control problems that are intractable using classical dynamic programming. They design an approximation architecture involving two stages: feature extraction and function approximation. Over the past few years, NDP has generated several notable success stories, among which the most spectacular is the development of a world-class computer backgammon player by Tesauro (1992). Recently, several Ph.D. theses have been dedicated to solving large-scale Markov decision problems (Gordon, 1999; Marbach, 1998). 2) While two-stage stochastic programming problems are well known for being challenging both from theoretical and computational points of view, many research efforts have been devoted to the solution of stochastic programs, most of which have focused on stochastic two-stage linear programming problems. Recently, much attention has been directed to stochastic (mixed) integer programming, where integrality constraints are imposed on (some of) the first-stage and/or on the second-stage decision variables. A survey of results accomplished in the field of research on stochastic integer programming is given by Schultz, Stougie and van der Vlerk (1996). Laporte and Louveaux (1993) develop a branch-and-cut algorithm called the integer Lshaped method for problems with binary first-stage variables and arbitrary second-stage variables. Caroe and Tind (1998) propose a general framework for L-shaped decomposition of stochastic programs with integer recourse. The method is based on generalized Benders decomposition and general duality theory. Caroe and Schultz (1999) present an algorithm for solving stochastic integer programming problems with recourse, based on a dual decomposition scheme and Lagrangian relaxation. In summary, the contributions of this work are to formulate appropriately ‘‘design and planning under uncertainty’’ as a multi-objective sequential decision problem, or multi-objective Markov decision process, which integrates decisions at hierarchical levels that are made sequentially throughout the process life cycle and incorporates a variety of uncertainties in the operating environment such as market and technology. We develop a rigorous multi-objective stochastic dynamic programming algorithm to decompose the multi-objective Markov decision problem and obtain the Pareto optimal design strategy. The intensive computational requirement for the rigorous algorithm that arises from the ‘‘curse of dimensionality’’ plagues the solutions of realistic problems, which necessitates efficient approximation methods. In conclusions, we provide future directions for further investigation of solution strategies based on recent research development in the solution of
799
large-scale Markov decision problems and stochastic programs with recourse.
Acknowledgements The authors are grateful to the Department of Energy for support of this research under grant DE-FG0298ER14875.
Appendix A: Proof of Lemma 1 Proof 1. By definition, the performance index for the expected net present value I1 satisfies the separability condition, as it can be written in the following stage-wise additive form: N1 X t1 I1 E l vt v
t1
E [v1 v1
l E [v2 v2
l E [. . . ; l E [vN v3
vN
lvN1 ½sN ; uN ] . . . ½s3 ; u3 ]½s2 ; u2 ]½s1 ; u1 ];
(A:1)
where Evt [×½st ; ut ] is the conditional expectation with respect to vt , given the system state st and design selection ut , i.e., E [×½st ; ut ] vt
g
(×)p(vt ½st ; ut ) dvt :
(A:2)
vt
Therefore, the profit-to-go at period t is equal to the sum of the expected profit in period t and the expected profit-to-go at period t/1 discounted by l: N1 X t1 t f1 E vt l E l vt ½st ; ut vt
vt1 ;...;vN
tt1
E [vt lft1 1 ½st ; ut ]: vt
(A:3)
Since l /0, it can be seen from Eq. (A.3) that profit-togo at period t, ft1 increases monotonically as profit-togo at period t/1, ft1 increases, i.e., the profit index I1 1 is backward monotonic.
Appendix B: Proof of Lemma 2 Proof 2. By augmenting the new state wt , the downside risk criterion then can be written as I2 E [E [E [. . . E [max(nw ˆ N1 v1 v2 v3
vN
lN vN1 ; 0)½s˜N ; u˜N ] . . . ½s˜3 ; u˜3 ]½s˜2 ; u˜2 ]½s˜1 ; u˜1 ]:
(B:1)
L. Cheng et al. / Computers and Chemical Engineering 27 (2003) 781 /801
800
In other word, the ‘‘risk-to-go’’ at period t, defined as the expected downside risk for the remaining periods stating from period t, can be expressed as the expected value of risk-to-go at period t/1:
ft2 E E vt vt1 ;...;vN
N1 X t max (nw ˆ l vt ; 0 ½s˜t ; u˜t t1 )
˜t ]: E [ft1 2 ½s˜t ; u vt
tt1
(B:2)
Therefore, the index for downside risk is backward separable and monotonic by definition.
References Abo-Sinna, M. A., & Hussein, M. L. (1995). An algorithm for generating efficient solutions of multiobjective dynamic programming problems. European Journal of Operation Research 80 , 156 / 165. Balcer, Y., & Lippman, S. A. (1984). Technological expectation and adoption of improved technology. Journal of Economic Theory 34 , 292 /318. Bellman, R. E. (1957). Dynamic programming . Princeton, NJ: Princeton University Press. Bertsekas, D. P. (1995). Dynamic programming and optimal control , vol. I and II. Athena Scientific. Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neuro-dynamic programming . Athena Scientific. Bhatia, T. K., & Biegler, L. T. (1999). Multiperiod design and planning with interior point methods. Computers and Chemical Engineering 23 (7), 98 /1354. Birge, J. R., & Louveaux, F. (1997). Introduction to stochastic programming . Berlin: Springer. Bollen, N. P. B. (1999). Real options and product life cycles. Management Science 45 , 5. Caroe, C.C. Decomposition in stochastic integer programming . Ph.D. Thesis. Department of Operations Research, University of Copenhagen, 1998. Caroe, C. C., & Schultz, R. (1999). Dual decomposition in stochastic integer programming. Operations Research and Letters 24 , 37 /45. Caroe, C. C., & Tind, J. (1998). L-shaped decomposition of two-stage stochastic programs with integer recourse. Mathematical Programming 83 , 451 /464. Diwekar, U. M., & Kalagnanam, J. R. (1997). Efficient sampling technique for optimization under uncertainty. American Institute of Chemical Engineering Journal 43 (2), 440 /447. Dixit, A. K., & Pindyck, R. S. (1994). Investment under uncertainty . Princeton, NJ: Princeton University Press. Eberly, J. C., & Van Mieghem, J. A. (1997). Multi-factor dynamic investment under uncertainty. Journal of Economic Theory 75 , 345 /387. Eppen, G. D., Martin, R. K., & Schrage, L. (1989). A scenario approach to capacity planning. Operations Research 37 (4), 517 / 527. Epperly, T. G. W., Ierapetritou, M. G., & Pistikopoulos, E. N. (1997). On the global and efficient solution of stochastic batch plant design problems. Computers and Chemical Engineering 21 (12), 1411 / 1431. Gordon, G.J. Approximate solutions to Markov decision processes . Ph.D. Thesis. School of Computer Science, Carnegie Mellon University, 1999.
Harrison, J. M., & Van Mieghem, J. A. (1999). Multi-resource investment strategies: operational hedging under demand uncertainty. European Journal of Operations Research 113 , 17 /29. Ierapetritou, M. G., & Pistikopoulos, E. N. (1995). Design of multiproduct batch plants with uncertain demands. Computers and Chemical Engineering Supplement 19 , s627 /s632. Ierapetritou, M. G., Acevedo, J., & Pistikopoulos, E. N. (1994). Optimization approach for process engineering problems under uncertainty. Computers and Chemical Engineering Proceedings of the Fifth International Symposium on Process and System Engineering 20 (6 /7), 703 /709. Kall, P., & Wallace, S. W. (1994). Stochastic programming . New York: Wiley. Laporte, G., & Louveaux, F. V. (1993). The integer L-shaped method for stochastic integer programs with complete recourse. Operations Research and Letters 13 , 133 /142. Li, D. (1990). Multiple objective and non-separability in stochastic dynamic programming. International Journal of Systems Science 21 (5), 933 /950. Li, D., & Haimes, Y. Y. (1987). The envelope approach for multiobjective optimization problems. IEEE Transactions on Systems, Man and Cybernetics, SMC-17 6 , 1026 /1038. Li, D., & Haimes, Y. Y. (1989). Multiobjective dynamic programming: the state of the art. Control Theory and Advanced Technology 5 (4), 471 /483. Marbach, P. Simulation-based methods for Markov decision processes . Ph.D. Thesis. Department of Electrical Engineering and Computer Science, MIT Press, Cambridge, 1998. Markowitz, H. (1959). Portfolio selection . New Haven, CT: Yale University Press. Miettinen, K. (1999). Nonlinear multiobjective optimization . Dordrecht: Kluwer Academic Publishers. Nair, S. K. (1995). Modeling strategic investment decisions under sequential technological change. Management Science 41 , 282 / 287. Nair, S. K. (1997). Identifying technology horizons for strategic investment decisions. IEEE Transactions on Engineering Management 44 (3), 227 /236. Nair, S. K., & Hopp, W. J. (1992). A model for equipment replacement due to technological obsolescence. European Journal of Operations Research 63 , 207 /221. Petkov, S. B., & Maranas, C. D. (1998). Design of single-product campaign batch plants under demand uncertainty. American Institute of Chemical Engineering Journal 44 (4), 896 /910. Pistikopoulos, E. N., & Ierapetritou, M. G. (1995). Novel approach for optimal process design under uncertainty. Computers and Chemical Engineering 19 (10), 1089 /1110. Puterman, M. L. (1994). Markov decision processes: discrete stochastic dynamic programming . New York: Wiley. Rajagopalan, S. (1998). Capacity expansion and equipment replacement: a unified approach. Operations Research 46 (6), 846 /857. Rajagopalan, S., Singh, M. R., & Morton, T. E. (1998). Capacity expansion and replacement in growing markets with uncertain technological breakthroughs. Management Science 44 (1), 12 /30. Ringuest, J. L. (1992). Multiobjective optimization: behavioral and computational considerations . Dordrecht: Kluwer Academic Publishers. Schultz, R., Stougie, L., & van der Vlerk, M. H. (1996). Two-stage stochastic integer programming: a survey, statistica neerlandica. Journal of the Netherlands Society for Statistics and Operations Research 50 (3), 401 /416. Shapiro, A., & Homem-de-Mello, T. (1998). A simulation-based approach to two-stage stochastic programming with recourse. Mathematical Programming 81 , 301 /325. Steuer, R. E., Gardiner, L. R., & Gray, J. (1996). A bibliographic survey of the activities and international nature of multiple criteria
L. Cheng et al. / Computers and Chemical Engineering 27 (2003) 781 /801 decision making. Journal of Multi-Criteria Decision Analysis 5 (3), 195 /217. Subrahmanyam, S., Pekny, J. F., & Reklaitis, G. V. (1994). Design of batch chemical plants under market uncertainty. I and EC Research 33 (11), 2688 /2701.
801
Tesauro, G. J. (1992). Practical issues in temporal-difference learning. Machine Learning 8 , 257 /277. Tsitsiklis, J. N., & Van Roy, B. (1996). Feature-based method for large scale dynamic programming. Machine Learning 22 , 59 /94.