Capturing incomplete information in resource allocation problems through numerical patterns

Capturing incomplete information in resource allocation problems through numerical patterns

European Journal of Operational Research 197 (2009) 50–58 Contents lists available at ScienceDirect European Journal of Operational Research journal...

249KB Sizes 0 Downloads 71 Views

European Journal of Operational Research 197 (2009) 50–58

Contents lists available at ScienceDirect

European Journal of Operational Research journal homepage: www.elsevier.com/locate/ejor

Discrete Optimization

Capturing incomplete information in resource allocation problems through numerical patterns Arun Marar *, Warren B. Powell Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544, United States

a r t i c l e

i n f o

Article history: Received 3 April 2007 Accepted 4 June 2008 Available online 13 June 2008 Keywords: Combinatorial optimization Decision support systems Knowledge-based systems Large scale optimization Logistics

a b s t r a c t We look at the problem of optimizing complex operations with incomplete information where the missing information is revealed indirectly and imperfectly through historical decisions. Incomplete information is characterized by missing data elements governing operational behavior and unknown cost parameters. We assume some of this information may be indirectly captured in historical databases through flows characterizing resource movements. We can use these flows or other quantities derived from these flows as ‘‘numerical patterns” in our optimization model to reflect some of the incomplete information. We develop our methodology for representing information in resource allocation models using the concept of pattern regression. We use a popular goodness-of-fit measure known as the Cramer–Von Mises metric as the foundation of our approach. We then use a hybrid approach of solving a cost model with a term known as the ‘‘pattern metric” that minimizes the deviations of model decisions from observed quantities in a historical database. We present a novel iterative method to solve this problem. Results with real-world data from a large freight railroad are presented. Ó 2008 Elsevier B.V. All rights reserved.

1. Introduction A challenge in modeling a real-world system such as a complex freight logistics operation is the problem of optimizing the system under incomplete information. Normally we solve a cost function to reflect actual behavior. For complex problems such a methodology is inadequate because of elements of data not observable to the modeler. In some cases, however, this incomplete information is revealed, although imperfectly, through past decisions made by humans since typically in actual operations humans make decisions with more information than is available to the modeler. As most freight logistics operations are modeled as resource allocation problems, information regarding actual decisions are represented in a historical database as flows characterizing resource movements. Based on a desired observation statistic, these flows may be aggregated or used to derive certain quantities that capture some of the missing information. Such quantities are known as ‘‘numerical patterns” implying there is a natural ordering among them. The following examples illustrate some real-world situations where the modeler does not have complete information:  In a trucking operation the average length of haul for sleepers is high, but that does not mean sleepers are not assigned to short * Corresponding author. Tel.: +1 609 575 9641. E-mail address: [email protected] (A. Marar). 0377-2217/$ - see front matter Ó 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.ejor.2008.06.002

loads. We can observe from history the actual distribution of the length of the load assigned to a particular sleeper. In this case, an observation statistic is a particular sleeper and the numerical pattern is the length of its load.  In a locomotive scheduling model every train requires a certain amount of locomotive power based on the tonnage of the train and the terrain over which the track is laid. There may be a steep grade on the train route that warrants the use of more locomotive power than that calculated based on tonnage considerations. The modeler does not have this information in his database, but can observe in a historical database a pattern of over-powered trains (labeled so relative to the power calculated by the modeler) over certain segments of the train route through a statistic known as the ‘‘horsepower per trailing ton”, which reflects the ratio of the amount of locomotive power (expressed in horsepower) to the tonnage of the train. Here, the observation statistic is a train and the numerical pattern is the ‘‘horsepower per trailing ton” assigned to the train in history. A method that takes into account missing elements of data is to solve the optimization model under uncertainty using stochastic programming techniques (Birge, 1997). These techniques are useful when we know what the missing elements of data are and can observe them after the fact. In our problem we are dealing with data elements that are not directly observable, even after the fact. There is past research in the realm of inverse optimization (Ahuja and Orlin, 2001) that uses observable decisions from a

51

A. Marar, W.B. Powell / European Journal of Operational Research 197 (2009) 50–58

real-world system in an optimization model. Inverse optimization algorithms aim to perturb the cost parameters of a model while minimizing this perturbation, such that the observed decisions are optimal with respect to the perturbed cost parameters. Inverse optimization techniques cannot be used to solve our problem since these methods assume the observed solutions from history to be optimal whereas in applications such as ours even if they are made with complete information, decisions may be suboptimal, due to the humans’ inability to process large amounts of data. To capture incomplete information in an optimization model we look at pattern recognition systems to represent observed decisions from a database in a model since a pattern is simply a set of data elements and the decision pertaining to this set of data elements. The focus of this paper is to present a modeling framework that combines the use of cost functions (bottom-up representation of operations that can be quantified as costs) with a representation of historical decisions (top-down representation of operations that are captured through patterns). The main contribution of this paper is that we introduce and study for the first time integrating an engineering cost model with historical patterns that represent quantities or flows as a way of capturing incomplete information. The paper is organized as follows: in Section 2, we present the principles of pattern recognition and the basis of our approach. In Section 3, we introduce the empirical distribution function (EDF) and its relation to a popular goodness-of-fit metric for continuous distributions. Using this we present the model formulation for representing information using a function known as the ‘‘pattern metric” that penalizes deviations of model decisions from observed quantities in a historical database. In Section 4, we present our hybrid approach of solving a cost function along with a pattern metric. In Section 5, we present the experimental design and the measures of model performance that can be used to evaluate our methodology. We present results of our research using real-world data from a large freight railroad. In Section 6, we present our conclusions.

2. Pattern regression for resource allocation models The historical database for a typical logistics operation consists of a sequence of decisions or actions in time and elements of data governing these decisions. It is natural to look at pattern recognition systems to mimic human behavior since they aim to identify a decision with a particular set of data elements. Pattern recognition is a mature area and extensive treatment of these techniques are found in (Devroye et al., 1996). In the event where the pattern is a number such as a flow or a quantity, for instance, the horsepower per trailing ton, the term ‘‘pattern regression” is used instead of pattern recognition. The traditional method of optimization using cost functions has the ability to handle vectors of large dimensionality since optimization models are useful in capturing the global effects of a decision. However, we are limited in our ability to observe data elements with the same level of detail from a historical database. For this reason we cannot adopt a pure pattern recognition approach to solve our problem, but need to retain the cost function approach and complement it with pattern recognition techniques. Consider the case of observing the locomotive power assigned to a particular train from history as depicted in Fig. 1a. Such a distribution is very common in history since we work with simple states that do not convey all the information underlying the decision. For instance in Fig. 1a we may not have all the information regarding why a particular train was assigned a horsepower-pertrailing ton of 4.0 on a certain day instead of the more likely value of 2.0. It is possible that a typical optimization model might produce the distribution shown in Fig. 1b. This distribution not only does not have the same mean, it does not match any of the higher moments. We could try to force the optimization model to match the mean, but the result might be the distribution shown in Fig. 1c. The problem with just matching the mean is that we are trying to force every observation to match the mean. Since the actual distribution is much wider, clearly it is acceptable to deviate

1

b

Density function from optimization model

Density function from history

a

1.5

2.0

2.5

3.0

3.5

4.0

1

Density function from optimization model (incorporating historical mean)

c

1

1.5

2.0

2.5

3.0

3.5

4.0

Horsepower per trailing ton

Horsepower per trailing ton

Historical

1.5

2.0

Mean

2.5

3.0

3.5

4.0

Horsepower per trailing ton Fig. 1. (a) and (b) Observed horsepower in history and model. (c) Using the historical mean as information in the optimization model.

52

A. Marar, W.B. Powell / European Journal of Operational Research 197 (2009) 50–58

from the mean, but we want the overall average to match the historical average. While regression in its strictest sense identifies a number with a state, due to missing information defining the state we are required to consider matching an entire distribution of a flow or quantity. We apply the principles of pattern regression to the class of resource allocation problems. We use the superscript ‘h’ to indicate whenever we are working with a historical database or data pertaining to past events in general. Thus, we have

3. The pattern metric In this section, we develop a framework for representing information pertaining to the pattern regression distribution for a quantity as derived in (2). To develop this methodology we need to introduce the concept of an empirical distribution function (EDF) (Read and Cressie, 1988) and see how it relates to the decision variables in the optimization model. We define Ns

a Ah Da

vector of attributes for a resource attribute space of a in history set of decisions d that can be applied to a resource with attribute vector a number of times decision d is applied to a resource with attribute vector a where a 2 Ah and d 2 Da

xhad

For instance, in a locomotive scheduling model, a locomotive state could be as shown below:

a ¼ fLocation; Time; Locomotive ID; Locomotive Type; Horsepowerg:

ð1Þ

^sk ðxÞ q

number of times the observation statistic s 2 S is observed in the model the quantity corresponding to the kth instance of the observation statistic s 2 S, k 2 f1; 2; . . . ; Ns g, in the optimization model as a function of the model decision variable x

It should be noted that the number of times the observation statistic s 2 S is observed in the model is in general different from the number of times the observation statistic is observed in history. For example, we may observe a train leaving Los Angeles for Kansas 20 times in our historical database, but we may see the same train only five times in the model. We can generate the order statistics of the quantities evaluated over all the observations as shown below:

We define a family of observation statistics denoted by the set S. An example of an observation statistic is a train departing Los Angeles for Kansas City. For each instance of the observation statistic there are, in general, multiple pairs of ða; dÞ that are associated with this instance. If we are concerned with the total flow of locomotives any locomotive assigned to the train from Los Angeles to Kansas will pertain to that instance of the observation statistic. We also define:

^sðkÞ which is standard in statistics literature is interThe notation q preted as the kth largest quantity observed among all the instances of the observation statistic s 2 S in the optimization model. The ^sðkÞ in empirical distribution function (EDF) for the order statistics q the optimization model is given by:

Nhs

number of observations of the observation statistic s 2 S in a historical database set of all pairs ða; dÞ in a historical database pertaining to the kth observation of the observation statistic s 2 S, k 2 f1; 2; . . . ; Nhs g, d 2 Da , a 2 Ah

^sðkÞ ðxÞÞ ¼ F s ðq

We assume that for each observation statistic s 2 S and for ^hsk each observation k of the statistic s we generate a quantity q ^” to indicate based on a desired objective. We use the notation ‘‘q that sometimes the quantities may be derived from aggregations of state–decision pairs. For example, for a particular train leaving Los Angeles for Kansas a quantity derived could be the total horsepower assigned to the train. In our problem the quantities that we are typically interested in are derived from observed flows of resources derived from a historical database in a manner as shown:

k¼1

Lhsk

^hsk ¼ q

X

qsad xhad ;

8k 2 f1; 2; . . . ; Nhs g;

8s 2 S;

ð2Þ

8ða;dÞ2Lhsk

qsad is the contribution of the state-action pair ða; dÞ in the quantity for the observation statistic s 2 S. To illustrate, consider a locomotive state as shown in (1). A typical decision d can be characterized by a train as shown below:

d ¼ fOrigin; Destination; Train ID; Train Type; Tonnageg: The decision variable xhad takes a value 1 in Eq. (2) if we assign this locomotive to the train, otherwise its value is 0. For each observation statistic s 2 S, the cumulative probability function underlying the vector of quantities derived using Eq. (2) for all k 2 f1; 2; . . . ; N hs g characterizes the pattern regression distribution denoted by ws for the observation statistic s 2 S. The goal of this research is to incorporate the pattern regression distribution ws in an optimization framework since this reveals some of the missing information not captured in a cost function. In the next section, we show how we can define a distance metric for ws based on popular goodness-of-fit metrics in statistics.

^sð1Þ ðxÞ 6 q ^sð2Þ ðxÞ 6    6 q ^sðNÞ ðxÞ where N ¼ Ns : q

k ; Ns

8k 2 f1; 2; . . . ; Ns g;

8s 2 S:

ð3Þ

One of the popular goodness-of-fit measures based on the EDF is the Cramer–Von Mises test statistic given in functional form by:

DCV s ðxÞ ¼

Ns  X

^sðkÞ ðxÞÞ  ws ðq

2k  1 2Ns

2 ;

8s 2 S:

ð4Þ

We are interested in a function that represents the pattern regression distribution ws exactly in an optimization framework. Clearly, this function is closely related to a goodness-of-fit statistic on the model decision variables that attempts to infer if the sample of data (here the quantities assigned in the model to all instances of the observation statistic s 2 S) is derived from a historical distribution (here the pattern regression distribution ws ). Alternately we can look at optimizing a function similar to a goodness-of-fit statistic to capture information from a historical distribution. This function, that penalizes deviations of the corresponding quantities in the model from the observed historical distribution, is referred to as the pattern metric. The C–V goodness-of-fit measure in (4) is a function of the decision variables in the optimization model. The discontinuous nature of the pattern regression distribution ws in the decision variables poses problems if the exact form of the test statistic is used in an optimization model. Even if we approximate ws by a continuous distribution and fix the order statistics a priori we have to deal with nonconvexity issues since cumulative probability functions are in general nonconvex. We note that ws is a cumulative probability function and hence is a monotonically increasing function in the order statistics of the decision variables, that is, we have

^sð1Þ ðxÞÞ 6 ws ðq ^sð2Þ ðxÞÞ 6    6 ws ðq ^sðNÞ ðxÞÞ where N ¼ Ns ; ws ðq

8s 2 S: Instead of using a function that measures the difference between the cumulative distribution function and the empirical distribution function we propose to work with the following function:

53

A. Marar, W.B. Powell / European Journal of Operational Research 197 (2009) 50–58

e CV ðxÞ ¼ D s

 2 Ns  X 2k  1 ^sðkÞ ðxÞ  w1 q ; s 2N s k¼1

8s 2 S;

ð5Þ

where we use w1 to denote the inverse cumulative function. We s note that since the order statistics themselves have a monotone property we can use a distance metric that measures deviations of the order statistics of the quantities in the model from the corresponding order statistics derived from a historical database given by: 1 ^h q sðkÞ ¼ ws

  2k  1 ; 2Ns

8k 2 f1; 2; . . . ; Ns g;

8s 2 S:

ð6Þ

If we fix the order statistics a priori, the function indicated in Eq. (5) is convex. We assume that the modeler solves a functional approximation of the real-world objective by an engineering cost function CðxÞ that is convex in x. We present the optimization model in a pattern metric as shown below:

x ðhÞ ¼ arg min CðxÞ þ h

Ns XX 2 ^sðkÞ ðxÞ  q ^h ðq sðkÞ Þ

ð7Þ

s2S k¼1

subject to:

^s/ðkÞ ðxÞ the quantity corresponding to the /sk th instance of the q observation statistic s 2 S in the model, k 2 f1; 2; . . . ; Ns g, /2U Solving the model as a combinatorial problem is extremely difficult for two reasons. The first reason is the size of the set U (for an observation statistic s 2 S, jUs j ¼ ðNs Þ!) which can explode for large scale resource allocation problems. The second reason is that for each index / 2 U we require solving a constrained optimization problem further complicated by the order statistics constraints. There is a technique that is suited for highly nonconvex problems introduced in Pardalos, 1989. In this method, a number of ‘‘starting points” are evaluated separately using a modified version of the Frank–Wolfe method (Frank and Wolfe, 1956) that ensures convergence to a local optimum. By evaluating a number of starting points the algorithm attempts to arrive at a local optimum that is close to the global optimum. We propose an iterative model that solves a sequence of simpler models that do not require the constraints defining the order statistics. Thus, at an iteration m of this sequence we solve the following model as shown below:

Ax ¼ b; x P 0;

xm ðhÞ ¼ arg min CðxÞ þ h

where h is a positive scaling factor that balances the tradeoff between the cost function and the pattern metric. We use x to denote the complete vector of decision variables fxad ga2A;d2Da . The concept behind the optimization in Eq. (7) is that it combines engineering costs with prior decisions which capture information that is not available to the modeler. In the next section, we present our methodology to solve the problem in (7). 4. Optimizing the model with a pattern metric In this section, we present our methodology to solve the model with a pattern metric indicated in Eq. (7). We introduce the notation defining the order statistics of the quantities and then present an algorithm that we can use to solve the optimization model with a pattern metric. We also present some theoretical properties pertaining to the algorithm. 4.1. The algorithm The order statistics of the quantities cannot be fixed in the model in (7) since we do not know the value of the decision variables a priori. This implies that our problem of representing information pertaining to the numerical patterns is a combinatorial problem. For each observation statistic s 2 S we define

Us /s /sk

the set of permutations of the index vector f1; 2; . . . ; N s g a generic permutation of the index vector f1; 2; . . . ; N s g, /s 2 Us the kth element of the generic permutation /s , k 2 f1; 2; . . . ; Ns g

For example, if we observe a train from Los Angeles to Kansas City five times the cardinality of the index set corresponding to this observation statistic is five and a particular permutation of this index vector could be {2, 5, 4, 1, 3}. In this case the third element of this permutation is the fourth instance of the train observed in the optimization model. We denote / ¼ f/s gs2S as a generic permutation vector over the set of observation statistics S. We define the set: U ¼ s2S Us with element / 2 U. We then define the quantities under the permutation vector / as shown below:

Ns XX 2 ^s/ðkÞ ðxÞ  q ^h ðq sðkÞ Þ

ð8Þ

s2S k¼1

subject to:

/ ¼ /m1 2 U; Ax ¼ b; x P 0: The permutation /m1 defines the order statistics of the following vector of quantities:

^m1 ¼ fq ^m1 q sk gs2S;

k2f1;2;...;N s g ;

which is derived from the solution vector xm1 . We assume we have an initial feasible solution (an obvious choice is the solution to the model without the specification of the pattern metric: min CðxÞ subject to: Ax ¼ b; x P 0). We show in the next section that by solving the sequence of models indicated in (8), we obtain convergence to a stationary point in a finite number of iterations. 4.2. Properties The practical applicability of our algorithm is established in the following theorem which states that even though the order statistics are not specified as constraints, the pattern metric evaluated over the new order statistics (since the ordering may change if the constraints are not specified) is guaranteed to be less than the pattern metric evaluated for the order statistics of the current solution. We prove the following theorem: Theorem 4.1. Given a ^sk gk2f1;2;...;Ns g we have: fq

min 0 / 2Us

vector

of

strictly

positive

solutions

Ns Ns X X 2 2 ^s/0 ðkÞ  q ^h ^sðkÞ  q ^h ðq ðq sðkÞ Þ ¼ sðkÞ Þ : k¼1

k¼1

Proof. We first note that: Ns Ns X X ^s/0 ðkÞ Þ2 ¼ ^sðkÞ Þ2 ; ðq ðq k¼1

8/0 2 Us ;

k¼1

^sðkÞ gk2f1;2;...;Ns g and fq ^s/0 ðkÞ gk2f1;2;...;Ns g are permutasince the vectors fq tions of each other for all /0 2 Us . Thus, we have:

54

A. Marar, W.B. Powell / European Journal of Operational Research 197 (2009) 50–58

Ns Ns X X 2 2 ^s/0 ðkÞ  q ^h ^sðkÞ  q ^h ðq ðq sðkÞ Þ  sðkÞ Þ k¼1

From Eq. (12) it follows that:

k¼1

¼ 2

Ns X

Cðxmþ1 Þ þ h ^h ^ 0 ^ q sðkÞ ðqs/ ðkÞ  qsðkÞ Þ;

8/0 2 Us :

ð9Þ

Ns XX 2 ^s/mþ1 ðkÞ  q ^h ðq sðkÞ Þ s2S k¼1

k¼1

Using Lemma 6.1, proven in the appendix, where we set n ¼ Ns , ^h ^s , y ¼ q x¼q s and Un ¼ Us we see that the right hand side of Eq. (9) is nonnegative. Thus, it follows that: Ns Ns X X 2 2 ^s/0 ðkÞ  q ^h ^sðkÞ  q ^h ðq ðq sðkÞ Þ P sðkÞ Þ ; k¼1

8/0 2 Us :

k¼1

^sðkÞ gk2f1;2;...;Ns g is a valid permutation on the decision variSince fq ^sk gk2f1;2;...;Ns g we have the following relation: ables fq

min 0 / 2U s

Ns X

2 ^s/0 ðkÞ  q ^h ðq sðkÞ Þ ¼

k¼1

Ns X 2 ^sðkÞ  q ^h ðq sðkÞ Þ ; k¼1

which is the statement required to be proved. h In the next section, we state and prove convergence of our algorithm to a stationary point. Proposition 4.2. If the feasible region of the model indicated in (8) is compact and nonempty, the sequence of optimal solutions obtained from solving Eq. (8) converges to a stationary point in a finite number of iterations. Proof. The feasible region of the model indicated in (8) is denoted by X. Since X is compact and nonempty there exists a minimum for any convex, continuous function minimized over X. At any iteration m of our methodology we solve the following problem:

xm ¼ arg

min

x2X;/¼/m1

CðxÞ þ h

Ns XX 2 ^s/ðkÞ  q ^h ðq sðkÞ Þ :

ð10Þ

s2S k¼1

6 Cðxm Þ þ h

Ns XX 2 ^s/m ðkÞ  q ^h ðq sðkÞ Þ : s2S k¼1

We can see that the objective function corresponding to xm monotonically decreases with iteration m, m P 1. As we are minimizing a convex objective function over the compact set X we are guaranteed convergence of the sequence of objective functions from solving Eq. (8) to a stationary point. Note that from Eq. (10) the only difference between solving the model at two different iterations is that the permutation vector defining the order statistics of the quantities may differ. The possible number of permutations is clearly finite and this implies that the sequence of permutation vectors generated from solving (8) converges to a stationary point in a finite number of iterations. ~ ~ < 1 such that /m~ ¼ /mþ1 . From Eq. (10) we can imply Thus, 9m ~ ~ m mþ1 and we have the statement in our proof. h that x ¼ x In the next section, we study an experiment that tests this methodology. 5. Experimental design and results Three questions arise in our research that need to be tested experimentally:  What is the improvement gained by using historical patterns?  What is the effect of aggregation of historical patterns on the benefit from using historical patterns?  How effectively does our methodology in (8) use information derived from historical patterns?

We have using Theorem 4 the following inequality:

Cðxm Þ þ h

Ns XX ^m ^h 2 ðq s/m ðkÞ  qsðkÞ Þ s2S k¼1

6 Cðxm Þ þ h

Ns XX 2 ^m ^h ðq q sðkÞ Þ : s/m1 ðkÞ

ð11Þ

The experiment we present utilizes real-world data representing the locomotive management problem of a large freight railroad. Thus, in our laboratory experiment, we create our ‘‘history” by e ðxÞ optimally as a deterministic, flow-balancing solving a function C network model with complete information as shown below:

s2S k¼1

The left hand side of Eq. (11) gives the objective for the model whose solution is xm since the pattern metric reflects the order statistics of the quantities derived from this solution. Now consider the model solved at iteration m þ 1. We have:

xmþ1 ¼ arg min CðxÞ þ h

Ns XX 2 ^s/ðkÞ  q ^h ðq sðkÞ Þ ; s2S k¼1

where / ¼ /m . Since xm is a feasible solution to the above problem we have

Cðxmþ1 Þ þ h

Ns XX 2 ^s/m ðkÞ  q ^h ðq sðkÞ Þ

e ðxÞ; xh ¼ arg min C x2X

where the feasible region X is assumed to be compact. We do have access to an actual database of historical decisions but due to the complexity of actual operations we are unable to measure the quality of these solutions and hence we have to create our ‘‘history” artificially. e and C we Since the feasible region is identical when solving C have N hs ¼ N s for each observation statistic s 2 S. We construct our pattern regression distribution ws for each s 2 S from the historical quantities generated using the methodology detailed in Section 2. We then use:

s2S k¼1 Ns XX 2 ^s/m ðkÞ  q ^h 6 Cðxm Þ þ h ðq sðkÞ Þ :

ð12Þ

s2S k¼1

Again using Theorem 4 for the order statistics pertaining to the quantities derived from solution vector xmþ1 we have

Cðxmþ1 Þ þ h

Ns XX 2 ^s/mþ1 ðkÞ  q ^h ðq sðkÞ Þ s2S k¼1

6 Cðxmþ1 Þ þ h

Ns XX 2 ^s/m ðkÞ  q ^h ðq sðkÞ Þ : s2S k¼1

1 ^h q sðkÞ ¼ ws



 2k  1 ; 2N s

8k 2 f1; 2; . . . ; N s g;

8s 2 S

^h ¼ fq ^h to obtain the desired vector of quantities q sðkÞ gs2S;k2f1;2;...;Ns g that represents our incomplete information in the pattern metric. The set of observation statistics is chosen such that the numerical patterns are represented at a higher level of aggregation of the e . This makes state and decision than is represented in either C or C our experiment realistic since we are unable to observe patterns from a historical database at the same level of detail that is captured in a cost function. The optimization model with the pattern metric is given by:

55

A. Marar, W.B. Powell / European Journal of Operational Research 197 (2009) 50–58

x ðhÞ ¼ arg min CðxÞ þ h x2X

Ns XX 2 ^sðkÞ ðxÞ  q ^h ðq sðkÞ Þ :

ð13Þ

s2S k¼1

Solving the model in (13) using the traditional approach (that is e since the objech ¼ 0) yields a suboptimal solution to the model C e consists of cost parameters that are not considered in C. We tive C are interested in evaluating the improvement over this solution when the model is solved with a pattern metric (that is h > 0 in (13)). Using the traditional method with just the cost function (that is setting h ¼ 0 in Eq. (13)) we have

x ð0Þ ¼ arg min CðxÞ: x2X

We solve a sequence of deterministic, flow-balancing models where at each index m of the sequence we solve:

xm ðhÞ ¼ arg

min x2X;/¼/

m1

CðxÞ þ h

Ns XX 2 ^s/ðkÞ ðxÞ  q ^h ðq sðkÞ Þ ;

ð14Þ

We set /0 to the permutation pertaining to the order statistics of the solution x ð0Þ. As we showed in Proposition 4 the above sequence of optimal solutions converges to a stationary point. We denote the convergent solution to the sequence in (14) by x ðhÞ. We define our measure of model performance by the improvement ratio which is given by:

e ðx ðhÞÞ e ðx ð0ÞÞ  C C : e ðxh Þ e ðx ð0ÞÞ  C C

An improvement ratio equal to 1 means that for a particular value of h we have obtained the optimal solution using our methodology of (14). We claim that by solving the optimization model with a pattern metric as indicated in (14) we are able to get a positive improvement for a particular h, that is we have gIMP ðhÞ > 0. In the following sections we present the experimental setup detailing the generation of the vector xh and consequently the quantities that are represented in a pattern metric and the results of our research. 5.1. Experimental setup Our experimental setting is a locomotive management problem for a large railroad. The resource attribute state at for the train is as shown:

at ¼ fTrain ID; Origin; Destination; Tonnage; Train Typeg; at 2 At ; At is the space of possible attribute vectors for a train resource. The resource attribute state of the locomotive denoted by al is as shown below:

al ¼ fLocomotive Type; Location; HorsepowerðHPÞg;

al 2 Al ;

where we use Al to denote the attribute space of the locomotive attribute vector al . The set of decisions pertaining to a train resource with attribute vector a 2 At is as shown below:

Dta ¼ [fal 2Al :al

Location

Origin

¼aOrigin g

 fAssign train to locomotive with attribute vector al g; where alLocation denotes the ‘‘Location” component of the attribute vector al . aOrigin denotes the attribute ‘‘Origin” of the train. Thus, the decision set Dta consists of decisions where each decision means assigning to the train a locomotive with a specific attribute vector al 2 Al such that the location of the locomotive is same as the origin of the train. The set of decisions pertaining to a locomotive resource attribute vector a 2 Al is as shown below:

g

fAssign the locomotive to train with attribute vector at g: The experimental data is obtained from a major railroad and typically represents the data over a time horizon of a week. The data consisted of 10,557 train segments (each segment is a unique ‘‘Train ID”) connecting 178 locations in the network, 12 train types and 8 locomotive types. We consider two types of engineering costs, a positive cost denoted by the vector ca ¼ fcaad ga2Al ;d2Dla (the cost of assigning locomotives to trains) and a negative cost denoted by the vector cr ¼ fcrad ga2At ;d2Dta (the reward of assigning power to a train). The objective function characterizing C has a quadratic form as shown below:

CðxÞ ¼

X

X

a2Al

0

s2S k¼1

m 2 f1; 2; . . .g:

gIMP ðhÞ ¼

Dla ¼ [fat 2At :aLocation ¼at

caad xad

d2Dla

2 0 1 0 12 X BX 6 @X r @ dHP xad Acad  @ dHP xad A 42 a2At

d2Dta

d2Dta

31 crad

7C 5A: aTonnage  b ð15Þ

Please note that the decision set Dta is simply a set of locomotive states and hence we use dHP to denote the horsepower of the locomotive. b is a positive parameter characterizing the ‘‘horsepower per trailing ton” ratio. The quadratic form for the revenue function of assigning power to a train makes sure that the function achieves its maximum when the total horsepower assigned is equal to the product of the tonnage on the train and the horsepower-per-trailing-ton. e ~ for the objective function C The corresponding data ~ca , ~cr and b characterizing the real-world are derived by perturbing the vectors ca , cr and b by random perturbations da , dr and dba in a biased sense, that is, the expectations of these random perturbations are not equal to 0. We also assume that the random vectors da , dr and dba are not observable to the modeler even after the fact. e to generate our history xh . In our experiment the We solve C e are allowed to deviate positive cost parameters characterizing C uniformly between 100% and 200% of their respective values in C. The negative costs are allowed to deviate uniformly between 75% and 150% of their corresponding values in C. dba is a random multiplier, uniform in the range [0.7, 1.6]. The difference between e and the parameters characterizing the parameters characterizing C C is assumed to be completely unknown to the modeler. We consider aggregated train attribute states in our experiment to capture loss of information when representing the numerical patterns in our experiment. The state with the most detail represented by a train state at 2 At is denoted by the level O–D–T–ID and consists of the attributes ‘‘Origin”(O), ‘‘Destination”(D), ‘‘Train Type”(T) and ‘‘Train ID”(ID). The aggregated states that we consider in our experiment are the O–D–T and O–D levels. The modeler’s estimate of the ‘‘horsepower per trailing ton” (HPT) for each train is given by b. However, in the real-world the actual HPT may be different from b and this difference may be a function of a level of aggregation. Thus, when we generate our ‘‘history” if we generate patterns where the HPT information is a function of all the attributes at the O–D–T level the numerical patterns of interest are represented only at the O–D–T level (complete information) and O–D level (loss of information). In the former each observation statistic refers to an O–D–T level and in the latter each observation statistic is at the O–D level. Thus, in our methodology of capturing incomplete information contained in the realworld using numerical patterns we represent the patterns at the same level or a higher level of aggregation than the level at which we generated the HPT data. This approach is encapsulated in the Table 1.

56

A. Marar, W.B. Powell / European Journal of Operational Research 197 (2009) 50–58

The quantity that we observe from our history and represented as numerical patterns is the total horsepower assigned to a train P given by d2Dta dHP xh ad . We derive the order statistics for these quantities based on the set of observation statistics which are generated at the same or higher level of aggregation with respect to level at which the HPT data is generated.

5.2. Summary of results All the models solved are quadratic programming problems and are solved on a SunOS5.8 platform using LOQO which utilizes interior-point methods to solve linear and quadratic programming problems (Vanderbei, 1999). Each model solved involved 95,109 variables and 12,077 constraints. In our experiment testing our methodology, convergence is achieved if the iteration-specific patP P s ^m ^h 2 ðq tern metric given by s2S Nk¼1 s/ðkÞ  qsðkÞ Þ , m 2 f1; 2; . . .g where / 2 U is the permutation vector that defines the order statistics of ^m the solution vector fq sk gs2S;k2f1;2;...;Ns g , converges to within two significant digits. We present the results of our experimental scenarios in Table 2. Even in the case where there is loss of information due to aggregation of patterns we see that we have a positive improvement ratio. This clearly shows that our methodology offers practical benefits in capturing loss of information both due to missing data and aggregation of patterns. We also performed an experiment where we assume the e permodel is able to capture the engineering costs reflected in C fectly, that is, ca ¼ ~ca and cr ¼ ~cr and the only difference between e is the unknown vector of beta multipliers fdb g t . C and C t a2A These results are shown in Table 3. We show in the tables the best improvement ratio and the scaling factor corresponding to this ratio and number of iterations to convergence of the algorithm for this scaling factor. In the case where numerical patterns are generated at O–D level and represented with no aggregation, the maximum improvement ratio is 0.981. It should be noted that since our algorithm is not guaranteed to converge to the global optimum, these results speak well of the quality of the initial starting solution. These results show that our methodology as applied with our choice of initial feasible solution is successful in its ability to capture the combinatorial effects of the order statistics. In Fig. 2, we represent the improvement ratio as a function of different data scenarios when the numerical patterns are represented at the O–D level, but are generated at the O–D–T–ID level. In the case of perfect engineering costs we plot this in Fig. 3. We see that the improvement ratio is not monotone in the scaling factor as is indicated by its maximum at h ¼ 50. This is because the loss of information in aggregating patterns does not justify increasing h indefinitely to obtain a better improvement ratio.

Table 1 Experimental scenarios Data behavior

Numerical patterns O–D–T–ID

O–D–T

O–D

O–D–T–ID O–D–T O–D

Yes – –

Yes Yes –

Yes Yes Yes

Table 2 Imperfect engineering costs Different data scenarios for different levels of aggregation Data behavior

O–D–T–ID O–D–T O–D

(Improvement ratio, scaling factor, iterations) O–D–T–ID

O–D–T

O–D

(0.957, 5000, 1) – –

(0.178, 50, 2) (0.940, 5000, 1) –

(0.131, 50, 11) (0.713, 500, 5) (0.876, 5000, 1)

Table 3 Perfect engineering costs Different data scenarios for different levels of aggregation

O–D–T–ID O–D–T O–D

(Improvement ratio, scaling factor, iterations) O–D–T–ID

O–D–T

O–D

(1.00, 10,000, 1) – –

(0.144, 50, 1) (1.00, 10,000, 1) –

(0.126, 50, 12) (0.808, 1000, 6) (0.981, 5000, 2)

1

Improvement Ratio

Data behavior

0.8 0.6 0.4 0.2 0 0

1

5

10

50

100

500

1000

5000

10000

Scaling factor Databehavior-O-D-T-ID

Databehavior-O-D-T

Databehavior-O-D

Fig. 2. Improvement ratio where numerical patterns are represented at O–D level: imperfect engineering costs.

57

A. Marar, W.B. Powell / European Journal of Operational Research 197 (2009) 50–58

Improvement Ratio

1 0.8 0.6 0.4 0.2 0 0

1

5

10

50

100

500

1000

5000

10000

Scaling factor Databehavior-O-D-T-ID

Databehavior-O-D-T

Databehavior-O-D

Fig. 3. Improvement ratio where numerical patterns are represented at O–D level: perfect engineering costs.

6. Conclusion We present a new methodology for representing information using the concept of pattern regression in resource allocation models. Such problems arise in a freight logistics operation, for instance, the locomotive scheduling operation where locomotive power is assigned to trains. We see that the problem of representing information in a historical database pertaining to observed flows or quantities derived from these flows is closely related to the empirical distribution function (EDF) of the decision variables. We adopt the Cramer– Von Mises goodness-of-fit test statistic for continuous distributions to develop our methodology of representing information in resource allocation models. The functional form known as the pattern metric that we use to represent information is combinatorial in nature due to the influence of the order statistics of the quantities. We solve this global optimization problem as a sequence of models without any explicit use of constraints specifying the order statistics. We prove convergence of this algorithm to a stationary point in a finite number of iterations. We tested our methodology on a real-world data set representing the operations of a large freight railroad. Preliminary results show that the algorithm has potential in representing information pertaining to numerical patterns. A suggestion for future work is the application of our research in a real-world optimization model such as a locomotive scheduling model for a large freight railroad.

Lemma 6.1. Given a vector fxk gk2f1;2;...;ng whose elements are all strictly positive and a set Un of all possible permutations of the vector f1; 2; . . . ; ng, if we have a new vector of order statistics fyðkÞ gk2f1;2;...;ng whose elements are also strictly positive, then:

/2Un

k¼1

yðkÞ x/ðkÞ ¼

¼ yð1Þ x1 þ yð2Þ x2  yð1Þ x2  yð2Þ x1 ¼ yð1Þ ðx1  x2 Þ  yð2Þ ðx1  x2 Þ ¼ ðyð1Þ  yð2Þ Þðx1  x2 Þ P 0; since ðyð1Þ  yð2Þ Þ 6 0 by definition of order statistics and ðx1  x2 Þ 6 0 by our assumption. Thus, the lemma is true for the case m ¼ 2, that is, we have

max /2U2

2 X

yðkÞ x/ðkÞ ¼

k¼1

2 X

yðkÞ xðkÞ :

k¼1

Using the induction hypothesis we assume that the lemma is true for m ¼ n, that is

max /2Un

n X

yðkÞ x/ðkÞ ¼

k¼1

n X

yðkÞ xðkÞ :

k¼1

Consider the case m ¼ n þ 1. Without loss of generality we assume ynþ1 P yj ; 8j 2 f1; 2; . . . ; ng and xnþ1 P xj ; 8j 2 f1; 2; . . . ; ng so that xðnþ1Þ ¼ xnþ1 and yðnþ1Þ ¼ ynþ1 . Consider the following quantity: nþ1 X

yðkÞ x/ðkÞ ;

8/ 2 Unþ1 :

k¼1

The section relates to the content in Section 4 of this paper. We state and prove the following lemma:

n X

ðyð1Þ xð1Þ þ yð2Þ xð2Þ Þ  ðyð1Þ x2 þ yð2Þ x1 Þ

Z nþ1 ð/Þ ¼

Appendix

max

number of instances is 2. In this case there are only two permutations possible for the vector x, fx1 ; x2 g and fx2 ; x1 g. Without loss of generality we can set the vector of order statistics fxð1Þ ; xð2Þ g to fx1 ; x2 g. We have

n X

yðkÞ xðkÞ ;

k¼1

We define the following function:

Z nþ1;1 ð/Þ ¼

n X

yðkÞ x/ðkÞ þ yðnþ1Þ xðnþ1Þ ;

8/ 2 Un :

ð16Þ

k¼1

Consider swapping xnþ1 with any xj (or x/ðjÞ ), j 2 f1; 2; . . . ; ng in Eq. (16). This procedure can be expressed using a function as shown below:

Z nþ1;2 ð/; jÞ ¼

n X

! yðkÞ x/ðkÞ

þ yðnþ1Þ x/ðjÞ  yðjÞ x/ðjÞ þ yðjÞ xðnþ1Þ ;

k¼1

where /ðkÞ is the kth element of the permutation vector / 2 Un and fxðkÞ gk2f1;2;...;ng represents the vector of ordered statistics for fxk gk2f1;2;...;ng . Proof. Let / 2 Un be a generic permutation vector of f1; 2; . . . ; ng. We show our proof by induction. For the case where m ¼ 1 the statement in the lemma is trivial. Consider the case where the

8j 2 f1; 2; . . . ; ng;

8/ 2 Un ;

Z nþ1;2 ð/; jÞ evaluates Z nþ1 for all possible permutations of the set Unþ1 such that Z nþ1 does not contain the term yðnþ1Þ xðnþ1Þ . Z nþ1;1 ð/Þ evaluates Z nþ1 for all possible permutations of the set Unþ1 such that Z nþ1 consists of the term yðnþ1Þ xðnþ1Þ . We have

58

A. Marar, W.B. Powell / European Journal of Operational Research 197 (2009) 50–58

Z nþ1;1 ð/Þ  Z nþ1;2 ð/; jÞ ¼ yðnþ1Þ ðxðnþ1Þ  x/ðjÞ Þ  yðjÞ ðxðnþ1Þ  x/ðjÞ Þ ¼ ðyðnþ1Þ  yðjÞ Þðxðnþ1Þ  x/ðjÞ Þ P 0;

8j 2 f1; 2; . . . ; ng;

8/ 2 Un ;

by definition of the order statistics. From the above equation we have

max Z nþ1 ð/Þ ¼ maxfZ nþ1;1 ð/Þ; max Z nþ1;2 ð/; jÞg /2Un j¼f1;2;...;ng ! n X yðkÞ x/ðkÞ þ yðnþ1Þ xðnþ1Þ ¼ max Z nþ1;1 ð/Þ ¼ max

/2Unþ1

/2Un

¼

nþ1 X

/2Un

k¼1

yðkÞ xðkÞ :

k¼1

Thus, the lemma is true for m ¼ n þ 1 and by the induction hypothesis it is true for all m ¼ n where n 2 f1; 2; . . .g. h

References Ahuja, R., Orlin, J., 2001. Inverse optimization. Operations Research 49 (5), 771–783. Birge, J., 1997. Stochastic programming computation and applications. INFORMS Journal on Computing 9 (2), 111–133. Devroye, L., Gyorfi, L., Lugosi, G., 1996. A Probabilistic Theory of Pattern Recognition. Springer. Frank, M., Wolfe, P., 1956. An algorithm for quadratic programming. Naval Research Logistics Quarterly 3, 95–110. Pardalos, P., 1989. Parallel search algorithm in global optimization. Applied Mathematics and Computation 29, 219–229. Read, T., Cressie, N., 1988. Goodness-of-Fit Statistics for Discrete, Multivariate Data. Springer-Verlag, New York. Vanderbei, R., 1999. An interior point code for quadratic programming. Optimization Methods and Software 11 (2), 451–484.