Volume 2, Number 5
OPERATIONS RESEARCH LETTERS
December 1983
ON THE NUMBER OF ITERATIONS OF LOCAL IMPROVEMENT ALGORITHMS Craig A. TOVEY School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, USA Received June 1983 Revised September 1983
A general model of local improvement algorithms in combinatorial optimization accurately confirms performance characteristics often observed in individual cases. The model predicts exponentially bad worst case and low order polynomial average run times for single optimum problems including some linear complementarity problems and linear programming. For problems with multiple local optima, most notably those that are NP-complete, average speed is linearly bounded but accuracy is poor. Local improvement, heuristic, linear programming, complementarity, average performance, combinatorial optimization
I. Introduction Local improvement is a method widely used in combinatorial optimization problems. Well-known examples include hill climbing in numerous artificial intelligence applications [28], principal pivoting methods for LCP [4,20], r-opt procedures in the travelling salesman problem [14,19], the simplex method for linear programming [7], etc. Experience with and analysis of these methods reveal some curiously strong consistencies of behavior from problem to problem. Chief among these is a very fast average running time coupled with an exponentially bad worst case performance [5,11,7,15,28]. The most celebrated example of this is the simplex method, and considerable recent progress has been made in analyzing its average case performance [1,2,8,12,18,23]. The author was supported by the New Faculty Research Development Program of the Georgia Institute of Technology. This work is based on the author's Ph.D. thesis, performed under George Dantzig at Stanford 1978-81, at the Systems Optimization Laboratory. While at Stanford, research was supported in part by Department of Energy Contract AM03-76SF00326, PA #DE-AT03-76ER?2018; Office of Naval Research Contract N00014-75.C-0267; National Science Foundations Grants MCS76-81259, MCS-7926009 and ECS-8012974; and Army Research Office Contract DAS29-79-C-0110. Reproduction in whole or in part is permitted for any purpose of the U.S. government.
Our aim here is to introduce a general abstract model of local improvement which may contribute considerably to our understanding of the entire class of local improvement algorithms. The model, presented in Section 2, is novel in that we study distributions of problem structure, rather than distributions of problem data. This seems natural and intuitive, and, moreover, frees our model from questions concerning precision of the data, roundoff error, etc. Our model proceeds from an arbitrary problem instance and extracts the structural properties related to the performance of local improvement algorithms. The principal result is that the structures which correspond to 'bad' performance are combinatorially so rare as to be pathological; furthermore, a variety of natural probabilistic assumptions lead to the s~me average (roughly linear) efficiency. The analytical results and the simulation evidence that support this are described in Section 3. In Section 4, we discuss the advantages and disadvantages of our model.
2. A model of local improvement algorithms
2.1. Optimal adjacency algorithm As our basic model we consider the problem of maximizing a real valued function f whose domain
0167-6377/84/$3.00 © 1984, Elsevier Science Publishers B.V. (North-Holland)
231
Volume 2, Number 5
OPERATIONS RESEARCH LETTERS
is the set of vertices of the n-cube. We assume here for simplicity that all the values of f are distinct, The domain of the function can be thought of as a set of boolean decision variables: many optimization problems may be cast in this form [3]. There is a natural notion of distance between two vertices of the n-cube: the number of components in which they differ. This distance is a metric and is known as the Hamming distance. If x and y are at a distance of zero, then x - y; if x and y are at a distance of one, they share an edge and are said to be adjacent or neighbors. A vertex whose function value is greater than any of its n neighbors is called a local maximum. If f has the property that a local maximum is a global maximum we say that / is Local-Global or LG for short. The LG property is of course reminiscent of the property that a local minimum of a convex function is a global one; see also [9]. If f is LG, a local improvement algorithm will solve the problem of maximizing f over the stated domain. A natural implementation of local improvement is the Optimal Adjacency (OA) algorithm: 0. Start with any vertex x. 1. If x is locally optimal, stop with x the solution. Otherwise proceed to 2. 2. Let y be the optimal vertex adjacent to x. Set x equal to y and go to 1. Since the domain is finite and has only one local optimum, the algorithm must terminate after finitely many steps with the globally optimal solution. We next define some graphical structures that will be useful in analyzing the performance of the algorithm. 2.l OATs If we are given a particular LG function/, we can construct a directed tree to show how many iterations the optimal adjacency algorithm will require: (i) Each vertex of the n-cube corresponds to a node of the tree. (ii) The father of a vertex is its optimal adjacent vertex; if a vertex is a local optimum, it has no father. The tree is called an Optimal Adjacency Tree, or OAT. Its root is the local (hence globally unique) optimum. The OAT displays the path followed by the algorithm by going from son to father on the 232
-~00~01 10",,~11
December 1983
10~ 11~1~01
100
\
010
001
/\
110 101 011 I 111
Fig. 2A.
tree (note that OATs are a special class of trees: not all trees are OATs). We emphasize that for any instance of any local-global problem there is a unique OAT which describes the action of the OA algorithm on that instance. Figure 2A shows two possible OATs for n - 2, and a possible OAT for n--3.
The set of OATs of order n has a 2"-fold symmetry. Any of the 2" vertices could be the root; we assume for notational convenience that the origin, 00...0, is the root. Suppose that we were going to run the OA algorithm on a particular instance of a problem, and we knew that this instance had the structure for n = 3 shown in Figure 2A. How many iterations would we expect the algorithm to take? If the starting vertex is chosen at random, there would be an equal probability of starting at each of the 8 vertices. In general for each starting vertex, the path to the root in the OAT is by definition the path the OA algorithm will follow. Thus the height or pathlength of each vertex in the tree is the number of iterations the algorithm would need to reach the optimum from that vertex. The mean pathlength of the tree (the mean of the pathlengths of all the nodes in the tree) is precisely equal to the expected number of iterations of the OA algorithm. Thus for a problem which has the structure of that tree, the OA algorithm would be expected to take (1 x l x 3 x 2 + 3 x 3 + 1 x 4 ) / 8 = 2 ½ iterations. If f is not LG, the rules for producing the OAT will instead produce an OAF, or Optimal Adjacency Forest, with one tree per local optimum. The height of a tree is the maximum height of its vertices. In [24] we show that there exist OATs that are exponentially high, so any strict bound on the worst-case performance of the OA algorithm must be exponential in n. It is therefore of interest to ask, what is the expected performance of the optimal adjacency algorithm? Or, equivalently,
Volume 2, Number 5
OPERATIONS RESEARCH LETTERS
what is the expected mean pathlength of an OAT of order n? We shall formalize this question under each of several reasonable probability distributions of OATs. The remarkable fact, which is the main point of this paper, is that the 'bad' cases are combinatorially exceedingly rare. That the expected mean pathlength is polynomial in n turns out to be quite insensitive to the distributional assumptions. 2.3. Three distributions on problem structure The LG distribution. When we construct an OAT from the function f, we are not interested in the specific numeric values of f, but in the ordering of values of f on the vertices. Since functional values are distinct, the vertices can be uniquely ordered from high to low function value. Such a list of vertices is called an ordering. Since we are interested in structural properties and not specific numeric values, for our purposes the ordering of the vertices defines f. If the ordering were {00, 10, 01, 11 } or {00, 10, 11, 01 } then it would produce the leftmost OAT in Figure 2A. The ordering {00, 11, 01, 10} produces an OAF. If an ordering produces an OAT it is said to be an LG ordering. The first distribution we shall consider is that all LG orderings are equally likely to occur. We call this the LG distribution. The boundary distribution. An obvious necessary and sufficient condition that an ordering be LG is that every vertex except the first has an adjacent vertex that is located higher up in the ordering. We restate this condition in an equivalent but more useful way. If S is a subset of the vertices of the n-cube, we define the boundary of S to consist of all vertices x such that x is not in S and such that for some y in S, x and y are adjacent. An ordering is LG iff for all i, the i + 1st vertex is in the boundary of the first i vertices. The following procedure therefore recursively enumerates all LG orderings.
PROCEDURE ENUMERATE (When this procedure is called it is passed the values of n, i, and the first i values of an array A[1,...,n] of vertices): 0. Begin 1. If i - n, output A and go to 5. 2. Compute the boundary, B, of A[1],...,A[i]. 3. S e t i , - i + 1.
December 1983
4. For each member x of B, Do Begin Set A[i],= x Call procedure enumerate End 5. End If we want to produce on LG ordering randomly, we change Step 4 to 'For some member x of B'. Then at each step of the random generation process, a vertex x is selected from the boundary, assigned a father, and attached to the tree. How we select x determines the underlying distribution. We could let each member of the boundary have an equal change of being chosen: the distribution is called the 'boundary' distribution. The ideas of this section are best visualized in terms of a receding flood. Picture the problem as a mountain, which for the moment is covered by the flood waters of randomization. As the flood waters recede, the first point revealed is the optimum. The next point uncovered must be a neighbor of the optimum (else it would be a local optimum). After i points have been revealed, we know that the next point must be in the boundary of the first i points. Under the boundary distribution, these boundary members have an equal chance of being uncovered next. Note: the boundary and LG distributions differ because the size of the boundary at stage i varies depending on what the previous choices have been. Thus, an ordering that has an unusually large boundary in the early stages would be more likely to occur under the LG distribution than under the boundary distribution. The coboundary distribution. Every boundary member has at least one neighbor that has already been selected, but some have more such neighbors than others (n at most, since each vertex has n neighbors). An alternate criterion would be to give each boundary member a weight proportional to the number of chosen neighbors it has. Thus, vertices with more chosen neighbors are more likely to be chosen themselves. This distribution is called the coboundary distribution; it may seem preferable to the boundary distribution if one can judge a vertex by its neighbors (not an unreasonable supposition for an LG problem). BATs. The OA algorithm always chooses the best neighbor to go to. If we relax this condition, and only require that the algorithm proceed from 233
Volume 2, Number 5
OPERATIONSRESEARCHLETTERS
oo
I ol I 1!
I lO
December 1983
both OATs and BATs. Moreover, the three distributions are virtually indistinguishable. The 'bad' cases evidently are so rare that the expected mean pathlength is quite insensitive to distributional assumptions.
Fig. 2B.
a vertex to a better adjacent vertex, we get the Better Adjacency (BA) algorithm: BA (Better Adjacency) Algorithm 1. Start at a random vertex x. 2. Search through x's neighbors until a better one, y, is found or all neighbors have been tried. In the former case set x equal to y and iterate (2); in the latter case stop with x optimal. This algorithm is valid for the same reasons as the OA algorithm; its representation is the Better Adjacency Tree, or BAT. Given any LG ordering, we can pick a higher valued neighbor for each vertex (except the origin) and make that neighbor its father. The resulting tree (it is a tree since it has no cycles and there is a path connecting all nodes to the origin) is a BAT. BATs are a less restricted class than OATs (for instance the three in Figure 2B is a BAT but not an OAT), and are therefore somewhat easier to analyze. Since only one OAT can be generated from an ordering, the expected mean pathlength of an OAT from, say, the boundary distribution is well defined. This is not true for BATs because usually many BATs can be generated from a single ordering. To resolve the ambiguity, we adopt the convention that a BAT is to be randomly generated from an ordering by choosing from among the possible fathers with equal probability, for .each vertex. We can now discuss the expected mean pathlengths of OATs and BATs from the coboundary and other distributions.
3. Results of the model
3.1. Average mean pathlength On the suggestion of Donald Knuth [17], simulations of the three distributions defined in Section 2 were performed. The significant and surprising result is an apparently linear mean pathlength of 234
Conjecture 1. The expected mean pathlengths of both OATs and BATs are [~(n) and less than n under all three distributions: boundary, coboundary, and LG. Theorem 2. The expected number of iterations of the OA or the BA algorithm with respect to the boundary distribution is less than en e. (Here e .is the logarithmic constant.)
Theorem 3. The expected number of iterations of the BA algorithm with respect to the coboundary distribution is less than 2 enlog ~. For OATs the expected mean pathlength is less than 2en' log n. For a proof, see [24,25]. These bounds may be generalized to a class of distributions.
Theorem 4. Suppose that for some polynomial p( n ) the distribution on LG orderings satisfies
_l~ ( n )
Prob[V,+,f x l V f v ] < , . i j ~ ( v ) I Vi, v V x ~ B ( v ) . Then the expected mean pathlength of the BAT is less than 2 enp( n ) log n. Here Vj is a random variable that equals the jth vertex in the ordering, V denotes { Vi,. .., V~} and v denotes a list of the first i vertices in the ordering. Note that Theorem 4 reduces to the first part of Theorem 3 when p ( n ) - 1 . The other bounds in Theorems 2 and 3 may be g~eneralized similarly.
3.2. The simplex method The simplex method of Dantzig [7] proceeds from basic solution to neighboring basic solution by adding one column to the basis and removing another one at each iteration. Let n be the number of variables and let m be the number of constraints. Then we can model the simplex method by restricting the domain of the subset of hypercube vertices that have m components equal to 1 and defining two vertices as adjacent if they differ in exactly two components. As before, simulation
Volume 2, Number 5
OPERATIONS RESEARCH LETTERS
December 1983
results indicate an average performance that is logarithmic in the number of points in the space. The model continues to accurately predict performance (see Section 4). A proof of results similar to Theorems 2-4 requires a lower bound on the cardinality of the boundary or coboundary. No such bound is known, though Kleitman and West [16] have conjectured the construction of the minimal coboundary sets and verified their conjecture for small values of m and n. Employing their construction leads to the following result [24,25]:
Definition. If, for a local improvement algorithm, the vertices of the hypercube are assigned neighbors in a way that is independent of individual instances, then we say the algorithm involves a data-independent adjacency scheme. If, in addition, the maximum number of neighbors a vertex can have is polynomially bounded, the adjacency scheme is said to be reasonable.
Theorem 5. Suppose m < n/2, and the minimal
the expected number of iterations of any local improvement algorithm is less than e(a + 2)p(n), where p ( n ) bounds the number of neighbors.
coboundary conjecture holds. Then the expected height of a BAT under the coboundary distribution is less than 2emlog n. For m > n/2 the bound is the same as for n - m.
Corollary 6. Under the hypotheses of Theorem 5, the expected number of iterations of a Simplex-type BA algorithm is less than 2emlog n. 2.3. Problems with
multiple local optima
Specific examples of this type include local search in integer programming, r-opting for the travelling salesman problem and hill climbing algorithms in several artificial intelligence applications. In general, these algorithms display two characteristics: speed and inaccuracy. Our model correctly predicts this behavior. First we consider the speed of the method. Proofs of the results in the remainder of this section are found in [26]. Theorem 7. Suppose the ratio of probabilities of
occurrence satisfies
Prob[ v]/Prob[ v'] -.< 2"" for all orderings v, v'. Then the expe~':ed number of iterations of any local improvement algorithm is less than ( a + 2) en.
These results apply to any local improvement algorithm. Even a stupied rule such as least positive improvement will run quickly. Note that a need not be a constant but can be any polynomial in n. We can also give a very general result which applies to any data.independent local improvement scheme where the number of neighbors is polynomially bounded.
Theorem 8. For any reasonable adjacency scheme and any probability distribution satisfying
Prob[ v ]/Prob[ v'] ~<2 ~" V orderings v, v'
We remark that Theorems 7 and 8 may easily be adjusted to other domains such as the space of all permutations on n objects. Also, the bounds in Theorems 2-4 apply to large classes of distributions on ordering related to the boundary and coboundary distributions [26]. We now consider the question of how many local optima exist. Theorem 9. Suppose that for all orderings w, w' the
ratio of probabilities satisfies P[ w]/P[w'] <~k. Then the expected number of local optima is at least 2"/(kn + 1).
Local improvement does not appear to be a likely method to solve NP-hard optimization problems exactly. For instance, Papadimitriou and Steiglitz [21] have shown 'exact' search for the travelling salesman problem to be NP-complete, and have constructed instances of the travelling salesman problem with exponentially many 3-optimal tours that are not globally optimal [22]. This is not surprising, since LG problems are in NP A
co(NP): Proposition 10. Suppose that a discrete optimization problem, {max f ( x ) I x ~ X}, is LG with respect to a reasonable definition of adjacency. Then its set recognition version (given an instance and a number k, does there exist an x G X such that f ( x ) is at least k?) is in NP A co(NP). We show that a particular NP.complete problem is not LG. 235
Volume2, Number5
OPERATIONSRESEARCHLETTERS
Theorem I1. The cfique problem is not LG under any reasonable definition of adjacency. Theorem 12. In the clique problem, for any dataindependent adjacency scheme where each element has n or fewer neighbors there exists a class of instances with exponentially manv local optima. Note. Theorem 11 is proved directly by playing the 'adversary' against the adjacency rule; Theorem 12 is proved nonconstructively by employing the probabilistic method of Erdos and Spencer [10]. We believe that all NP-complete problems can have exponentially many local optima; unfortunately we cannot expect an easy proof of this conjecture because it would imply that P ¢ NP. However, we have found that the standard transformations between known NP-complete problems usually preserve enough of the 'topology' that results carry over quite easily from one problem to another. For instance, the following are easy corollaries to Theorems 11 and 12: Corollary 13. The Boolean maximization problem is not LG under any reasonable adjacency rule. Corollary 14. For any data independent adjacencv rule that assigns n neighbors to each point there exist instances of the Boolean maximization problem with exponentially many local optima.
4. Discussion of the model
A good model should be simple and consistent. It should conform with experience of reality and be useful in making predictions about unknowns. Our model has all of these characteristics. Despite its generality it is surprisingly accurate. The main points are, of course its predictions of worst-case and average-case performance, and of the large number of local optima in non-LG problems. In addition, a comparison of the OAT and BAT simulation data suggests an average cost of about 15~ more iterations with better instead of optimal adjacency. This is consistent with experimental results in integer programming [13]. When the model is modified to apply to the Simplex Method for linear programming, the difference between OATs and BATs increases to about fifty percent 236
December1983
[24], which is consistent with experimental results of 38 to 67 percent found in [6]. The model is also consistent with reality in that it permits degeneracy. Although the assumption of distinct function values is used in Section 2 for convenience, all that is technically necessary is that there exist some way of breaking ties (between function values of adjacent points) which prevents cycling, e.g. lexicographic ordering [7]. Other models do not permit degeneracy assumptions, yet most real LPs are degenerate. The model has been used to analyze an experimental pivot rule for the Simplex Method. The predictions are within a few percent of actual experimental performance [27], thus providing further verification. In abstracting features shared by local improvement algorithms in general, some specific characteristics are lost. For example, in a primal simplex approch to LP, any ordering that occurs, must be LG in reverse as well as LG, since maximizing - f is also an LG problem. Another possible inconsistency is that there exists a class of orderings that produce only trees with exponentially large mean pathlength. In Simplex Method terms, no choice of pivots with improving objective value can reach the optimum of such a problem in polynomially many steps. We doubt that any class of such instances exists for the simplex method. These difficulties, it appears to us, are minor. A stronger objection to the model might be that the distributions are not based on direct probabilistic assumptions on raw problem data. This is true, but it ;~ not a valid objection. There are three points that should be made. First, we are addressing a generally occurring phenomenon which calls for a general explanation. It is our feeling that what makes the simplex method work well is not intrinsically different from what makes general local improvement work well. We feel therefore that these issues are best addressed in a general framework. The second point is that this analysis is not inconsistent with analysis based on data assumptions. For example, for a particular problem and assumption about the data, it may be easier to derive a bound on the ratio of probabilities of occurrence of two 0rderings than to directly derive a bound on algorithm performance. Theorems 4 or 7 would then apply to yield an average performance bound for the problem.
Volume 2, Number 5
OPERATIONS RESEARCH LETTERS
Third, there are some real difficulties with distributional assumptions on input data. Our under-
standing of what a random problem is, is incomplete and unsatisfactory. Plausible assumptions often have implications frustratingly at odds with observed fact. For instance, all of the papers about the expected performance of the Simplex Method mentioned in Section 1 make assumptions that imply that the probability of degeneracy is zero. Yet the observed incidence of degeneracy in LP problems is close to 100~ [7]. The same difficulty often occurs with respect to special structure in the constraint matrix, where again it has been observed that the great majority of problems occurring in practice exhibit considerable structure. We fit real optimization problems into our models so as to solve them, but the problems do not arise as random instances of a model. Real problems have limited numerical precision; they have their own correlations and internal structure. A possible resolution of this difficult situation is indicated by the results of this paper. We suggest that there is considerable significance in the apparent insensitivity of the average mean pathlength with respect to the randomness assumptions. The plain fact is that the bad case structures are so exceedingly rare that, almost independent of how the instances arise, hill climbing tends to be fast. We think this is the true (underlying) explanation of its effectivehess.
Acknowledgements
This work is based on and extends my thesis on local improvement algorithms. I would like to thank my advisor, George B. Dantzig, for his invaluable guidance and encouragement. The proof of Theorems 4.1-4.3 comes from an idea by Jeffrey UIlman. l thank him and the other member of my reading committee, Richard Cottle, for their help and comments. I am particularly indebted to Frank Heartney for many ideas and discussions, and to Persi Diaconis for stimulating interest in the e/k conjecture. I benefited also from discussions with David Aldous, John Bartholdi, L.A. Brown, Charles Fay, Frederick Hillier, A.J. Hoffman, Joseph Keller, D.J. Kleitman, Donald Knuth, Chrisms Papdimitrou, Jim Pitman, Richard Stone, and Doug West. I would also like to acknowledge the very
December 1983
helpful suggestions of the referee which improved the presentaion and focus of the paper.
References [1] 1. Adler, "The expected number of pivots needed to solve parametric linear programs and the efficiency of the selfdual simplex method", Department of Industrial Engineering, University of California, Berkeley, California, June 1983. [2] K.-H. Borgwardt, "Some distribution-independent results about the asymptotic order of the average number of pivot steps of the simplex method", Math of Operations Research 7, 441-462 (1982). [3] S.A. Cook, "The complexity of theorem proving procedures", Proc. 3rd Annual ACM Symposium on Theory of Computing, 151-158 (1971). [4] R.W. Cottle and G.B. Dantzig, "Complementary pivot theory of mathematical programming", Linear Aig. Appl. I, 103-125 (1968). [5] R.W. Cottle (1978). "Observations on a class of nasty linear complementarity problems", Technical Report 78-34, Department of Operations Research, Stanford University. [6] L. Cutler and P. Wolfe (1963). "Experiments in linear programming", Report No. RM-3402, The Rand Corporation, Santa Monica, CA. [7] G. Dantzig, Linear Programming and Extensions, Princeton University Press, New Jersey, 1963. [8] G. Dantzig, "Expected number of steps of the simplex method for a linear program with a convexity constraint", Department of Operations Research, Stanford University, 1980. [9] P.M. Dearing, R.L. Francis and T.J. Lowe, "Convex location problems on tree networks", Operations Research 24, 628-642 (1976). [lOi P. Erdos and J. Spencer, Probabilistic Methods in Combinatorics, Academic Press, New York, 1974. [111 Y. Fathi, "Computational complexity of LCPs associated with positive definite symmetric materices", Mathematical Programming 17, 335-344 (1979). [12] M. Haimovich, "The simplex method is very good: On the expected number of pivots and related properties of random linear programs", 415 Uns Hall, Columbia University, New York, 10027, April 1983. [131 F.S. HiUier, "Efficient heuristic procedures for integer linear programming with an interior", Operations Research, 17, 600-637 (1969). [14) B. Kemighan and S. Lin, "A heuristic algorithm for the travelling salesman problem", Bell Labs CSTR # 1, 1972. [151 V. Klee and G. Minty, "How good is the simplex algorithm?", in: Shisha, ed., inequaliaies III, Academic Press, New York, 1971. [161 D.J. Kleitman, "Hypergraphic extremal properties", 1979. [17) D. Knuth, Class Project for CS144b, "Algorithms and data structures", 1979. PSI T. Liebling, "On the number of iterations of the simplex method", Methods of Operations Research XVii, 248-264 (1973). 237
Volume 2, Number 5
OPERATIONS RESEARCH LETTERS
[19] S. Lin, "Computer solutions of the travelling salesman problem", Bell System Technical Journal 44, 2245-2269 (1965). [20] K. Murty, "A note o.i a bard type scheme for solving the linear complementarity problem", Opsearch II, 123-130 (1974). [21] C.H. Papadimitriou and K. Steiglitz, (1977), "On the complexity of local search for the travelling salesman problem", SIAM J. Comput. 6, 76-83. [22] C.H. Papadimitriou and K. Steiglitz, "Some examples of difficult travelling salesman problems", Operations Research 26 (3), 434-443 (1978). [23] C. Smale, "On the average speed of the simplex method of linear programming", University of California, Berkeley, 1982. [24] C. Tovey, "Polynomial local improvement algorithms in
238
December 1983
combinatorial optimization", Systems Optimization Laboratory Technical Report SOL 81-21, Department of Operations Research, Stanford University, 1981. [25] C. Tovey, "Low order bounds on the expected number of iterations of local improvement algorithms", Georgia institute of Technol,~y Technical Report Series No. J-82-6, 1982. [26] C. Tovey, "Hill c! imbing with multiple local optima", to appear in SIAM J. Algebraic Discrete Methods. [27] C. Tovey, Researc~l performed for George Dantzig at the Systems Optimization Laboratory, Stanford University, 1979. [28] P. Winston, Artificial Intelligence, Addison-Wesley, Massachusetts, 1977.