Anatomy of the fitness landscape for dense graph-colouring problem

Anatomy of the fitness landscape for dense graph-colouring problem

Swarm and Evolutionary Computation ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Q1 15 Q3 16 17 18 19 20 21 22 23 24 25 Q6 26 27 28 29 30 31 32 3...

2MB Sizes 1 Downloads 54 Views

Swarm and Evolutionary Computation ∎ (∎∎∎∎) ∎∎∎–∎∎∎

1 2 3 4 5 6 7 8 9 10 11 12 13 14 Q1 15 Q3 16 17 18 19 20 21 22 23 24 25 Q6 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

Contents lists available at ScienceDirect

Swarm and Evolutionary Computation journal homepage: www.elsevier.com/locate/swevo

Regular Paper

Anatomy of the fitness landscape for dense graph-colouring problem M.-H. Tayarani-N., Adam Prügel-Bennett University of Southampton, SO17 1BJ Southampton, UK

art ic l e i nf o

a b s t r a c t

Article history: Received 3 April 2014 Received in revised form 26 November 2014 Accepted 14 January 2015

Graph-colouring is one of the best-known combinatorial optimisation problems. This paper provides a systematic analysis of many properties of the fitness landscape for random instances as a function of both the problem size and the number of colours used. The properties studied include both statistical properties of the bulk of the states, such as the distribution of fitnesses and the auto-correlation, but also properties related to the local optima of the problem. These properties include the mean time to reach the local optima, the number of local optima and the probability of reaching local optima of a given cost, the average distance between global optima and between local optima of a given cost and the closest local optimum, the expected cost as a function of the distance from a configuration and the fitness– distance correlation. Finally, an analysis of how a successful algorithm exploits the fitness distance correlation is carried out. & 2015 Elsevier Ltd. All rights reserved.

Keywords: Optimization problems Fitness landscape analysis Graph-colouring problem

1. Introduction This paper investigates the fitness landscape of graph-colouring for dense random graphs, concentrating in particular on the structure of local and global optima as a function of the problem size and number of colours. This allows us to characterise the behaviour of this problem as we move through the chromatic phase transition which is widely regarded as marking the transition between easy and hard problem instances. To undertake this investigation we have concentrated on problem instance up to around 100 vertices where it is possible to find the majority of low cost solutions. We have tended to concentrate on properties that we believe may be important for deciding between search algorithms and for designing new search strategies. There is a large number of such properties so as a result this paper is long. We believe that this reflects the difficulty of combinatorial search, since so many factors may be influential. In a previous paper, we performed a similar analysis on the maximumsatisfiability (MAX-SAT) problem [1]. The current paper is intended to be independent of the MAX-SAT paper, although interestingly there is a great similarity in many of the properties (as well as important differences), which we comment on in the conclusion. In one of our previous works we studied the landscape of different problems [2] and pointed out their differences and similarities. This paper in an extension of that paper which describes the graph-colouring problem in considerably more detail. The landscape of the graph-colouring problem has attracted the attention of many researchers. In the first major effort in understanding the landscape of the graph-colouring problem, [3] uses a branch and bound search algorithm to find the local optima, and studies some properties of the solutions (note that their notion of a local optimum differs from ours). The major obstacle

to this method is the problem size, where they can study graphs with up to 20 vertices. Hamiez et al. [4] have studied some properties of the solutions in the graph-colouring problem including the diversity of the configurations in a population of solutions. Culberson et al. [5] study a property of the landscape that is called the frozen set. The frozen set is a particular set of vertices that in every globally optimal configuration of the problem has the same colour class. Bouziri et al. [6] study the statistical properties of some benchmark graph-colouring problems. There are several properties of the fitness landscape that can be studied, such as the distribution of the fitness function, the number and distribution of local optima, the structure of the basins of attraction, the presence and structure of neutral networks, correlation between the quality of a solution and its distance to a local or global optimum and landscape ruggedness. Porumbel et al. [7] provide some evidence for the existence of clustering of good solutions in some graph-colouring problems, and describe an algorithm that might exploit this. Researchers have tried to propose some measures, representing the hardness of the problems. It is believed that presenting some measures for local ruggedness of a problem provides good indications about the problem difficulty. Some examples of such measures include auto-correlation [8] and fitness distance correlation [9,10]. Fitness distance correlation describes the relationship between the fitness functions and heuristic functions. These characteristics have been studied and several methods have been proposed to measure some of these properties [11,9]. However, it was quickly realised that it is easy to build problems where these measures do not reflect the problem difficulty [12]. Another active line of research has been to look at algebraic properties of landscapes and particularly elementary landscapes [13,14], which are shared by many well known NP-

http://dx.doi.org/10.1016/j.swevo.2015.01.005 2210-6502/& 2015 Elsevier Ltd. All rights reserved.

Please cite this article as: M.-H. Tayarani-N., A. Prügel-Bennett, Anatomy of the fitness landscape for dense graph-colouring problem, Swarm and Evolutionary Computation (2015), http://dx.doi.org/10.1016/j.swevo.2015.01.005i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101

M.-H. Tayarani-N., A. Prügel-Bennett / Swarm and Evolutionary Computation ∎ (∎∎∎∎) ∎∎∎–∎∎∎

2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

hard problems. Graph-colouring is an elementary landscape although MAX-SAT is not. Unfortunately these properties do not correlate with problem difficulty and therefore are not studied in this paper. The rest of this paper is organised as follows. In the next section we introduce the graph-colouring problem and describe some of the features required for interpreting the results in later sections. We also describe the local search algorithm we use to find the optima in the landscape. Section 3 describes some statistical properties of random configurations, auto-correlation, and some properties of global and local optima, including the number of steps a local-search algorithm takes to get to a local optimum, number of local and global optima and distance between the optima. In Section 4, we examine the expected fitness of configurations in Hamming spheres of different radii from a local optimum. We also consider the probability of returning to a local optimum starting from a randomly chosen configuration in the Hamming sphere. We draw conclusions including making comparisons with MAX-SAT in Section 5.

2. Graph-colouring problem In this section, we describe the graph-colouring problem and how we make the problem instances. We finish this section with a discussion about the local-search algorithm we use to find the local optima, and the way we distinguish different local optima from each other. 2.1. Problem definition The graph-colouring problem is a combinatorial optimisation problem which belongs to the class of NP-hard problems. Given an undirected graph GðV; EÞ, with a vertex (node) set V and edge set E, and k different colours, the graph-colouring problem is defined as finding a colouring of the vertices to minimise the number of edges whose vertices share the same colour. We denote a configuration of the graph-colouring problem with k colours as a vector x of size n ¼ j V j , with elements xi A f1; 2; …; kg representing the colour of the ith node. The cost of a configuration x is defined as the number of colour conflicts in the graph, i.e., the number of edges whose vertices have identical colours. That is X cðG; xÞ ¼ 1xi ¼ xj U; ð1Þ ði;jÞ A E

where 1predicateU denotes the indicator function that is equal to 1 if the predicate is true and 0 otherwise. The chromatic number, χ ðGÞ of a graph, G, is defined to be the smallest number of colours k such that a configuration exists that has no colour conflicts (i.e. a cost of zero). We consider the problem of finding low cost configurations (i.e. try to minimise the number of colour conflicts). In this paper, we concentrate on instances drawn from the ensemble of random graphs Gðn; pÞ, consisting of graphs with n vertices where each edge is drawn with a probability p. A graph is represented by an n  n adjacency matrix, G. We generate these problem instances randomly, where the probability of two nodes being connected (the probability of an edge existing) is p. Thus a graph is generated as Gði; jÞ ¼ ½Rð0; 1Þ o p, for i; j ¼ 1…n, i aj, where Rð .,. Þ is a uniform random number generator and ½predicate ¼ 1 if predicate¼true and ½predicate ¼ 0 if predicate¼false. We focus on the case p¼0.5, so that the graphs are dense (a dense graph is one where p remains fixed as n increases so the number of edges grow is of order n2. In contrast, in sparse graph p ¼ 1=n, so that the number of edges per vertex remains fixed). From preliminary investigations, we found that the greatest determiner of the structure of the fitness landscape is its proximity to the phase-transition. By investigating the landscape as a function of n and the number of colours, k, we are

able to characterise the behaviour around the phase-transition. Thus, we believe that the behaviour we report is typical of dense graphs at other values of p. 2.2. Colour symmetry and distance measures An important feature of graph-colouring is that if we permute all the colours, then the cost is unchanged. As there are k! permutations of the colours, there is a k!-fold symmetry in the search space. Graph-colouring can also be viewed as a partitioning problem, where we try to partition the vertices into k partitions so as to minimise the number of edges with vertices in the same partition. In this partition view of the problem we eliminate the k! symmetry of the problem. Although it is more logical to view graph-colouring as a partitioning problem, most algorithms treat the problem as a colouring problem. This reflects the fact that partitions are difficult to treat (e.g. it is non-trivial to determine whether two partitions are identical). In this paper, we have tried to accommodate both views of the problem: either as a colouring problem with a k!-fold symmetry or as a partition problem. As a consequence of these two views of the problem we consider two distance measures between configurations. The first is the Hamming distance defined as Dh ðx; yÞ ¼

n X

1xi a yi U:

ð2Þ

i¼1

The second measure is a measure of the ‘partition distance’ defined as Dp ðx; yÞ ¼ min Dh ðx; π ðyÞÞ; π

ð3Þ

where π ðÞ is a permutation operator that permutes the colours. The minimisation is over all possible permutations of the k colours. The partition distance measures the smallest number of reallocations of partition membership to make the partition represented by x into the partition represented by y. When the Hamming distance is small, it is often the same as the partition distance. In practice, we 3 can compute the partition distance in Oðk Þ by representing the colour matching as a linear assignment problem and using the Hungarian algorithm [15]. It is useful to understand the distribution of distances between random configurations. Note that this property depends only on the number of vertices, n, and number of colours, k, but is otherwise independent of the problem instances. For two randomly generated configurations the probability that the Hamming distance is equal to h is given by a binomial distribution      n 1 h 1 nh PðDh ðx; yÞ ¼ hÞ ¼ ; ð4Þ 1 k k h so that the expected Hamming distance between randomly chosen configurations is   1 ð5Þ EðDh ðx; yÞÞ ¼ n 1  : k We are not aware of an analytic formula for the probability distribution of partition distances between randomly drawn configurations. In Fig. 1b we show the probability distribution of distances between randomly chosen pairs of configurations. The partition distance is measured empirically by sampling. The average partition distances between the solutions in the search space for different values of n and k are shown in Fig. 1b. In this figure, the horizontal axis is k and the vertical axis is the expected partition distance divided by n. Note that the partition distance can be viewed as the minimum distance between a configuration x and a set of configurations π ðyÞ. The average partition distance is a result of two competing effects. As we increase k the

Please cite this article as: M.-H. Tayarani-N., A. Prügel-Bennett, Anatomy of the fitness landscape for dense graph-colouring problem, Swarm and Evolutionary Computation (2015), http://dx.doi.org/10.1016/j.swevo.2015.01.005i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

M.-H. Tayarani-N., A. Prügel-Bennett / Swarm and Evolutionary Computation ∎ (∎∎∎∎) ∎∎∎–∎∎∎

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

Fig. 1. (a) Probability of the distance between two randomly drawn configurations with length n¼ 100 and n¼ 1000 and with k ¼ 3 and k ¼ 10 colours. The partition distance is found by randomly sampling 106 pairs of configurations. (b) The averaged partition distance between random configurations for different n and k.

average Hamming distance between random configurations increases as nð1  ð1=kÞÞ. As a consequence initially the average partition distance increases. However, as k increases, the size of the permutation symmetry increases as k!; thus the likelihood of there being a closer configuration in the set of permuted configurations increases. As a result, the average partition distance begins to fall again. We can obtain very similar qualitative behaviour by considering the expected minimum Hamming distance between a configuration and a set of k! random configurations. 2.3. Chromatic number The chromatic number of a graph is the smallest number of colours needed to colour the graph with no colour conflicts (i.e. such that there exists a zero-cost colouring). Random graphs drawn from the ensemble Gðn; pÞ can have different chromatic numbers; however, it is found that the chromatic number tends to be highly concentrated. This is illustrated in Fig. 2a, where we show the proportion of graphs that are colourable with k, for graphs drawn from Gðn; pÞ with n¼ 10, 50 and 100. These results were found empirically, by randomly drawing samples and repeatedly running a heuristic search algorithm to find the best colouring. Although the heuristic is not guaranteed to find the optimal, by running it many times we can be fairly certain that the optimal has been found (see Section 3.5 for a justification of this assertion). Fig. 2b shows the proportion of the colourable graphs as a function of n and k. We observe two clear regions divided by a fairly steep transition region. This change in behaviour marks a “phase-transition”, which is typical of many constraint satisfaction problems. As we increase the number of constraints, these problems

3

Fig. 2. (a) The proportion of colourable graphs for different values of problem size n and number of colours k for p ¼0.5. The data are averaged over 1000 random problem instances. (b) The proportion of colourable graphs as a function of system size n and number of colours k. The data are averaged over 1000 random problem instances.

undergo a transition from a satisfiable phase to an unsatisfiable phase. The sharpness of the transition typically grows with the instance size. In the context of graph-colouring, each edge represents a constraint. For a fixed number of vertices and colours, graphs are colourable with a small enough number of edges. As we increase the number of edges, there are fewer ways to colour the graph. As we further increase the number of edges there comes a point where it is no longer possible to colour the graph without introducing a colour conflict. It is more common in graph-colouring to consider the graph (i.e. the number of constraints) as fixed and consider varying the number of colours. In this case, the unsatisfiable phase corresponds to the case of few colours, while the satisfiable phase corresponds to the case where we have many colours. For an ensemble of graphs the transition between colourable and uncolourable graphs is less well defined, although we could define the location of the transition to be the point where at least half the graphs are colourable. The location of the phase-transition as a function of n, p and k is unknown, although a lower-bound on the probability of a graph drawn from Gðn; pÞ being colourable is given by X

ð1  pÞni ðni  1Þ=2 ; ni ! i¼1 k

n! ∏

n A Λðn;kÞ

ð6Þ

where n ¼ ðn1 ; n2 ; …; nk Þ and Λðn; kÞ is the set of vectors, such that Pk i ¼ 1 ni ¼ n and ni Z 0 [16]. In Fig. 3, we show the lower-bound on the chromatic number versus n. We also show the minimum number of colours needed to colour random graphs drawn from Gðn; 0:5Þ, found by a modified version of hybrid-GA, which has long

Please cite this article as: M.-H. Tayarani-N., A. Prügel-Bennett, Anatomy of the fitness landscape for dense graph-colouring problem, Swarm and Evolutionary Computation (2015), http://dx.doi.org/10.1016/j.swevo.2015.01.005i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

M.-H. Tayarani-N., A. Prügel-Bennett / Swarm and Evolutionary Computation ∎ (∎∎∎∎) ∎∎∎–∎∎∎

4

been seen as one of the most powerful algorithms for solving dense random graph-colouring problems [17]. 2.4. Methodology In this paper we study the landscape with respect to a Hamming neighbourhood. That is, we consider two configurations to be neighbours if they have a different colour at one and only one vertex. The major analysis we undertake is to study the structure of the local minima with respect to cost. Note that we define a local minimum to be a connected set of configurations with no lowercost neighbours. The local minimum (or minima) with the lowest cost is the global minimum (or minima). Our notion of what we mean by a local and global minima is illustrated schematically in Fig. 4. Note that in our definition each local minima can consist of several configurations. One way to study the topology of a fitness landscape is to search for local optima by an exhaustive search of the landscape; however, this would confine us to studying small problems with sizes up to 20 or 30 vertices (see, for example, [18,19]). Instead we use a local search algorithm, which allows us to study larger instances. We run the local-search algorithm multiple times to find many local optima. To know if we have reached a local optimum we exhaustively search the set of configurations at the current cost to see if there are any lower-cost neighbours. If there is a lower-cost neighbour we move to

it, otherwise we have explored all the neighbours at the current cost so we can be sure that we have reached a local minimum. We can accomplish this cheaply by maintaining two sets. A set of configurations, X , at the current cost, and a set of configurations, U  X whose neighbours we have to check. The details of the exhaustive hillclimbing algorithm are given in Algorithm 1. Algorithm 1. Exhaustive hill climbing algorithm. 1. 2. 3. 3. 4. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16.

x’randomColouringðÞ. do X ’fxg U’fxg c’costðxÞ while (j U j 4 0) y’chooseMemberðUÞ U’U=fyg forall z A NeighboursðyÞ if (z 2 = X) if (costðzÞ o c) X ’fzg U’fzg c’costðzÞ if (costðzÞ ¼ c) X ’X [ fzg U’U [ fzg return (c, X )

80 Number of Colours

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

60

40 Lower bound Heuristic Search

20

0

0

200

600 400 800 Number of Vertices, n

1000

Fig. 3. Lower-bound on the chromatic number for graphs drawn from Gðn; 0:5Þ. Also shown are the best-colourings found by a state-of-the-art heuristic algorithm.

In Algorithm 1 the function randomColouringðÞ returns a random configuration, costðxÞ returns the cost of a configuration x, chooseMemberðUÞ chooses a member of the set U, and NeighboursðyÞ returns the set of neighbours of configuration y. The set U can be efficiently implemented as either a stack or a queue. Note that the algorithm returns the cost of the local optimum and the set of configurations in the local minimum. As discussed in Section 2.2, a feature of graph-colouring is that a permutation of the colour labels leads to the same partitioning of the vertices and thus has an identical cost. Thus with k colours there is a k!-fold permutation symmetry. In our analysis of local optima we will treat all configurations that lead to the same partitioning as equivalent. In practice, we accomplish this by permuting the colours in a solution to a canonical form (which is easily achieved by making the colour of vertex 1 equal to the first colour, then the next vertex that is coloured differently from vertex 1 we assign the second colour, etc.). Rather than storing all the configurations for a partition, we just store the least configuration according to some ordering. This allows us to quickly check whether we have visited a local optimum previously.

3. Landscape analysis In this section we study some properties of the fitness landscape of the graph-colouring problem which are believed to be representative of the problem hardness, and show the effect of the size of the problem on the properties of the landscape. 3.1. Density of configurations Fig. 4. Schematic of the search space where each node represents a configuration (i.e. a colouring). The edges show the neighbours connected to each configuration. The cost of each configuration is shown by the number. In this illustration there is one global minimum of cost 2 consisting of 4 configurations, and two local minima of cost 3 consisting of 1 and 2 configurations. Of course, the true search space has a very different topology.

We start by considering the density of configurations as a function of the cost. These have been studied elsewhere (e.g. [6]) and we include them here more for the sake of completeness, than because of their interest. We also give formulae for the cumulants, which, as far as we know, have not been given elsewhere.

Please cite this article as: M.-H. Tayarani-N., A. Prügel-Bennett, Anatomy of the fitness landscape for dense graph-colouring problem, Swarm and Evolutionary Computation (2015), http://dx.doi.org/10.1016/j.swevo.2015.01.005i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

M.-H. Tayarani-N., A. Prügel-Bennett / Swarm and Evolutionary Computation ∎ (∎∎∎∎) ∎∎∎–∎∎∎

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

The expected cost of a graph, G with k colours, is given by 0 1 X @ cðGÞ ¼ EðcðG; xÞÞ ¼ E 1xi ¼ xj UA ði;jÞ A E

X   j Ej ¼ ; E 1xi ¼ xj U ¼ k ði;jÞ A E

ð7Þ

where k is the number of colours and j E j is the number of edges in the graph. For graphs drawn from Gðn; pÞ j Ej ¼ p

nðn 1Þ : 2

ð8Þ

To find the density of configurations, we generate random colourings and compute a histogram of the costs. Fig. 5a shows the natural logarithm pffiffiffiffiffiffiffiffiffiffiffiffiof ffi the probability of a cost around the average cost scaled by j E j =k for k¼ 5 and for n¼100, 1000 and 10 000. Regardless of the size of the problem, the results are almost similar for each randomly drawn problem instance. The cost of the random configurations are approximately bell shaped around the average cost with a variance of Ωðn2 Þ. We observe that the distributions are quite skewed. The distribution hardly varies for different n. Fig. 5b shows the same distribution but now for fixed n¼100, and for different values of k. This shows that the distribution varies with the number of colours k. 3.1.1. Cumulants To get a better understanding of the density of configurations, we can compute the moments or cumulants of the distribution. The ith central moment is defined as   ð9Þ μi ðGÞ ¼ E ðcðG; xÞ  cðGÞÞi : The cumulants are more natural measures of the distribution. They are closely related to the moments. The first cumulant is equal to the mean cost, and the second cumulant is equal to the second

5

central moment or variance. The third cumulant is equal to the third central moment, while the fourth cumulant is equal to the fourth central moment minus three times the variance squared. (The cumulants arise as Taylor coefficients of the natural logarithm of the moment-generating function.) One property of the cumulants is that for a normal distribution, all the cumulants above the second cumulant are zero. Thus the cumulants can be viewed as a measure of the deviation from a normal distribution. To make these measures invariant of the cost scale, it is common to scale n=2 the nth-cumulant by K 2 (rescaling the cost would leave these scaled cumulants unchanged). The scaled third and fourth cumulants are known as the skewness and kurtosis, respectively

γ1 ¼

K3 3=2

K2

γ2 ¼

;

K4 K 22

:

ð10Þ

The skewness measures the asymmetry in the tails of a distribution, with a positive skewness indicating a longer tail on the right of the distribution and a negative skewness a longer tail on the left of the distribution. A positive kurtosis indicates that the tails are more pronounced than a Gaussian and a negative kurtosis indicates less probability mass in the tails. To compute the cumulants we note that we can write the cost of a configuration (colouring) x for a graph G minus the average cost as  X X 1 cðG; xÞ  cðGÞ ¼ ¼ 1xi ¼ xj U  Se ; ð11Þ k eAE ði;jÞ A E where Sði;jÞ ¼ 1xi ¼ xj U 1=k. On averaging over all colourings we find EðSe Þ ¼ 0. Furthermore, if we have a set of edges, F  E, such that there exists a vertex that belongs to one and only one edge in F , then EðΠ e A F Se Þ ¼ 0. Thus, the variances in the cost for a graph G is given by 0 !2 1   X 2 Se A K 2 ðGÞ ¼ E ðcðG; xÞ  cðGÞÞ ¼ E@ eAE

¼E

X

!

S2e

1 XX Se Se0 A þ E@

eAE

0

e A E e00 A E

e ae

  X  2 E Se ¼ j E j E S2e : ¼

ð12Þ

eAE

  We can straightforwardly evaluate E S2e   !   1 2 1xi ¼ xj U E S2ði;jÞ ¼ E k   2 1 ¼ E 1xi ¼ xj U 1xi ¼ xj U þ 2 k k   1 2 1 1 1 1 ; ¼  2þ 2 ¼ k k k k k

ð13Þ

where we have used the fact that because 1xi ¼ xj UA f0; 1g then 1xi ¼ xj U2 ¼ 1xi ¼ xj U. Thus, the variance is given by K 2 ¼ 2 j E j ðk  1Þ=k . For a graph drawn from the ensemble Gðn; pÞ, the expected number of edges is pnðn  1Þ=2. Thus, the expected variances for a graph drawn from Gðn; pÞ is K2 ¼

Fig. 5. (a) Natural logarithm of histogram of costs for random configurations in particular instances with k ¼ 5 and p ¼ 0.5 for n¼ 100, 1000, 10 000. (b) Natural logarithm of histogram of costs for random configurations in particular instances with n¼ 100 and k ¼ 3, 5, 10 and 15.

pnðn  1Þðk  1Þ 2k

2

:

ð14Þ

We can use the variance to define a measure of the cost diversity defined as pffiffiffiffiffiffiffiffiffiffiffiffi sffiffiffiffiffiffiffiffiffiffi k1 K 2 ðGÞ : ð15Þ ¼ v¼ j Ej cðGÞ

Please cite this article as: M.-H. Tayarani-N., A. Prügel-Bennett, Anatomy of the fitness landscape for dense graph-colouring problem, Swarm and Evolutionary Computation (2015), http://dx.doi.org/10.1016/j.swevo.2015.01.005i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

M.-H. Tayarani-N., A. Prügel-Bennett / Swarm and Evolutionary Computation ∎ (∎∎∎∎) ∎∎∎–∎∎∎

6

correlation is reduced by the same factor at each step so that

Skewness

τ

n=50 n=100 n=500

0.8

RðτÞ ¼ Rð1Þ ¼ e

0.6 0.4

0

Rð1Þ ¼

0

20

40

60

80

2 1.5

ð19Þ

1 ðEÞððcðG; xðt þ 1ÞÞ  cðGÞÞ K 2 ðGÞ 1 ðcðG; xðtÞÞ  cðGÞÞÞ ¼ ðK 2 ðGÞ þCðGÞÞ; K 2 ðGÞ

ð20Þ

i

n=50 n=50 n=200 n=200 n=1000 n=1000

2.5

;

where we have separated out the term that differs from K 2 ðGÞ 0 1   X  1 ð21Þ CðGÞ ¼ E@ 1x0i ¼ xj U  1xi ¼ xj U 1xi ¼ xj U A: k jAN

100

k 3

 τ=l

where l ¼ 1=lnðRð1ÞÞ is the correlation length. We suppose that in one step we change the colouring of vertex i from xi to x0i . Denoting the neighbours of vertex i by N i ¼ fjj ði; jÞ A Eg then

0.2

kurtosis

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

Using the properties of indicator functions 1x0i ¼ xj U1xi ¼ xj U ¼ 0; 1xi ¼ xj U2 ¼ 1xi ¼ xj U;

ð22Þ

and also that on average j N i j is equal to, 2j E j =n we find   1 2j E j 2k ¼ 1 K 2 ðGÞ  Rð1Þ ¼ K 2 ðGÞ nk nðk  1Þ

1 0.5

ð23Þ

2

0

0

5

10 k

15

20

Fig. 6. (a) The skewness of the distribution of the cost of the configurations for different n and k for p ¼ 0.5. b) The kurtosis of the distribution of the cost of the configurations for different n and k for p ¼0.5.

The higher-order cumulants can be found in a similar fashion to the second-order cumulant. The details are given in Appendix A. The expected third order cumulant is given by 2

K3 ¼

pnðn  1Þðk  3k þ 2Þ 2k

3

þ

p3 nðn  1Þðn  2Þ k

3

:

ð16Þ

Fig. 6a shows the analytical curve and the empirical skewness for different values of k and n for p ¼0.5. The expected fourth cumulant is given by K4 ¼

pnðn  1Þðk 1Þ 4

2k

2

2

ð6  6k þ k þ 48p2 24kp 2

 24np2 þ12knp þ 24p3  24np3 þ 6n2 p3 Þ:

ð17Þ

(see Appendix A for details). The analytical curve and the empirically measured results for kurtosis of the distribution, for different values of n and k are shown in Fig. 6b. The results show that as n and k grow, the kurtosis decreases gradually, although it is more sensitive to the number of colours than the size of the problem. 3.2. Auto-correlation Auto-correlation measures the local ruggedness of the fitness landscape by measuring the expected correlation of a random configuration and that of a configuration where we have made τrandomly chosen steps [8]. For a graph, G, the auto-correlation is given by RðτÞ ¼

1 EðcðG; xðt þ τÞÞ  cðGÞÞ K 2 ðGÞ

ðcðG; xðtÞÞ  cðGÞÞÞ;

ð18Þ

where xðtÞ is the configuration at step t of a random walk, and K 2 ðGÞ is the variance in the cost for random configurations of the problem instance, G. As each step is chosen independently, the

(recall that K 2 ðGÞ ¼ j E j ðk  1Þ=k ). Consequently the correlation length is approximately equal to nð1  1=kÞ=2. (Note that this result, although not given explicitly, was also proved in [20]. In that paper, it is shown for any elementary landscape, the autocorrelation function falls off exponentially with a correlation length l ¼  1=lnð1  λ=DÞ, where λ is the eigenvalue of Laplacian and D is the number of neighbours of the adjacency matrix. Ref. [13] showed that for graph-colouring λ ¼ 2, while D ¼ nðk  1Þ, which reproduces the results given above.) The correlation length thus grows linearly with n and is smaller for smaller k. Larger problems thus appear smoother, although this just reflects the fact that changing the colour of a single vertex has a smaller relative change on the cost. The correlation length describes only local properties of the fitness landscape. As we will see later, the hardness of the problem is related to long-ranged properties of the fitness landscape. For example, the character of the problem changes dramatically around the chromatic number. This is not reflected at all in the auto-correlation function, which changes smoothly over the phase-transition between satisfiable and unsatisfiable instances. 3.3. Time to local optimum Having described the average behaviour of configurations, we now move on to study properties of local and global optima. These properties are much more relevant to problem difficulty. Our analysis begins by studying the time taken by the local-search algorithm to reach a local optimum. To check if the local-search algorithm has reached a local optimum, the exhaustive local-search algorithm is used, which knows when an optimum is reached. We measure the time in terms of the number of steps taken. Note that in a highly optimised search algorithm, it is usually making a move which dominates the run time of a local search algorithm. We start by studying the relationship between the time to reach a local optimum and the number of colours. There is a strong relationship between the number of steps taken by a local search algorithm to reach a local optimum and the number of colours. This relationship is illustrated in Fig. 7a, where the mean of the steps needed to reach a local optimum is shown plotted against k, for n ¼30, 40, 50 and k ¼ 1; …; 40. The curves consist of a phasetransition and three major parts, ‘A’, ‘B’ and ‘C’ (see Fig. 8a). In the first part, for k¼2 up to k ¼9 there is a rapid increase in the

Please cite this article as: M.-H. Tayarani-N., A. Prügel-Bennett, Anatomy of the fitness landscape for dense graph-colouring problem, Swarm and Evolutionary Computation (2015), http://dx.doi.org/10.1016/j.swevo.2015.01.005i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

M.-H. Tayarani-N., A. Prügel-Bennett / Swarm and Evolutionary Computation ∎ (∎∎∎∎) ∎∎∎–∎∎∎

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

Fig. 7. (a) Time to reach a local optimum versus the number of colours for n¼ 30, 40 and 50. Each data point represents the mean over 100 instances and 105 hillclimbs per instance. (b) Natural logarithm of the time to reach a local optimum versus the number of colours. The size of the problem is n¼ 100 and k ¼ 2 up to chromatic number around k ¼ 14. Each data point represents the mean or median over 100 instances and 105 hill-climbs per instance.

number of steps to a local optimum. For problems of this size the chromatic number is around 9 (see Fig. 2a)—this marks the phase transition between unsatisfiable and satisfiable instances. After the phase-transition, in part ‘B’, there is a sharp decrease in the time to reach a local optimum. During part ‘C’, the number of steps decay at a much slower rate. For all the instances we have studied, the phase-transition occurs at the chromatic number, i.e., the minimum k such that the global optima have a cost of zero. At each part, the graph shows different behaviour. On a logarithmic scale, at part ‘A’ the data fit a straight line, meaning that for k up to the chromatic number, the number of steps grows exponentially with the number of colours. This is shown in Fig. 7b, where on a logarithmic scale, the mean and the median of the time to reach a local optimum are plotted against k for n¼100 and k up to the chromatic number. The practical data fit the straight line on the logarithmic scale; therefore the data suggest that the average time to local optima for n¼ 100 is T ¼ 22:64  e0:40k ;

ð24Þ

for k¼ 2 up to the chromatic number. The reason for such behaviour is that up to the chromatic number as k grows, the size of the plateaux in the landscape increases exponentially, causing the time to local optima to increase exponentially. We note that the dramatic change in behaviour of the time to reach a local optimum is not reflected in the auto-correlation function. All that happens to the auto-correlation as we increase k is that the correlation length slowly increases. This demonstrates clearly that the time to reach a local optimum is a property of the large-scale behaviour of the landscape, and is unrelated to the local smoothness of the landscape. The second and the third parts of the graph show different behaviours. The plot of the time to local optima as a function of k for n¼ 50 is shown in Fig. 8a on a log-log scale. By rescaling the graph, the data for the part ‘B’ and ‘C’ fit a straight line, suggesting that the time to local optima decays as a factor of kg, where g o 0 is

7

Fig. 8. (a) Time to reach a local optimum versus the number of colours k, on a log– log plot. The size of the problem is n¼ 50. Each data point represents the mean over 100 instances and 105 hill-climbs per instance. (b) Time to reach a local optimum versus the problem size. The number of colours is k ¼ 4. Each data point represents the mean or median over 100 instances and 105 hill-climbs per instance.

Table 1 Time to local optima as a function of k for different n. The results are averaged over 100 instances each 104 hill-climbs. n

TA

TB

30

2.84 e0:77k

pk

 16:08

pk

 0:62

40

5.53 e0:62k

pk

 11:80

pk

 0:61

50

8.14 e

pk

 12:94

pk

 0:60

60

10.72 e0:50k

pk

 14:56

pk

 0:59

70

13.94 e0:46k

pk

 14:42

pk

 0:59

80

14.77 e0:46k

pk

 18:76

pk

 0:57

90

15.73 e0:46k

pk

 20:31

pk

 0:54

0:55k

TC

the gradient of the fitting line. The fitting curve for part ‘B’ is  14:76  0:62 T pk and for the part ‘C’ is T p k . The same experiment is performed on different sizes of the problem and the formulae describing the behaviour of the time to local optima as a function of k are summarised in Table 1. For the part ‘A’ (k up to the chromatic number), as n grows, the rate of growth reduces, although the constant increases so that for a fixed k, the expected number of steps to reach a minimum increases slowly with n in this regime. However, as the chromatic number increases with n, the expected number of steps to reach a minimum increases substantially with n for k at the chromatic number. The interesting behaviour of the part ‘B’ is that for n ¼ 30 up to 100, it covers just 3 or 4 numbers of colours, i.e., k ¼ χ ðnÞ to k ¼ χ ðnÞ þ3. This is why in Table 1 the decay rate of the time to local optima grows with the system size; for bigger n it has to fall off from a larger number within just three or four steps. In part ‘C’ the decay rate decreases with the system size. The time to local optima also depends on the system size, n. Fig. 8b shows the mean and the median time to reach a local optimum versus the problem size n for k¼4. The graph indicates that

Please cite this article as: M.-H. Tayarani-N., A. Prügel-Bennett, Anatomy of the fitness landscape for dense graph-colouring problem, Swarm and Evolutionary Computation (2015), http://dx.doi.org/10.1016/j.swevo.2015.01.005i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

M.-H. Tayarani-N., A. Prügel-Bennett / Swarm and Evolutionary Computation ∎ (∎∎∎∎) ∎∎∎–∎∎∎

8

below the phase-transition, the time apparently increases linearly with n. The best straight line fit to the data is also shown in this figure. The same method is used to find the time to local optima as a function of n for different k, and the results are shown in Table 2. The table is divided into three parts, below the chromatic number, where the number of steps grows linearly with n, at the chromatic number, where it grows exponentially with n, and above the chromatic number, where the time grows polynomially with n. Using this extrapolation, the mean time a local-search algorithm takes to reach a local optimum for a problem with size of n¼ 1000, and k¼4 is 1130. Although some caution is needed in extrapolating from small instances to large instances, nevertheless, for a small number of colours a local-search algorithm does not need much time to find a local optimum. As we saw, the time to reach a local optimum depends on both the size of the system and the number of colours. In Fig. 9, the log time to local optima is plotted against k and n. The three regions are seen in this figure, and the relationship between the size of the system and the time to local optima is different in each region. The phase transition in Fig. 2b, which occurs at the chromatic number, is also shown in Fig. 9 with a dashed line. The graph clearly shows that the peak in the time to local optima coincides with the chromatic number. Note that at the phase transition on a logarithmic scale the time grows linearly with n, meaning the time increases exponentially with the system size. It is relatively easy to understand this intuitively. If we consider a random colouring, then the probability of a colour conflict at any edge is 1=k. Consider a vertex, i with j N i j neighbours. On average, j N i j is equal to pðn  1Þ. The probability that ci of these edges are unsatisfied is given by a binomial distribution Pð c i Þ ¼

j N ij ci

!    1 ci 1 j N i j  ci 1 : k k

ð25Þ

Table 2 Time to local optima as a function of n for different k. The results are averaged over 100 instances, each 104 hill-climbs. This is for n¼30,40,…,90. k

TMean

TMedian

2 3 4 5

3.26 þ 0.46n 4.53 þ 0.82n 17.44 þ 1.11n 53.90 þ 1.18n

1.27 þ 0.46n 0.64 þ 0.80n 5.76 þ 1.06n 16.48 þ 1.35n

x(n)

114:48  e0:053n

29:52  e0:059n

30 40 50

 3:45 þ n1:56 17:44 þ n1:59 53:90 þ n1:60

 3:55þ n1:58  3:88þ n1:62  4:29þ n1:68

ln(Time to local optima)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

15 10 5 0 10

n 5 0

0

10

20 k

30

40

Fig. 9. Time to reach a local optimum versus the problem size. The number of colours is k ¼ 4. Each data point represents the mean or median over 100 instances and 105 hill-climbs per instance.

Fig. 10. Schematic of the probability distribution of the number of unsatisfied edges connected to an arbitrarily chosen edge.

We can think of ci as the local colour cost for vertex i. The total cost is equal to half the sum of the local colour costs (the half occurs because we double count each edge). The distribution of local colour costs are shown schematically in Fig. 10 for k1 5 χ 5 k2 . The probability  ofj Nai j vertex, i, having no colour conflicts is thus, Pðci ¼ 0Þ ¼ 1  1=k . If k is well above the phase-transition (e.g. k2), then it will be highly probably that we can choose a colour for vertex i, so that it has no colour conflicts. If we do this for all vertices, then we will have to find a configuration with zero cost. Furthermore, typically there will be some fraction of the vertices where we have an option of several colours to choose, all of which give no colour conflicts. As a result, we are likely to have an exponentially large global optimum (e.g. if some proportion, f, of vertices have zero local cost for two colours, we might expect around 2fn configurations in the global optimum—of course, by choosing a vertex colour we modify the neighbourhood for the other vertices, so this argument should not be taken too literally). In this regime we can quickly find a solution with cost 0, at which point we can stop our search. As we decrease k (or increase the probability of an edge), then Pðci ¼ 0Þ decreases and it becomes harder to find a colour for a vertex that does not produce a colour conflict. Around the phasetransition we would expect that we can (at least, after some search) find a colouring for most vertices with local colour cost zero. Furthermore, there are likely to be multiple alternatives for some fraction of the vertices leading to an exponentially large connected set at the same cost. As we cannot stop our search unless we have found a configuration of cost 0, our search is likely to take exponentially longer to exhaustively search all neighbours at a low cost. For a smaller number of colours (e.g. k1 in Fig. 10), it becomes unlikely that we can find a colour for most vertices with no conflicting edges. Thus, a local search algorithm will find a colour such that the probability of the cost, Pðci Þ, is very small. Because of this, it is unlikely for there to be alternative colours with the same cost ci. (This is in direct contrast to the case for large k. In that case, many vertices will have a local colour cost of 0 and because it is impossible to have a lower local colour cost, there are likely to be many neighbouring configurations with 0 local colour cost at the same vertices.) Thus, in this regime the optima tend to be relatively small, and the time to reach the local optima short, as the search does not spend a long time on a flat region. However, as we will see, another difficulty arises for these instances; that is, there tends to be an exponential number of local optima. As observed in Fig. 7a, the mean time to reach a local optimum is higher than the median, indicating that the distribution of steps to reach a local optimum has a long tail. Up to the phasetransition, the gap between the mean and the median increases as k grows, meaning that for larger k, the tail of the distribution becomes longer. Fig. 11 shows the distribution of times to reach a local optimum plotted on a semi-log scale to show the rare events. These data were gathered on a particular randomly constructed problem instance. The data show that occasionally it takes a very long time for a hill-climbing algorithm to reach a local optimum. The analysis presented so far does not show the size of the flat regions and the height of the cliffs the algorithm sees at each stage of

Please cite this article as: M.-H. Tayarani-N., A. Prügel-Bennett, Anatomy of the fitness landscape for dense graph-colouring problem, Swarm and Evolutionary Computation (2015), http://dx.doi.org/10.1016/j.swevo.2015.01.005i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

M.-H. Tayarani-N., A. Prügel-Bennett / Swarm and Evolutionary Computation ∎ (∎∎∎∎) ∎∎∎–∎∎∎

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

9

Fig. 11. Logarithm of the histogram of the number of steps to reach the local optima for a particular instance for k ¼5 and n¼ 100. The data are collected on 105 hill-climbs.

the search process. In order to study this, a record of the steps taken by the local-search algorithm to get to the most visited global optimum, for five different descents from random configurations, for n¼50 and k¼5, is shown in Fig. 12a. The graph shows the cost of the solutions at each step against the natural logarithm of the number of steps. We observe that the shape of the landscape changes as the local-search algorithm gets closer to the local optima. At the beginning, when the search algorithm starts from a random configuration, the algorithm easily finds a lower cost neighbour. Furthermore, the algorithm initially makes a large improvement in cost at each step (note that the algorithm always chooses the best neighbour to move to). However, as the search progresses, not only does the probability of finding a better configuration decrease, but the improvement in cost also decreases. Note that we stop the localsearch algorithm when it reaches a local optimum. The number of steps the algorithm takes on a local optimum to make sure it has reached one is not counted in Fig. 12a. In order to show the general behaviour of the record of the steps to reach a local optimum, a density graph of the record of the steps taken by the local-search algorithm to get to the local optima, for 105 different searches starting from random configurations, is shown in Fig. 12b. The horizontal axis shows the natural logarithm of the number of steps to reach a local optimum. Most runs reach a local optimum in 30–100 steps, but in some rare cases it takes many more steps to get to local optima. Although showing different behaviour for different search processes, all the descents show similar shape. At the beginning of the search, it is easy to find a better solution, but getting closer to the local optima, the probability of a neighbour being better becomes exponentially smaller. The graph can be divided in two different parts. In the first part all the runs make steady progress, although the probability of an improving move decreases exponentially as the optimum is approached. The second part (starting after around 50 steps in this case) is due to a small number of runs, where a very large flat region is reached with a cost just above the optimum. When this happens, it can take over an order of magnitude longer to search this flat region and find the optimum. The time it takes to find a local optimum can be a problem, when the number of colours is close to the chromatic number. Another feature, however, which makes graph-colouring hard for local search is the sheer number of local optima. This is a problem both around the chromatic number and in the over-constrained region (when k is below the chromatic number). We explore this next. 3.4. Number of local optima To find the number of local optima in the landscape we use the exhaustive local-search algorithm and store the local optimum it returns. The exhaustive local-search algorithm is repeated a large

Fig. 12. (a) The record of the steps taken by the local-search algorithm to get to the most visited global optimum, for five different search processes starting from random configurations. The cost of the configurations at each step is represented against the number of steps plotted on a logarithmic scale. The size of the problem is n¼50 and the number of colours is k ¼ 5. (b) The density plot of the record of the steps taken by the local search algorithm to get to the local optima for 105 different search process, starting from random configurations. The cost of the configurations at each step is represented against the ln number of steps. The size of the problem is n ¼50 and the number of colours is k ¼5.

number of times from randomly chosen starting configurations. Each local optimum with the number of times it hits is stored. As mentioned earlier, we treat every local optimum obtained by permuting the colours as a single local optimum. Of course, there is no guarantee that all the local optima in the landscape are found, particularly if a local optimum has a small basin of attraction. Typically there are many local optima that have a very small probability of being visited. For example, in one randomly constructed instance with n¼50 and k¼8, after 106 hill-climbs, there were 36 300 found local optima, of which 31 154 were discovered less than 10 times, and 18 040 local optima being found just once. For another typical instance with n¼70, and k¼ 10, again after 106 hillclimbs, among 451 789 found local optima, 442 147 of them were found less than 10 times, and 358 045 of them were hit only once. However, those local optima that were observed only once were all high-cost local optima (with a cost of 22, where the optimal solutions have a cost of 8). The mean, minimum and maximum number of times the local optima at certain cost were found for a particular problem instance with n ¼ 50 and k ¼7 are shown in Fig. 13. Of course there are likely to be local optima that we have not found. However, note that for costs smaller than 12, we have visited each local optimum at least 30 times. If any local optima exist with a cost less than 12 that we have not observed, then they would appear to have unusually small basins of attraction. Following [1] we call such fit (low cost) local optima with unusually small basins of attractions elves. We cannot rule out the existence of elves empirically, as, by definition, we would be unlikely to observe them. However, we will see in Section 4.1 that low-cost minima tend to be surrounded by low-cost configurations so that it is

Please cite this article as: M.-H. Tayarani-N., A. Prügel-Bennett, Anatomy of the fitness landscape for dense graph-colouring problem, Swarm and Evolutionary Computation (2015), http://dx.doi.org/10.1016/j.swevo.2015.01.005i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

M.-H. Tayarani-N., A. Prügel-Bennett / Swarm and Evolutionary Computation ∎ (∎∎∎∎) ∎∎∎–∎∎∎

10

difficult for the landscape to hide fit optima with small basins of attraction. In addition, in the many runs we have performed we have never found any evidence for fit solutions with abnormally small basins of attraction (although there is some variation in the size of the basin of attraction this variation is never huge). Thus, although we cannot rule out the existence of elves, we deem their existence unlikely. Although we can be confident of small instances of reaching all fit local optima, we are almost surely missing local optima with high cost (greater than 12 for the instance discussed above). There exists a number of imputation techniques for estimating the true number of local optima at a given cost level (see for example [21,22]). In our previous paper [1], we used an imputation technique to estimate the number of local optima at low cost; however, for high cost local optima we found the imputation very difficult as most local optima were only visited once, providing insufficient data to make a sensible inference. The data for graph-colouring suffer from a similar problem, so we have not attempted a similar inference. Fitting the data for the mean count of the local optima at each cost level in Fig. 13 to a straight line gives another way to estimate the probability of finding a local optima. For the case of Fig. 13, where n¼50 and k¼7, the gradient of the fitting line on the data for cost c¼10,…,14, is  0.67, which suggests that the probability of getting to a particular local optimum at the cost c decays exponentially as P v ðcÞ p 10  0:67c . (Note that because there are many more local minima at high cost, this does not imply that the search is likely to reach the global optimum). We can give no strong justification for believing that this exponential decay should continue for high costs, although it seems to be a fairly good approximation at low cost for every instance we have examined. A faster decay rate means that the inferior local optima have much smaller basins of attraction and the local-search algorithm

6 Maximum Count Mean Count Minimum Count

5 Log10 number of visits

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 Q5 57 58 59 60 61 62 63 64 65 66

has a better chance of reaching a fit local optimum. The decay rate of the probability of finding a local optimum at cost c for different sizes of the problem and different numbers of colours is summarised in Table 3. The data are averaged over 100 problem instances and 105 number of hill climbs. Due to practical reasons, we have made the data for k up to the chromatic number. The data suggest a linear relationship between the decay rate with both the system size and the number of colours. As the system size grows the decay rate decreases linearly. This implies that as the system size grows, the probability of a local-search algorithm getting to a fitter local optimum shrinks exponentially. For larger systems, an inferior local optimum has relatively greater chance of being found. In the case of the number of colours, there is a positive linear relationship between k and the decay rate. As the number of colours increases, the chance of the local-search algorithm finding a particular fit local optimum increases. Interestingly, such behaviour is seen for n up to 80; however, for larger systems, an increase to the number of colours leads to a decrease in the decay rate. This means that for larger systems, as we get closer to the chromatic number, a local search algorithm has a smaller chance of finding a fit local optimum. The relationship between the size of the problem and number of local optima is shown in Fig. 14a, where the logarithm of the pffiffiffi number of observed local optima divided by n is plotted versus the cost divided by the average cost, c ¼ j E j =k. The results are averaged over 100 instances of size 30, 45, 60 and 75. We have performed 105 hill-climbs to find the local optima. For n¼60 and n¼75 we significantly underestimate the actual number of local optima at higher costs (inferior local optima). Since the number of colours for all the problem sizes is taken to be k¼5, as n grows, the cost grows and the graphs are shifted to the right. Furthermore, the plot shows that this curve is roughly similar for different n. This indicates that the number of local optima grows approximately exponentially with the square root of the number of vertices. Note also that we count all local optima that are equivalent up to a colour

4 3 2 1 0 10

12

14

16

18

20

22

24

Cost Fig. 13. Logarithm of the number of times local optima at certain cost are hit in an experiment with 107 hill-climbs for a particular randomly drawn instance with n¼ 50 and k ¼ 7.

Table 3 Probability of finding global optima at cost c as a function of n for different values of k is equal to 10  xc, where x for different values of n and k are shown in the table. k ¼ 2 k ¼ 3 k ¼4 k ¼5 k ¼6 k ¼7 k ¼8 k ¼9 k ¼10 k ¼ 11 k ¼ 12 n ¼20 n ¼30 n ¼40 n ¼50 n ¼60 n ¼70 n ¼80 n ¼90 n ¼100

0.27 0.24 0.24 0.21 0.22 0.20 0.22 0.21 0.21

0.49 0.42 0.33 0.30 0.30 0.26 0.26 0.23 0.24

0.69 0.60 0.44 0.40 0.35 0.31 0.28 0.24 0.25

– 0.68 0.56 0.48 0.40 0.34 0.31 0.20 0.18

– – 0.72 0.54 0.44 0.39 0.33 0.22 0.14

– – 0.80 0.67 0.53 0.41 0.33 0.21 0.13

– – – 0.72 0.55 0.44 0.31 0.20 0.12

– – – – 0.56 0.47 0.30 0.17 0.11

– – – – – 0.40 0.30 0.17 0.08

– – – – – – 0.37 0.13 0.08

– – – – – – – 0.10 0.07

Fig. 14. (a) Logarithm of the number of times local optima at certain cost for different n and k¼ 5. The results are averaged over 100 instances for 105 hill-climbs. (b) Mean number of local optima found in 105 hill-climbs averaged over 100 instances against the number of colours k.

Please cite this article as: M.-H. Tayarani-N., A. Prügel-Bennett, Anatomy of the fitness landscape for dense graph-colouring problem, Swarm and Evolutionary Computation (2015), http://dx.doi.org/10.1016/j.swevo.2015.01.005i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

M.-H. Tayarani-N., A. Prügel-Bennett / Swarm and Evolutionary Computation ∎ (∎∎∎∎) ∎∎∎–∎∎∎

permutation as a single optimum. The total number of Hamming optima is thus k! greater than that shown in the figure. The dependence of the number of local optima on the number of colours is shown in Fig. 14b, where the mean number of local optima found in 105 hill-climbs, averaged over 100 instances is plotted against k. Since the time to reach a local optimum grows very rapidly close to the chromatic number (see Section 3.3), it is difficult to count the number of local optima for large k. In Fig. 14b, we could make the data for k up to 8. The reason that such a limitation for the number of colours did not exist in counting the number of steps to local optima but exists in finding the number of local optima is that in finding the number of steps we can stop the search process as soon as we reach a solution with cost zero (a solution with the cost zero is clearly at a global optimum). In counting the number of local optima we have to explore all the solutions on the local optima to be able to label the optima. For k slightly above the chromatic number, the number of solutions on the local optima is huge, such that it is impossible to store all of them. Fig. 15 shows the proportion of local optima at a particular cost, and the probability of finding a local optimum with the localsearch algorithm for a particular problem instance with n ¼50 and k ¼5. Note that Fig. 15 shows the empirically measured proportion of local optima at each cost level, and thus underestimates the true proportion of local optima at high cost. 3.5. Reaching the global optima Fig. 16a shows the minimum cost and the cost found by a local pffiffiffi search algorithm plotted against 1= n for k ¼5. Empirically, we observe that the minimum cost for a problem with k ¼5 as a function of n is approximately given by   5 cmin  1  pffiffiffi c; ð26Þ n pffiffiffi where c is the average cost of a configuration. The n scaling is not obvious (certainly as far as the authors are concerned). In pffiffiffi Fig. 16b we show the same data, but now we plot nð1 cmin =cÞ. We observe that the gap between the minimum cost of a problem and the cost found by a local search algorithm appears to be roughly constant on this plot, suggesting that the actual gap grows pffiffiffi with n. On each run of a local search algorithm, typically a different global optimum will be found with a different cost. However, the variance in cmin =c for the cost of the minima found by local search falls off approximately as 0:39n  3=2 . Thus, as n grows the cost-gap between the local optima found by local search and the global minimum measured in terms of the number of standard deviation grows as approximately n1=5 . That is, it becomes increasingly unlikely to find the global optimum as we increase the size of the system. 0.25 Probability/Proportion of local optima

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

11

Proportion of local optima Probability of finding local optima

0.2 0.15 0.1 0.05 0 35

Fig. 16. (a) Plot of the expected minimum cost and the average cost found by localpffiffiffi search algorithm versus 1= n for k ¼ 5. The number of hill-climbs is 105 on 500 randomly generated instances. (b) Plot of the same data, but plotted against pffiffiffi nð1  cmin =c on the y-axis. The number of hill-climbs is 105 on 500 randomly generated instances.

The gap between the expected cost and the cost of the global optima also depends on the number of colours, k. Fig. 17a shows the expected cost and p the ffiffiffi minimum cost of the local optima for n ¼50, plotted against k. Based on the scaling, it is clear that as the number of colours grows, the gap shrinks. It is easy to understand why this happens; the greater number of colours means that, on average, the local optima have smaller costs. In Fig. 17b, the logarithm of the probability of finding a global optimum for k¼ 5 is plotted against n. The straight line fit is consistent with the hypothesis that finding a global optimum using the local-search algorithm becomes exponentially unlikely as the system size grows. Using the straight line fit in Fig. 17a to extrapolate to large n, the probability of the local-search algorithm finding a global optimum for an instance of size 1000 would be 1:24  10  36 . Although such an extrapolation is unlikely to provide a precise value, nevertheless, it gives a strong indication that for larger problem instances, using multiple runs of local-search algorithms begins to be useless. Employing the same method used in Fig. 17b, the probability of finding the global optima as a function of n is found for different k, and is presented in Table 4. The data show that for all the values of k, the chance of the localsearch algorithms finding a global optimum decreases exponentially. Interestingly, the decay rate grows for larger k. 3.6. Number of global optima

40

45

50

55

60

Cost

Fig. 15. Histogram showing the proportion of local optima at a particular cost, and the probability of finding a local optimum of a particular cost for one instance with n¼ 50 and k ¼ 5.

Fig. 18a shows a histogram of the number of global optima for 10 000 random graph-colouring instances drawn from Gð50; 1=2Þ and k ¼5. We observe that there is quite a large spread (from instance to instance) in the number of global optima.

Please cite this article as: M.-H. Tayarani-N., A. Prügel-Bennett, Anatomy of the fitness landscape for dense graph-colouring problem, Swarm and Evolutionary Computation (2015), http://dx.doi.org/10.1016/j.swevo.2015.01.005i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

M.-H. Tayarani-N., A. Prügel-Bennett / Swarm and Evolutionary Computation ∎ (∎∎∎∎) ∎∎∎–∎∎∎

12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

Fig. 17. (a) Plot of the expected minimum cost and the average cost found by localpffiffiffi search algorithm versus k for n¼ 50. The number of hill-climbs is 105 on 500 randomly generated instances. (b) Natural log-probability of finding the global optima versus n for k ¼ 5, the number of hill-climbs is 105 and the graphs are averaged over 500 randomly generated instances.

Fig. 18. (a) Histogram of the number of global optima for 10 000 random graphcolouring instances with n¼ 50 and k ¼5. (b) Histogram of the number of configurations at the global optima for 10 000 random graph-colouring problem instances with n¼50 and k ¼ 5.

Table 4 Probability of finding global optima as a function of n for different values of k.

Table 5 Expected number of global optima for different n and k averaged over 100 problem instances.

k

Probability of finding global optima

k

n ¼20

n ¼30

n ¼40

n¼50

n¼ 60

2 3 4 5 6 7 8

1:17  e  0:038n 2:17  e  0:060n 3:21  e  0:072n 5:59  e  0:084n 12:18  e  0:097n 22:01  e  0:106n 27:11  e  0:104n

2 3 4 5 6 7 8

1.95 2.72 2.95 5.16 – – –

2.05 2.57 4.04 5.86 8.81 – –

2.01 2.82 4.89 5.64 6.51 10.83 –

1.84 2.62 3.18 4.69 5.93 6.68 11.66

1.87 3.07 3.45 4.45 5.31 6.01 7.68

The expected number of global optima for different n and k is summarised in Table 5. It is clear that as k grows, the number of global optima increases, but it does not seem to have a particular relation with the size of the system. Here, again due to practical reasons, there is a limitation on the number of colours for which we can compute these quantities. Note that we define each optimum to be a connected set of configurations at the same cost with no fitter neighbours. The size of the global optima also varies. Fig. 18b shows the histogram of the number of configurations at each global optimum for n ¼50 and k ¼5 for 10 000 randomly generated problem instances. The data are consistent with the number of local optima being distributed according to a geometric distribution   P ng ¼ ð1  pÞpng  1 ; ð27Þ with p  0:75. The expected number of configurations in a global optimum is shown as a function of k and n in Table 6. This does not seem to change substantially with the number of vertices, n, but it increases with k. This is not surprising, as when we increase k,

we move towards the phase-transition, where the sizes of all minima increase substantially. We are unable to measure the size of the global optima at or beyond the phase-transition as the number of states becomes too large. Another important property of the fitness landscape of the graph-colouring problem is the distance between the global optima, which shows how the global optima are correlated. Fig. 19 shows the histogram of the distances between global optima, averaged over 1000 instances for n¼50 and k¼5, both in partition and in Hamming space. For these parameters the expected Hamming distance between random configurations is 40, and the expected partition distance between random configuration is around 33. We see that the distances between global optima are widely distributed. Although there is a tendency for globally optimal solutions to be more correlated (in terms of partition distance) than random solutions, there exist cases where global solutions are very different from each other. Note that a fit solution is achieved through a choice of variables such that fewer than average constraints are violated. Because of the non-linear interactions between variables, this can be achieved in many different ways. The fact that the

Please cite this article as: M.-H. Tayarani-N., A. Prügel-Bennett, Anatomy of the fitness landscape for dense graph-colouring problem, Swarm and Evolutionary Computation (2015), http://dx.doi.org/10.1016/j.swevo.2015.01.005i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

M.-H. Tayarani-N., A. Prügel-Bennett / Swarm and Evolutionary Computation ∎ (∎∎∎∎) ∎∎∎–∎∎∎

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

13

Table 6 Expected number of configurations in global optima for different n and k averaged over 100 problem instances. k

n¼ 20

n¼ 30

n¼ 40

n¼ 50

n¼ 60

2 3 4 5 6 7 8

1.60 2.69 5.39 17.38 – – –

1.51 2.32 3.54 5.56 9.26 – –

1.73 2.25 2.74 4.39 6.70 8.88 –

1.63 2.10 3.65 4.37 4.90 7.73 9.34

1.68 1.89 3.08 4.89 5.41 7.78 8.47

Fig. 19. Histogram of the Hamming distances and Partition distances between the global optima for 10 000 random graph-colouring problem instances with n ¼50 and k ¼ 5.

instances are drawn at random means that there are no strong statistical differences in the neighbourhood of the vertices. Thus it is not too surprising that there exist very different ways to obtain fit configurations. A consequence of the fact that there can be very fit configurations that lie a long way apart is that the position of the global optimum can be very sensitive to the exact distribution of edges. That is, by adding and removing an edge, the position of the global optimum can change to a completely different part of the search space. This makes it very challenging for a heuristic search algorithm to find a globally optimal configuration. We have seen in Table 6 that the number of configurations at the global optima increases as the number of colours grows. A further question is the nature of the global optima. To address this we have studied the mean diameter of the global optima, which is the maximum distance between configurations in the same global optimum. We found that the result for Hamming and partition distance was identical, which suggests that the global optima are not being distorted by the presence of the colour permutation symmetry. The diameter is shown as a function of n and k in Fig. 20a together with the linear best-fit extrapolation. This seems to fit the data well and shows that the diameter increases linearly with both n and k. To get an idea of the shape of the global optima we measure the logarithm of the number of configurations in a global optimum, divided by the logarithm of the diameter (the diameter of a global optimum is the maximum Hamming distance between all pairs of configurations on the global optimum). This is plotted in Fig. 20b as a function of n and k together with the linear best-fit extrapolation. This shows that the number of configurations grows as a small power of the diameter, indicating that the global optima tend not to be compact at all (note that the number of configurations that fit in a Hamming ball of diameter D is kD). Furthermore, this exponent seems to be independent of the system size n and grows only slowly with k.

Fig. 20. (a) Average diameter of global optima versus k and n. The results are averaged over 1000 different random problem instances. (b) Average compactness of global optima versus k and n. Also show for the two graphs is the best linear fit. The results are averaged over 1000 different random problem instances.

3.7. Distance between optima As we show in Table 4, for large instances of the graphcolouring problem, the local-search algorithm will, with overwhelming probability, reach a local optimum with a cost considerably higher than the minimum cost. This might not prevent an algorithm from finding a globally optimal solution, if a global optimum was close to most good local optima. We show here that this hope proves fruitless. Fig. 21a shows the mean minimum partition distance from a configuration in a local optimum to the nearest global optimum. These results are averaged over all the local optima found in 100 instances. By rescaling the axes and fitting a single correction to scaling parameter, we can collapse the curves in Fig. 21a onto a universal curve. This is shown in Fig. 21b, and demonstrates that the distances from a local optimum to a global optimum scale linearly with the problem size. The mean distance grows approximately as c  c a min dn ; ð28Þ n with a  0:33, thus the mean distance from an optimum of cost cmin þ 1 to a global optima grows approximately as n0:67 . The number of configurations in a Hamming ball of size n0:67 is greater 0:67 than n0:33n . Thus for large n it would appear to take superpolynomial time to discover a global optima, by exploring the neighbourhood of a local optima with cost cmin þ 1.

4. Landscape correlation The experiments we have presented so far provide good reason for believing that the graph-colouring problem is difficult for a hill-climber to navigate. However, as we show in Section 3.2, there exist significant long-range correlations in the landscape of the

Please cite this article as: M.-H. Tayarani-N., A. Prügel-Bennett, Anatomy of the fitness landscape for dense graph-colouring problem, Swarm and Evolutionary Computation (2015), http://dx.doi.org/10.1016/j.swevo.2015.01.005i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

M.-H. Tayarani-N., A. Prügel-Bennett / Swarm and Evolutionary Computation ∎ (∎∎∎∎) ∎∎∎–∎∎∎

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

Fig. 22. (a) Expected cost of configurations in Hamming sphere of radius h around a global optimum for n¼ 50 and k ¼ 5. The dotted lines show one standard deviation around the mean. (b) Expected cost in a Hamming sphere for different sizes of the problem and for k ¼ 5. Note that the minimum of the curves is at h ¼ nð1  ð1=kÞÞ ¼ 0:8n. The results are averaged over 104 solutions at Hamming distance h. Fig. 21. (a) Measure of the mean partition distance to the closest global optimum from a local optimum of fitness c for k ¼ 2 and different values of n. (b) Rescaled version of (a), showing that the curves collapse on to each other.

graph-colouring problem. This is a consequence of the structure of the fitness function. On average each vertex is connected to pðn  1Þ other vertices. The change in the cost of a configuration due to changing the colour of a vertex will be equal to the number of the neighbour vertices with the new assigned colour, minus the number of the neighbour vertices with the previous colour. Typically, this change in fitness is rather small. As a consequence, the configuration around a fit configuration will also tend to be fit. This property is also reflected in Fig. 12b, where the density plot of the record of the steps to local optima is represented. Close to the optima, there are huge flat regions where any improving neighbour is likely to improve the cost by at most 1, meaning that close to a local optimum, there are typically a large number of configurations having a similar cost to the local optimum. 4.1. Expected cost in hamming sphere In Fig. 22a, the expected cost of a configuration in a Hamming sphere of radius h from a global optimum is shown. The quantitative shape of the curve is quite similar for different n, although the standard deviation is of order n2. The expected cost of a configuration in a Hamming sphere of radius h around a configuration h is equal to ch ðxÞ ¼

X X 1 1x0 ¼ x0j U; j Hh ðxÞj x0 A H ðxÞði;jÞ A E i

ð29Þ

h

where Hh ðxÞ is the set of configuration at a Hamming distance h from x. We can compute a good approximation to this by assuming that we independently mutate a vertex colour with a probability h=n. There are then three types of edges we have to consider:

1. Those edges where colour at the two vertices are unchanged (i. e. x0i ¼ xi and x0j ¼ xj ), which occur with a probability ð1  h=nÞ2 . In this case, the cost of the edges is the same as the cost of the initial configuration cðxÞ. 2. Those edges where the colour of one vertex is changed, which occurs with a probability 2ð1  h=nÞh=n. In this case, there is a probability of 1=ðk  1Þ of an edge that was not a colour conflict becoming a colour conflict. 3. Finally, there are those edges where the colour of both edges are changed, which occurs with a probability ðh=nÞ2 . In this case, if the edge was in conflict, there is a probability of 1=ðk 1Þ that the edge remains a conflict. If, on the other hand, the edge was not in conflict, there is a probability ðk  2Þ=ðk  1Þ2 that the edge becomes a conflicting edge. Taking these three types of edges into account we find  2   h h h 1 ch ðxÞ ¼ 1  cðxÞ þ 2 1  ð j E j  cðxÞÞ n n n k1 !  2 h cðxÞ k2 þ ð j E j cðxÞÞ þ n k1 ðk  1Þ2     kh j Ej h  cðxÞ 2  k  2 : cðxÞ þ n nðk  1Þ2 k Since the average cost is equal to j E j =k we find    ch ðxÞ  cðxÞ kh h ¼ 2 k2 : 2 n c  cðxÞ nðk  1Þ

ð30Þ

ð31Þ

This function reaches a maximum of 1 at h ¼ nð1  ð1=kÞÞ. The rescaled cost for different n and for k¼5 is shown in Fig. 22b. This is computed empirically by sampling the cost 104 times at each Hamming distance, h. The theoretical approximation is indistinguishable from the empirical curves on this graph. The same experiments are performed for different k and n¼100, and are shown in Fig. 23. The curve for k¼ 2 has a complete

Please cite this article as: M.-H. Tayarani-N., A. Prügel-Bennett, Anatomy of the fitness landscape for dense graph-colouring problem, Swarm and Evolutionary Computation (2015), http://dx.doi.org/10.1016/j.swevo.2015.01.005i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

M.-H. Tayarani-N., A. Prügel-Bennett / Swarm and Evolutionary Computation ∎ (∎∎∎∎) ∎∎∎–∎∎∎

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

symmetrical ‘\ ’ shape. The reason for such behaviour is the symmetry in the search space; for k¼2, a fully mutated solution is exactly identical to the original solution. So mutating all the variables in a configuration at a global optimum makes a solution at the same global optimum in the other symmetry. These curves show the existence of large-scale correlations in the fitness landscape. That is, the presence of a local optimum changes the expected fitness of configurations away from the mean fitness at all Hamming distances. Despite this large-scale structure, local search algorithms still fail to reliably find a global optimum. The reason for this is that the landscape is sufficiently rugged that it is not possible to exploit the long scale correlation using local fitness information alone. In Fig. 24a, a density plot of the local optima as a function of their fitness and their Hamming distance from the most frequently visited global optimum is shown. Apparently there is little correlation between the quality of the local optima and the closeness to this global optimum. Note that the local optima close to the global optimum are slightly below the average cost of local optima. The average Hamming distance for n ¼ 100 and k ¼5 is 80. It is clear that the peak of the density graph is at the left side of the line showing the average Hamming distance. For the case of Fig. 24a, 64% of the local optima has a smaller-than-average Hamming distance to the global optimum. The same data are plotted against partition distance in Fig. 24b. The distances between the local optima in partition space show different behaviour from the Hamming space. The correlation between the quality of the local optima and the closeness to this global optimum is stronger using partition distance. In order to show the relationship between n and k, with this correlation, we have computed the gradient of the best fitting line to the density of the local optima as a function of their fitness and their partition distance. Fig. 25a shows these gradients for the most frequently visited global optimum. Each point represents the average over 10 random problem instances for 105 hill-climbs. Another way to measure the relationship between the fitness of the local optimum and its partition distance to the most visited global optimum is to measure the (Pearson) correlation   P i A L ðci  cðLÞÞ Di  DðLÞ ð32Þ Corrðc; DÞ ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  2 ; P 2P i A L ðci  cðLÞÞ i A L Di  DðLÞ where L is the set of local optima, ci and Di are the cost and the distance to the global optima for local optimum, i, and c L and D L are the average cost and distances of the local optima. Note that this measure is closely related to the fitness–distance correlation [9] (close because we consider the partition distance to the most frequently visited global optimum). The correlation for different values of n and k is represented in Fig. 25b. The data suggest a

Fig. 23. Expected cost in a Hamming sphere for different numbers of colours and for n¼ 100. Note that the minimum of the curves is at h ¼ nð1  ð1=kÞÞ. The results are averaged over 104 solutions at Hamming distance h. The results are averaged over 104 solutions at Hamming distance h.

15

Fig. 24. (a) Density plot of local optima configurations as a function of their costs and Hamming distance from the most frequently visited global optimum. We also plot the mean cost in a Hamming sphere from the same global optimum. The shading of the point shows the number of configurations at that cost and Hamming distance. The size of the problem is n ¼100 and the number of colours is k ¼ 5. The number of Hill-climbs is 107 and the expected cost at Hamming distance h is found as an average over 105 configurations. (b) Density plot of local optima configurations as a function of their costs and partition distance from the most frequently visited global optimum. The shading of the point shows the number of configurations at that cost and Hamming distance. The size of the problem is n¼ 50 and the number of colours is k ¼ 5. The number of Hill-climbs is 107.

positive correlation in partition space, except when we reach the phase-transition and the global optima become very large. Below the phase-transition there appears to be no particular relationship between the correlation and the system size or the number of colours.

4.1.1. Exploiting fitness–distance correlation An important question when designing a heuristic search algorithms is whether the correlation between fitness and distance of local optima can be exploited in some way. This is particular relevant for evolutionary algorithms where different members of the population may have found, or at least be close to, different local optima. One scenario where we might believe the fitness distance correlation can be exploited is in a hybrid generational genetic algorithm. In each generation we apply local search, selection and crossover. The local search will find good solutions (presumably near to local optima). As fitter solutions are closer (in partition distance) to global optima, this will move the population towards a global optimum. However, since there are an enormous number of local optima, the local search algorithms are unlikely to actually find the global optimum and will tend to get stuck. In selection we choose the fitter solutions, which should reduce the average distance to a global optimum because of the fitness–distance correlation. It does this at the expense of losing diversity. We then perform crossover, which helps restore diversity.

Please cite this article as: M.-H. Tayarani-N., A. Prügel-Bennett, Anatomy of the fitness landscape for dense graph-colouring problem, Swarm and Evolutionary Computation (2015), http://dx.doi.org/10.1016/j.swevo.2015.01.005i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

16

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

M.-H. Tayarani-N., A. Prügel-Bennett / Swarm and Evolutionary Computation ∎ (∎∎∎∎) ∎∎∎–∎∎∎

Fig. 25. (a) Gradient of the best fitting line to the cost of local optima as a function of their partition distance from the most frequently visited global optimum (see Fig. 24b). The data are averaged over 105 hill-climbs over 10 different problems at each problem size. (b) Correlation between the cost of local optima as a function of their partition distance from the most frequently visited global optimum (see Fig. 24b). The data are averaged over 105 hill-climbs and over 10 different problems at each problem size.

Hopefully, the crossover will not (substantially) increase the distance to a global optimum, but it reinvigorates the search by moving the members of the population to a new part of search space away from a local optimum. Thus at the end of each generation, we should be closer to a global optimum than in the previous generation. As we lose diversity, due to selection, the search can “run out of steam” before it reaches a global optimum, although this can be mitigated to some extent, for example, by increasing the population size. A critical part of the argument for the success of such a hybrid algorithm is that crossover does not substantially increase the distance to a global optimum. In most problems that are represented by strings, this is guaranteed by simple crossovers such as uniform crossover, or k-point crossover, provided that the final population has essentially the same set of genetic variables after crossover (i.e. crossover just shuffles the variables between individuals). In this case, the average Hamming distance to any point in the search space is unchanged by crossover. However, in problems such as graphcolouring where there is an explicit symmetry, it is not sufficient that the average Hamming distance is unchanged, as there is no correlation between fitness and the Hamming distance. As a consequence, it has been long known that using a genetic algorithm with a traditional crossover operator is inefficient for graph-colouring. What we require is a crossover operator that minimises the change in the partition distance. Such an operator was proposed by [17] over 10 years ago. Using a hybrid GA with this operator, Galinier and Hao obtained substantially better results on a large set of standard benchmark problems including large dense random problem. This crossover is still used in many state-of-the-art algorithm.

The description given above of how a hybrid genetic algorithm works appears plausible, but needs testing. We have attempted to do this by measuring the partition distance at each stage of a hybrid genetic algorithm using the Galinier and Hao crossover operator. More precisely, we have run an evolutionary algorithm with a population of size 10, performing three phases each generation. The first is a local neighbourhood search, “L”. In this phase, for each member of the population, we run a simple hill-climber for … iterations. We then perform selection, “S”, using … Finally we pair up all members of the population … and use Galinier and Hao crossover, “C” as described in [17] (this uses the sizes of the colour classes as labels rather than arbitrary colours). We run this algorithm 100 times on random instances drawn from Gð100; 1=2Þ. We tested the algorithms with k¼3, 6 and 9 colours. The results are shown in Fig. 26 where we have measured the partition distance to the optimum and also the cost after each phase of the algorithm for five generations. We observe that local search both improve the cost and the partition distance to the optimum configuration. However, it runs out of steam as we approach a local optimum. Selection slightly improves both the average cost and the partition distance, although it does not add any new configurations in the population. Finally crossover substantially increases the cost, but only increases the partition distance by a small amount. As we repeat this cycle over many generations the quality of solutions improves. This confirms the explanation given above of how an evolutionary algorithm exploits the fitness–distance correlation. A landscape satisfies the big-valley hypothesis, if the closer a local optimum is to the global optimum the fitter the optimum [23]. In the graph-colouring problem we often have many global optima that are far from each other. Furthermore, although there is a slight correlation between the fitness of a local optimum and its distance from a global optimum, there are a plenty of fit local optima with a considerable distance from a global optimum, and many of unfit local optima close

Fig. 26. The evolution of a hybrid genetic algorithm where “L” represents the localsearch algorithm, “S” represents the selection operator and “C” represents the crossover operator. This is collected on a randomly drawn instance of Gð100; 1=2Þ with k ¼3, 6 and 9. The size of the population is 10. The results are averaged over 100 runs.

Please cite this article as: M.-H. Tayarani-N., A. Prügel-Bennett, Anatomy of the fitness landscape for dense graph-colouring problem, Swarm and Evolutionary Computation (2015), http://dx.doi.org/10.1016/j.swevo.2015.01.005i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

M.-H. Tayarani-N., A. Prügel-Bennett / Swarm and Evolutionary Computation ∎ (∎∎∎∎) ∎∎∎–∎∎∎

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

to a global optimum. Thus we cannot see the classic big-valley structure in the graph-colouring problem instances. 4.2. Basins of attraction The region in the search space that leads a local-search algorithm to a particular local optimum is usually called the “basin of attraction” of the local optimum. We can empirically measure the “basin of attraction” of a local optimum by repeatedly jumping to a certain Hamming distance from the local optimum, running the local-search algorithm until it reaches a local optimum and measuring the number of times it ends up at the same local optimum we had jumped from. This allows us to measure the return probability, starting from a configuration in a given Hamming sphere. Fig. 27a shows the return probability for the most frequently visited global optimum for a particular graph-colouring problem instance for different numbers of colours. As we move away from the minimum, the return probability initially decreases. Somewhat counter-intuitively it then increases again; however, this is because when we change enough variables we finish closer to an optimum that is identical up to a permutation of the colours—in all our work we have treated local optima that are identical up to a colour permutation as the same optimum. This is most pronounced in the case k¼ 2, where by changing all the colours we end up in the optimum obtained by swapping the two colours. The return probability for the most frequently visited global optimum, and one of the least visited local optima is plotted in Fig. 27b. The most visited global optimum has been visited for

Fig. 27. (a) Return probability starting from a Hamming sphere of radius h versus h. This is for the most visited global optima in a particular graph with size of n¼ 50 for different numbers of colours. The empirical return probability is computed by running 105 hill-climbs starting at randomly chosen configurations in each Hamming sphere. (b) Return probability starting from a Hamming sphere of radius h versus h. This is for the most visited global optima and the least visited local optima in a particular graph with size of n¼ 50 and k ¼ 5. The empirical return probability is computed by running 106 hill-climbs starting at randomly chosen configurations in each Hamming sphere.

17

0.033% of the times, and the least visited one has been hit just once in 106 number of hill-climbs.

5. Conclusion This paper studies a wide variety of properties and in consequence we cannot give a simple punchline. We would argue that this is a result of the intrinsic difficulty of optimisation. Almost all the properties we have studied could be relevant to developing an effective optimisation algorithm for graph-colouring. There is some simplification in the picture that emerges in that many of the properties are found to be very similar to those found in maximum satisfiability (MAX-SAT) [1]. We briefly outline some of those similarities and some of the important differences at the end of this section. Furthermore, many of the properties are relatively easy to understand in hindsight. One of the distinctive properties of graph-colouring is its permutation symmetry. This symmetry can be huge for large problems, for example, graphs from Gð1000; 1=2Þ where the chromatic number is around 83, this permutation symmetry is truly vast (83!  4  10124 ). The symmetry clearly has important consequences for any algorithm that tries to exploit long-range structure in the fitness landscape such as an evolutionary algorithm using crossover. Although low-cost solutions are moderately correlated with respect to the partition distance, they are not with respect to Hamming distance. Thus, we would expect that any effective crossover operator should minimise the partition distance between the children and the parents. Another feature of graph-colouring problems that involves a large number of colours is the shear size of the search space. One consequence is that local optima tend to be some considerable distance apart. As a result local optima have a relatively large basin of attraction in terms of Hamming distances. The expected Hamming distance between random configurations is very large (precisely n  n=k), which means that even moderate correlation between configurations can be quite significant. This is seen when we perform Galinier and Hao crossover. As the parents tend to be quite far apart, the children are far from the parents and as a consequence can have a much higher cost than their parents. Nevertheless, the children tend to be in the basins of attraction of local optima with a similar cost to their parents. As a consequence, on running a local search algorithm, the child very rapidly reach the cost similar to their parents. This paper examined instances both as a function of the problem size and the number of colours. This allowed us to characterise the landscape in the satisfiable phase, the unsatisfiable phase and at the phase transition. We showed how the time to reach local optima grows exponentially around the phase transition. This is due to the very large plateau regions that occur around the phase transition. Clearly, the choice of search algorithm is likely to be affected by how close the instances are to the phase-transition. In particular, around this region it may not be an effective strategy to wait for a localsearch algorithm to converge to a local optimum, but rather use operators such as crossover to encourage large jumps. At the beginning of the paper we considered statistical properties of average configurations, such as the density of configurations and the auto-correlation. These properties seem to be uncorrelated, with many of the properties of the local minima, and in particular with the phase transition. This perhaps is not too surprising as the mean cost is very considerably higher than the minimum cost (this distance grows as n times the standard deviation in the cost as the system size increases). These bulk properties, however, will be relevant to the efficiency of a local search algorithm, as they measure the local ruggedness and show how likely it is to find a fitter individual as a function of the cost.

Please cite this article as: M.-H. Tayarani-N., A. Prügel-Bennett, Anatomy of the fitness landscape for dense graph-colouring problem, Swarm and Evolutionary Computation (2015), http://dx.doi.org/10.1016/j.swevo.2015.01.005i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

18

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

M.-H. Tayarani-N., A. Prügel-Bennett / Swarm and Evolutionary Computation ∎ (∎∎∎∎) ∎∎∎–∎∎∎

different components. Exploiting these landscape properties for designing successful algorithms remain for future work.

Fig. 28. This shows the two types of terms that contribute to the third cumulant.

Fig. 29. This shows the four types of terms that contribute to the fourth central moment.

It is beyond the scope of this paper to make a comprehensive comparison between graph-colouring and MAX-SAT. We intend in a future work to make a comparison paper when we have studied some other problems. In our opinion graph-colouring is important enough and rich enough to deserve a full paper. Nevertheless, it is worthwhile make a few comments on the similarities between the properties studied in this paper and the same properties for MAXSAT described in [1]. One of the striking features is how many of the figures in the two papers have almost exactly the same form. This is surprising given the very different neighbourhood structures of the two problems. MAX-SAT does not share the colour permutation symmetry. Indeed, it is often only when you look at the partition distance that this similarity is revealed. In particular, the distribution of local optima (Fig. 15), the probability or reaching local minima (Fig. 17b), average distance between local minima at a given cost and their closest global minimum (Fig. 21b), the expected cost of a configuration at a fixed distance from an optimum (Fig. 22), the fitness distance correlation (Fig. 24a) and the return probability (Fig. 27b) all have a very similar behaviour in graph-colouring and MAX-SAT. These properties are all longrange properties of the fitness landscape, which suggests that despite the many differences in the structure of the problem, they nevertheless have a similar long-range structure. This provides some encouragement that algorithms capable of exploiting this type of long-range structure properties might be effective on a wide class of problems. The main goal of this paper was to study the fitness landscape of the problem and report a wide range of its properties. The way these analyses could be exploited to solve the problem remains for future work. However the following comments could be made. First when developing an evolutionary algorithm for the problem, the properties of the fitness landscape should be taken into account, so a more successful algorithm is designed. For example, in Fig. 7, we showed that some problems have plateaux regions and some do not. For instance close to the phase transition, large plateaux appear in the landscape. This means that local search algorithms may get stuck on plateaux, wandering for an improving move. Therefore these algorithms are not recommended for problems close to phase transition or a mechanism should be employed to help the algorithm escape from the plateaux. Fig. 12 shows that at the beginning of the search, the local search algorithms are very successful in rapidly finding better solutions. As the search goes to the bottom of the basins, the plateaux appear. This means that may be using the local search for the beginning and using another algorithm that more easily escapes plateaux (like tabu search) for the later stages could be a better choice. Fig. 24 shows that there is a pattern in the good local optima toward the global optimum. This probably means that there could be ways of exploiting these patterns to estimate the location of global optimum. Finally the analysis shows that different representations or neighbourhood can result in different behaviour of the landscape. Thus for a particular algorithm, a hybrid algorithm can be used that uses more appropriate representations for its

Appendix A. Calculating the cumulants The third cumulant for a graph G is given by   K 3 ðGÞ ¼ E ðcðG; XÞ  cðGÞÞ3 0 !3 1 X ¼ E@ S e A:

ð33Þ

eAE

We can represent the expansion of this term diagrammatically. The variable Se can be represented as a line between an edge of the graph. If we expand the cube of the sum we have terms consisting of three edge variables Se Se0 Se″ (note that the edges do not need to be different). These terms are represented by three edges in G. If any edge connects a vertex not touched by the other edges then it will have zero expectation. Thus the only terms that contribute to K 3 ðGÞ are those shown in Fig. 28. Each triangle that occurs in the graph G occurs 3!-times in the expansion of the cube of the sum. Thus the third cumulant is given by     K 3 ðGÞ ¼ j E j E S3e þ 3!j T j E Sði;jÞ Sðj;kÞ Sðk;iÞ ; ð34Þ where T is the set of three vertex cliques (i.e. triangles or loops of size 3) that exist in the graph G. It is straightforward, but tedious to show     1 1 2 1 1 E S3e ¼ k k k     1 1 ð35Þ E Sði;jÞ Sðj;kÞ Sðk;iÞ ¼ 2 1  : k k Thus, 2

K 3 ðGÞ ¼

k  3k þ 2 k

3

j Ej þ

3!ðk  1Þ k

3

jT j:

ð36Þ

For graphs G drawn from Gðn; pÞ the expected number of triangles in a graph is Eð j T j Þ ¼

p3 nðn  1Þðn  2Þ : 3!

ð37Þ

The diagrams that contribute to the fourth central moment are shown in Fig. 29. The terms that contribute to the fourth central moment are 0 !4 1   X μ ðGÞ ¼ E ðcðG; XÞ  cðGÞÞ4 ¼ E@ Se A 4

    2  ¼ j E j E S4e  3E S2e

eAE

  3  4! j T j E S2ði;jÞ Sðj;kÞ Sðk;iÞ 2  2   þ 4!j Sj E Sði;jÞ Sðj;kÞ Sðk;lÞ Sl;i þ 3j E j 2 E S2e ;

þ

ð38Þ

where S is the set of all loops of length four in the graph G. The combinatorial factors count the number of times each term occurs in the expansion. We note that in the last diagram we have a term     3E S2e E S2e0 for all pairs of edges e and e0 such that e ae0 . By  2 adding a term 3j E j E S2e to these diagrams we can write the total  2 contribution as 3j E j 2 E S2e . In doing so we have over-counted  2 the terms of order j E j by a factor 3E S2e that we subtract from  2 the first term. The term 3j E j 2 E S2e is equal to 3K 22 so that the

Please cite this article as: M.-H. Tayarani-N., A. Prügel-Bennett, Anatomy of the fitness landscape for dense graph-colouring problem, Swarm and Evolutionary Computation (2015), http://dx.doi.org/10.1016/j.swevo.2015.01.005i

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

M.-H. Tayarani-N., A. Prügel-Bennett / Swarm and Evolutionary Computation ∎ (∎∎∎∎) ∎∎∎–∎∎∎

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

fourth cumulant is given by K 4 ðGÞ ¼ μ

 3K 22 4 ðGÞ 

   2  ¼ j E j E S4e  3E S2e   3  4! j T j E S2ði;jÞ Sðj;kÞ Sðk;iÞ 2   þ 4!j Sj E Sði;jÞ Sðj;kÞ Sðk;lÞ Sl;i : þ

Evaluating the expectations we find     1 1 3 3 1 1 þ 2 E S4e ¼ k k k k     1 1 2 2 1 E Sði;jÞ Sðj;kÞ Sðk;iÞ ¼ 2 1  k k  k    1 1 E Sði;jÞ Sðj;kÞ Sðk;lÞ Sl;i ¼ 3 1  : k k

ð39Þ

ð40Þ

Finally, the expected number of loops of 4 vertices for a graph drawn from Gðn; pÞ is Eð j Sj Þ ¼

p4 nðn  1Þðn  2Þ2 : 8

ð41Þ

References [1] A. Prügel-Bennett, M.-H. Tayarani-N., Maximum satisfiability: anatomy of the fitness landscape for a hard combinatorial optimisation problem, IEEE Trans. Evol. Comput. 16 (2012) 319–338. [2] M.-H. Tayarani-N, A. Prugel-Bennett, On the landscape of combinatorial optimization problems, IEEE Trans. Evol. Comput. 18 (3) (2014) 420–434. http://dx.doi.org/10.1109/TEVC.2013.2281502. [3] A. Hertz, B. Jaumard, M.P. de Aragão, Local optima topology for the k-coloring problem, Discrete Appl. Math. 49 (1994) 257–280. [4] J.P. Hamiez, J.K. Hao, An analysis of solution properties of the graph coloring problem, in: Fourth Metaheuristics International Conference, Porto, Portugal, 2001.

19

35 [5] J. Culberson, I. Gent, Frozen development in graph coloring, Theor. Comput. Sci. 265 (2001) 227–264. 36 [6] H. Bouziri, K. Mellouli, E.G. Talbi, The k-coloring fitness landscape, J. Comb. 37 Optim. 21 (3) (2011) 306–329. 38 [7] D.S. Porumbel, J.K. Hao, P. Kuntz, A search space “Cartography” for guiding graph coloring heuristics, Comput. Oper. Res. 37 (4) (2010) 769–778. 39 [8] E.D. Weinberger, Correlated and uncorrelated fitness landscapes and how to 40 tell the difference, Biol. Cyber. 63 (1990) 325–336. 41 [9] T. Jones, Evolutionary algorithms, fitness landscapes and search (Ph.D. thesis), University of New Mexico, Albuquerque, 1995. 42 [10] P. Merz, B. Freisleben, Fitness landscape analysis and memetic algorithms for 43 the quadratic assignment problem, IEEE Trans. Evol. Comput. 4 (4) (2000) 44 337–352. http://dx.doi.org/10.1109/4235.887234. 45 [11] B. Manderick, M. de Weger, P. Spiessens, The genetic algorithm and the structure of the fitness landscape, in: Proceedings of Fourth International 46 Conference on Genetic Algorithms, 1991, pp. 143–150. 47 [12] L. Altenberg, Fitness distance correlation analysis: an instructive counter48 example, in: T. Bäck (Ed.), Proceedings of the Seventh International Conference on Genetic Algorithms, Morgan Kaufmann, San Mateo, CA, 1997, pp. 57–64. 49 [13] L.K. Grover, Local search and the local structure of NP-complete problems, 50 Oper. Res. Lett. 12 (1992) 235–243. 51 [14] P. Stadler, Towards a theory of landscapes, Complex Syst. Binary Netw. (1995) 52 78–163. [15] C.A. Glass, A. Prügel-Bennett, A polynomially searchable exponential neigh53 bourhood for graph colouring, J. Oper. Res. Soc. 56 (3) (2005) 324–330. 54 [16] A. Prügel-Bennett, On the Chromatic Number of Finite Random Graphs, In preparation. Q2 55 [17] P. Galinier, J.K. Hao, Hybrid evolutionary algorithms for graph coloring, J. 56 Comb. Optim. 3 (4) (1999) 379–397. 57 [18] J. Hallam, A. Prügel-Bennett, Large barrier trees for studying search, IEEE 58 Trans. Evol. Comput. 9 (4) (2005) 385–397. [19] W. Benfold, J. Hallam, A. Prügel-Bennett, Optimal parameters for search using 59 a barrier tree Markov model, Theoret. Comput. Sci. 386 (2007) 94–113. 60 [20] P. Stadler, Landscapes and their correlation functions, J. Math. Chem. 20 (1) 61 (1996) 1–45. 62 [21] J. Garnier, L. Kallel, Efficiency of local search with multiple local optima, SIAM J. Discr. Math. 15 (2000) 122–141. 63 [22] C.R. Reeves, A.V. Eremeev, Statistical analysis of local search landscapes, J. 64 Oper. Res. Soc. 55 (7) (2004) 687–693, URL 〈http://www.jstor.org/stable/ 65 4102015〉. [23] K. Boese, A. Kahng, S. Muddu, On the big valley and adaptive multi-start for 66 discrete global optimizations, Oper. Res. Lett. 16 (2). Q4

Please cite this article as: M.-H. Tayarani-N., A. Prügel-Bennett, Anatomy of the fitness landscape for dense graph-colouring problem, Swarm and Evolutionary Computation (2015), http://dx.doi.org/10.1016/j.swevo.2015.01.005i

67