The generalized minimum spanning tree problem: An overview of formulations, solution procedures and latest advances

The generalized minimum spanning tree problem: An overview of formulations, solution procedures and latest advances

ARTICLE IN PRESS JID: EOR [m5G;June 10, 2019;23:28] European Journal of Operational Research xxx (xxxx) xxx Contents lists available at ScienceDir...

1MB Sizes 0 Downloads 36 Views

ARTICLE IN PRESS

JID: EOR

[m5G;June 10, 2019;23:28]

European Journal of Operational Research xxx (xxxx) xxx

Contents lists available at ScienceDirect

European Journal of Operational Research journal homepage: www.elsevier.com/locate/ejor

Invited Review

The generalized minimum spanning tree problem: An overview of formulations, solution procedures and latest advances Petrica˘ C. Pop Department of Mathematics and Computer Science, Technical University of Cluj-Napoca, North University Center of Baia Mare, Baia Mare, Romania

a r t i c l e

i n f o

Article history: Received 16 December 2018 Accepted 13 May 2019 Available online xxx Keywords: Combinatorial optimization Generalized minimum spanning tree problem Survey

a b s t r a c t In this paper, some of the main known results relative to the generalized minimum spanning tree problem are surveyed. The principal feature of this problem is related to the fact that the vertices of the graph are partitioned into a certain number of clusters and we are interested in finding a minimum-cost tree spanning a subset of vertices with precisely one vertex considered from every cluster. The paper is structured around the following main headings: problem definition, variations and practical applications, complexity aspects, integer programming formulations, exact and heuristic solution approaches developed for solving this problem. Furthermore, we also discuss some open problems and possible research directions.

1. Introduction The minimum spanning tree (MST) problem has been recorded in the history of combinatorial optimization for a long time. It was in 1926 that Boruvka first considered it when he came across it during the electrification of the rural areas of Southern Moravia. In this context, he provided a solution for the most economical layout of a power-line network, for further reference see Nesetril and Nesetrilova (2012). There are several aspects underlying the famousness and importance of the MST: we can refer to the efficient solutions which make it practical to solve when dealing with large graphs, polynomial-time algorithms for solving the problem were developed by Kruskal (1956), Prim (1957), Dijkstra (1959) and Sollin (1961), the problem also has many real-life applications, while also having been applied with success to numerous combinatorial optimization problems such as transportation problems, telecommunication network design, distribution systems, etc. However, considering the opinions of numerous researchers, it seems that the extensions of the MST problem prove to be more appealing. These are: the degree-constrained minimum spanning tree problem considered by Narula and Ho (1980), the probabilistic minimum spanning tree problem introduced by Bertsimas (1990), the stochastic minimum spanning tree problem studied by Ishii, Shiode, Nishida, and Namasuya (1981), the quadratic minimum spanning tree problem considered by Xu (1984), the minimum Steiner tree problem described by Maculan (1987), the capacitated minimum spanning tree problem presented by Chandy and Russell (1972), the generalized minimum spanning tree problem E-mail address: [email protected]

© 2019 Elsevier B.V. All rights reserved.

introduced by Myung, Lee, and Tcha (1995), etc. Since the extensions of the MST are mainly NP-hard problems, there are no polynomial-time algorithmic approaches meant to solve them. In this paper our focus falls on the generalized version of the minimum spanning tree problem, also known as the generalized minimum spanning tree problem (GMSTP), described by Myung et al. (1995), and which was defined in the following manner: considering an undirected graph with vertices divided into a given number of subsets (clusters), the GMSTP means finding a minimum-cost tree spanning a subset of vertices with exactly one vertex selected from every cluster. Consequently, the MST represents a special case of the GMSTP with every cluster consisting of precisely one vertex. The considered problem falls under the same category of generalized combinatorial optimization problems. This category of problems naturally generalizes the classical optimization problem, having the following primary features: the vertices of the underlying graph are divided into a certain number of clusters and, when considering the feasibility constraints of the initial problem, these are expressed in relation to the clusters rather than as individual vertices. Some interesting and intensively studied problems belonging to this category are: the generalized traveling salesman problem, see for more information Fischetti, Salazar-Gonzales, and Toth (1995, 1997), the generalized vehicle routing problem, see for more information Ghiani and Improta (20 0 0), Pop, Matei, and Sitar (2013), the partition graph coloring problem, see for more details Demange, Monnot, Pop, and Ries (2014) and Demange, Ekim, Ries, and Tanasescu (2015), etc. For further reference on the category of generalized combinatorial optimization problems we refer to Dror and Haouari (20 0 0), Feremans, Labbe, and Laporte (2003) and

https://doi.org/10.1016/j.ejor.2019.05.017 0377-2217/© 2019 Elsevier B.V. All rights reserved.

Please cite this article as: P.C. Pop, The generalized minimum spanning tree problem: An overview of formulations, solution procedures and latest advances, European Journal of Operational Research, https://doi.org/10.1016/j.ejor.2019.05.017

ARTICLE IN PRESS

JID: EOR 2

[m5G;June 10, 2019;23:28]

P.C. Pop / European Journal of Operational Research xxx (xxxx) xxx

Such trees which have the property of spanning a subset of vertices with one vertex selected from every cluster are called generalized spanning trees. 2.2. Variants of the GMSTP

Fig. 1. A feasible solution to a GMSTP instance.

Pop (2012). Even though the term “generalized” is used in the literature to define this class of problems, we may note that it is misleading since there are many kinds of generalizations of the classical combinatorial optimization problems. We consider that the terms clusterized or selective are more appropriate to define this class of problems. Hereafter the paper is organized according to the following structure: the next section provides a formal definition, four variants of the problem, and some important practical applications of the GMSTP reported in the specialized literature, Section 3 presents complexity aspects of the investigated problem. Integer programming formulations are described and analyzed in Section 4. The next section (Section 5) provides a description of exact algorithms, and leading heuristic approaches for solving the GMSTP are described in Section 6 together with a comparative analysis of the performance of these solution approaches, while in Section 7 some concluding results, as well as further remarks are presented. 2. Definition, variations and practical applications of the GMSTP 2.1. Definition of the GMSTP In this section we present an explicit definition of the generalized minimum spanning treeproblem as a graph theoretic model. We consider an undirected graph G = (V, E ) that consists of a set of vertices V = {v1 , v2 , . . . , vn } and a set of edges defined as follows:

E = {e = {vi , v j } | vi , v j ∈ V, vi = v j }. We shall partition the whole set of vertices into m mutually exclusive nonempty subsets which we call clusters and which are denoted by V1 , . . . , Vm , which means that the following conditions hold: 1. V = V1 ∪ V2 ∪ · · · ∪ Vm 2. Vl ∩ Vp = ∅ for all l, p ∈ {1, . . . , m} and l = p. As it was considered by Pop, Kern, and Still (2006a), we suppose that the graph G is connected. The edges of the graph are classified into two categories: intra-cluster edges which connect vertices belonging to the same cluster and inter-cluster edges which connect vertices belonging to different clusters. We define a cost function c : E → R+ which attaches to each edge e ∈ E a nonnegative cost ce . The generalized minimum spanning tree problem aims for finding a minimum-cost tree spanning a subset of vertices that would include exactly one vertex from every cluster. Fig. 1 illustrates a scheme of the GMSTP defined on an undirected graph with 21 vertices divided into 6 clusters, as well as a feasible solution of the problem.

In the scientific literature, have been considered three variants of the generalized spanning tree problem: one variant in which there are costs (prizes) associated to the vertices, in addition to those attached to the edges, called the prize collecting generalized spanning tree problem and denoted by PC-GMSTP. Another variant aims for finding a minimum cost tree spanning at least one vertex from every cluster and is denoted by L-GMSTP. The other variant is called the generalized hop-constrained minimum spanning tree, denoted GHMSTP, and incorporates some additional restrictions that permit to limit the communication delay of the resulting network in regional or metropolitan area networks. In addition, we propose a novel variant in which we assume that the clusters are not mutually exclusive. Obviously, the prize collecting generalized minimum spanning tree problem reduces to the GMSTP when all the prizes are zero or exactly equal to each other within a cluster of the partition. For this variant of the GMSTP, Pop (2007) described five integer programming formulations, Golden, Raghavan, and Stanojevic (2008) presented some heuristic strategies, where local search and a genetic algorithm were included, and as well as a branch-and-cut algorithm that provides optimal solutions for graphs with up to 200 vertices within two hours of CPU time. Recently, Pop, Matei, Sitar, and Danciulescu (2017) described a new method meant to solve the PC-GMSTP. This solution approach was achieved by decomposing the problem in question into two smaller subproblems. Thus a macro-level subproblem and a micro-level subproblem were obtained and were solved individually. The purpose of the former was to provide trees spanning the clusters, called global trees, using a diploid genetic algorithm, while the purpose of the latter was to identify the minimum-cost tree, corresponding to the global trees mentioned above, spanning exactly one vertex from each cluster. Ihler, Reich, and Widmayer (1999) and Dror, Haouari, and Chaouachi (20 0 0) introduced separately the L-GMSTP. Dror et al. (20 0 0) have shown that the L-GMSTP is N P -hard and suggested a genetic algorithm that achieved solutions that were on average 6.53% from optimality. Ihler et al. (1999) showed that the decision version of the L-GMSTP is N P -complete even if the underlying graph is a tree. Shyu, Yin, Lin, and Haouari (2003) proposed an ant colony approach which offers results that are comparable to those reported by Dror et al. (20 0 0). Duin and Vos (2004) showed that the L-GMSTP fits into the framework of Group Steiner Problem (GSP) and therefore the transformation of the GSP into the wellknown undirected Steiner problem on graphs (SPG) can be used. The same authors used an exact SPG solver that outperformed the genetic algorithm proposed by Dror et al. (20 0 0). Haouari, Chaouachi, and Dror (2005) described an exact branch-and-bound algorithm that was combined with a specialized pre-processing algorithm. Pop, Marc, and Sitar (2006b) presented integer programming formulations for the L-GMSTP. Considering the situation where the cost matrix associated to the edges of the graph satisfies the triangle inequality, then evidently, the optimal solution of the L-GMSTP contains exactly one vertex from each cluster. Thus, for this situation, the GMSTP and L-GMSTP have the same optimal solution, we refer to Feremans, Labbe, and Laporte (2001) for further information. Leitner (2016) introduced a different variant of the GMSTP and its main feature is that it incorporates some specific restrictions that permit to limit the communication delay of the resulting network in regional or metropolitan area networks. The same author

Please cite this article as: P.C. Pop, The generalized minimum spanning tree problem: An overview of formulations, solution procedures and latest advances, European Journal of Operational Research, https://doi.org/10.1016/j.ejor.2019.05.017

ARTICLE IN PRESS

JID: EOR

[m5G;June 10, 2019;23:28]

P.C. Pop / European Journal of Operational Research xxx (xxxx) xxx

3

Fig. 2. An illustration of the selective minimum spanning tree problem and two feasible solutions.

described four integer linear programming models of the problem, a polyhedral comparison of the proposed formulations and developed branch-and-cut approaches based on the provided models. A different variant worth investigating is obtained when we assume that the clusters are not mutually exclusive. We will call this variant Selective Minimum Spanning Tree Problem, denoted by SMSTP. In this case we suppose that the vertices are grouped within clusters such that each vertex may belong to one or more clusters and the objective is to find a minimum cost tree spanning a subset of vertices that includes exactly one vertex from each cluster. Obviously, this variant is an extension of the GMSTP. Fig. 2 illustrates an example of the SMSTP with 27 vertices grouped within eight clusters denoted by V1 , . . . , V8 . The vertices may satisfy different conditions: clusters included within larger clusters (see clusters V2 and V1 ), clusters without intersections with other clusters (see cluster V5 ) and clusters with non-empty intersection (see clusters V6 , V7 and V8 , or V4 and V7 ). We can observe that in the case of the SMSTP the feasible solutions may have a different number of edges, while in the case of GMSTP the number of edges is always m − 1. 2.3. Practical applications of the GMSTP The generalized minimum spanning tree problem provides an attractive way of modeling various real world applications. Thus here are a few: in the field of telecommunications, in identifying the position of regional service centers such as stores, warehouses or distribution centers, agricultural settings, energy transportation, physics, etc. In the area of telecommunication networks, it is frequently necessary to design backbone networks that are cost-effective in order to link sets of local area networks (LANs). This can be realized using different network designs, usually being considered tree network structures in order to decrease the number of telecommunication facilities and the required equipment necessary to build the connections between all the LANs. Usually, there are one or more telecommunication centers which enable the connection of every LAN to the backbone network. These telecommunication centers serve as gateways between users and other LANs. Even though every LAN has many possible candidates that may serve as gateways, the telecommunication costs (facilities, equipments, etc.) motivate the requirement in terms of design which stipulates that a single gateway should be used for each LAN. This problem of designing the least expensive regional backbone network, having a tree structure and spanning exactly one gateway from every LAN can be modeled as a GMSTP. In some cases, the impact of different potential gateways in any LAN has to be taken into consideration. In this case, the telecommunication centers within the same LAN are competing to be chosen as gateways and every telecommunication center offers a certain compensation (prize) if it is selected. This particular telecommunication network design can be modeled as

a PC-GMSTP. For more information on these applications we refer to Gamvros, Golden, Raghavan, and Stanojevic (2005) and Myung et al. (1995). A similar application occurs in the case of metropolitan area network (MAN), where a metropolitan area is partitioned into districts in order to plan some operations and the aim is to design a tree backbone that connects the districts such that exactly one telecommunication center is used in each district. Obviously, this problem can be modeled as a GMSTP. Additionally, if the designed backbone network has to fulfill constraints regarding the quality-of-service that restricts the maximum number of intermediate routers along communication paths, then the problem can be reduced to the generalized hop-constrained minimum spanning tree, we refer to Leitner (2016) for more information. Another interesting application of the GMSTP consists in determining the location of the regional service centers such as public facilities, stores, warehouses or distribution centers, see Myung et al. (1995), Pop (2002) and Shyu et al. (2003). For example, suppose that a company wants to establish marketing centers, one for each divided market and to connect the selected marketing centers by building links. The problem faced by the company is to find a minimum cost tree spanning a subset of marketing centers which includes exactly one from every market segment. In the case of weighted multiple attributed graphs (WMAG) that are able to capture and represent the properties of large real data efficiently, Abilasha and Mohan (2016) addressed the problem of designing networks by specifying attribute constraints that restrict the search to a set of vertices which have their attribute features within a defined range and structural restrictions that restrict how these vertices are linked. The problem considers m-constraint queries that remodels the large data graph into a graph with the vertices partitioned into m number of sets and the problem of finding m-minimally separated vertices satisfying the considered k constraints from a WMAG evidently is reduced to the GMSTP. He, Zhang, Huang, Shi, and Cao (2012) pointed out an interesting application in the area of cloud computing which is a hot and challenging research topic nowadays. Distributing replicas to multiple geographically dispersed clouds may bring risks either in storage or during transmission. The scheme provided by He et al. (2012) called Distributed Multiple Replicas Data Possession Checking tries to identify an optimal spanning tree in order to define the partial order of scheduling multiple replicas data possession checking. This problem is closely related to the GMSTP. Starting from the bi-objective insular traveling salesman problem with maritime and ground transportation costs defined recently by Miranda, Blazquez, Obreque, Maturana-Ross, and Gutierrez-Jarpa (2018), we define a new application of the GMSTP as follows. Given a set of islands with the property that each of them has a specified number of ports or docks, we are interested in designing a communication network that connects all the islands with the property that from each island exactly one port or

Please cite this article as: P.C. Pop, The generalized minimum spanning tree problem: An overview of formulations, solution procedures and latest advances, European Journal of Operational Research, https://doi.org/10.1016/j.ejor.2019.05.017

JID: EOR 4

ARTICLE IN PRESS

[m5G;June 10, 2019;23:28]

P.C. Pop / European Journal of Operational Research xxx (xxxx) xxx

dock is selected. The problem thus defined aims at finding a minimum cost tree spanning a subset of ports or docks which includes exactly one from every island. Dror et al. (20 0 0) approached the field of agriculture and described a compelling application for irrigation in desert environments. They considered a set of parcels having the shape of a polygon that should be irrigated from a common source of water. Each parcel contains a given number of vertices and the objective is to design an irrigation network of minimal length which links at least one vertex from each parcel to the source of water, assuming that irrigation networks do not cross any parcel and lie on the boundary lines of the parcels. The considered irrigation problem can be modeled by considering a graph whose vertices are partitioned into a number of sets and the edges defined as the boundary lines of different parcels. We can observe that this problem has some vertex sets with non-empty intersection corresponding to the parcels sharing a boundary, that can be transformed easily into L-GMSTP by substituting each vertex v ∈ Vl ∩ Vr with l, r ∈ {1, . . . , m} by two vertices vl ∈ Vl and vr ∈ Vr and by adding a new edge of cost 0 between vl and vr . The area of applications was extended to physics by Kansal and Torquato (2001), who developed a general method through which the data obtained from a given network can be applied for the understanding of the physical process that has lead the information of the specific structure.

Considering the undirected graph G = (V, E ), for S ⊂ V, we commonly define E(S) the set of edges in E with both endpoints in S and the cutset δ (S) the set of edges in E with exactly one endpoint in S:

E (S ) = {e = {i, j} ∈ E | i ∈ S, j ∈ S} and

δ (S ) = {e = {i, j} ∈ E | i ∈ S, j ∈/ S}. We also consider the directed graph D = (V, A ) associated to the undirected graph G = (V, E ), where A is the arc set and contains two arcs (i, j) and (j, i) for each edge {i, j} ∈ E. Similarly, for S ⊂ V, the corresponding notions when considering the directed graph are defined hereafter:

A(S ) = {(i, j ) ∈ A|i, j ∈ S},

δ + (S ) = {(i, j ) ∈ A|i ∈ S, j ∈/ S} and δ (S ) = {(i, j ) ∈ A | i ∈/ S, j ∈ S}. −

In order to keep things simple, we denote by δ + (i ) and δ − (i ) instead of δ + ({i} ) and δ − ({i} ). With the purpose of formulating the GMSTP as an integer program, we introduce the following binary variables:



xe = xi j =

 zi = wi j =

Case 1. Considering |Vk | = 1, for all k = 1, . . . , m the GMSTP is trivially reduced to the classical MST problem, which can be solved in polynomial time. Case 2. If m = 1, then the solution consists of a single vertex. Case 3. If m = 2, then the solution consists of a single edge of minimal costs connecting the subsets of vertices. Case 4. Considering that the number of clusters m is fixed then the GMSTP can be solved in polynomial time (in the number of nodes n). In this case, based on dynamic programming, Pop (2002) described a polynomial time procedure which solves the GMSTP. Case 5. When the GMSTP is defined on trees and the number of leaves is bounded, the problem can be solved in polynomial time. 4. Integer programming formulations of the GMSTP In the scientific literature, there have been introduced various types of linear integer programming formulations of the GMSTP. Before describing, analyzing and comparing them, some technical definitions, as well as notations are presented.

if the edgee = { i, j} ∈ E is used in the solution otherwise

if the vertex i is used in the solution otherwise



3. Complexity aspects of the GMSTP It has been proven by Myung et al. (1995) that GMSTP is an N P -hard problem. This was carried out by reduction from the vertex cover problem. For some particular combinatorial optimization problems, Garey and Johnson (1979) have proven that the simple structure of trees can provide algorithmic advantages for solving them efficiently. Actually, a series of N P -complete problems, when formulated on a general graph, turn out to be polynomially solvable in the case when the underlying graph has a tree structure. Unfortunately, this does not apply for the GMSTP. Pop (2002), upon reduction from the set cover problem, has proven that the GMSTP, even when defined on trees, is N P -hard, which actually represents a stronger result regarding the complexity of the investigated problem. Although GMSTP is N P -hard, there exist some situations in which the problem can be solved efficiently in polynomial time, see for more details Dror et al. (20 0 0) and Pop (2012).

1 0

1 0

1 0

if the arc(i, j ) ∈ A is used in the solution otherwise.

The following vector notations are used: x = (xi j ), z = (zi ), w =  x , for E ⊆ E, z(V ) = {i, j}∈E i j   w , for A ⊆ A. z , for V ⊆ V and w (A ) = i∈V i (i, j )∈A i j

(wi j ) and the notations: x(E ) =

4.1. Formulations based on tree properties Generalized subtour elimination formulation (Myung et al., 1995) Given the undirected graph G = (V, E ), a feasible solution of the GMSTP can be seen as a graph without subtours (cycles), with m − 1 edges, with one vertex chosen from each cluster and linking all the clusters. Hence, we can express the GMSTP as the following linear integer programming problem:

min



ce xe

e∈E

s.t. z(Vk ) = 1, x(E (S )) ≤ z(S − i ),

∀ k ∈ K = {1, . . . , m} ∀ i ∈ S ⊂ V, 2 ≤ |S| ≤ n − 1

x (E ) = m − 1

(1)

(2)

(3)

xe ∈ {0, 1},

∀e∈E

(4)

zi ∈ {0, 1},

∀ i ∈ V.

(5)

In order to keep things simple, the notation S − i was used instead of Sࢨ{i}. As one can see in the above formulation, through constraints (1) we assure that we choose exactly one vertex from each cluster, we use constraints (2) in order to eliminate all the subtours and through constraint (3) we guarantee that the subgraph which we selected has m − 1 edges. Constraints (2) are named as generalized subtour elimination constraints. This is an integer programming formulation which was introduced by

Please cite this article as: P.C. Pop, The generalized minimum spanning tree problem: An overview of formulations, solution procedures and latest advances, European Journal of Operational Research, https://doi.org/10.1016/j.ejor.2019.05.017

ARTICLE IN PRESS

JID: EOR

[m5G;June 10, 2019;23:28]

P.C. Pop / European Journal of Operational Research xxx (xxxx) xxx

Myung et al. (1995) and its name is generalized subtour elimination formulation. We shall use Psub to denote the feasible set of the linear programming relaxation of this model, where we replaced the constraints (4) and (5) by 0 ≤ xe , zi ≤ 1, for all e ∈ E and i ∈ V. Generalized cutset formulation (Myung et al., 1995) The subtour elimination constraints (2) can be replaced by the connectivity constraints (6), the outcome being the following linear integer programming problem which Myung et al. (1995) introduced and named generalized cutset formulation:

min



ce xe

e∈E

s.t. (1 ), (3 ), (4 ), (5 ) and x(δ (S )) ≥ zi + z j − 1,

∀ i ∈ S ⊂ V, j ∈/ S, 1 ≤ |S| ≤ n − 1 (6)

We shall use Pcut to denote the feasible set of the linear programming relaxation of this model. Myung et al. (1995) showed the following relation: Psub ⊂ Pcut . Generalized multicut formulation (Pop, 2002). The following model, known as the generalized multicut formulation, is the outcome of the replacement of the simple cutsets with multicuts. Considering the following partition of the vertices V = C0 ∪ C1 ∪ · · · ∪ Ck , the multicut δ (C0 , C1 , . . . , Ck ) is defined as the set of edges which connects different Ci and Cj . The GMSTP can be expressed as a linear integer programming problem presented hereafter:

min



ce xe

4.2. Formulations based on arborescence properties Let us consider the directed graph D = (V, A ) as outcome of the replacement of each edge e = {i, j} ∈ E with the opposite arcs (i, j) and (j, i) in A with the same cost as the edge {i, j} ∈ E. The directed version of the GMSTP, which Myung et al. (1995) introduced, known as the Generalized Minimum Spanning Arborescence problem, is defined on a directed graph D rooted at a certain cluster, say V1 without loss of generality. This problem aims in establishing a minimum cost arborescence which contains exactly one vertex from each cluster. In this section we are going to present two formulations, which were in turn described be Feremans et al. (2002). Directed generalized cutset formulation (Feremans et al., 2002) We first describe a directed generalized cutset formulation of the GMSTP. With regard to this model, we consider, without any loss when it comes to generality, that the cluster V1 is the root of the directed graph D and we denote by K1 = K \ {1}.

min



ce xe

e∈E

x(δ (C0 , C1 , . . . , Ck )) ≥

k 

zi j − 1,

x (E ) = m − 1 w(δ − (S )) ≥ zi ,

(7)

We shall use Pmcut to denote the feasible set of the linear programming relaxation of this model. Evidently, Pmcut ⊆Pcut , in addition, Pop (2002) proved that Psub = Pmcut . Cluster subpacking formulation (Feremans, Labbe, & Laporte, 2002)) We may strengthen the generalized subtour formulation of the GMSTP by replacing the subtour elimination constraints (2) with the cluster subpacking constraints introduced by Feremans et al. (2002):

x(E (S )) ≤ z(S \ Vk ),

∀S ⊂ V, 2 ≤ |S| ≤ n − 1, k ∈ K.

x(E (S )) ≤ z(S \ Vk ) = z(S ) − 1, where S = S ∪ Vk . and provided the cluster subpacking formulation of the GMSTP:



ce xe

∀ i ∈ V1 , j ∈/ V1

(10)

wi j + w ji = xe ,

∀ e = {i, j} ∈ E

(11)

s.t. (1 ), (3 ), (4 ), (5 ) and

∀ S ⊂ V, 2 ≤ |S| ≤ n − 1, |{k : Vk ⊆ S}| = 0.

(12)

When referring to this model, constraints (9) and (10) assure the existence of a path from the chosen root vertex from V1 to any other chosen vertex which contains the chosen vertices exclusively. We shall use Pdcut to denote the projection of the feasible set of the linear programming relation of this formulation into the (x, z)space. Myung et al. (1995) considered another possible directed generalized cutset formulation. They obtained it by replacing constraint (3) with the following constraints:

w(δ − (V1 )) = 0

(13)

∀ k ∈ K1 .

(14)

Directed subpacking formulation (Feremans et al., 2002) We present now the formulation of the GMSTP based on branchings introduced by Feremans et al. (2002). As in the previous formulation, we consider the digraph D = (V, A ) with V1 chosen as the cluster root. Consequently, the GMSTP can be expressed as the following linear integer programming problem, named as directed subpacking formulation:

min

e∈E

(9)

wi j ≤ zi ,

w(δ − (Vk )) ≤ 1,

The same authors pointed out that these constraints are dominated by:

min

∀ i ∈ S ⊆ V \ V1

x, z, w ∈ {0, 1}.

j=0

∀ C0 , C1 , . . . , Ck node partitions of V and ∀ i j ∈ C j for j = 0, 1, . . . , k.

∀ k ∈ K = {1, . . . , m}

s.t. z(Vk ) = 1,

e∈E

s.t. (1 ), (3 ), (4 ), (5 ) and

5



ce xe

e∈E

∀ k ∈ K = {1, . . . , m}

s.t. z(Vk ) = 1,

x(E (S )) ≤ z(S ) − 1,

(8)

The cluster subpacking constraints (8) assure that the number of edges chosen from any subset of vertices S with S ⊂ V, 2 ≤ |S| ≤ n − 1, cannot be higher than the number of vertices chosen from that set minus 1. We shall use Pspack to denote the feasible set of the linear programming relaxation of the cluster subpacking formulation.

x (E ) = m − 1 w(A(S )) ≤ z(S − i ), w(δ − ( j )) = z j , wi j + w ji = xe ,

∀ i ∈ S ⊂ V, 2 ≤ |S| ≤ n − 1

∀ j ∈ V \ V1 ∀ e = {i, j} ∈ E

(15)

(16)

x, z, w ∈ {0, 1}.

Please cite this article as: P.C. Pop, The generalized minimum spanning tree problem: An overview of formulations, solution procedures and latest advances, European Journal of Operational Research, https://doi.org/10.1016/j.ejor.2019.05.017

ARTICLE IN PRESS

JID: EOR 6

[m5G;June 10, 2019;23:28]

P.C. Pop / European Journal of Operational Research xxx (xxxx) xxx

Let us consider Pdspack as the projection of the feasible set of the linear programming relation of this formulation into the (x, z)space. Evidently, Pdspack ⊆Psub and Pspack = Pdspack . The meaning of this is that a simple undirected formulation can be as tight as a direct formulation, even though the variables are fewer. The following result was proved by Feremans et al. (2002): Pdspack = Pdcut ∩ Psub . A different proof of this result was presented by Pop (2002).

One unit of commodity k originates from V1 and has to be delivered to the cluster Vk . If we consider fikj as the flow for commodity k in arc (i, j), what we get is a compact mixed linear integer programming model:

min



ce xe

e∈E

∀ k ∈ K = {1, . . . , m}

s.t. z(Vk ) = 1,

x (E ) = m − 1    ( m − 1 )zi fe − fe = −z i + − e∈δ ( i )

e∈δ ( i )

∀ k ∈ K = {1, . . . , m}

s.t. z(Vk ) = 1, x (E ) = m − 1

All of the above-presented formulations have an exponential number of constraints. The formulations considered hereafter all have just a polynomial number of constraints but an extra number of variables. With the purpose of giving compact formulations for the GMST problem, we can opt for introducing ‘auxiliary’ flow variables beyond the natural binary variables attached to the edges and vertices of graph. Our intention is to send a flow between the graph vertices and consider the edge variable xe as indicating whether the edge e ∈ E is able to carry any flow or not. In this regard, we describe four such flow formulations: a single commodity model, a multicommodity formulation, a bidirectional flow model, and a flow cut formulation. In every model considered here, the flow variables will be directed despite the fact that the edges are undirected. In other words, for every edge {i, j} ∈ E there will be flow in both directions i to j and j to i. Single commodity flow formulation (Pop, 2002) The characteristic of the single commodity formulation is that the source cluster V1 sends one flow unit to each cluster. If fij denotes the flow on edge e = {i, j} in the direction i to j, then we have obtained the following compact mixed linear integer programming formulation:

min

ce xe

e∈E

4.3. Flow based formulations





for i ∈ V1 for i ∈ V \V1

(17)

f i j ≤ ( m − 1 )xe ,

∀ e = {i, j} ∈ E

(18)

f ji ≤ (m − 1 )xe ,

∀ e = {i, j} ∈ E

(19)

fi j , f ji ≥ 0,

∀ e = {i, j} ∈ E

(20)

x, z ∈ {0, 1}. For this particular formulation, the mass balance constraints (17) suggest that the subgraph defined by any solution (x, z) has to be connected. Considering the fact that constraints (1) and (3) assure that the subgraph defined by any solution has m − 1 edges and one vertex from each cluster, the logical conclusion is that every possible solution has to be a generalized spanning tree. Thus, at projection into the space of the (x, z) variables, the GMSTP is correctly modeled by this formulation. We shall use Pf low to denote the projection of the feasible set of the linear programming relaxation of this formulation into the (x, z)-space. Multicommodity flow formulation (Myung et al., 1995) When taking into consideration multicommodity flows, we obtain a stronger relaxation. Myung et al. (1995) introduced this directed multicommodity flow formulation. Specific to this formulation is the fact the every vertex set k ∈ K1 defines a commodity.



fak −

a∈δ + ( i )

a∈δ − ( i )

 fak =

zi , −zi , 0,

i ∈ V1 i ∈ Vk , k ∈ K1 i∈ / V1 ∪ Vk

(21)

fikj ≤ wi j ,

∀ a = (i, j ) ∈ A, k ∈ K1

(22)

wi j + w ji = xe ,

∀ e = (i, j ) ∈ E ∀ a = (i, j ) ∈ A, k ∈ K1

(23)

fak

≥ 0,

x, z ∈ {0, 1}. We shall use Pmc f low to denote the projection of the feasible set of the linear programming relaxation of this formulation into the (x, z)-space. Pop (2002) proved that the following relation holds between the feasible sets of the linear programming relaxation of these two flow formulations: Pmc f low ⊆ Pf low . Bidirectional flow formulation (Pop, 2012) By eliminating the variables wi j , the outcome is a closely related formulation. This formulation consists of the constraints: (1), (3), (21), (24) and the following:

fihj + fikj ≤ xe ,

∀ h, k ∈ K1 and ∀ e ∈ E.

(24)

This model is called the bidirectional flow formulation of the GMSTP. We shall use Pbdf low to denote the set of feasible solutions in the (x, z)-space. We notice that once the variables wa are eliminated when creating the bidirectional flow formulation, this model is defined on the undirected graph G = (V, E ), even if for every commodity k we allow the flows fikj and f jik in both directions on the edge e = {i, j}. In the bidirectional flow formulation, constraints (24), known as the bidirectional flow inequalities, connect the flow of different commodities flowing in different directions on the edge {i, j}. These constraints model the next aspect: if the edge {i, j} is eliminated and the vertices are divided into two sets, any commodity whose associated vertices both lie in the set without the root, both flow on the edge {i, j} in the same direction. This is valid for any feasible generalized spanning tree. Consequently, whenever two commodities h and k both flow on the edge {i, j}, their direction is the same, and thus one of fihj and fikj equals zero. Pop (2012) proved that the following relation holds between the feasible sets of the linear programming relaxations of the multicommodity flow formulation and bidirectional flow formulation: Pmc f low = Pbdf low . Flow cut formulation (Feremans et al., 2002) The flow cut formulation is closely related to the multicommodity flow formulation. There is an important difference between the two, and that is that the flow balance constraints (21) and constraints (22) are supplanted by a direct cutset constraint using Gale (1957) necessary and sufficient feasibility condition for the existence of a feasible flow.

min



ce xe

e∈E

s.t. z(Vk ) = 1, wi j + w ji = xe w(δ − (V1 )) = 0

∀ k ∈ K = {1, . . . , m} ∀ e = {i, j} ∈ E (25)

Please cite this article as: P.C. Pop, The generalized minimum spanning tree problem: An overview of formulations, solution procedures and latest advances, European Journal of Operational Research, https://doi.org/10.1016/j.ejor.2019.05.017

ARTICLE IN PRESS

JID: EOR

[m5G;June 10, 2019;23:28]

P.C. Pop / European Journal of Operational Research xxx (xxxx) xxx

w(δ (Vk )) ≤ 1

∀k ∈ K1



w(δ − (S )) ≥ z(Vk ∩ S ) − z(V1 ∩ S ),

(26)

∀S ⊂ V,

1 ≤ |S| ≤ n − 1, k ∈ K1 x, z ∈ {0, 1}.

(27)

The directed cutset constraints (27) together with constraints (1), (11), (25) and (26) ensure tree design and connectivity amongst all clusters. This model is known as the flow cut formulation of the GMSTP, and we shall use Pfcut to denote the set of feasible solutions in the (x, z)-space. The relationship between the polyhedron defined by the linear relaxation of the flow cut formulation of the GMSTP and some of the already defined polyhedrons is presented next:

Pspack = Pdspack = Pf cut = Pmc f low = Pbdf low . We refer to Feremans et al. (2002) and Pop (2002) for more details. Steiner tree formulation (Raghavan, 2002) The Steiner tree formulation, introduced by Raghavan (2002), is a multicommodity flow formulation of the GMSTP, as GMSTP is transformed into a Steiner tree problem with degree constraints. The transformation of the GMSTP into a Steiner tree problem is carried out according to next steps: sr is considered an artificial root node and T = {t1 , t2 , . . . , tk } is a set of artificial sink vertices which need to appear in the Steiner tree. The artificial node sr is linked to each vertex i ∈ V with zero cost arc (sr , i), and all the vertices i ∈ Vk of a certain cluster k = 1, . . . , m are linked with zero cost arc (i, tk ) to the matching artificial vertex tk . In this new graph, a set of commodities H is also described in such a way that there exists one single commodity h ∈ H with origins in vertex sr and ending at vertex tk for each k = 1, . . . , m. The Steiner tree model of the GMSTP is described as follows:

min

 e∈E

s.t.

ce xe 

∀ k ∈ K1

wi j = 1,

∀(i, j ) ∈ A

wi j + w ji ≤ 1,

fah −

a∈δ + ( i )

named the local-global approach approach and it decomposes the problem into two logical, natural subproblems: a macro-level subproblem (i.e., global) and a micro-level subproblem (i.e., local). The method was successfully applied with the aim of providing solution approaches for solving the GMSTP and other generalized combinatorial optimization problems, e.g., Hintsch and Irnich (2018), Expósito-Izquierdo, Rossi, and Sevaux (2016), etc. We denote by Gglobal = (Vglobal , Eglobal ) the graph achieved from G = (V, E ) after replacing all vertices of a cluster Vi with a supervertex representing Vi . This graph will be called the global graph and it is defined as follows: 1. Each cluster Vi of G is considered as a vertex of Vglobal , with |Vglobal | = m. 2. Edges of the graph Gglobal are defined between each pair of the graph vertices V1 , . . . , Vm . In order to make it more convenient, Vi is identified with the supervertex that represents it. We consider the binary variables yij , i, j ∈ {1, . . . , m}, in order to define the global connections. Thus yi j = 1 if cluster Vi will be linked to cluster Vj and yi j = 0 otherwise, and we presume that y depicts a spanning tree. The convex hull of all these y-vectors is called the spanning tree polytope associated to the global graph with vertex set {V1 , . . . , Vm } which we presume to be complete. In accordance with Yannakakis (1991) this polytope, denoted by PMST , will be described through the following polynomial number of constraints:



yi j = m − 1

{i, j}

yi j = 

λki j + λk ji , for 1 ≤ k, i, j ≤ m and i = j

λki j = 1,

(32)

for 1 ≤ k, i, j ≤ m and i = k

(33)

for 1 ≤ k, j ≤ m

(34)

j

(28)

(i, j )∈A:i∈/Vk , j∈Vk



7

 a∈δ − ( i )

0 ≤ fihj ≤ wi j ,

 fah =

−1, 1, 0,

(29)

if i = O(h ) if i = D(h ) ∀i ∈ V and h ∈ H (30) otherwise

∀ (i, j ) ∈ A, h ∈ H

(31)

w ∈ {0, 1}. The Steiner tree formulation of the GMSTP coincides with the standard multicommodity flow model for the Steiner tree model with the additional degree constraints (28) which guarantee an indegree of 1 for every cluster Vk with k ∈ K. This formulation guarantees the construction of a minimum-cost directed Steiner tree with the root vertex sr , all the vertices in T, and exactly one vertex from every cluster. In Raghavan (2002), Raghavan has proved that the Steiner tree model is equivalent to the multicommodity flow formulation. 4.4. The local-global formulation of the GMSTP (Pop, 2002) An important and interesting formulation was introduced by Pop (2002) and its purpose is to distinguish between global connections, i.e inter-cluster connections, and local ones, i.e., connections amongst vertices in various clusters. This approach was

λkk j = 0, yi j , λki j ≥ 0,

for 1 ≤ k, i, j ≤ m.

where the binary variables λkij are defined for every triple of nodes k, i, j, with i = j = k and their value for a spanning tree is:



λki j =

1

if j is the parent ofiwhen we root the tree at k

0

otherwise.

The constraints (32) actually indicate that an edge {i, j} is in the spanning tree if and only if either i is the parent of j or j is the parent of i. The constraints (33) indicate that if a tree is rooted at k, then each vertex except vertex k has a parent. Constraints (34) indicate that root vertex k has no parent. If the vector y describes a spanning tree on the global graph, the corresponding best (w.r.t. minimization of the costs) local solution (x, z) ∈ {0, 1}|E| × |V| can be obtained by one of the following two methods: using dynamic programming, see for further details Section 5.2 or by solving the following integer linear programming problem:

min



ce xe

e∈E

s.t. z(Vk ) = 1, x(Vl , Vr ) = ylr , x(i, Vr ) ≤ zi , xe , zi ∈ {0, 1}, where x(Vl , Vr ) =



∀ k ∈ K = {1, . . . , m} ∀ l, r ∈ K = {1, . . . , m}, l = r ∀ r ∈ K, ∀ i ∈ V \ Vr ∀ e = (i, j ) ∈ E, ∀ i ∈ V,

i∈Vl , j∈Vr

xi j and x(i, Vr ) =



j∈Vr

xi j .

Please cite this article as: P.C. Pop, The generalized minimum spanning tree problem: An overview of formulations, solution procedures and latest advances, European Journal of Operational Research, https://doi.org/10.1016/j.ejor.2019.05.017

ARTICLE IN PRESS

JID: EOR 8

[m5G;June 10, 2019;23:28]

P.C. Pop / European Journal of Operational Research xxx (xxxx) xxx

For given y, Pop et al. (2006a) showed the following results: if y is the 0–1 incidence vector of a spanning tree of the global graph then the polyhedron Plocal (y) is integral, where we use Plocal (y) to denote the feasible set of the linear programming relaxation of this model. With the aim of proving this outcome, Pop et al. (2006a) proved that every solution of the linear programming relaxation may be expressed as a convex combination of solutions  corresponding to spanning trees: (x, z ) = λT (xT , zT ). Considering all of the above observations, we have come to a final formulation, known as local-global formulation of the GMSTP as an 0–1 mixed integer programming problem, where just the global variables y are forced to take integer values:

min



s.t. (x, z ) ∈ Plocal (y ) y ∈ PMST ylr ∈ {0, 1},

∀ 1 ≤ l, r ≤ m.

This novel formulation of the GMSTP was achieved through the incorporation of the constraints characterizing PMST , with y ∈ {0, 1}, into Plocal (y). The local-global formulation of the GMSTP employs three types of variables to describe the links amongst the clusters within the graph. The first group of variables, which was employed in the prior formulations, are indicator edge variables, xij . These variables are used in order to describe the employment of specific edges amongst vertices in the graph. The other two groups of variables ylr and λklr describe links amongst clusters in the global graph. The variables ylr point out if the solution includes any of the edges directly linking clusters Vl and Vr . The variables λklr are employed in order to describe m directed trees, namely one for every cluster in the global graph. 4.5. The multigraph formulation of the GMSTP (de Sousa, de Andrade, & Santos, 2018) The multigraph formulation of the GMSTP was introduced recently by de Sousa et al. (2018) and considers the multigraph M (G ) = (V , E ) associated to the undirected graph G = (V, E ) and defined as follows: 1. each cluster of G is considered as a vertex of V , with |V | = m, 2. each edge e ∈ E corresponds to an exactly an edge e ∈ E with the same cost ce = ce and with |E | = |E |. The multigraph model is meant to describe arborescences, one for vertex V as a root in such a manner that every route dictates an orientation of the edges in the solution, in accordance with the Adasme et. al. characterization theorem for forest in graphs Adasme, Andrade, Letournel, and Lisser (2015). We shall use μ(v ) to denote the cluster containing the vertex v ∈ V . Given an orientation away from the root cluster, we represent an edge e = {i, j} ∈ E as an ordered pair e = i, j or equivalently e = μ(i ), μ( j ) ∈ E . In addition to the already introduced binary variables xe , e ∈ E and zv , v ∈ V, de Sousa et al. (2018) considered for each vertex k ∈ V of the multigraph M(G) and each edge e ∈ E the following binary variables:



1

= 0



1

1 wk, e =

0

min



if xe = 1 and the edge e = i, j is oriented from cluster μ( j )to cluster μ(i ) otherwise. if xe = 1 and the edge e = i, j is oriented from cluster μ(i )to clusterμ( j ) otherwise.

ce xe

e∈E

s.t. (1 ), (3 ), (4 ), (5 ) and

∀ e ∈ E , ∀ k ∈ V

(35)

∀ k ∈ V , ∀ u ∈ V \ {k}

(36)

0 k,1 xe − wk, e − we = 0,



0 wk, e −

e∈E ,i=u



ce xe

e∈E

0 wk, e

The GMSTP can be formulated as the following compact linear integer programming problem, named as multigraph formulation:

e∈E ,i=s



1 wk, e = 1,

e∈E , j=u

0 wk, μ(i ),μ( j ) +

 e∈E , j=s

1 wk, μ(i ),μ( j ) = zs ,

∀ k ∈ V , ∀ u ∈ V \ {k}, ∀s ∈ Vu

(37)

0 wk, e = 0,

∀ k ∈ V : k = i, ∀ e ∈ E

(38)

1 wk, e = 0,

∀ k ∈ V : k = j, ∀ e ∈ E

(39)

∀ e ∈ E , ∀ k ∈ V

(40)

0 k,1 wk, e , we ∈ {0, 1},

In this model, constraints (35) imply that either the variable 0 k,1 wk, e = 1 or the variable we = 1 will describe the orientation of the edge e ∈ E w.r.t. every vertex k ∈ V if e belongs to the solution. Constraints (36) ensure that for a given vertex (cluster) k ∈ V and for every vertex (cluster) u ∈ V with u = k there is exactly one edge in E incident to u in such a way that its orientation points to vertex (cluster) u w.r.t. vertex (cluster) k. Constraints (37) ensure the same fact as constraints (36), while considering that the vertex s is selected from the cluster u ∈ V . These constraints also guarantee that if the vertex s does not belong to the solution then all the variables resulted from the orientations of the edges incident to s have to be zero. Finally, constraints (38) and (39) ensure that there exists no other edge oriented from any other cluster to k. It is worth observing that despite the fact that the presented formulations are described for the GMSTP, they also hold true for the PC-GMSTP, we only need to add the prize of selecting a vertex from a cluster to the objective function. The GMSTP formulations described in this section are summarized in Table 1, providing as well information concerning the size of each formulation, i.e., the number of variables and constraints. In the next figure, we present the existing relationships between the polyhedrons defined by the linear relaxations of corresponding GMSTP formulations. (Fig. 3) There still exists some missing relationships, relating the polyhedrons defined by the linear relaxations of the local-global formulation and the multigraph formulation to the other polyhedrons. de Sousa et al. (2018) provided some results concerning the empirical quality of the LP bounds obtained by relaxing the multicommodity flow formulation, the local-global formulation and the multigraph formulation. For more information concerning this empirical comparative study we refer to de Sousa et al. (2018). 5. Exact algorithms Usually the exact algorithms are classified into the following three categories: direct methods, dynamic programming and integer linear programming. We will present next three exact algorithms for solving the GMSTP, one belonging to each mentioned category.

Please cite this article as: P.C. Pop, The generalized minimum spanning tree problem: An overview of formulations, solution procedures and latest advances, European Journal of Operational Research, https://doi.org/10.1016/j.ejor.2019.05.017

ARTICLE IN PRESS

JID: EOR

[m5G;June 10, 2019;23:28]

P.C. Pop / European Journal of Operational Research xxx (xxxx) xxx

9

Table 1 Summary of the GMSTP formulations. Formulations

Reference

Variables

Constraints

Generalized subtour elimination (SUB), generalized cutset (CUT) Generalized multicut (MCUT) Cluster subpacking (CSUB), directed generalized cutset (DCUT), directed cluster subpacking (DCSUB) Single commodity flow formulation (SFLOW) Multicommodity flow formulation (MCFLOW) Bidirectional flow formulation (BDFLOW) Flow cut formulation (FCUT) Steiner tree formulation (StTREE) Local-global formulation Multigraph formulation

Myung et al. (1995)

O(n2 )

O(2n )

Pop (2002) Feremans et al. (2002)

O(n2 ) O(n2 )

O(2n ) O(2n )

Pop (2002) Myung et al. (1995) Pop (2002) Feremans et al. (2002) Raghavan (2002) Pop (2002) de Sousa et al. (2018)

O(n2 ) O(n3 ) O(n3 ) O(n2 ) O(n3 ) O ( n2 + m3 ) O(mn2 )

O(n2 ) O(n3 ) O(n3 ) O(2n ) O(n3 ) O(nm + m3 ) O(mn2 )

Fig. 3. Known relationship between the polyhedrons defined by the linear relaxations of corresponding GMSTP formulations.

5.1. The branch-and-cut algorithm (Feremans, Labbe, & Laporte, 2004) Feremans et al. (2004) proposed a branch-and-cut algorithm for solving the GMSTP based on the undirected cluster subpacking formulation described in Section 4.1. Because of the exponential number of generalized subtour elimination constraints (8), they are at first omitted and generated every time when they are violated. Additionally, there have been considered some other valid inequalities. A summary of the branch-and-cut algorithm described by Feremans et al. (2004) is presented next: Step 1. (Initialization) The linear programming relaxation of the undirected cluster subpacking formulation without the integer constraints and the generalized subtour elimination constraints, is introduced in a list denoted by L. The best-known solution, denoted by xbest , was obtained using a Tabu Search heuristic. Step 2. (Termination check and subproblem selection) If the list is empty, STOP, otherwise one subproblem from the list is extracted in accordance with the best-first strategy.

Step 3. (Subproblem solution) Deliver a solution to the subproblem employing an LP solver and let x¯ be its optimal solution. If x¯ ≥ xbest then go to Step 2, otherwise if the solution is feasible a local improvement procedure is called. If the solution is fractional a rounding procedure is called. Step 4. (Separation procedure) Search for violations of generalized subtour elimination constraints or other valid inequalities. If violations are identified, the corresponding constraints are introduced into the subproblem and then go to Step 3. Step 5. (Branching) Construct two new subproblems by branching on a constraint z(Vk ) = 1, where Vk is a cluster with at least one vertex such that zi is not integer, and add these two subproblems to L and then go to Step 2.

Applying this method, Feremans et al. (2004) have solved to optimality all the considered Euclidean instances with up to 160 vertices, all the instances with random costs with up to 200 vertices and 150 out of 169 instances derived from TSPLIB given a limited time of two hours.

Please cite this article as: P.C. Pop, The generalized minimum spanning tree problem: An overview of formulations, solution procedures and latest advances, European Journal of Operational Research, https://doi.org/10.1016/j.ejor.2019.05.017

ARTICLE IN PRESS

JID: EOR 10

[m5G;June 10, 2019;23:28]

P.C. Pop / European Journal of Operational Research xxx (xxxx) xxx

x(i, Vr ) ≤ zi ,

5.2. A dynamic programming based exact algorithm (Pop, 2002)

(P ) k

Considering a spanning tree of the global graph Gglobal , which will be referred to as a global spanning tree hereafter, Pop (2002) used dynamic programming with the purpose of finding the corresponding best (w.r.t. cost minimization) generalized spanning tree. We fix an arbitrary cluster Vroot as the root of the global spanning tree and direct all the edges away from the vertices of Vroot in accordance with the global spanning tree. A directed edge Vk , Vl  of Gglobal , which resulted from the orientation of edges of the global spanning tree naturally defines an orientation i, j of an edge {i, j} ∈ E, where i ∈ Vk and j ∈ Vl . Let us consider v a vertex belonging to the cluster Vk for some 1 ≤ k ≤ m. All such vertices v are seen as potential candidates to be incident to a global spanning tree edge. On the graph G, we shall use T (v ) to denote the subtree rooted at such a vertex v from G and C (v ) to denote the children of v ∈ Vk , i.e. are those vertices u ∈ Vl which are heads of the directed edges u, v in the orientation. Let W (T (v )) denote the minimum weight of a generalized subtree rooted at v. Our aim is to compute:

min W (T (r )).

r∈Vroot

W (T (v )) = 0,

v ∈ Vk and Vk is a leaf of the global spanning tree.

In order to calculate W (T (v )) for a vertex v ∈ V belonging to an interior cluster, i.e. to find the optimal solution of the subproblem W (T (v )), one needs to look at all the vertices from the clusters Vl in such a manner that C (v ) ∩ Vl = ∅. If u defines a child of the interior vertex v, then the recursion for v is:



W (T (v )) =

l,C (v )∩Vl =∅

min[c (v, u ) + W (T (u ))]. u∈Vl

Thus, when considering the fixed v, we need to verify at most n vertices. As a consequence, for the given global spanning tree, the overall complexity of the described dynamic programming algorithm is O(n2 ). Since according to the formula of Cayley, the number of all distinct global trees is mm−2 , we have the following outcome: a dynamic programming algorithm exists; this algorithm provides an exact solution to the GMSTP in O(mm−2 n2 ) time, where n is the number of nodes and m is the number of clusters in the input graph. 5.3. The rooting procedure (Pop et al., 2006a) Based on the local-global formulation, Pop et al. (2006a) proposed a solution procedure called the rooting procedure which is described in what it follows. Instead of considering 0–1 local-global mixed integer programming problem described in Section 4, we consider the constraints characterizing the polytope PMST only for fixed k, 1 ≤ k ≤ m, then we get a relaxation that we will denote by Pk . Employing Yannakakis’ description of the global spanning tree polytope Yannakakis (1991), this situation corresponds to the case when one randomly chooses a cluster Vk and root the global tree exclusively at the root k.

min



∀ 1 ≤ k, i, j ≤ m and i = k, k fixed

j

λkk j = 0, λki j ≥ 0, xe , zi ≥ 0, ylr ∈ {0, 1},

∀ 1 ≤ k, j ≤ m, k fixed ∀ 1 ≤ k, i, j ≤ m, k fixed ∀ e = (i, j ) ∈ E, ∀ i ∈ V ∀ 1 ≤ l, r ≤ m.

If the optimal solution of this relaxation (solved with CPLEX) produces a generalized spanning tree, then we have obtained the optimal solution of the GMSTP. Otherwise, what we have is a subgraph which contains at least one cycle, to which we add the corresponding constraints (from the characterization of PMST ) with the purpose of breaking that cycle (that is the global tree should be rooted in a subsequent cluster within the cycle) and continue in this manner until the optimal solution of the GMSTP is obtained. By employing this methodology, Pop et al. (2006a) have solved to optimality a set of 19 Euclidean instances with up to 160 vertices and a set of 29 non-Euclidean instances with random costs with up to 240 vertices. 6. Heuristic algorithms

In order to solve the subproblem W (T (v )), we are in the position of employing the dynamic programming recursion. The initialization is:

if

yi j = λki j + λk ji ,  λki j = 1,

∀ r ∈ K, ∀ i ∈ V \ Vr ∀ 1 ≤ k, i, j ≤ m and i = j, k fixed

ce xe

e∈E

s.t. z(Vk ) = 1,

∀ k ∈ K = {1, . . . , m}

x (E ) = m − 1 x(Vl , Vr ) = ylr ,

∀ l, r ∈ K = {1, . . . , m}, l = r

Since the GMSTP is an N P -hard, a natural approach to solved it is by means of heuristic algorithms. One stream of research has consisted of developing heuristics providing guarantees on the approximation. Most effort has, however been devoted to the design of heuristics with good empirical performance. In this section we will examine these two streams of heuristics. 6.1. Heuristic algorithms providing guarantees on the approximation It is well-known that some hard combinatorial optimization problems do not have an approximation algorithm unless P = N P . With the purpose of providing a result of this form, we only need to show that the occurrence of an α -approximation algorithm would permit the solving of certain decision problems known to be N P -complete in polynomial time. Pop (2012) has obtained the following in-approximability result by applying the above-described scheme to the GMSTP: when assuming that P = N P , there exists no α -approximation algorithm for the GMSTP. This result is a different expression of a result that Myung et al. (1995) provided in terms of approximation algorithms; this says that even finding a near optimal solution for the GMSTP is N P -hard. Assuming that there is an α -approximation algorithm APP for the GMSTP, where α ≥ 1, Pop (2012) has showed the APP also provides solutions to the node cover problem for a certain graph G = (V, E ) and an integer k such that k < |V|, which contradicts the assumption that P = N P . Nevertheless, considering further assumptions, in this part of the paper we introduce two heuristics providing guarantees on the approximation for GMSTP. 6.1.1. An approximation algorithm for the GMSTP with bounded cluster size (Pop, 2002) Under the following assumptions: A1: the graph has bounded cluster size, i.e. |Vk | ≤ ρ , for all k = 1, . . . , m, A2: the cost function is strict positive and satisfies the triangle inequality, i.e. ci j + c jk ≥ cik for all i, j, k ∈ V, Pop (2002) have presented an approximation algorithm for the GMSTP with performance ratio 2ρ . The approximation algorithm

Please cite this article as: P.C. Pop, The generalized minimum spanning tree problem: An overview of formulations, solution procedures and latest advances, European Journal of Operational Research, https://doi.org/10.1016/j.ejor.2019.05.017

ARTICLE IN PRESS

JID: EOR

[m5G;June 10, 2019;23:28]

P.C. Pop / European Journal of Operational Research xxx (xxxx) xxx

was built in accordance with Slavik (1997) ideas, in which he approached the Generalized Traveling Salesman Problem and Group Steiner Tree Problem. The algorithm for approximating the optimal solution of the GMSTP described by Pop (2002) works as follows: Input: A complete graph G = (V, E ) with a cost function attached to the edges which is strictly positive and satisfies the triangle inequality, and with the vertices partitioned into clusters V1 , . . . , Vm with bounded size, |Vk | ≤ ρ . Output: A tree T ⊂ G spanning certain vertices W ⊂ V which includes exactly one vertex within each cluster, which approximates the optimal solution to the GMSTP. 1. Solve the linear programming relaxation of the generalized cutset formulation of the GMSTP and let (x∗ , z∗ , Z1∗ ) = ((x∗e )e∈E , (zi∗ )ni=1 , Z1∗ ) be the optimal solution.







i ∈ V |zi∗ ≥ ρ1 and consider W ⊂ W ∗ , with the property that W has exactly one vertex within each cluster, and find a minimum cost tree spanning T ⊂ G on the sub graph G generated by W . 3. Output AP P = cost (T ) and the generalized spanning tree T. 2. Set W∗ =

Based on some auxiliary results, including the parsimonious property for the survivable network design problem, Pop (2002) proved that the performance ratio of the described approximation algorithm for approximating the optimum solution to the GMSTP satisfies:



2 AP P ≤ 2− OP T n



ρ.

where by OPT we denoted the cost of an optimal solution and by APP the cost of the approximate solution. 6.1.2. A polynomial time approximation scheme for the GMSTP with grid clustering (Feremans, Grigoriev, & Sitters, 2006) Feremans et al. (2006) have considered a geometric version of the at least variant of the GMSTP in which the vertices of the graph G are situated within a planar integer grid of dimension p × r, the costs of the edges represent the Euclidean distances amongst the vertices situated in the bi-dimensional plane and all the vertices which belong to the same grid cell create a cluster. In this case, citetF06 have achieved the following results: 1. The problem is N P -hard even though all non-empty grid cells are linked and each grid cell contains at most two vertices. This result was achieved using a reduction from the exact cover by 3-sets (X3C). 2. An exponential time exact algorithm based on dynamic programming that solves the problem with connected nonempty grid cells. In addition, when either one of the grid sized, p or r are bounded by a constant, then the dynamic programming algorithm solves the problem in polynomial time. 3. A polynomial time approximation algorithm, based on the dynamic programming algorithm, in the situation when all the non-empty grid cells are linked and the number of nonempty grid cells is superliniar in p and r, whose performance ratio satisfies:

AP P − OP T ≤ . OP T where  > 0 is an accuracy parameter. Regarding the approximation of the GMSTP several related questions remain open and further research directions are worth to be investigated. The most challenging question is if the geometric GMSTP admits a polynomial time approximation scheme when

11

we suppose that the non intersecting squares have varying sizes. The further research directions will be addressed in section Concluding remarks. 6.2. Heuristic algorithms with good empirical performance In this section, we will focus on four heuristic algorithms known to provide the best generalized spanning tree solutions in an empirical sense: the Tabu Search (TS) heuristic developed by Oncan, Cardeau, and Laporte (2008), the GRASP solution approach proposed by Ferreira, Ochi, Parada, and Uchoa (2012), the multioperator genetic algorithm (MGA) described by Contreras-Bolton, Gatica, Barra, and Parada (2016a) and the two-level solution approach (TL-2GA) developed by Pop, Matei, Sabo, and Petrovan (2018), followed by a performance analysis of these algorithms. In addition to these solution approaches other heuristic algorithms described in the literature are: the Genetic Algorithm developed by Golden, Raghavan, and Stanojevic (2005), the Simulated Annealing based approach described by Pop (2002), the Memetic algorithm proposed by Pop (2012), the VNS combined with integer programming presented by Hu, Leitner, and Raidl (2008), the evolutionary algorithm with solution archives and bounding extensions described by the same authors Hu and Raidl (2012) and an automated technique that explores new heuristic combinations proposed by Contreras-Bolton et al. (2016b). Corus, Lehre, Neumann, and Pourhassan (2016) provided a parameterized computational complexity analysis of the two different bi-level approaches described by Hu et al. (2008) and Hu and Raidl (2012), in which have been examined the parameters of the GMSTP that make it hard to solve it by heuristic algorithms. They showed that using the global representation leads to fixed parameter evolutionary algorithms with respect to the number of clusters. 6.2.1. The Tabu search solution approach (Oncan et al., 2008) Tabu search is an iterative local search heuristic that starts from an initial solution, the search trajectory changes at every iteration from the current solution to the best one in a subset of its neighborhood, even though this generates a degradation of the objective function value. In order to avoid cycling, solutions with certain characteristics of recently visited solutions are considered tabu, for a certain number of iterations. One of the main features of the Tabu search solution approach proposed by Oncan et al. (2008) consists on defining an aspiration criterion whose aim is to allow to be evaluated tabu solutions, from which some actually never have been considered in the search of the feasible solution space, producing in this way better solution values than that of the best known solutions. The proposed solution approach was tested on the existing GMSTP instances adapted from the TSPLIB instances with vertex sizes between 198 and 226 and as well on a new class of instances derived from TSPLIP using the Grid Clusterization procedure, see for more details Oncan et al. (2008). 6.2.2. The GRASP solution approach (Ferreira et al., 2012) Greedy Randomized Adaptive Search Procedure (GRASP) is a hybrid metaheuristic that involves two stages at each iteration: the first one in which a feasible solution is constructed and a second one for improving the current solution. The GRASP solution approach described by Ferreira et al. (2012) is based on five classes of constructive heuristics and three additional improvement mechanisms: local search, path relinking and iterative local search. The same authors described a pre-processing of the GMSTP instances which allowed to reduce the problems size significantly. The heuristic algorithms employed during the construction stage of the GRASP are: a random construction algorithm that

Please cite this article as: P.C. Pop, The generalized minimum spanning tree problem: An overview of formulations, solution procedures and latest advances, European Journal of Operational Research, https://doi.org/10.1016/j.ejor.2019.05.017

JID: EOR 12

ARTICLE IN PRESS

[m5G;June 10, 2019;23:28]

P.C. Pop / European Journal of Operational Research xxx (xxxx) xxx

simply chooses one vertex from every cluster and then the minimum tree spanning the selected vertices is determined using the Kruskal’s algorithm, two adaptations of the Kruskal’s algorithm to the GMSTP, two adaptations of the Prim’s algorithm to the GMSTP, two heuristics that selects the vertices based on average distances and a clustering method that initially groups the clusters into a given number of subsets considered as initial seeds and then each non-seeded cluster is associated with the nearest seed cluster in order to include each cluster in a group. The improvement mechanisms considered by Ferreira et al. (2012) are: 1. A local search procedure in which, beginning from an initial solution, a set of neighboring solutions is created and the best solution from this set is chosen according to the method described by Golden et al. (2005); 2. A path relinking mechanism whose aim is to identify intermediary solutions between two good solutions. The described path relinking method used an elite set which contains only the best solutions discovered along the execution and which is updated when the solution generated by the local search is better than the worst solution from the elite set (w.r.t. cost minimization). For more details concerning the path relinking method we refer to Yagiura, T, and Glover (2006). 3. An iterated local search that has four components: an initial solution, a local search procedure, a perturbation strategy and a stopping criterion. The perturbation strategy is playing a crucial role, because based on its strength the search trajectory might be driven to different attraction basins leading to diverse local optima. For more details concerning the iterated local search heuristic we refer to Stutzle (2006). 6.2.3. The multi-operator genetic algorithm (Contreras-Bolton et al., 2016a) The multi-operator genetic algorithm (MGA) proposed by Contreras-Bolton et al. (2016a) approaches a genotype-phenotype representation, where the genotype is a string of integers corresponding to the chosen vertices within the clusters and the phenotype corresponds to a minimum cost tree spanning the selected vertices, obtained using the Kruskal’s algorithm. The main features of the proposed approach are the use of an efficient representation and the simultaneous use of two crossover operators and five mutation operators which are randomly and dynamically selected based on their probabilities. The representation is proposed as genotype, based on the one described by Haouari and Chaouachi (2006), employing a string of integers which are of length m. The kth position, 1 ≤ k ≤ m, may take an integer value from the interval [1, t], where t is the cardinality of the cluster Vk . The novelty of the proposed representation consist on the application of the modulus on the number of vertices of the cluster, selecting a number of vertices within the range of the numbers of vertices belonging to the cluster. In order to construct the phonotype from the set of selected vertices, it was implemented the Kruskal’s algorithm with the disjoint-set data structure. The multi-operators considered by Contreras-Bolton et al. (2016a) are: two classical crossover operators: the order crossover and the two points crossover, and five mutation operators: the classical Exchange Mutation, ad-hoc operator that randomly selects a cluster and exchanges the vertex belonging to that cluster with another one, and three local searches that carry out the same search procedure on the chromosome, with the difference regarding how exhaustive the search is. The considered genetic operators are randomly and dynamically elected and performed based on some given probabilities.

6.2.4. The two-level solution approach (Pop et al., 2018) The method described by Pop et al. (2018), which is a two-level solution approach, takes advantage of the peculiar structure of the GMSTP, that is the vertices of the graph are split into a certain number of clusters and is accomplished through the decomposition of the problem into two logical and natural smaller subproblems. These two problems are: a macro-level (global) subproblem and a micro-level (local) subproblem; which are solved individually. This solution approach provides computational advantages by employing efficient methods meant to solve the subproblems individually and by combining the results obtained with the aim of providing a solution of the GMSTP without employing any post-processing procedure. The macro-level (global) subproblem aims at defining the global trees spanning the clusters (inter-cluster connections) and it is defined on the global graph Gglobal introduced on Section 4.4. In the case of the GMSTP, Pop et al. (2018) used a genetic algorithm which was applied to the corresponding global graph with the aim of providing a collection of trees spanning the clusters. One of the major advantages of employing this approach was the considerable diminution of the solution space of the initial problem. The same authors proved that having a global tree spanning the clusters (i.e., a tree in the global graph), finding the generalized spanning tree with minimum cost is quite easy. There exist various generalized spanning trees corresponding to a global spanning tree and there exists one called the best generalized spanning tree (w.r.t. cost minimization) amongst these generalized spanning trees, and it can be obtained either by dynamic programming or by solving the linear integer program also described in Section 4.4, see Pop et al. (2006a) and Pop (2012) for further details. The genetic algorithm which provides a collection of global trees spanning the clusters has some important features: an adequate representation at the level of the global graph was employed, in this representation, the chromosome for every candidate solution (i.e., global spanning tree) was depicted as a unique Prüfer sequence of length m − 2, unlike the classical individuals, which were synonymous with chromosomes, Pop et al. (2018) considered individuals made up of a pair chromosomes, thus ensuring a greater diversity of the potential feasible solutions. The goal of the micro-level (local) subproblem is to establish the best tree (w.r.t. cost minimization) spanning a subset of vertices which includes exactly one vertex from every cluster for the above mentioned global spanning trees, employing the dynamic programming described in Section 5.2. 6.2.5. Comparative analysis of performance Table 1 sums up the computational results achieved by the aforementioned algorithms on a set of 101 benchmark instances generated by Oncan et al. (2008) and frequently employed in the scientific literature. These instances were adapted from the TSPLIB employing two procedures with the aim of partitioning the verties into clusters: Cluster Centering and Grid Clusterization. Within the former procedure, the vertices were split into m = [|V |/5], while within the latter procedure the clusters include at least one vertex and at most [|V|/η] vertices, where η denotes the approximate number of vertices per cluster. The dimension of the instances vary from 229 to 783 vertices and both procedures were applied: center clustering and grid clustering for η = 3, η = 5, η = 7 and η = 10. Since the experiments were not carried out on the same machine, we explicitly mention the machines used to carry out the tests: the Tabu Search Oncan et al. (2008) and GRASP Ferreira et al. (2012) were tested on a Pentium IV PC with a 3 gigahertz CPU and 2 gigabyte RAM, while the multi-operator genetic algorithm was tested on a Intel Xeon CPU E7-4830 with 64 gigabyte RAM and the two-level solution approach was executed on a Pentium 4, Intel, 1.8 megahertz computer.

Please cite this article as: P.C. Pop, The generalized minimum spanning tree problem: An overview of formulations, solution procedures and latest advances, European Journal of Operational Research, https://doi.org/10.1016/j.ejor.2019.05.017

ARTICLE IN PRESS

JID: EOR

[m5G;June 10, 2019;23:28]

P.C. Pop / European Journal of Operational Research xxx (xxxx) xxx

13

Table 2 Global results for the set of 101 instances often used in the literature. TS Oncan et al. (2008)

GRASP Ferreira et al. (2012)

MGA Contreras-Bolton et al. (2016a)

TL-2GA Pop et al. (2018)

21 instances – cluster centering procedure Average gap Number of hits

0.0395 12

0.0291 15

0.0282 16

0.0263 17

80 instances – grid clustering procedure with η = 3, η = 5, η = 7 and η = 10 Average gap Number of hits

0.0236 15

0.0095 16.5

0.0140 16.75

0.0065 17.75

Average time in seconds Scaled time in seconds

736.85 94.52

59.13 7.28

49.43 49.43

948.26 59.26

It is important to mention that Tabu Search and GRASP results were obtained in the single run of the algorithms with fixed parameters as reported in Ferreira et al. (2012) and Oncan et al. (2008), while in the case of the multi-operator genetic algorithm and the two-level solution approach the results are averages of 30 independent runs for each instance as reported in Contreras-Bolton et al. (2016a) and Pop et al. (2018). In Table 2, for each of the aforementioned heuristic approaches, we provided the following characteristics: the average gap and the number of hits, the original reported average running times in seconds for solving each of the 101 instances and in order to be consistent with the performance analysis we provided the average normalized running times, after scaling as all methods were run on the same computer Pentium IV PC, Intel, 1.8 megahertz. An analysis of the outcomes of the experiments reported in Table 2 in both classes of instances, reveals that the TL-2GA provides the best average gaps, the highest number of hits, therefore the TL-2GA is clearly more effective in producing high quality solutions. The average scaled computational time in reaching these solutions are the third smallest compared with the corresponding average CPU times of the other heuristics.





7. Concluding remarks •

The generalized minimum spanning tree problem is an important problem in terms of both theoretical and practical reasons. Different types of integer programming formulations have been proposed for the GMSTP and between these models two of them: the local-global formulation and the multigraph formulation are specially tailored to the investigated problem. The best exact algorithms can only solve relatively small instances and in practice heuristic algorithms are the preferred solution methodology. As regards the quality of the achieved solutions, the best method is the two-level solution approach which was achieved through the decomposition of the problem into two logical and natural smaller subproblems: an macro-level one (global) and a micro-level one (local); which were solved individually. The two-level solution approach provides computational advantages by employing efficient methods for solving the subproblems and by combining the obtained results with the aim of providing a solution for the GMSTP without employing any post-processing procedure. Next we present a number of research directions that we consider that are worth to be investigated: •

As we have seen in the case of the GMSTP, very different solution approaches have been suggested in the literature. In principle, they can be classified into exact algorithms, approximation algorithms and heuristic approaches. While exact algorithms in principle always yield proven optimal solutions, they are in practice typically applicable only for small instances due to high runtimes caused by the high complexity of the problem. Faster heuristic methods have been described for addressing larger instances in practice. Nowadays, the leading







methods for solving several complex optimization problems are hybrid approaches, which combine different solution techniques. We consider that it is worth to be investigated new hybrid algorithms that combine exact methods based on integer linear programming with local search based metaheuristics. Both streams of techniques have their particular advantages and disadvantages that can be seen as complementary to a large degree, and while analyzing them it appears natural to combine them to obtain more effective algorithms. The local-global method which is a bi-level approach that splits up the problem into a macro-level problem and a micro-level problem which depend on each other, deserves increased attention since it is an important source for developing efficient solution approaches for solving large scale GMSTP. The same approach might be applied also to other problems belonging to the class of generalized combinatorial optimization problems. An interesting research direction concerns the parameterized computational complexity analysis of some other bi-level approaches described in the literature. In this way we will be capable to obtain more information regarding the runtime behavior with respect to the dimension of the input and to examine the parameters of the GMSTP that make it hard to solve by heuristic algorithms. Regarding approximation algorithms for GMSTP, it will be interesting to consider some other assumptions that lead to the development of polynomial time approximation schemes for the investigated problem. Another research direction is to design fast constant approximation algorithms for the geometric GMSTP with grid clustering. The most challenging open question is if the geometric GMSTP admits a polynomial time approximation scheme when we suppose that the non intersecting squares have varying sizes. Concerning the two-level solution approach, the authors used a representation at the level of the global graph, in which the chromosome for each candidate solution (i.e., global spanning tree) is represented as a unique Prüfer sequence of length m − 2, where m is the number of clusters. It will be interesting to analyze some other representations of the spanning trees for evolutionary search, such as dandelion code, happy code, rainbow code, etc. Several integer and mixed integer programming have been described in order to model the GMSTP being an important source to develop solution approaches, especially exact algorithms for solving the problem. Recently, it was described a multigraph formulation which considers the multigraph M(G) associated to the undirected graph G. This novel model of the GMSTP opens new research directions in terms of adaptation to other generalized combinatorial optimization problems and as well as for its use in decomposition and cutting plane methods. One should particularly investigate the generalized hopconstrained minimum spanning tree, which is a variant of the GMSTP that incorporates some additional restrictions that per-

Please cite this article as: P.C. Pop, The generalized minimum spanning tree problem: An overview of formulations, solution procedures and latest advances, European Journal of Operational Research, https://doi.org/10.1016/j.ejor.2019.05.017

JID: EOR 14



ARTICLE IN PRESS

[m5G;June 10, 2019;23:28]

P.C. Pop / European Journal of Operational Research xxx (xxxx) xxx

mit to limit the communication delay of the resulting network in regional or metropolitan area networks. The results on this variant are scarce and several research directions should be investigated: novel mathematical models, design of approximation algorithms and heuristic approaches. Special attention should be paid to the novel variant proposed in Section 2.2, called Selective Minimum Spanning Tree Problem, which is an extension of the GMSTP and which assumes that the clusters are not mutually exclusive. The models and methods surveyed in this paper are not easily adaptable to the SMSTP and therefore it will be interesting to investigate mathematical formulations and solution approaches for this new variant.

As a concluding observation, the overall image materializing from the review carried out throughout this paper illustrates a domain where many contributions have been brought within the last years, yet also includes various topics worth to be subject to further research. It is our desire and hope that this study should encourage further research in this challenging field of combinatorial optimization. Acknowledgments The author is grateful to the anonymous referees for reading the manuscript very carefully and providing constructive comments which helped to improve substantially the paper. References Abilasha, S., & Mohan, A. (2016). A genetic algorithm based heuristic search on graphs with weighted multiple attributes. Proceedings of ICNGIS Conference, (pp. 1–6). Adasme, P., Andrade, R., Letournel, M., & Lisser, A. (2015). Stochastic maximum weight forest problem. Networks, 65, 289–305. Bertsimas, D. J. (1990). The probabilistic minimum spanning tree problem. Networks, 20, 245–275. Chandy, K. M., & Russell, R. A. (1972). The design of multipoint linkages in a teleprocessing tree network. IEEE Transactions on Computers, 21, 1062–1066. Contreras-Bolton, C., Gatica, G., Barra, C. R., & Parada, V. (2016a). A multi-operator genetic algorithm for generalized minimum spanning tree problem. Expert Systems with Applications, 50, 1–8. Contreras-Bolton, C., Rey, C., Ramos-Cossio, S., Rodrguez, C., Gatica, F., & Parada, V. (2016b). Automatically produced algorithms for the generalized minimum spanning tree problem. Scientific Programming, 2016, 11. Article ID 1682925 Corus, D., Lehre, P. K., Neumann, F., & Pourhassan, M. (2016). A parameterised complexity analysis of bi-level optimisation with evolutionary algorithms. Evolutionary Computation, 24(1), 183–203. Demange, M., Ekim, T., Ries, B., & Tanasescu, C. (2015). On some applications of the selective graph coloring problem. European Journal of Operational Research, 240(2), 307–314. Demange, M., Monnot, J., Pop, P. C., & Ries, B. (2014). On the complexity of the selective graph coloring problem in some special classes of graphs. Theoretical Computer Science, 540–541, 82–102. Dijkstra, E. W. (1959). A note on the two problems in connection with graphs. Numerische Mathematik, 1, 269–271. Dror, M., Haouari, H., & Chaouachi, J. (20 0 0). Generalized spanning trees. European Journal of Operations Research, 120, 583–592. Dror, M., & Haouari, M. (20 0 0). Generalized Steiner problems and other variants. Journal of Combinatorial Optimization, 4, 415–436. Duin, C., & Vos, S. (2004). Solving group Steiner problems as Steiner problems. European Journal of Operations Research, 154(1), 323–329. Expósito-Izquierdo, C., Rossi, A., & Sevaux, M. (2016). A two-level solution approach to solve the clustered capacitated vehicle routing problem. Computers & Industrial Engineering, 91, 274–289. Feremans, C., Grigoriev, A., & Sitters, R. (2006). The geometric generalized minimum spanning tree problem with grid clustering. 4OR, 4(4), 319–329. Feremans, C., Labbe, M., & Laporte, G. (2001). On generalized minimum spanning trees. European Journal of Operations Research, 134(2), 457–458. Feremans, C., Labbe, M., & Laporte, G. (2002). A comparative analysis of several formulations for the generalized minimum spanning tree problem. Networks, 39(1), 29–34. Feremans, C., Labbe, M., & Laporte, G. (2003). Generalized network design problems. European Journal of Operations Research, 148(1), 1–13. Feremans, C., Labbe, M., & Laporte, G. (2004). The generalized minimum spanning tree problem: Polyhedral analysis and branch-and-cut algorithm. Networks, 43(2), 71–86.

Ferreira, C. S., Ochi, L. S., Parada, V., & Uchoa, E. (2012). A GRASP-based approach to the generalized minimum spanning tree problem. Expert Systems with Applications, 39(3), 3526–3536. Fischetti, M., Salazar-Gonzales, J. J., & Toth, P. (1995). The symmetric generalized traveling salesman polytope. Networks, 26(2), 113–123. Fischetti, M., Salazar-Gonzales, J. J., & Toth, P. (1997). A branch-and-cut algorithm for the symmetric generalized traveling salesman problem. Operations Research, 45(3), 378–394. Gale, D. (1957). A theorem on flows in networks. Pacific Journal of Mathematics, 7, 1073–1082. Gamvros, I., Golden, B., Raghavan, S., & Stanojevic, D. (2005). Heuristic search for network design. In Tutorials on emerging methodologies and applications in operations research. International series in operations research & management science: vol. 76 (pp. 1–46). Garey, M. R., & Johnson, D. S. (1979). Computers and intractability: A guide to the theory of NP-completeness. San Francisco, California: Freeman. Ghiani, G., & Improta, G. (20 0 0). An efficient transformation of the generalized vehicle routing problem. European Journal of Operational Research, 122(1), 11–17. Golden, B., Raghavan, S., & Stanojevic, D. (2005). Heuristic search for the generalized minimum spanning tree problem. INFORMS Journal on Computing, 17(3), 290–304. Golden, B., Raghavan, S., & Stanojevic, D. (2008). The prize-collecting generalized minimum spanning tree problem. Journal of Heuristics, 14, 69–93. Haouari, M., Chaouachi, J., & Dror, M. (2005). Solving the generalized minimum spanning tree problem by a branch-and-bound algorithm. Journal of the Operational Research Society, 56(4), 382–389. Haouari, M., & Chaouachi, J. S. (2006). Upper and lower bounding strategies for the generalized minimum spanning tree problem. European Journal of Operational Research, 172, 632–647. He, J., Zhang, V., Huang, G., Shi, Y., & Cao, J. (2012). Distributed data possession checking for securing multiple replicas in geographically-dispersed clouds. Journal of Computer and System Sciences, 78, 1345–1358. Hintsch, T., & Irnich, S. (2018). Large multiple neighborhood search for the clustered vehicle-routing problem. European Journal of Operational Research, 270(1), 118–131. Hu, B., Leitner, M., & Raidl, G. R. (2008). Combining variable neighborhood search with integer linear programming for the generalized minimum spanning tree problem. Journal of Heuristics, 14(5), 473–499. Hu, B., & Raidl, G. R. (2012). An evolutionary algorithm with solution archives and bounding extension for the generalized minimum spanning tree problem. In Proceedings of the fourteenth annual conference on genetic and evolutionary computation (pp. 393–400). Ihler, E., Reich, G., & Widmayer, P. (1999). Class Steiner trees and VLSI-design. Discrete Applied Mathematics, 90, 173–194. Ishii, H., Shiode, S., Nishida, T., & Namasuya, Y. (1981). Stochastic spanning tree problem. Discrete Apllied Mathematics, 3, 263–273. Kansal, A., & Torquato, S. (2001). Globally and locally minimal weight spanning tree networks. Physica A: Statistical Mechanics and its Applications, 301(1), 601–619. Kruskal, J. B. (1956). On the shortest spanning subtree of a graph and the traveling salesman problem. In Proceedings of the American mathematical society: vol. 7 (pp. 48–50). Leitner, M. (2016). Layered graph models and exact algorithms for the generalized hop-constrained minimum spanning tree problem. Computers & Operations Research, 65, 1–18. Maculan, N. (1987). The Steiner tree problem in graphs, surveys in combinatorial optimization. In S. Martello, G. Laporte, M. Minoux, & C. C. Ribeiro (Eds.), Annals of discrete mathematics: vol. 31 (pp. 185–212). Miranda, P. A., Blazquez, C. A., Obreque, C., Maturana-Ross, J., & Gutierrez– Jarpa, G. (2018). The bi-objective insular traveling salesman problem with maritime and ground transportation costs. European Journal of Operational Research, 271(3), 1014–1036. Myung, Y. S., Lee, C. H., & Tcha, D. W. (1995). On the generalized minimum spanning tree problem. Networks, 26, 231–241. Narula, S. C., & Ho, C. A. (1980). Degree-constrained minimum spanning tree. Computer and Operations Research, 7, 239–249. Nesetril, J., & Nesetrilova, H. (2012). The origins of minimal spanning tree algorithms – boruvka and jarnik. Documenta Mathematica, 127–141. Oncan, T., Cardeau, J. F., & Laporte, G. (2008). A tabu search heuristic for the generalized minimum spanning tree problem. European Journal of Operational Research, 191, 306–319. Pop, P. C. (2002). The generalized minimum spanning tree problem. (Ph.D. thesis). Twente University Press, the Netherlands. Pop, P. C. (2007). On the prize-collecting generalized minimum spanning tree problem. Annals of Operations Research, 150(1), 193–204. Pop, P. C. (2012). Generalized network design problems, modelling and optimization. Germany: De Gruyter. Pop, P. C., Kern, W., & Still, G. J. (2006a). A new relaxation method for the generalized minimum spanning tree problem. European Journal of Operational Research, 170, 900–908. Pop, P. C., Marc, A. H., & Sitar, C. P. (2006b). The at least version of the generalized minimum spanning tree problem. Carpathian Journal of Mathematics, 22(1–2), 129–135. Pop, P. C., Matei, O., Sabo, C., & Petrovan, A. (2018). A two-level solution approach for solving the generalized minimum spanning tree problem. European Journal of Operational Research, 265(2), 478–487.

Please cite this article as: P.C. Pop, The generalized minimum spanning tree problem: An overview of formulations, solution procedures and latest advances, European Journal of Operational Research, https://doi.org/10.1016/j.ejor.2019.05.017

JID: EOR

ARTICLE IN PRESS P.C. Pop / European Journal of Operational Research xxx (xxxx) xxx

Pop, P. C., Matei, O., & Sitar, C. P. (2013). An improved hybrid algorithm for solving the generalized vehicle routing problem. Neurocomputing, 109, 76–83. Pop, P. C., Matei, O., Sitar, C. P., & Danciulescu, D. (2017). A genetic algorithm based solution approach to solve the prize-collecting generalized minimum spanning tree problem. In Proceeding of international conference on computers and industrial engineering (CIE). 11–13 October, Lisbon, Portugal, code 133146 Prim, R. C. (1957). Shortest connection networks and some generalizations. Bell Systems Technical Journal, 36, 1389–1401. Raghavan, S. (2002). On modeling the generalized minimum spanning tree. Technical report. University of Maryland. Shyu, S. J., Yin, P. Y., Lin, B. M. T., & Haouari, M. (2003). Ant-tree: an ant colony optimization approach to the generalized minimum spanning tree problem. Journal of Experimental & Theoretical Artificial Intelligence, 15(1), 103–112. Slavik, P. (1997). On the approximation of the generalized traveling salesman problem. University of Buffalo Working paper.

[m5G;June 10, 2019;23:28] 15

Sollin, G. (1961). Probleme de l’arbre minimum. unpublished manuscript prepared for the C. Berge’s Paris seminar. de Sousa, E. G., de Andrade, R. C., & Santos, A. C. (2018). A multigraph formulation for the generalized minimum spanning tree problem. in proc. of ISCO 2018. Lecture Notes in Computer Science, 10856, 133–143. Stutzle, T. (2006). Iterated local search for the quadratic assignment problem. European Journal of Operational Research, 174(3), 1519–1539. Xu, W. (1984). Quadratic minimum spanning tree problems and related topics. University of Maryland (Ph.D. dissertation). Yagiura, M., T, I., & Glover, F. (2006). A path relinking approach with ejection chains for the generalized assignment problem. European Journal of Operational Research, 169(2), 548–569. Yannakakis, M. (1991). Expressing combinatorial optimization problems by linear programs. Journal of Computer and System Sciences, 43, 441–466.

Please cite this article as: P.C. Pop, The generalized minimum spanning tree problem: An overview of formulations, solution procedures and latest advances, European Journal of Operational Research, https://doi.org/10.1016/j.ejor.2019.05.017