ELSEVIER
Information Processing Letters 52 (1994) 223-228
Information Processing Letters
An optimal parallel algorithm for maximal matching Pierre Kelsen ’ Department of Computer Science, University of British Columbia, Vancouver, BC, Canada V6T lZ4
Communicated by M.J. Atallah; received 18 August 1993
Abstract We describe a new parallel algorithm for computing a maximal matching in a graph. The O(log4n) on (m + n)/log4rz EREW PRAM processors. A variant of this algorithm computes bipartite graphs in time 0(log3n) on (m + n)/log3n EREW PRAM processors. This is algorithm for maximal matching that achieves a linear time-processor product. We also linking the parallel complexity of maximal matching on bipartite graphs and general graphs. Keywords:
Maximal
matching;
Parallel
algorithms
1. Introduction
A matching in a graph is a subset of edges in the graph no two of which have a common endpoint. A matching is maximal if it is not a proper subset of another matching. There is a trivial sequential algorithm that computes a maximal matching in linear time. Finding a fast parallel algorithm for this problem is more difficult. Since a maximal matching in a graph corresponds to a maximal independent set in its line graph, NC algorithms for the maximal independent set problem (e.g., [4]) yield NC algorithms for the maximal matching problem. A more efficient algorithm for computing a maximal matching in a general graph, based on a different approach, was provided by Israeli and Shiloach [5]. Their algorithm runs in time 0(log3n) time -i--- The author was supported by the Natural Sciences Engineering
Research
Council
of Canada.
0020-0190/94/$07.00 0 1994 Elsevier SSDIOO20-0190(94)00141-3
algorithm runs in time a maximal matching in the first deterministic prove a general result
Science
and
on n + m processors of a CRCW ARBITRARY PRAM and in time O(log4n) time on n + m processors of an EREW PRAM. Israeli and Shiloach also have a randomized algorithm [6] that finds a maximal matching in expected time O(log n) on II + m CRCW COMMON PRAM processors. We say that a parallel algorithm is optimal if its time-processor product is asymptotically equal to the fastest known worst-case running time of a sequential algorithm for the problem. The randomized algorithm mentioned above requires O(m + n> operations and can be made optimal using Brent’s principle (Section 2). None of the deterministic parallel algorithms are optimal, however; they all require R(m log2n) operations. Up to this point the situation was very similar for the maximal independent set problem: while there is an optimal randomized algorithm [lo], all determistic NC algorithms require fl((m + n) log’n) operations (bound matched by [4]).
B.V. All rights reserved
224
P. Kelsen /Information
Processing Letters 52 (1994) 223-228
We describe the first optimal deterministic parallel algorithm for computing a maximal matching in a graph. The time bound of our algorithm matches that of [5] on the EREW PRAM; for the special case of bipartite graphs we improve their time bound by a factor of log n. A tool in developing our algorithm is a decomposition of general graphs into bipartite subgraphs that is of independent interest. In the next section we introduce relevant notation and terminology. In Section 3 we describe a graph decomposition that is used to relate the complexity of maximal matching on bipartite graphs and general graphs. In Section 4 we describe an optimal parallel algorithm for maximal matching in bipartite graphs that generalizes to general graphs using the result of Section 3.
2. Preliminaries We denote the vertex set and edge set of a graph G by I/(G) and E(G), respectively, and we let n(G) = 1I/(G)1 and m(G) = 1E(G) I. If the graph G is understood we simplify the notation into I/, E, n and m. Following [2] we consider all graphs to be directed. An undirected graph is a special case of a directed graph in which each undirected edge {u, U}is represented by two symmetric edges (u, u) and (u, u>. The degree of a vertex u in an undirected graph is the number l{(u, WI: (0, w) E E} I. In a general directed graph the indegree (outdegree) of a vertex is the number I {(w, u): (w, u> E El I ( II(u, WI: (0, WI E E} I ). A path in a graph (V, E) is a sequence of vertices (uO, ui,..., uk) such that (ui, ui+i) EE for 0 is called a trivial cycle. The path is simple if uO,. . . , vk_ 1 are distinct and ui, . . . , vk are distinct. We choose as model of computation the parallel random access machine (PRAM). In this model p synchronous processors have access to a common shared memory. We shall assume that no two processors access the same memory cell at the same time. This variant of the PRAM is also known as exclusive-read-exclusive-write (EREW)
PRAM. For more information on this model the reader is referred to the survey paper of [7]. We assume that the graphs are given by their adjacency lists. In this representation an array contains the concatenation of the lists of edges incident on the vertices with each vertex pointing to the first entry of its list. If the graph is undirected then an edge (u, U> has a pointer to the symmetrical edge (u, u). In this paper we shall use the fact that we can remove any subset of (marked) edges or vertices of a graph in time O(log n> on (m + n)/log n EREW processors. The following observation, known as Brent’s scheduling principle (see [71), is useful for developing optimal parallel algorithms: any asynchronous parallel algorithm taking time t that consists of a total of x elementary operations can be implemented by p processors within a time of lx/p1 + t. We note that this result assumes that processor allocation is not a problem (see [7]). This will be true for all applications that we shall consider.
3. Bipartite decomposition We describe a simple but useful decomposition of a general graph into bipartite subgraphs. We shall make use of this decomposition to relate the parallel complexity of maximal matching on general graphs to that on bipartite graphs. The decomposition described here may be generalized to hypergraphs [81. A bipartite decomposition of a graph G is a collection {G,, . . . , GJ of bipartite graphs such that V(G,) = V(G) for all i and {ECG,),..., E(G,)} is a partition of E(G). The integer k is the size of the decomposition. The following lemma states that every graph has a bipartite decomposition of small size that can be computed quickly in parallel. Lemma 1. For any graph G a bipartite decomposition of size [log nl can be computed in constant time on n + m processors of an EREW PRAM. Proof. Let the vertices of G be labeled 0 to n - 1. Thus each vertex label is a [log nl-bit inte-
ger. Number the bits from 0 to [log nl - 1, start-
P. Kelsen /Information
Processing Letters 52 (1994) 223-228
ing at the rightmost (least significant) bit. With an edge e = (u, u) associate as label the index of the lowest bit position at which the label of u and u differ. Denote by Gi the subgraph of G with vertex set V(G) whose edges are the edges labeled i in G. To see that the graph Gi is bipartite, let A (B) denote the set of vertices in G whose ith bit is 0 (1). Clearly, any edge labeled i joins a vertex in A with a vertex in B; hence G, is bipartite. We conclude that the Gi’s form a bipartite decomposition of G of size [log nl. To compute the decomposition we associate a processor with each edge (u, u). A processor can compute the label of its edge by a constant number of standard PRAM operations (see [3, p. 361). 0 The previous lemma yields the following relation between the parallel complexity of maximal matching in bipartite and general graphs.
2. compute a maximal matching Y in the resulting graph (algorithm A); 3. x:=xuy; A straightforward inductive argument shows that after i iterations of the for-loop, X is a maximal matching in the graph (V, E(G,) U . . * U E(G,)) and hence upon termination X is a maximal matching in G. Since t(n) = R(log n), the time required by one iteration of the for-loop is dominated by step 2, i.e., it is O(t(n)). Since k = O(log n), the algorithm runs in time O(t(n). log n). If algorithm A is optimal, then the total number of operations of the general algorithm is O(m + n). By Brent’s principle the general algorithm can be made to run in O(t(n) log n) time on Cm + n)/(t(n) log n) processors. 0
4. Maximal matching Theorem 2. If there is a parallel algorithm that computes a maximal matching in bipartite graphs in time t(n) = R(log n), then there is a parallel algorithm that computes a maximal matching in a general graph in time O(t(n) log n). If the first algorithm is optimal so is the second. Proof. Let A be a parallel algorithm computing a
maximal matching in a bipartite
graph in time
t(n). To compute a maximal matching in an arbi-
trary graph G we proceed as follows: first we compute a bipartite decomposition G,, . . . , G, of size k = [log nl of G in constant time on n + m EREW processors as indicated in the proof of the last lemma. We can extract adjacency list representations for the graphs Gi from the adjacency list for G by sorting the edges by their labels: since the number of different labels is O(log n), the sorting can be done in O(log n) time on (m + n)/log IZ EREW processors using parallel bucket sort (see [3, p. 411). We now compute a maximal matching X in G as follows: X:= y:= $j; for i := 1 to k do 1. remove those vertices from Gi that are incident with an edge in X;
225
in bipartite graphs
In this section we describe an optimal parallel algorithm for computing a maximal matching in a bipartite graph G. Another algorithm for this problem was proposed in [9]. That algorithm is slower than our algorithm by a log n factor and it is non-optimal, requiring CN(m + n) log n) operations. The high-level structure of our algorithm is similar to that of previous algorithms and is described by the following pseudo-code: X:= #; while G is nonempty do compute a matching Y in G and remove vertices incident on Y from G; x:=xu Y; Below we describe a procedure match that computes a matching Y incident with at least m/6 edges in G. Thus, after 00og n> iterations of the above while-loop, G is empty and X is a maximal matching in G. Procedure match makes use of a procedure halve that halves the degree of each vertex. The procedure halve operates on a bipartite graph G’. The basic strategy is to compute an Euler partition of G’, i.e., a decomposition of G’ into edge-
226
P. Kelsen /Information
Processing Letters 52 (1994) 223-228
disjoint paths with the property that each vertex of odd (even) degree is the endpoint of exactly 1 (0) open path. Two-coloring the edges on each path in the Euler partition and removing all edges of one color has the effect of halving the degree of each vertex in G’ (see the following lemma). We assume that the edges (u, U) of G’ incident on the same vertex u have been paired; we denote the partner edge of edge (u, U) by partner(u, v). If u has odd degree then exactly one edge (u, U) will have purtner(u, v> = nil.
procedure halve;
1. for all edges (u, v) in G’ in parallel do succ(u, U) :=partner(v, u); 2. let V(G*) = {(u, v): (u, v) E E(G’)} and E(G*) = {((u, v), succ(u, v)): (u, v> E E(G’) A succ(u, U) # nil). Color the vertices on each path of G* alternately blue and red in such a way that (u, v> and (u, u) receive the same color for all edges (u, v); 3. delete the red edges from G’.
Lemma 3. Let d(v) and d’(v) denote the degree of a vertex v in G’ before and after running procedure halve on G’. If d(v) is even, then d’(v) = d(v)/2 else d’(v) E {ld(u)/21, ld(v)/211.
Proof. Each vertex in G” has indegree and outdegree at most 1, i.e., G* is the vertex-disjoint union of open or closed simple paths. Since G is bipartite, G* is bipartite and each closed path in G* has even length. Thus, we can two-color each path of G* (with different colors for adjacent vertices). Note also that succ(u, v> = (v, w) implies partner(v, u) = (v, w> and hence succ(w, U) = (v, u). We conclude that there is a two-coloring of G* with the property that (u, v) and (v, u) receive the same color for (u, U) E E, as required by step 2 of halve. If (u, w) =partner(u, v>, then ((w, u), (u, VI> is an edge of G* and therefore (u, u) and (u, w) (same color as (w, u)) receive distinct colors. Thus, any two partner edges are colored differently. The claim of the lemma follows. 0
Lemma 4. Procedure halve can be implemented to run in O(log n) time on (m + n)/log n EREW processors. Proof. Steps 1 and 3 of halve are implemented within these resource bounds using standard methods. We assume that the vertices of G’ have been labeled from 0 to n - 1 and the undirected edges in G’ from 0 to m - 1. With each vertex in G* we associate as label the label of the corresponding undirected edge. We implement step 2 using the following substeps that are performed by each vertex x of G*: (1) compute the vertex of lowest label on x’s path; (2) if x = (u, v) is a vertex of lowest label on its path and label(u) > label(v) mark x and all the vertices and its path; only unmarked vertices participate in the following two substeps; (3) if x lies on an open path it computes the distance from itself to the last vertex on its path otherwise it computes the distance from itself to the vertex of lowest label on its path; (4) if the distance is even (odd), x = (u, v> colors itself blue (red) and gives the same color to the vertex in G* representing the symmetrical edge (0, u). Substep 4 can obviously be done in time O(log n) on (n + ml/log n EREW processors. For the other substeps we adapt an algorithm for optimal parallel list ranking given in [l]. The algorithm operates on a linked list of p elements. On a high level the algorithm consists of three stages: (1) Splice out subsets of nonadjacent elements from the list so that at most O(p/ log p) elements remain (or survive); (2) solve the list ranking problem on the reduced list using pointer doubling; (3) reconstruct the list by processing the elements that were spliced out. As pointed out in [l] it is straightforward to implement stages 2 and 3 to run optimally in O(log p) time. It is further shown in [l] that a judicious allocation of processors allows stage 1 to be done in O(log p) time on p/log p processors. The analysis in [l] in fact works for an arbitrary collection of (possibly cyclic) linked lists. To perform any of the first three substeps, we adapt each stage of the above three stage algorithm to compute the information required by the particular substep. This is a routine exercise. The analysis of [l] implies that the resulting implementation of
P. Kelsen /Information
Processing Letters 52 (1994) 223-228
step 2 of halve runs in O(log n) time on (m + n)/log n EREW processors. 0. Algorithm match below computes a matching in a bipartite graph G that is incident with a large number of edges in G. procedure match; G’ := G;
for all u E V(G’) in parallel do p[ v] := nil; While E(G’) f @ do 1. for all u E V(G’) in parallel do: if v has a neighbor w of degree 1 in G’ then p[ v] := w; 2. remove all degree 1 vertices from G’; 3. apply procedure halve to G’; Let S := {(v, p[v]): v E V(G) ~p[v] # nil}. The set S is the union of two matchings M, and M2. Return the matching M (one of M, or M,) incident with the largest number of edges in G. Let G, be the graph consisting of the edges in S (defined in match). Lemma 5. The graph G, is the vertex disjoint union of open simple paths and trivial cycles. The number of edges of G incident on a vertex in G, is at least m/3. The matching M returned by match is incident with at least m/6 edges in G. Proof. Each vertex in G, has indegree and outdegree at most 1. Furthermore if p[u] = v and p[v] = w then either w = u, i.e., (u, v, u) is a trivial cycle, or p[vl := w was executed at an earlier iteration of the while-loop than p[u] := v.
Hence all cycles in G, are trivial and the first statement of the lemma holds. Note that this implies that S is indeed the union of two matchings M, and M,. Let Gj denote the graph G’ at the start of the ith iteration of the while-loop of match. Thus G,, = G. Let d;(v) denote the degree of v in G, and let ki(v) denote the number of degree 1 neighbors of c in Gi. We have di+l(v) Q (di(v) k,(c))/2 + l/2 and hence d,(v) & 2d,,,(v) + hi(v) - 1. Let dj(v) = 1. The last inequality implies that j-l
d,,(v)
> 2’ + c I=0
227
Now suppose that v is an endpoint of an edge in G,. Let P, denote the set of vertices that become a degree 1 neighbor of v during the execution of match and that are not in G,. We observe that these sets P, form a partition of the vertices of positive degree in V(G) - V(G,). Note that a vertex w E P, that has degree 1 in Gi has degree at most 2’+ ’ in G (by Lemma 3 and the fact that w has not seen any degree 1 neighbor). Thus With Eq. (1) we c WEP d,(w) d Z:j:i2’+‘k,(v). see that d,(v) > ~CwEpL,dO(w). It follows that L‘E V(G,)
~EV(G)-V(G,)
and hence C ( ,VCGs,d,(v) > 2m/3. The second claim of the lemma follows. Since S is the union of two matchings M, and M2, one of these two matchings is incident with at least m/6 edges in G. The third claim of the lemma follows. 0 Lemma 6. Procedure match can be implemented to run in time 0(log2n) time on (m + n)/log2n ERE W processors. Proof. By Lemma 4 one iteration of the while-loop of match can be done optimally in time O(log n).
As in the proof of the last lemma we denote by of v at the start of the ith iteration of the while-loop. If di(v) = 1 then di+ ,(v> = 0 (by step 2) and if di(v) > 0 then d;+,(v) < idi (by Lemma 3). Thus, the sum of all the degrees and hence the total number of edges in G’ decreases by a factor of at least 3/2 during one iteration of the while-loop of match. Therefore, the total number of operations done in the while-loop by match is O(m + n). Finally the computation of M can be done optimally in time O(log n) by adapting the optimal list ranking algorithm of [l] (as discussed in the proof of Lemma 4). The claim of the lemma now follows with Brent’s principle. 0 d&v) the degree
j-l
27 k,(v)
- 1) > C 2%,(v). I=0
(1)
Corollary 7. A maximal matching can be computed in a bipartite graph in time 0(log3n) time on (m + n)/log3n EREWprocessors.
228
P. Kelsen / Information Processing Letters 52 (1994) 223-228
Proof. To compute a maximal matching repeat match O(log n) times as described
in G, at the beginning of this section. Since each iteration runs in O(log’n) time, the total time is O(log3n) time. The total number of operations is linear. An optimal algorithm running in 0(log3n) time now follows with Brent’s principle. 0 Corollary 8. A maximal matching can be computed in a general graph in 0(log4n) time on (m + n)/log4n EREWprocessors. Proof. Combine Corollary 7 and Theorem
2.
0
References [l] R.J. Anderson and G.L. Miller, Deterministic parallel list ranking, in: VLSI Algorithms and Architectures, Proc. 3rd Aegean Workshop on Computing, Lecture Notes in Computer Science 319 (Springer, Berlin, 1988) 81-90. [2] C. Berge, Graphs (North-Holland, Amsterdam, 2nd rev. ed., 1985).
[3] R. Cole and U. Vishkin, Deterministic coin tossing with applications to optimal parallel list ranking, Inform. and Control 70 (1986) 32-53. [4] M. Goldberg and T. Spencer, Constructing a maximal independent set in parallel, SIAM J. Discrete Math. 2 (1989) 322-328. [5] A. Israeli and Y. Shiloach, An improved parallel algorithm for maximal matching, Inform. Process. Lett. 22 (1986) 57-60. [6] A. Israeli and Y. Shiloach, A fast and simple parallel algorithm for maximal matching, Inform. Process. Lett. 22 (1986) 77-80. [7] R.M. Karp and V. Ramachandran, Parallel algorithms for shared memory machines, in: J. van Leeuwen, Ed., Handbook of Theoretical Computer Science, Vol. A (Elsevier, Amsterdam and MIT Press, Cambridge, MA, 1990) 869-941. [8] P. Kelsen, On the parallel complexity of computing a maximal independent set in a hypergraph, in: Proc. 24th Ann. ACM Symp. on Theory of Computing (1992) 339-350. [9] G. Lev, Size bounds and parallel algorithms for networks, Tech. Rept. CST-8-80, Dept. of Computer Science, University of Edinburgh, 1980. [lo] M. Luby, A simple parallel algorithm for the maximal independent set problem, SZAM J. Comput. 15 (1986) 1036-1053.