J. Parallel Distrib. Comput. 69 (2009) 221–229
Contents lists available at ScienceDirect
J. Parallel Distrib. Comput. journal homepage: www.elsevier.com/locate/jpdc
Parallel bioinspired algorithms for NP complete graph problems Israel Marck Martínez-Pérez ∗ , Karl-Heinz Zimmermann Computer Engineering, Hamburg University of Technology, Hamburg 21073, Germany
article
info
Article history: Received 26 September 2006 Received in revised form 6 February 2008 Accepted 30 June 2008 Available online 19 July 2008 Keywords: Parallel bioinspired algorithms DNA-based computing Sticker systems NP complete problems
a b s t r a c t It is no longer believed that DNA computing will outperform digital computers when it comes to the computation of intractable problems. In this paper, we emphasise the in silico implementation of DNAinspired algorithms as the only way to compete with other algorithms for solving NP-complete problems. For this, we provide sticker algorithms for some of the most representative NP-complete graph problems. The simple data structures and bit-vertical operations make them suitable for some parallel architectures. The parallel algorithms might solve either moderate-size problems in an exact manner or, when combined with a heuristic, large problems in polynomial time. © 2008 Elsevier Inc. All rights reserved.
1. Introduction Although theoretically discussed by Head in 1987 [16], the concept of DNA Computing was not adopted until 1994 when Adleman solved a small instance of the Hamiltonian path problem with DNA [1]. The Adleman’s experiment attracted considerable interest from the scientific community, who were hoping that the massive parallelization of DNA molecules would one day be the basis to outperform electronic computers when it comes to the computation of complex combinatorial problems [3, 8,18,23,25–28]. But this vision was rapidly discarded when researchers realized some of the drawbacks related to this incipient technology: a growing number of error-prone, timeconsuming operations and exponential growth of DNA volume according to problem size [15]. Although some attempts were made to tackle these difficulties [5,22,24,29,33], no satisfactory solution was found to these problems, making the novel field impractical enough to be not considered as a viable technology for solving intractable problems. In this paper, rather than in vitro, we emphasise the in silico implementation of DNA-based algorithms as the only way to compete with other algorithms in combinatorics. For this, we provide sticker algorithms for solving classical NP-complete graph problems, i.e., vertex cover, k-clique, independent set, 3-coloring problems, bipartite subgraph, k-matching, perfect matching, and edge dominating k-set problems. Such algorithms could make use of the parallelism offered by some computational architectures,
∗
Corresponding author. E-mail address:
[email protected] (I.M. Martínez-Pérez).
0743-7315/$ – see front matter © 2008 Elsevier Inc. All rights reserved. doi:10.1016/j.jpdc.2008.06.014
eliminating the noisy, time-consuming operations of the in vitro approaches. We mitigate the combinatorial complexity by using small combinatorial input libraries [33], a strategy that might enable us to solve either moderate-size problems in an exact manner or, when combined with a heuristic, large problems in polynomial time. The rest of the paper is organized as follows. In Section 2, we introduce the theoretical fundamentals of DNA Computing and the sticker model. Section 3 describes the data structures and the basic operations of the sticker algorithms. In Section 4, we describe some useful procedures for the construction of the algorithms. In Section 5, we introduce the algorithms for the vertex cover, k-clique, 3-coloring, independent set, bipartite subgraph, kmatching, perfect matching, and edge dominating k-set problems, respectively. Finally, in Section 6 we discuss some directions for the in silico implementation of the algorithms. 2. DNA and the sticker model Deoxyribonucleic acid (DNA) is the molecule that encodes the information required to build a cell or an organism. DNA consists of two associated polynucleotide strands that wind together in form of a double helix. It is composed of four nucleotides: adenine (A), guanine (G), cytosine (C), and thymine (T). Nucleid acids are covalently bonded end-to-end in 50 to 30 direction to form singlestranded (ss) DNA molecule. To this end, each sugar is linked to the next via the phosphate group, creating a polymer chain composed of a repetitive sugar–phosphate backbone with its respective bases protruding from it. The associated antiparallel strand is bonded due to the complementary structures of the bases, in which A is paired with T by two hydrogen bonds, and G is paired with C by
222
I.M. Martínez-Pérez, K.-H. Zimmermann / J. Parallel Distrib. Comput. 69 (2009) 221–229
small combinatorial input libraries, called m, k libraries, for solving NP complete graph problems. Such a library consists of m bit long strands, each of them formed so that the first n substrands have k substrands turned on and n − k substrands turned off. So this library provides an encoding of all subsets of k elements (ksubsets) of an n-set. The last m − n substrands are turned off; as in the previous library, they are employed for internal calculations.
n
3. Data structures and sticker operations
Fig. 1. (A) 4-bit memory complexes and (B) its associated stickers.
three hydrogen bonds. DNA molecules can be thought of as base strings allowing a number of useful operations such as matching, concatenation, and insertion. These operations are supported by the modern techniques of DNA manipulation and serve as building blocks for universal computational models [20,19]. The sticker model is one of these resulting models. It belongs to the so-called filtering models of classical DNA Computing, in which there is a separation or filtering operation as a central mechanism of computation [28]. In particular, the sticker model implements an idealized computing machine better known as a register machine, whose registers consist of single stranded DNA molecules of fixed length intended to represent binary information. To this end, each data register is divided into several substrands (bits) of a fixed number of nucleotides. Furthermore, there is a set of sticker strands, each of them is complementary to only one substrand of the data register. A substrand having its annealed sticker represents a bit ‘on’; otherwise, it represents a bit ‘off’. The complex composed of a single stranded DNA molecule and its associated stickers is called memory complex. In this way, a memory complex may represent any binary number (of fixed length) as needed by just annealing the corresponding sticker at the required bit positions of the data registers. A collection of memory complexes is called a tube, which can contain multiple copies of the same strand. The sticker model (see Fig. 1) employs a set of operations to manipulate tubes: the combination of two tubes into a new tube (merge), the separation of a tube into two new tubes (separate), and the setting (set) and clearing (clear) of a determined bit of every register in a tube. These operations are bit-vertical as they manipulate one bit position of all memory complexes at a time. Any finite sequence of these operations is termed a sticker algorithm, whose complexity is given by the total number of laboratory steps. Moreover, this set of operations is robust enough to guarantee computational completeness [20]. A sticker algorithm has a set of bit strings, called initial test tube, as input parameters. Although the overall strategy of most sticker algorithms is to generate an initial test tube containing a large set of potential solutions (and then to remove all non-solutions), one could also generate an initial test tube with approximate solutions, and construct the admisible solutions using a sticker program. At the end of the algorithm, one or more final test tubes contain the solutions, if any. Originally, the sticker model introduces the concept of the [n, k] library as initial test tube, where n stands for the number of bits for each strand. The first k substrands encode the inputs by turning them on or off. The last n − k substrands are turned off; they are employed for internal calculations. Besides, in order to guarantee the complete combinatorial space of solutions, an [n, k] library must provide 2k different memory strands. The main disadvantage of this library is that the number of strands necessary to represent all solutions will grow exponentially with the size of the problem. Zimmermann [33] introduced
The sticker model is suitable for in silico implementation because of its simple data structures and operations. In the sticker model, the basic unit of information is a bit or substrand. In this document, both terms are used interchangeably. A memory complex is a bit string. A tube is an array of type memory complex. n n An initial m, k -tube contains k memory complexes, where each memory complex is an m-bit long string, from which, the first n bits have k bits turned on and n − k bits turned of. The last m − n bits are initially turned off. We can manipulate a tube with the following operations: (i) Merge. The merge (N1 , N2 , N) operation combines the content of two input test tubes (N1 , N2 ) to produce a new test tube (N) containing the memory complexes of the previous tubes. Alternatively, we will use the instruction merge (N1 , N) to indicate that tube N1 empties its content into tube N. (ii) Separate. The separate (N, N + , N − , i) operation divides the content of test tube N into two test tubes N + and N − . Those memory complexes whose ith substrand is set will be placed into test tube N + ; the rest will be placed into test tube N − . (iii) Set. The set (N, i) operation turns on the ith substrand of each memory complex of a test tube N. (iv) Clear. The clear (N, i) operation turns off the ith substrand of each memory complex of a test tube N. (v) Discard. The discard (N) operation empties the content of a test tube N. 4. Useful subroutines We describe some useful procedures which serve as building blocks for the construction of more complex sticker algorithms. In the following, let G = (V , E ) be a finite undirected graph with vertex set V = {v1 , . . . , vn } and edge set E = {e1 , . . . , em }. The ith edge is denoted by ei = {vi1 , vi2 }. The graph information can be represented by either adjacency or incidency matrix. All algorithms presented in this paper were programmed using the StickerSim library [30], a programming environment for the development of sticker algorithms. Although this library does not include the probabilistic nature of DNA computations (i.e., the kinetics and thermodynamics of DNA hybridization), it allows to test the logical behavior of sticker algorithms. 4.1. The EdgeInducedGraphs procedure The algorithm EdgeInducedGraphs provides all subgraphs of a graph G which are induced by the k-subsets of edges [33]. Let F ⊆ E be a subset of edges of the graph G. The graph induced by F is a subgraph GF = (U , F ) of G with vertex set U = {v ∈ V | v end point of e ∈ F }. We provide a sticker algorithm which determines all subgraphs of G induced by the k-subsets of the edge set E, where 1 ≤ k ≤ m. m The input of the algorithm is an m + n, k library N0 providing the encodings of all k-subsets of edges. The algorithm operates in a bit-vertical fashion as it considers in parallel the ith substrand of the memory complexes where 1 ≤ i ≤ m. For those memory complexes whose ith substrand is on, the edge ei
I.M. Martínez-Pérez, K.-H. Zimmermann / J. Parallel Distrib. Comput. 69 (2009) 221–229
223
Algorithm 1 EdgeInducedGraph(N0 , m)
Algorithm 3 Complement(N0 , n)
Require: [m + n, k ] library N0 1: for i ← 1 to m do 2: separate(N0 , N + , N − , i) 3: set(N + , m + i1 ) 4: set(N + , m + i2 ) 5: merge(N + , N − , N0 ) 6: end for 7: return N0
Require: [2n, k ] library N0 1: for for i ← 1 to n do do 2: separate (N0 , N + , N − , i) 3: set(N − , i + n) 4: merge (N + , N − , N0 ) 5: end for 6: return N0 n
m
Table 1 Computation of Weightening N0
Fig. 2. A graph G.
occurs in the corresponding k-set of edges. If the ith substrand is on, the substrands m + i1 and m + i2 are turned on indicating the vertices of the edge ei . In the final test tube, the memory complexes correspond to the subgraphs of G which are induced by the k-sets of edges. The algorithm requires 4m steps. For instance, in view of the graph in Fig. 2 and k = 2, the algorithm yields the following types of memory complexes
N1
N2
N3
Initial
00000 10101 01111 11010
i=1 Sep. on 1
00000 01111
10101 11010
i=2 Sep. on 2
00000
10101 01111
i=3
00000
11010 10101 01111
i=4 Sep. on 4
00000
10101
i=5 Sep. on 5
00000
Sep. on 3
N4
N5
11010
11010 01111 11010 10101
01111
e1
e2
e3
e4
v1
v2
v3
v4
Table 2 Computation of Complement with n = 4 and k = 2
1
1
0
0
1
1
1
0
n1
n2
n3
n4
c1
c2
c3
c4
1
0
1
0
1
1
1
0
1
0
0
1
1
1
1
1
0
1
1
0
1
1
1
0
0
1
0
1
1
0
1
1
0
0
1
1
0
1
1
1
1 1 1 0 0 0
1 0 0 1 1 0
0 1 0 1 0 1
0 0 1 0 1 1
0 0 0 1 1 1
0 1 1 0 0 1
1 0 1 0 1 0
1 1 0 1 0 0
4.2. The Weightening procedure The algorithm Weightening extracts from an input test tube N0 those memory complexes in which exactly k of the substrands m + 1, . . . , m + n are turned on [20,33]. At the end of the loop (1–7), the data test tube Ni , 0 ≤ i ≤ n, contains all memory complexes in which exactly i of the substrands m + 1, . . . , m + n are turned on. The data test tube Nk provides the output of the algorithm. The 1 sticker algorithm requires 2n n+ = n2 + n steps. 2 Algorithm 2 Weightening(N0 , m, n, k) Require: input test tube N0 1: for i ← 0 to n − 1 do 2: for j ← i down to 0 do 3: separate(Nj , N + , N − , m + i + 1) 4: merge(N + , Nj+1 ) 5: merge(N − , Nj ) 6: end for 7: end for 8: return Nk For instance, consider an input test tube N0 providing the encodings of the memory complexes 00000, 10101, 01111, 11010. Table 1 shows the computation of Weightening with m = 0 and n = 5. The output of the algorithm is the test tube Nk so that
the test tubes Nk+1 , . . . , Nn are not required. Therefore, the second statement can be altered to 2 : for j ← min{i, k} down to 0 do. The correspondingly modified algorithm needs only 2(1 + 2 +· · ·+ k + (n − k)(k + 1)) = 2n(k + 1) − k2 − k steps. 4.3. The Complement procedure The algorithm Complement yields the complements of all ksubsets of vertices in a graph G. Let S ⊆ V be a k-subset of the vertices of the graph G. The algorithm finds the complementary subset S of S in V . n The input of the algorithm is an 2n, k library N0 providing the encodings of all k-subsets of vertices, where 1 ≤ k ≤ n. The algorithm turns on substrand i + n for those memory complexes whose ith substrand is turned off. As a result, at the end of the loop we can find the complement of a strand composed of n substrands. The algorithm requires 3n steps. For example, considering the library N0 with n = 4 and k = 2. The output of the algorithm is shown in Table 2. 4.4. The IncidenceRelation procedure The algorithm IncidenceRelation provides the incidence relation between vertices and edges in a graph G.
224
I.M. Martínez-Pérez, K.-H. Zimmermann / J. Parallel Distrib. Comput. 69 (2009) 221–229
Algorithm 4 IncidenceRelation(N0 , l, u, m)
Algorithm 5 IndependentSubset(N0 , l, u)
Require: [(c + c 0 )n + m,
Require: [2n, k ] library N0 1: for i ← l to u − 1 do 2: for j ← i + 1 to u do 3: separate (N0 , N + , N − , i) 4: separate (N + , N ++ , N +− , j) 5: merge (N − , N +− , N0 ) 6: if adjacent(i, j) then 7: set(N ++ , (u + 1) + (j − l)) 8: clear(N ++ , j) 9: end if 10: merge(N ++ , N0 ) 11: end for 12: end for 13: return N0
1: 2: 3: 4: 5: 6: 7: 8: 9: 10:
n k
] library N0 , c > 0 and c 0 ≥ 0 integers
for i ← l to u do separate (N0 , N + , N − , i) for j ← 1 to m do if incident(i, j) then set(N + , u + j) end if end for merge(N + , N − , N0 ) end for return N0
Table 3 Computation of IncidenceRelation
v1 1 1 1 0 0 0
v2 1 0 0 1 1 0
v3 0 1 0 1 0 1
v4 0 0 1 0 1 1
e1
e2
1 1 1 1 1 0
e3
1 1 1 1 0 1
1 1 0 1 1 1
e4 0 1 1 1 1 1
The input of the algorithm is an n + m, library N0 providing the encoding of all k-subsets of vertices (1 ≤ k ≤ n). For practical reasons, we introduce two additional parameters, i.e. the lower (l) and upper (u) bound of the set of substrands of size n, with ((c −1)n+1 ≤ l < u ≤ cn). This allows to generalize the procedure n to n consecutive substrands of a (c + c 0 )n + m, k library. For those memory complexes whose ith substrand is turned on (tube N + ), the algorithm verifies in parallel if the vertex-edge pair (vi , ej ) is incident and, in this case, turns on the (u + j)th substrand, corresponding to the incident edge (statements 4–5). At the end of the loop, the strands composed of the last m substrands provide the incidence relation (vi , ej ) between vertices and edges. The algorithm requires at most n(m + 2) steps. For example, in view of the graph G in Fig. 2 and the library N0 (with n = 4 and k = 2), the output of the algorithm is given in Table 3. It indicates that in the first memory complex, the set of incident edges for v1 and v2 is e1 , e2 , and e3 (e1 and e2 are incident with v1 , while e1 and e3 are incident with v2 ).
n k
4.5. The IndependentSubset procedure The algorithm IndependentSubset constructs an independent set in each subgraph with k vertices of a graph. An independent set in a graph G is a subset of vertices of G such that no two vertices in the set are connected. n The input of the algorithm is an 2n, k library N0 providing the encodings of all k-subsets of vertices of G with 1 ≤ k ≤ n. Moreover, lower (l) and upper (u) bounds of the set of substrands of size n are given (1 ≤ l < u ≤ n). The algorithm verifies whether the ith and jth substrands are turned on (tube N ++ ), i.e., the vertices vi and vj , with i < j, are adjacent. In the affirmative case, the jth substrand is cleared and the (u + 1) + (j − l)th substrand is set, i.e., the vertex vj is removed from the k-subset and stored in the second set of substrands of size n. At the end of the algorithm, the first n strands of each memory complex provide an independent subset of the initially given k-subset. The algorithm requires at most n2 + 6n steps. Notice that the constructed independent subsets depend on the ordering of the vertices. For example, in view of the graph G in Fig. 2 and the library N0 (with n = 4 and k = 3), the algorithm yields the output tube.
n
v1
v2
v3
v4
v1
v2
v3
v4
1
0
0
0
0
1
1
0
1
0
0
1
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
1
0
0
1
0
For instance, the second memory complex encodes the independent subset {v1 , v4 } of the subgraph spanned by the vertex set {v1 , v2 , v4 }. 4.6. The MutuallyDisjointSets procedure The algorithm MutuallyDisjointSets verifies whether the vertex set of a graph G is partitioned into two disjoint subsets. Algorithm 6 MutuallyDisjointSets(N0 , l, u) Require: [2n, k ] library N0 1: for i ← l to u do 2: separate (N0 , N + , N − , i) 3: separate (N + , N ++ , N +− , i + u) 4: separate (N − , N −+ , N −− , i + u) 5: merge (N +− , N −+ , N0 ) 6: discard (N ++ ) 7: discard (N −− ) 8: end for 9: return N0 2n
h
The input of the algorithm is an 2n,
i 2n k
library N0 such that
the first n substrands provide an encoding of the k-subsets of V . Likewise, the second n substrands give another encoding of subsets of V . Moreover, bounds l and u on the set of substrands of size n are given (1 ≤ l < u ≤ n). The algorithm separates the library N0 into four tubes N ++ , N +− , N −+ and N −− each of which indicating the presence (+ ) or absence (− ) of vertices vi and vi+u , respectively. The next statement merges the tubes that contain either vertex vi or vertex vi+u , while the tubes N ++ and N −− are discarded. This algorithm requires at most 6n steps. For example, assume that an already processed N0 library with n = 4 and k = 2 exhibits the following contents.
v1
v2
v3
v4
v1
v2
v3
v4
1
1
0
0
1
1
0
0
1
0
1
0
0
1
0
1
1
0
0
1
1
0
0
1
0
1
1
0
1
0
0
1
0
1
0
1
0
1
0
1
0
0
1
1
1
1
0
0
I.M. Martínez-Pérez, K.-H. Zimmermann / J. Parallel Distrib. Comput. 69 (2009) 221–229
The algorithm eliminates those memory complexes whose ith and (i + u)th substrands have the same value (l = 1 and u = 4). The algorithm produces the output tube.
v1
v2
v3
v4
v1
v2
v3
v4
1
0
1
0
0
1
0
1
0
1
1
0
1
0
0
1
0
0
1
1
1
1
0
0
225
From this tube, Weightening generates the tube.
v1
v2
v3
v4
e1
e2
e3
e4
1
0
1
0
1
1
1
1
0
1
1
0
1
1
1
1
So {v1 , v3 } and {v2 , v3 } are the vertex covers of G. 5.2. The k-clique problem The algorithm k-Clique provides all cliques of size k in a graph G. A clique of G is a subgraph of G in which all vertices are connected. Such a subgraph is called complete. The problem of finding a clique of size k in a graph G is NP-complete [21].
5. NP complete graph problems 5.1. The vertex cover problem The algorithm VertexCover determines if a graph G exhibits a vertex cover of size k. A vertex cover is a set of vertices in G that meets every edge of graph G. The problem of finding a vertex cover of size k or less in a graph G is NP-complete [21]. Algorithm 7 VertexCover(N0 , m, n) Require: [n + m, k ] library N0 1: IncidenceRelation(N0 , 1, n, m) 2: Weightening (N0 , n, m, m) 3: if ¬empty (N0 ) then 4: return N0 5: else 6: report ‘‘no vertex cover’’ 7: end if n
The input of the algorithm is an n + m, k library N0 providing the encodings of all k-subsets of vertices, where 1 ≤ k ≤ n. The algorithm first constructs for each k-subset of vertices the set of incident edges by using the procedure IncidenceRelation. These edges are stored in the last m substrands of the memory complexes. Those memory complexes in which all m substrands are turned on provide vertex covers. The memory complexes with this property are filtered out by Weightening. The algorithm requires nm + 2n + n m2 + m steps, O(n) tubes, and an initial library with O k bit strings. Guo et al. [13] proposed an algorithm which solves the problem using a exponential initial library of O(2n ) strands. The minimum vertex cover problem can be solved by invoking this algorithm for increasing parameters k = 1, . . . , n (resp. input test tubes) until the corresponding output test tube is nonempty. For instance, a library N0 (with k = 2, n = 4, and m = 4) contains the memory complexes. n
v1
v2
v3
v4
e1
e2
e3
e4
1
1
0
0
0
0
0
0
1
0
1
0
0
0
0
0
1
0
0
1
0
0
0
0
0
1
1
0
0
0
0
0
0
1
0
1
0
0
0
0
0
0
1
1
0
0
0
0
In view of the graph G in Fig. 2, IncidenceRelation produces the following tube.
Algorithm 8 k-Clique(N0 , l, u) Require: [n, k ] library N0 1: for i ← l to u − 1 do 2: for j ← i + 1 to u do 3: separate (N0 , N + , N − , i) 4: separate (N + , N ++ , N +− , j) 5: merge (N − , N +− , N0 ) 6: if adjacent(i, j) then 7: merge(N ++ , N0 ) 8: end if 9: discard (N ++ ) 10: end for 11: end for 12: if ¬empty (N0 ) then 13: return N0 14: else 15: report ‘‘no k-Clique’’ 16: end if n
The input of the algorithm is an n, k library N0 providing the encodings of all k-subsets of vertices (1 ≤ k ≤ n) and the lower (l) and upper (u) bound of the set of substrands of size n(1 ≤ l < u ≤ n). The algorithm first calculates the tube N ++ containing those memory complexes in which the ith and jth substrands are turned on. Then the algorithm checks if the corresponding vertices are adjacent. If not, the memory complexes are filtered out (6–7). This algorithm requires at most n2 + 5n steps, O(1) tubes, and n O k bit strings. Previous approaches solved the problem using a exponential initial library [2,17,25]. The maximum clique problem can be solved by invoking k-Clique for descreasing parameters k = n, n − 1, . . . (resp. input test tubes) until the corresponding output test tube is nonempty. A different implementation is given by Zimmermann [33], requiring 4m + 2n(k + 1) − k2 − k steps and n
O
m k 2
bit strings.
For instance, consider the graph G in Fig. 2 and the library N0 (with n = 4 and k = 3) providing the following memory complexes.
v1
v2
v3
v4
1 1 0 1
1 0 1 1
0 1 1 1
v1
v2
v3
v4
e1
e2
e3
e4
1
1
0
0
1
1
1
0
1 1 1 0
1
0
1
0
1
1
1
1
The final tube contains the memory strand.
1
0
0
1
1
1
0
1
0
1
1
0
1
1
1
1
v1
v2
v3
v4
0
1
0
1
1
0
1
1
1
1
1
0
0
0
1
1
0
1
1
1
Therefore, G contains a single 3-clique given by {v1 , v2 , v3 }.
226
I.M. Martínez-Pérez, K.-H. Zimmermann / J. Parallel Distrib. Comput. 69 (2009) 221–229
5.3. The independent vertex set problem The algorithm k-Clique can be modified to provide the independent k-sets in a given graph G. An independent k-set of a graph G is a k-subset of the vertex set of G such that no two vertices in the k-subset are connected. The problem of finding an independent vertex set of size k ina graph G is NP-complete [21]. n The input of the algorithm is an n, k library N0 providing the encodings of all k-subsets of vertices (1 ≤ k ≤ n) and the lower (l) and upper (u) bound of the set of substrands of size n. In the algorithm k-Clique, line 6 of the code is replaced as follows, 6 : if not(adjacent(i, j)) then So the tube N ++ yields all memory complexes in which the ith and jth vertex are not adjacent. This algorithm requires at n bit strings. Bach most n2 + 5n steps, O(1) tubes, and O k et al. [3] proposed an algorithm which solves the problem using an 1.51n initial library and O(n2 m2 ) laboratory steps. Using Sterling’s approximation, it can be shown that our algorithm is better than that Bach’s if k ≤ n/7 or k ≥ n − n/7, that is, in 2 out of 7 cases. Fu [9] proposed an approximation algorithm for the independent set problem using an 1.23n initial library and an additional operation, called append. This algorithm has a low error ratio for sufficiently large number of vertices, but it is technically difficult to implement [4]. The maximum independent set problem can be solved by invoking our algorithm for descreasing parameters k = n, n−1, . . . (resp. input test tubes) until the corresponding output test tube is nonempty. For instance, consider the graph G in Fig. 2 and the library N0 (with n = 4 and k = 2). The output tube, providing the independent 2-subsets {v1 , v4 } and {v2 , v4 } of G, is given by
4: 5: 6: 7: 8: 9:
bit strings. The best DNA-based algorithm proposed
so far for the 3-coloring problem is due to Bach et al. [3]. Both algorithms use the same initial library, but that of Bach et al. uses an enhanced DNA model including an additional operation, called append, requiring in total O(n2 + m2 ) steps. Fu [9] developed an approximation algorithm for the 3-coloring problem (via dynamic programming) using an 1.35n initial library. This algorithm also has a low error ratio for sufficiently large number of vertices, but its implementation is technically difficult [4]. For example, pick a library N0 with n = 4 and k = 1 having the memory complexes.
v1
v2
v3
v4
v1
v2
v3
v4
v1
v2
v3
v4
1
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
In view of the graph G in Fig. 2, IndependetSet leaves the tube invariant. Then Complement yields the tube.
v1
v2
v3
v4
v1
v2
v3
v4
v1
v2
v3
v4
1
0
0
0
0
1
1
1
0
0
0
0
v4
0
1
0
0
1
0
1
1
0
0
0
0
1 0
0 1
0 0
1 1
0
0
1
0
1
1
0
1
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
After this, IndependentSubset provides independent subsets given by the second set of substrands,
v1
v2
v3
v4
v1
v2
v3
v4
v1
v2
v3
v4
1
0
0
0
0
1
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
0
1
0
1
0
0
1
0
1
0
0
0
0
0
1
1
0
0
0
0
1
1
0
Finally, IndependetSet applied to the last n substrands yields the tube.
k P n ] library N0 i
i =1
3:
n i
i =1
v3
Algorithm 9 3-Coloring(N0 , n)
2:
Pk
v2
The algorithm 3-Coloring solves the 3-coloring problem in a graph G. A vertex coloring of G is an assignment of label or colors to each vertex in G such that no edge connects two identically colored vertices. A 3-coloring of G is a vertex coloring of G with at most three colors. The problem of finding a 3-coloring of a graph is NPcomplete [21].
1:
O
v1
5.4. The 3-coloring problem
Require: [n + 2n,
given a different color. Firstly, IndependentSet filters out those ksubsets of vertices which are independent. Secondly, Complement provides the complement of each independent subset storing the complementary set in the second n substrands. Thirdly, IndependentSubset constructs an independent subset of the subgraphs spanned by those vertices which are given by the second n substrands. Forthly, IndependentSet eliminates those memory complexes in which vertices given by the third n substrands are adjacent. Thealgorithm requires 3n2 + 15n steps, O(1) tubes, and
IndependentSet(N0 , 1, n) Complement(N0 , n) IndependentSubset (N0 , n + 1, 2n) IndependentSet(N0 , 2n + 1, 3n) if ¬empty (N0 ) then return N0 else report ‘‘no 3-coloring subgraph’’ end if
v1
v2
v3
v4
v1
v2
v3
v4
v1
v2
v3
v4
1
0
0
0
0
1
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
0
1
0
1
0
0
1
0
1
0
0
For instance, the first memory complex encodes a decomposition of the vertex set of G into three independent subsets: {v1 }, {v2 , v4 }, and {v3 }. 5.5. The bipartite subgraph problem
The input of the algorithm is an [n + 2n, i=1 i ] library N0 , where k = bn/3c, providing the encodings of all k-subsets of vertices. The algorithm decomposes the vertex set of G into three disjoint independent subsets such that each subset can be
Pk
n
The algorithm BipartiteSubgraph determines whether a graph G has a k-bipartite subgraph. A k-bipartite subgraph of G is a bipartite subgraph of G containing k vertices. A bipartite graph is a graph in which the vertex set can be partitioned into two disjoints subsets such that no two vertices of the same subset are adjacent.
I.M. Martínez-Pérez, K.-H. Zimmermann / J. Parallel Distrib. Comput. 69 (2009) 221–229
227
Algorithm 10 BipartiteSubgraph(N0 , n) Require: [2n, k ] library N0 1: IndependentSet(N0 , 1, n) 2: IndependentSet(N0 , n + 1, 2n) 3: MutuallyDisjointSets (N0 , 1, n) 4: if ¬empty (N0 ) then 5: return N0 6: else 7: report ‘‘no bipartite subgraph’’ 8: end if 2n
Fig. 3. A bipartite graph G.
The problem of finding a k-bipartite subgraph in a graph is NPcomplete [11]. h i The input of the algorithm is a
2n,
2n k
library N0 . Each
memory complex in N0 encodes two subsets of the vertices in G, the first and second of which is given by the first and second n substrands, respectively. So each memory complex corresponds to a union of a k-subset of the vertex set into two sets. Firstly, both subsets of such a k-subset are subjected to IndependentSet, filtering out those subsets which are independent in G. Secondly, memory complexes in which both subsets are independent in G serve as input of MutuallyExclusiveSets providing those memory complexes in which the independent subsets are disjoint. So the final memory complexes encode k-bipartite subgraphs of G. The algorithm 2n2 + 10n + 6n = 2n2 + 16n steps, O(1) tubes, requires and O
2n k
bit strings. This appears to be the first DNA-based
solution of the bipartite subgraph problem. For instance, take a library N0 with n = 4 and k = 4 having
8 4
different types of memory complexes. To simplify the example, assume that N0 consists of the following memory complexes,
v1
v2
v3
v4
v1
v2
v3
v4
1
1
1
1
0
0
0
0
1
1
0
0
1
1
0
0
0
0
1
1
0
0
1
1
1
0
1
0
1
0
1
0
0
1
0
1
0
1
0
1
1
0
0
1
0
1
1
0
0
1
1
0
1
0
0
1
0
0
0
0
1
1
1
1
1
1
0
0
0
0
1
1
0
0
1
1
1
1
0
0
In terms of the graph G in Fig. 3, the first IndependentSet statement yields the tube.
v1
v2
v3
v4
v1
v2
v3
v4
1
0
0
1
0
1
1
0
0
1
1
0
1
0
0
1
0
0
0
0
1
1
1
1
The second IndependentSet statement provides the tube.
v1
v2
v3
v4
v1
v2
v3
v4
1
0
0
1
0
1
1
0
0
1
1
0
1
0
0
1
MutuallyDisjointSets gives the final tube.
v1
v2
v3
v4
v1
v2
v3
v4
1
0
0
1
0
1
1
0
0
1
1
0
1
0
0
1
So the graph G is a 4-bipartite graph with vertex subsets {v1 , v4 } and {v2 , v3 }. 5.6. The k-matching problem The algorithm k Matching solves the k-matching problem in a graph G. A k-matching in G is a k-subset of edges in G such that no two of them share a common vertex. The problem of finding a k-matching in a graph is NP-complete [21]. Algorithm 11 kMatching(N0 , m, n, k) Require: [m + n, k ] library N0 1: EdgeInducedGraphs(N0 , m) 2: Weightening (N0 , m, n, 2k) 3: if ¬empty (N0 ) then 4: return N0 5: else 6: report ‘‘no kMatching’’ 7: end if m
The input of the algorithm is an m + n, k library N0 providing the encodings of all k-subsets of edges, where 1 ≤ k ≤ n/2. Firstly, the induced subgraphs from all k-subsets of edges are calculated. Such a subgraph provides a k-matching if and only if it has 2k vertices. Thus Weightening filters out those subgraphs with 2k vertices. The algorithm requires 4m + 2n(2k + 1) − 4k2 − 2k m steps, O(n) tubes, and O k bit strings. A somewhat different approach in vivo was proposed for the matching problem [10]. Although interesting from the biotechnological point of view, this technique requires an O(2n ) initial library. For example, take the library N0 (with k = 2, m = 4, and n = 4) having memory complexes. m
e1
e2
e3
e4
v1
v2
v3
v4
1
1
0
0
0
0
0
0
1
0
1
0
0
0
0
0
1
0
0
1
0
0
0
0
0
1
1
0
0
0
0
0
0
1
0
1
0
0
0
0
0
0
1
1
0
0
0
0
In view of the graph G in Fig. 2, EdgeInducedGraphs yields the following memory complexes. e1
e2
e3
e4
v1
v2
v3
v4
1
1
0
0
1
1
1
0
1
0
1
0
1
1
1
0
1
0
0
1
1
1
1
1
0
1
1
0
1
1
1
0
0
1
0
1
1
0
1
1
0
0
1
1
0
1
1
1
Weightening provides the final tube,
228
I.M. Martínez-Pérez, K.-H. Zimmermann / J. Parallel Distrib. Comput. 69 (2009) 221–229
e1
e2
e3
e4
v1
v2
v3
v4
1
0
0
1
1
1
1
1
So the subgraph induced by the edges e1 and e4 is the only 2matching in G. 5.7. The perfect matching problem A small change in the k Matching algorithm solves the perfect matching problem. Let n > 0 be an even integer. A perfect matching in a graph G with n vertices is an n/2-matching in G. The problem of finding a perfect matching in a graph is also NP-complete [21]. Algorithm 12 PerfectMatching(N0 , m, n) Require: [m + n, k ] library N0 1: EdgeInducedGraphs(N0 , m) 2: Weightening (N0 , m, n, n) 3: if ¬empty (N0 ) then 4: return N0 5: else 6: report ‘‘no perfect matching’’ 7: end if m
initial library. The minimum edge dominating k-set problem can be solved by invoking this algorithm for increasing parameters k = 1, . . . , n (resp. input test tubes) until the corresponding output test tube is nonempty. For example, take the library N0 (with k = 1, m = 4, and n = 4) having the memory complexes. e1
e2
e3
e4
v1
v2
v3
v4
e1
e2
e3
e4
1
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
In terms of the graph G in Fig. 2, EdgeInducedGraphs produces the tube.
The input of the algorithm is an m + n, library N0 providing the encodings of all n/2-subsets of edges. In opposition to the k Matching algorithm, Weightening receives n as last parameter instead of 2k. The algorithm requires 4m +2n(n + 1) − m n2 − n = 4m + n2 + n steps O(n) tubes, and O k bit strings. Chen et al. [6] proposed a solution of the perfect matching problem using the surface-based DNA computing model. For this, the DNA molecules of the solution space are attached to a solid carrier, and biochemical reactions are used to establish the solution. This model requires an O(2n ) initial library.
m k
e1
e2
e3
e4
v1
v2
v3
v4
e1
e2
e3
e4
1
0
0
0
1
1
0
0
0
0
0
0
0
1
0
0
1
0
1
0
0
0
0
0
0
0
1
0
0
1
1
0
0
0
0
0
0
0
0
1
0
0
1
1
0
0
0
0
Then IncidenceRelation yields the tube. e1
e2
e3
e4
v1
v2
v3
v4
e1
e2
e3
e4
1
0
0
0
1
1
0
0
1
1
1
0
0
1
0
0
1
0
1
0
1
1
1
1
0
0
1
0
0
1
1
0
1
1
1
1
0
0
0
1
0
0
1
1
0
1
1
1
Finally, Weightening gives the tube in which the last m substrands are turned on
5.8. The edge dominating k-set problem The algorithm EdgeDominatingSet calculates all dominating edge sets of size k in a graph G. Let S be a k-subset of edges in G. Let NG [S ] denote the set of edges in G which are in S or adjacent to an edge in S. If NG [S ] is the edge set in G, then S is said to be k-dominating. The problem of finding a k-dominated edge set in a graph is NP-complete [32].
e2
e3
e4
v1
v2
v3
v4
e1
e2
e3
e4
0
1
0
0
1
0
1
0
1
1
1
1
0
0
1
0
0
1
1
0
1
1
1
1
Therefore, {e2 } and {e3 } are 1-dominating edge sets in G. 6. Discussion
Algorithm 13 EdgeDominatingSet(N0 , m, n) Require: [m + (n + m), k ] library N0 1: EdgeInducedGraphs(N0 , m) 2: IncidenceRelation(N0 , m + 1, m + n, m) 3: Weightening (N0 , m + n, m, m) 4: if ¬empty (N0 ) then 5: return N0 6: else 7: report ‘‘no edge dominating set’’ 8: end if m
The input of the algorithm is an m + (n + m), k library N0 providing the encodings of all k-subsets of edges, where 1 ≤ k ≤ n. Firstly, EdgeInducedGraphs provides all subgraphs in G which are induced by all the k-subsets of edges. Secondly, IncidenceRelation calculates for each vertex set in such a subgraph, the set of incident edges. (These edges are stored in the third set of substrands of size m.) Such a set of incident edges is k-dominating in G if and only if it equals the edge set in G. Therefore, Weightening filters out those memory complexes whose m substrands on third set are turned on. The algorithm requires 4m + nm + 2n + 2m(m + 1) − m2 − m = m 2 n(m + 2) + m + m steps, O(m) tubes, and O k bit strings. Guo et al. [14] solved a related problem, the vertex dominating-set problem, using a polynomial-time sticker algorithm with an O(2n )
e1
m
We have described sticker algorithms for several representative 2 NP-complete graph problems. These problems are solved by O(n ) n operations and O(n) tubes, using a initial library with O k bit strings. In view of the error model of the sticker system, sticker algorithms may never outperform electronic computers in solving NP-complete problems. However, DNA models of computation essentially provide digital code and manipulate massive amounts of DNA molecules in parallel. Therefore, DNA models of computation may be implemented in silico by computer architectures offering massive parallelism [7]. For instance, a sticker system may be implemented in principle by a massively parallel register machine. In particular, the nature of the bitvertical sticker operations suggest that sticker algorithms can be realized by a highly parallel computer architecture such as a field-programmable gate array (FPGA) [12,31]. However, the combinatorial complexity of the initial library may severely impact the efficiency of these algorithms. For this reason, in silico DNA computing is also limited to solve moderate-size combinatorial problems in an exact fashion. However, we believe the main direction of this approach is not to solve intractable problems in an exact manner, but in a heuristic one. We will explore the emulation of sticker algorithms on FPGA’s in a forthcoming project.
I.M. Martínez-Pérez, K.-H. Zimmermann / J. Parallel Distrib. Comput. 69 (2009) 221–229
Acknowledgment This research was supported by CONACYT and DAAD fellowships to IMMP. References [1] L. Adleman, Molecular computation of solutions to combinatorial problems, Science 266 (1994) 1021–1024. [2] M. Amos, DNA computation, Ph.D. Thesis, Department of Computer Science, The University of Warwick, 1997. [3] E. Bach, A. Condon, E. Glaser, C. Tanguay, DNA models and algorithms for NP complete problems, in: Proc. 11th Ann. IEEE Conf. Comp. Complexity, IEEE Computer Society Press, Philadelphia, Pennsylvania, 1996, pp. 290–299. [4] W.-L. Chang, M. Ho, M. Guo, Molecular solution for the subset-sum problem on DNA-based supercomputing, Biosystems 73 (2004) 117–130. [5] J. Chen, E. Antipov, B. Lemieux, W. Cedeno, D.H. Wood, DNA computing implementing genetic algorithms, in: Preliminary Proceedings DIMACS Workshop on Evolution as Computation, Piscataway, NJ, 1999, pp. 39–49. [6] Z.-P. Chen, X.-L. Lu, L. Wang, Y.-P. Lin, A surface-based DNA algorithm for the perfect matching problem, Computer Research and Development 42 (2005) 1241–1246. [7] D.E. Culler, J.P. Singh, Parallel Computer Architecture, Morgan Kaufmann, SF, 1999. [8] T. Eng, D. Faulhammer, A surface-based DNA algorithm for minimal set cover, in: Proc. 3rd DIMACS Meeting on DNA Based Computers, Pennsylvania, 1997, pp. 74–82. [9] B. Fu, Volume bounded molecular computation. Ph.D. Thesis, Department of Computer Science, Yale University, 1997. [10] L. Gao, R.N. Ma, J. Xu, The molecular algorithm of the matching problem based on plasmid DNA, Progress in Biochemistry and Biophysics 29 (2002) 820–823. [11] M.R. Garey, D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, W.H. Freeman and Company, New York, 1979. [12] M. Glesner, P. Zipf, M. Renovell, Field-Programmable Logic and Application, Springer, New York, 2002. [13] M. Guo, W.L. Chang, M. Ho, J. Lu, J. Cao, Is optimal solution of every NPcomplete or NP-hard problem determined from its characteristic for DNAbased computation, Biosystems 80 (2005) 71–82. [14] M. Guo, M. Ho, W.L. Chang, Fast parallel molecular solution to the dominatingset problem on massively parallel bio-computing, Parallel Computing 30 (2004) 1109–1125. [15] J. Hartmanis, On the weight of computations, Bulletin of the European Association for the Theoretical Computer Science 55 (1995) 136–138. [16] T. Head, Formal language theory and DNA: An analysis of the generative capacity of specific recombinant behaviors, Bulletin of Mathematical Biology 49 (1987) 737–759. [17] M. Ho, W.L. Chang, M. Guo, T. Yang, Fast parallel solution for set-packing and clique problems by DNA-based computing, IEICE Transactions on Information and Systems E87-D (2004) 1782–1788. [18] N. Jonoska, S. Karl, A molecular computation of the road color problem, in: Proc. 2nd Annual Meeting on DNA Based Computing, Princeton, 1996, pp. 148–158. [19] L. Kari, DNA computing: Arrival of biological mathematics, The Mathematical Intelligencer 19 (1997) 9–22. [20] L. Kari, G. Paun, G. Rozenberg, A. Salomaa, S. Yu, DNA computing, sticker systems, and universality, Acta Informatica 35 (1998) 401–420.
229
[21] R.M. Karp, Reducibility among combinatorial problems, in: Proc. Sympos. IBM, Plenum, New York, 1972, pp. 85–103. [22] J.Y. Lee, H.W. Lim, S.I. Yoo, B.T. Zhang, T.H. Park, Efficient initial pool generation for weighted graph problems using parallel overlap assembly, in: DNA10, Milano, 2005, pp. 215–223. [23] R.J. Lipton, DNA solution of hard combinatorial problems, Science 268 (1995) 542–548. [24] I.M. Martínez-Pérez, G. Zhang, Z. Ignatova, K.-H. Zimmermann, Biomolecular autonomous solution of the Hamiltonian path problems via DNA hairpin formation, International Journal of Bioinformatics Research Applications 1 (2005) 389–398. [25] Q. Ouyang, P.D. Kaplan, S. Liu, A. Libchaber, DNA solution of the maximal clique problem, Science 278 (1997) 446–449. [26] C. Papadimitriou, K. Steiglitz, Combinatorial Optimization, Prentice Hall, Englewood Cliffs, NJ, 1982. [27] B. Ravinderjit, N. Chelyapov, C. Johnson, P. Rothemund, L. Adleman, Solution of a 20 variable 3-SAT problem on a molecular computer, Science 296 (2002) 499–502. [28] S.E. Roweis, E. Winfree, B. Burgoyne, N.V. Chelyapov, M. Goodman, P. Rothemund, L. Adleman, A sticker based architecture for DNA computation, in: Proc. 2nd Annual DIMACS Meeting on DNA Based Computers, Princeton, 1996, pp. 1–29. [29] K. Sakamoto, H. Gouzu, K. Komiya, D. Kiga, S. Yokoyama, T. Yokomori, M. Hagiya, Molecular computation by DNA hairpin formation, Science 288 (2000) 1223–1226. [30] O. Scharrenberg, Stickersim: A programming library for sticker algorithms, Project Work, Hamburg University of Technology, Hamburg, Germany, 2006. [31] R.C. Seals, G.F. Whapshott, Programmable Logic: PLDs and FPGAs, McGrawHill, 1997. [32] M. Yannakakis, F. Gavril, Edge dominating sets in graphs, SIAM Journal of Applled Mathematics 48 (1980) 364–373. [33] K.-H. Zimmermann, Efficient DNA sticker algorithms for graph theoretic problems, Computer Physics Communications 144 (2002) 297–309.
Israel M. Martinez-Perez is currently a Ph.D. student at the Department of Computer Engineering at Hamburg University of Technology. He received the BSc and MSc degrees in Electronic Systems and Intelligent Systems, respectively, from the Monterrey Institute of Technology (ITESM). His research interests include bimolecular computing, artificial intelligence, and optimization models inspired by nature.
Karl-Heinz Zimmermann is an associate Professor of Computer Science at the Hamburg University of Technology. He received the Ph.D. degree from the University of Erlangen and the Habilitation from the University of Bayreuth. He was a Heisenberg fellow at the University of Karlsruhe and a Fulbright fellow at Princeton University. He serves as Associate Editor of International Journal of Bioinformatics Research and Applications and Journal of VLSI Signal Processing Systems. His current research interests include bimolecular computing, protein informatics, and combinatorics and optimization.