P_UNSAT approach of attractor calculation for Boolean gene regulatory networks

P_UNSAT approach of attractor calculation for Boolean gene regulatory networks

Accepted Manuscript P_UNSAT Approach of Attractor Calculation for Boolean Gene Regulatory Networks Qinbin He, Zhile Xia, Bin Lin PII: DOI: Reference:...

1MB Sizes 0 Downloads 52 Views

Accepted Manuscript

P_UNSAT Approach of Attractor Calculation for Boolean Gene Regulatory Networks Qinbin He, Zhile Xia, Bin Lin PII: DOI: Reference:

S0022-5193(18)30151-6 10.1016/j.jtbi.2018.03.037 YJTBI 9414

To appear in:

Journal of Theoretical Biology

Received date: Revised date: Accepted date:

21 August 2017 27 March 2018 28 March 2018

Please cite this article as: Qinbin He, Zhile Xia, Bin Lin, P_UNSAT Approach of Attractor Calculation for Boolean Gene Regulatory Networks, Journal of Theoretical Biology (2018), doi: 10.1016/j.jtbi.2018.03.037

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Highlights • We proposed an approach of attractor calculation for large-scale Boolean gene regulatory networks.

CR IP T

• The proposed method combined the predecessor approach and the logic unsatis?ability approach to accelerate the attractor calculation.

AC

CE

PT

ED

M

AN US

• The algorithm is effective to calculate all attractors for large-scale Boolean gene regulatory networks with a relatively large average degree.

1

ACCEPTED MANUSCRIPT

P UNSAT Approach of Attractor Calculation for Boolean Gene Regulatory Networks Qinbin He1,∗ ,Zhile Xia1 ,Bin Lin1

CR IP T

1 Department of Mathematics,Taizhou University, Linhai, Zhejiang 317000, China E-mail: [email protected],[email protected],[email protected]

Abstract

AN US

Boolean network models provide an efficient way for studying gene regulatory networks. The main dynamics of a Boolean network is determined by its attractors. Attractor calculation plays a key role for analyzing Boolean gene regulatory networks. An approach of attractor calculation was proposed in this study, which combined the predecessor approach and the logic unsatisfiability approach to accelerate attractor calculation. The proposed algorithm is effective to calculate all attractors for large-scale Boolean gene regulatory networks even the networks with a relatively large average degree.

1. Introduction

ED

M

Keywords: Boolean networks; Genetic regulatory networks; Attractor

AC

CE

PT

Boolean networks (BNs) provide a special kind of finite-state dynamical system originally proposed by Kauffman in 1969 to model genetic regulatory networks [1, 2]. Boolean network models can be particularly suited to large-scale gene regulatory networks [3, 4, 5, 6, 7]. In a Boolean network, the state of node mainly presents as ON (active, 1 value) and OFF (inactive, 0 value). The nodes on network connect with each other by directed edges, and logical rules assigned reflect the mutual regulations between interacted genes. The BNs as deterministic dynamics would finally flow into periodic sets regardless of the initial states called attractors. Attractor states which were known corresponding to cell phenotypes [8, 9, 10, 11, 12, 13]. BNs provide an effective way to studying the complexity of networks and show the statistical features of real living cells. Attractor computing plays a key role for analyzing BNs. Finding attractors of a BN is an NP-hard problem [14, 15, 16]. It is very difficult (often impossible) to find all attractors for a large network due to computational complexity. Although, the approach to exhaustively find attractors is able to calculate all attractors, it ∗

Author for correspondence

Preprint submitted to J THEOR BIOL

March 28, 2018

ACCEPTED MANUSCRIPT

CE

PT

ED

M

AN US

CR IP T

is not suitable for computing attractors of large-scale networks. However, networks based on real life systems can consist of dozens or even hundreds of nodes. Many approaches have been proposed to calculate attractors in the past years, such as sampling approach [17, 18], reduction approach [19, 20, 21], sub-network method [22, 23, 24], parallel simulation [25, 26], SAT-based algorithm [27], predecessor-based approach [28, 29, 30]. Until now, there is no generic method that can efficiently find every attractor for largescale networks. The approach based on sampling cannot guarantee finding all attractors. Reduction approach and sub-network method require specific networks and there is a risk of errors to get attractors. The parallel algorithm can speed up the attractor calculation, such as accelerating several times or dozens of times. However, this accelerating of attractor calculation can not solve the problem in essence. A good algorithm is the key to solve the attractor calculation of BNs. The paper [29] proposed an approach of finding every attractor in a Boolean network. The method analyzes certain partial states to see whether or not they are capable of occurring in an attractor state. Based on the paper [29], the paper [30] proposed the method of identification constant nodes and simplifying Boolean networks to accelerate attractor calculation. However, there are two problems with these predecessor-based approaches [28, 29, 30]. Firstly, it can not be very effective to determine the stability of the each partial state. Secondly, there are some part algorithms that use the traversal method, which is a major obstacle to solving large-scale networks. Although there are various disadvantages, in some respects, predecessor-based approach can be very effective for dealing with the attractors of large-scale Boolean networks. SAT-based algorithm [27] is based on the logic satisfiability method. SAT-based algorithm appears to be the most effective and promising method of attractor calculation for large-scale Boolean networks. However, SAT-based algorithm cannot effectively deal with large-scale Boolean networks with a relatively large average degree. And, it may use a relatively long time and even will fail to get the attractors. Therefore, we improved the paper [30] by combining the thought of logic satisfiability (unsatisfiability), and we expect that the new approach P UNSAT approach (Predecessor UNSAT approach) can more efficiently calculate the attractors of large-scale Boolean networks with a relatively large average degree.

AC

2. Results

2.1. Boolean networks The update schemes of BNs mainly are synchronous, asynchronous and stochastic, which correspond to synchronous, asynchronous and stochastic BNs, respectively. In this study the synchronous updating mechanism is only considered for analyzing Boolean networks. A directed network is composed of n nodes V = {v 1 , v 2 , ..., v n }. At each discrete time step t ≥ 0, each node vi has a Boolean state xi ∈ {0, 1} and forms a network state 3

ACCEPTED MANUSCRIPT

X = X(t) = (x1 (t), x2 (t), ..., xn (t)). The set of Boolean functions f = (f1 , f2 , ..., fn ) can be expressed as follows, xi (t + 1) = fi (X(t)) (i ∈ {1, ..., n}), (1) or

(2)

CR IP T

X(t + 1) = f (X(t)) = (f1 (X(t)), ..., fn (X(t))).

M

AN US

2.2. Concept and properties This function describes how to determine a Boolean-valued output based on the basic logical operations include logical AND ”∧”, OR ”∨” and NEGATION ”¬”. For any an initial state X = X(0) of Boolean network, the system updates itself until the system reaches its final states A = {X0 , X1 , ..., Xp−1 }(p ≥ 1), called an attractor. The attractor could be a limit cycle (p > 1) or a steady state (p = 1). The set of all states that converge to A forms an attractor basin. The all states of attractors constitute an attractor state set. All states except attractor states are called transient states or non-attractor states. For M ⊆ V = {v1 , v2 , ..., vn }, y M ∈ {0, 1}|M | is a set of |M | node states, called a partial state[29], e.g., y M = {v0 = 0, v1 = 1} = ¬v0 ∧ v1 . A partial state y M is stable if it is contained in a attractor state. A partial state y M is unstable if it is not contained in any an attractor state. If a partial state y N (regardless the states of the other nodes) will lead to a partial state y M after time k steps, we call y N is a k-predecessor of y M , and denote y M ≺ · · ≺} y N . Without causing confusion, we denote y M = y M (0) ≺ y M (−1) ≺ | ·{z k

AC

CE

PT

ED

· · · ≺ y M (−k). Here, y M (−i)(i = 1, 2, ..., k) is a i−predecessor of the partial state y M . The directed networks have their in-degrees and out-degrees (the numbers of edges into and out of each node) independently. The edge into (out) a node is also called an in-edge (out-edge). In addition, for a directed network, the average in-degree is equal to the average out-degree. In this study, the average in-degree or the average out-degree also is called average degree. In theory, as long as we exclude all non-attractor states in total state space of a Boolean network, we will get all attractor states. Correspondingly, we will get all attractors of a Boolean network. Therefore, we should as many as possible to exclude non-attractor states, so that left as less as possible the remaining states. As long as we traverse these remaining states, we will get all attractors of a Boolean network. Non-attractor state is closely related to the unstable partial states. Here, we first introduce some related lemmas[29, 30].

Lemma 1. If exists a periodic predecessor map in the predecessor maps of a partial state y M , i.e., y M ≺ · · · ≺ y N1 ≺ · · · ≺ y Nq+1 ≺ · · ·, here y N1 = y Nq+1 , the partial state y M is stable.

Lemma 2. If a partial state y M is unstable, all predecessor maps of the partial state y M will eventually end. 4

ACCEPTED MANUSCRIPT

A clause is a disjunction of literals over distinct variables. A propositional sentence n is in conjunctive normal form (CNF) if it has the form ϕ = ∧ ci = c1 ∧ c2 ∧ · · · ∧ cn , i=1

im

where each ci = ∨ lij = li1 ∨ li2 ∨ ... ∨ liim is a clause, and lij ∈ {0, 1} is a literal (i = j=1

AN US

CR IP T

1, 2, ..., n; j = 1, 2, ..., im ) [31]. For a logical formula, then there is an assignment of the formula, so that its value is true, we call the logic formula is satisfiable, otherwise we call the logic formula is unsatisfiable. Any a logical formula can be converted to CNF form. Usually we judge the satisfiability of a logical formula by transforming the logical formula to CNF. A logic formula P is said to imply another logic formula Q, denoted P → Q, iff every assignment that satisfies P also satisfies Q. If logic formulas P and Q are satisfied P → Q and Q → P , denoted P ↔ Q. Lemma 3. [31] If A, B are logical formulas , then it is satisfied: A = B ↔ (A → B) ∧ (B → A) ↔ (¬A ∨ B) ∧ (A ∨ ¬B).

ED

M

The technique in [27] represents attractor detection in a BN as a satisfiability (SAT) problem. The main idea is inspired by SAT-based bounded model checking: the transition relation of the BN is unfolded into a bounded number of steps in order to construct a propositional formula which encodes attractors and which is then solved by a SAT-based solver. In every unfolding step a new variable is required to represent a state of a node in the BN. It is clear that the efficiency of these algorithms largely depends on the number of unfolding steps required and the number of nodes in the BN. Here, we use unsatisfiability of a CNF to calculate the attractors of BNs.

PT

Rule 1. Unit resolution [32, 33]: if A is a logical formula and p is a literal, then it is satisfied: p ∧ (¬p ∨ A) = p ∧ A. Rule 2. If A is a logical formula and p is a literal, then it is satisfied: A ∧ (p ∨ A) = A.

CE

Proof. It is obvious: A ∧ (p ∨ A) = (A ∧ p) ∨ (A ∧ A) = (A ∧ p) ∨ A = A. 

AC

Rule 3. If A is a logical formula and p is a literal, then it is satisfied: (¬p∨A)∧(p∨A) = A. Proof. It is obvious:(¬p ∨ A) ∧ (p ∨ A) = (¬p ∨ p) ∧ (¬p ∨ A) ∧ (A ∨ p) ∧ (A ∨ A) = (¬p ∨ A) ∧ (A ∨ p) ∧ A = (¬p ∨ A) ∧ A = A.  Theorem 1. A partial state y M is unstable, iff all predecessor maps of the partial state y M eventually end. Proof. Based on lemma 1, we know that if exists a periodic predecessor map in the predecessor maps of a partial state y M , the partial state y M is stable. Therefore, if all predecessor maps of the partial state y M eventually end, partial state y M is unstable. On 5

ACCEPTED MANUSCRIPT

the other hand, based on lemma 2, we know that if a partial state y M is unstable, all predecessor maps of the partial state y M will eventually end. This finishes the proof.  It is obvious that let a logical formula ϕ = ϕ1 ∧ p ∧ ¬p, then ϕ is self-conflict, i.e., unsatisfiable. Here ϕ and ϕ1 are logical formula and p is a literal. k

CR IP T

Theorem 2. Let ϕk = y M (0) ∧ ∧ ψi (X(−i) = f (X(−i − 1))), if ϕk is self-conflict, i=0

the partial state y M is unstable. Here it is satisfied ψk (X(−k) = f (X(−k − 1))) = kl

∧ (xj (−k) = fj (X(−k − 1))), and all xj (−k) is presented in ϕk−1 (j = (k1 , ..., kl )).

j=k1

AN US

Proof. Based on equations (1), it is obvious that all assignments satisfying ϕk are all ipredecessors of the partial state y M (i = 1, 2, ..., k). If ϕk is self-conflict, ϕk is unsatisfiable. Therefore, there is no k-predecessor of y M , i.e., all predecessor maps of the partial state y M eventually end. Based on lemma 2, we know the partial state y M is unstable. This finishes the proof. 

AC

CE

PT

ED

M

2.3. Description of algorithm We use theorem 2 to judge a partial state y M stable or unstable, and we take the maximum value k can not exceed 20. During the process of judging stability, we continue to make use of Rule 1-3 to logical simplify CNF of ϕk , until the ϕk becomes self-conflict or reaches the limit of k = 20. The algorithm consists of several parts. We first give the flow chart of the algorithm, shown in Figure 1.

6

ACCEPTED MANUSCRIPT

Begin

Judge stability of all 1-node partial states

All states finish judged? yes

AN US

Judge stability of all 2-node partial states

no

CR IP T

Do samples for Boolean network

Calculate attractors of the network

M

End

ED

Figure 1: Flow chart of the algorithm.

AC

CE

PT

Do samples for Boolean network A sample of randomly selected network state is simulated to find a sample of attractors. The sampling approach used in the program would enable the sizes of the most prominent attractors to be estimated. We take 3000 samples in the program. The vast majority of the steady partial states can be judged by sampling. Judge stability of all 1-node partial states We use theorem 2 and Rule 1-3 to judge a 1-node partial state y M stable or unstable. At this stage, we should find as many as possible 1-node unstable partial states. We repeat doing the process of searching all 1-node partial states. When a process is complete, we compare the current result with the previous result. If the two results are the same, the stage of ”Judge stability of all 1-node partial states” is finished. Judge stability of all 2-node partial states We also use theorem 2 and Rule 1-3 to judge a 2-node partial state y M stable or unstable. However, when we finish the process of judging all 2-node states, we do not need to repeat this process. Calculate attractors of the network Up to now, we have established the set of unstable partial states involving 1,2 nodes. 7

ACCEPTED MANUSCRIPT

AN US

CR IP T

We know that any one of attractor states can not contain an unstable partial state. Based on the set of unstable partial states, we construct a logic discriminant that satisfies any one of a attractor state. Then, we further transform this logic discriminant to a CNF form, and calculate all the attractors. We give an example for further illustrate the process of the attractor calculation. Example 1 P UNSAT approach of attractor calculation for a 6-node Boolean gene regulatory network: B0(t + 1) = ¬B2(t) ∨ ¬B0(t) ∨ B5(t); B1(t + 1) = B2(t); B2(t + 1) = (B2(t) ∨ B4(t)) ∧ (B2(t) ∨ ¬B5(t)); B3(t + 1) = B3(t) ∧ B5(t); B4(t + 1) = B3(t) ∧ ¬B0(t); B5(t + 1) = ¬B3(t) ∧ ¬B2(t).

M

Based on lemma 3, we get the CNF form of the 6-node Boolean network, as follows: (¬B0(t + 1) ∨ ¬B0(t) ∨ ¬B2(t) ∨ B5(t)) ∧ (B0(t + 1) ∨ B2(t)) ∧ (B0(t + 1) ∨ B0(t)) ∧ (B0(t + 1) ∨ ¬B5(t)); (¬B1(t + 1) ∨ B2(t)) ∧ (B1(t + 1) ∨ ¬B2(t)); (¬B2(t + 1) ∨ B2(t) ∨ B4(t)) ∧ (¬B2(t + 1) ∨ B2(t) ∨ ¬B5(t)) ∧ (B2(t + 1) ∨ ¬B2(t)) ∧ (B2(t + 1) ∨ ¬B4(t) ∨ B5(t)); (¬B3(t + 1) ∨ B3(t)) ∧ (¬B3(t + 1) ∨ B5(t)) ∧ (B3(t + 1) ∨ ¬B3(t) ∨ ¬B5(t)); (B3(t) ∨ ¬B4(t + 1)) ∧ (¬B0(t) ∨ ¬B4(t + 1)) ∧ (B0(t) ∨ ¬B3(t) ∨ B4(t + 1)); (¬B3(t) ∨ ¬B5(t + 1)) ∧ (¬B2(t) ∨ ¬B5(t + 1)) ∧ (B2(t) ∨ B3(t) ∨ B5(t + 1));

ED

We take 3000 samples and get the attractor states shown as in table 1. Table 1: 3000 samples to get three attractor states

PT

B0 1 0 1

B1 1 1 0

B2 1 1 0

B3 0 0 0

CE

1 2 3

B4 0 0 0

B5 0 0 1

AC

From Table 1, we easily conclude that in all 2 × 6 1-node partial states, there may be two 1-node partial states that are unstable, i.e., the partial states B3 and B4. The 1-node partial states ¬B0, B0, ¬B1, B1, ¬B2, B2, ¬B3, ¬B4, ¬B5 and B5 are stable. In all 4C62 2-node partial states, there may be 28 partial states are unstable, for example, ¬B0 ∧ ¬B1, ¬B1 ∧ B2. Here, we determine the stability of the partial state B3. Based on theorem 2 and Rule 1-3, we get the calculation, as follows: B3 ↔ B3(0) ↔ B3(0)∧(¬B3(0)∨B3(−1))∧(¬B3(0)∨B5(−1))∧(B3(0)∨¬B3(−1)∨ ¬B5(−1)) ↔ B3(0) ∧ B3(−1) ∧ B5(−1) ↔ B3(0) ∧ B3(−1) ∧ B5(−1) 8

ACCEPTED MANUSCRIPT

ED

M

AN US

CR IP T

∧(¬B3(−1) ∨ B3(−2)) ∧ (¬B3(−1) ∨ B5(−2)) ∧ (B3(−1) ∨ ¬B3(−2) ∨ ¬B5(−1)) ∧(¬B3(−2) ∨ ¬B5(−1)) ∧ (¬B2(−2) ∨ ¬B5(−1)) ∧ (B2(−2) ∨ B3(−2) ∨ B5(−1)) ↔ B3(0) ∧ B3(−1) ∧ B5(−1) ∧ B3(−2) ∧ B5(−2) ∧ ¬B4(−2) ∧ ¬B3(−2). It is obvious that the above formula is self-conflict, i.e., B3 is unstable. Similarly, B4 is also unstable. To determine the 28 possible unstable partial states, we also use theorem 2 and Rule 1-3, and we get the 2-node unstable partial states: ¬B0∧¬B1, ¬B0∧¬B2, ¬B0∧B3, B0∧ B3, ¬B0 ∧ B4, ¬B0 ∧ B5, B1 ∧ ¬B2, ¬B1 ∧ B2, ¬B1 ∧ ¬B5, B1 ∧ B5, ¬B2 ∧ ¬B5, B2 ∧ B5. Thus, we have established the set of unstable partial states involving 1,2 nodes. Then all the attractor states must satisfy the following logic formula: ¬B3 ∧ ¬B4 ∧ ¬(¬B0 ∧ ¬B1) ∧ ¬(¬B0 ∧ ¬B2) ∧ ¬(¬B0 ∧ B3) ∧ ¬(B0 ∧ B3) ∧ ¬(¬B0 ∧ B4) ∧ ¬(¬B0 ∧ B5) ∧ ¬(B1 ∧ ¬B2) ∧ ¬(¬B1 ∧ B2) ∧ ¬(¬B1 ∧ ¬B5) ∧ ¬(B1 ∧ B5) ∧ ¬(¬B2 ∧ ¬B5) ∧ ¬(B2 ∧ B5). We can easily get the corresponding CNF form: ¬B3 ∧ ¬B4 ∧ (B0 ∨ B1) ∧ (B0 ∨ B2) ∧ (B0 ∨ ¬B3) ∧ (¬B0 ∨ ¬B3) ∧ (B0 ∨ ¬B4) ∧ (B0 ∨ ¬B5) ∧ (¬B1 ∨ B2) ∧ (B1 ∨ ¬B2) ∧ (B1 ∨ B5) ∧ (¬B1 ∨ ¬B5) ∧ (B2 ∨ B5) ∧ (¬B2 ∨ ¬B5). We calculate the attractors satisfying the above formula, and we get two attractors as follows: NO. 1(length 2) 1 1 1 0 0 0; 0 1 1 0 0 0. NO. 2(length 1) 1 0 0 0 0 1.

AC

CE

PT

2.4. simulations and parameters We build the set of unstable partial states involving k = 1,2 nodes. Our algorithm uses the thought of logic unsatisfiability to determine whether each partial state y M is stable or unstable. To initialize the algorithm, a sample of randomly selected network states is simulated to find a sample of attractors. The sampling approach used in the initialization would enable the sizes of the most prominent attractors to be estimated. We take 3000 samples in the program. Simulation shows that increasing the sampling size cannot effectively reduce the computation. Instead, for a larger network, too many samples will increase the computation. In this study, each network is a randomly constructed Boolean network. To construct a network with average degree d, we randomly take input nodes with maximum number r = [2d] (here, [.] denoted by the operation of rounding). In order to avoid over-calculation, we take k=20 in theorem 2. All simulations in this study are run by a computer with 2.13 GHz processor and 32GB of RAM(IBM X3550m3). The time of a simulation cannot exceed 20 min, otherwise the analysis was prematurely aborted and deemed unsuccessful. In the following study, each case is conducted by 100 simulations and each result is averaged by successful simulations. This approach is used as standard method unless otherwise indicated. 9

ACCEPTED MANUSCRIPT

3. Discussion

AN US

CR IP T

The algorithm in [30] optimizes the algorithm of [29] for analyzing the stability of partial states. Furthermore, the algorithm combines with the identification of constant nodes and simplifies Boolean networks to accelerate attractor calculation. Although the algorithm takes use of a lot of skills to simplify the calculation, it can not be very effective to determine the stability of the each partial state. There are some part methods that use the traversal method, which is a major obstacle to solving large-scale networks. This algorithm requires a lot of computation, also results in a large proportion of unsuccessful cases of judgment stability for partial states involving k=1,2 nodes. Therefore, the algorithm needs to further calculate the stability of all 3-node partial states, thus further increasing the large amount of computation. The algorithms in [29, 30] calculate the attractors based on truth tables, and P UNSAT algorithm calculates the attractors based on the logic equations. However, the logic equations can be converted into the truth tables. As in [29], the Boolean network of Example 1 can be represented in table 2. Table 2: The Boolean network and its specific structure of Example 1 f unction 11101111 01 01110101 0001 0010 1000

ED

M

node B0 B1 B2 B3 B4 B5

neighbors B0 B2 B5 B2 B2 B4 B5 B3 B5 B0 B3 B2 B3

AC

CE

PT

Here, we randomly generate logic equations (Boolean networks), and then converted into truth tables. We investigate the effectiveness of the P UNSAT algorithm by comparing the algorithms of [29, 30], as shown in Fig.2.

10

CR IP T

ACCEPTED MANUSCRIPT

(a)

(b)

AN US

Figure 2: Three algorithms calculate Boolean networks with average degree 2 and the network size varies from 10 to 50. (a)The time of computing attractors and the network size. (b)The percentage of successful simulations and the network size.

AC

CE

PT

ED

M

From, Fig.2(a), we can see that P UNSAT algorithm is faster than the other two algorithms, and we can seen from Fig.2(b) that the successful rate of P UNSAT algorithm is also significantly higher than the other two algorithms’. It is obvious that our algorithm in this paper is much batter than the algorithms of [29, 30]. SAT algorithm [27] appears to be the most effective and promising method of attractor calculation for large-scale Boolean networks. However, SAT-based algorithm cannot effectively deal with large-scale Boolean networks with a relatively large average degree. And, it may use a relatively long time and even will fail to get the attractors. Therefore, we improved the algorithm in [30] by combining the thought of logic unsatisfiability to calculate attractors of Boolean networks. Due to the efficiency of the algorithm, we do not need to determine the stability of 3-node partial states, further reducing the amount of computation. Different from the SAT-based algorithm, we proposed P UNSAT approach. We have conducted a great number of various simulations to test the effectiveness of the algorithm. On the whole, the proposed algorithm is effective to calculate all attractors for large-scale Boolean gene regulatory networks even the networks with a relatively large average degree. We investigate the effectiveness of the new algorithm by comparing SAT-based algorithm [27], shown in Fig.3. We can see from Fig.3(a), when the average degree is greater than 6, the calculation speed of P UNSAT approach is more quick than that of SAT-based algorithm. When the average degree is greater than 7, P UNSAT approach gets better results on both the calculation speed and the percentage of successful simulations. Especially, the calculation speed of P UNSAT approach is nearly ten times faster than SAT-based approach’s.

11

ACCEPTED MANUSCRIPT

2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5

90% 80% 70%

CR IP T

successful simulations

100%

P-UNSAT SAT

60% 50%

2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 average degree

(a)

(b)

AN US

Figure 3: Two algorithms calculate 64-node Boolean networks with different average degree. (a)The time of computing attractors and the average of networks. (b)The percentage of successful simulations and the average degree of networks.

In addition, we investigate the efficiency of the algorithm for various Boolean networks with average degree 6 and the network size varies from 32 to 128, shown in Fig.4.

ED

time(seconds)

64

successful simulations

M

128

32

Each network with

16

64

network size

CE

32

PT

average degree 6

96

128

100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%

32

64

96

128

Each network with average degree 6

32

64

96

128

network size

(a)

(b)

AC

Figure 4: The computing attractors of Boolean networks with average degree 6 and the network size varies from 32 to 128. (a)The time of computing attractors and the network size. (b)The percentage of successful simulations and the network size.

We can see from Fig.4, P UNSAT approach can efficiently calculate the attractors of these networks. The amount of calculation increases when the size of networks increases. Even as for 128-node networks with average degree 6, P UNSAT approach is also effective. However, there are nearly 30% networks failed to get attractors within 20 min. We implement the P UNSAT algorithm to compute attractors of real biological networks and compare it with the algorithms of [27, 30]. As benchmarks we use existing 12

ACCEPTED MANUSCRIPT

Boolean networks models of real organisms, shown in Table 3. Table 3: Experimental results for the Boolean networks models of real organisms. Vertices n

Average degree

T-cell receptor T-helper cell Human gonadal sex

40 23 19

1.45 1.65217 4.15789

Attractors number × length 8×1,1×6 3×1 1×3

Time (sec) SAT-based <1 <1 <1

Ref.[30] 27.23 15.01 24.42

P UNSAT 2.34 2.23 6.01

CR IP T

Benchmark name

ED

M

AN US

There are three real biological networks in Table 3: T-cell receptor signaling pathway analysis [34], T-helper cell differentiation [35], and Human gonadal sex [36]. As we can see from Table 3, the P UNSAT algorithm is faster than the algorithm of [30]. However, SAT-based algorithm is faster than the other two algorithms. It is obvious that the SATbased algorithm has the advantage of attractor calculation for the networks of Table 3. For these networks, our algorithm slower than the SAT-based algorithm. The main reason is that our algorithm takes 3000 samples, which will use some time. Of course, reducing the number of samples may speed up the calculation. However, we have conducted a great deal of simulations and we take 3000 samples basically satisfied most of attractor calculation of networks. On the other hand, the core algorithm also determines that our method has no advantage in calculation attractors of networks with small average degree. The significance of our algorithm is that it can get better results than SAT-based algorithm [27] for calculation attractors of large-scale Boolean networks with a relatively large average degree, as shown in Fig.3. We also give three simulation-generated 80-node networks with average degrees 5, 6, and 7, respectively (Attachment S1). We implement the P UNSAT algorithm to compute attractors of the simulation networks and compare it with the algorithm of [27], shown in Table 4.

PT

Table 4: Calculation of three simulation networks.

Vertices n

Average degree

Network S1 Network S2 Network S3

80 80 80

5 6 7

CE

Simulation Network

Attractors number × length 4×4 2×1, 1×2, 3×4 4×1, 2×2

Time (sec) SAT-based 20 254 615

P UNSAT 4 58 13

AC

For in public benchmarks, we can not find a large-scale real biological Boolean network with average degree over 6. Of course, it does not mean that there is no real biological network with a large average degree. Maybe, with the further research in biology, there are large-scale Boolean networks with a large average degree, e.g., brain’s Boolean networks, and maybe other gene regulatory networks. Our algorithm may provide a good tool and research idea. Although the CAT approach is a very efficient algorithm, the CAT algorithm can not completely solve the problem of attractor calculation of Boolean networks. The efficiency of CAT approach is determined by its SAT-solver. Many people continue to 13

ACCEPTED MANUSCRIPT

CR IP T

study the SAT-solver, using a lot of advanced skills, such as chronological backtracking, non-chronological backtracking and conflict-driven clauses learning [31]. P UNSAT approach is based on Theorem 2 and Rule 1-3. However, theorem 2 can only determine whether a partial state is unstable, and it can not determine whether a partial state is stable. Nevertheless, the sampling of networks can ensure that vast majority of the attractor states are found, that is, most of the stable partial states can be found by sampling. From Fig.4(b), we also can see the calculation of some Boolean networks may fail to get attractors within 20 min or even fail to get the attractors. The efficiency of P UNSAT approach is determined by Rule 1-3. In the future, some more effective rules should be adopted by P UNSAT approach. Acknowledgments

AN US

This research was supported by Natural science foundation of Zhejiang province (Grant No. LY16A010008). References References

M

[1] Kauffman SA (1969) Metabolic stability and epigenesist in randomly constructed nets. Journal of Theoretical Biology 22, 437-467.

ED

[2] Kauffman SA (1993) The Origins of Order: Self-Organization and Selection in Evolution. Oxford University Press, New York.

PT

[3] Albert R, Othmer HG (2003) The topology of the regulatory interactions predicts the expression pattern of the segment polarity genes in Drosophila. Journal of Theoretical Biology 223, 1-18.

CE

[4] Hu YL, Gu YM, Wang YM, Huang YJ, Zou YM (2015) Integrated network model provides new insights into castration-resistant prostate cancer. Nature Scientific Reports 5, 1-12.

AC

[5] B´erenguier D, Chaouiya C, Monteiro PT, Naldi A, Remy E, Thieffry D, Tichit L (2013) Dynamical modeling and analysis of large cellular regulatory networks. Chaos 23, 025114. [6] Rodr´ıguez A, Sosa D, Torres L, Molina B, Fr´ıas S, Mendoza L (2012) A Boolean network model of the FA/BRCA pathway. Bionformatics 28(6), 858-66. [7] Singh A, Nascimento JM, Kowar S, Busch H, Boerries M (2012) Boolean approach to signalling pathway modelling in HGF-induced keratinocyte migration. Bioinformatics 28(18), i495-i501. 14

ACCEPTED MANUSCRIPT

[8] Grieco L, Calzone L, Bernard-Pierrot I, Radvanyi F, Kahn-Perles B, Thieffry D (2013)Integrative modelling of the influence of MAPK network on cancer cell fate decision. PLoS Comput Biol.9(10),e1003286. [9] Hanahan D, Weinberg RA (2000)The hallmarks of cancer. Cell. 100(1),57-70.

CR IP T

[10] Smith G, Carey FA, Beattie J, Wilkie MJ, Lightfoot TJ, Coxhead J, Garner RC, Steele RJ, Wolf CR (2002) Mutations in APC, Kirsten-ras, and p53-alternative genetic pathways to colorectal cancer. Proc Natl Acad Sci U S A.99(14),9433-9438. [11] Fumi¨e HF, Martins ML (2013)Boolean network model for cancer pathways: predicting carcinogenesis and targeted therapy outcomes. PLoS One.8(7),e69008.

AN US

[12] Lu J, Zeng H, Liang Z, Chen L, Zhang L, Zhang H, Liu H, Jiang H, Shen B, Huang M, Geng M, Spiegel S, Luo C (2015) Network modelling reveals the mechanism underlying colitis-associated colon cancer and identifies novel combinatorial anticancer targets. Sci Rep. 5,14739.

M

[13] Li Q, Wennborg A, Aurell E, Dekel E, Zou JZ, Xu Y, Huang S, Ernberg I (2016) Dynamics inside the cancer cell attractor reveal cell heterogeneity, limits of stability, and escape. Proc Natl Acad Sci U S A. 113(10):2672-2677.

ED

[14] Akutsu T, Kuhara S, Maruyama O, Miyano S (1998) A system for identifying genetic networks from gene Expression patterns produced by gene disruptions and overexpressions. Genome Inform. 9,151-60.

PT

[15] Zhao Q (2005) A remark on ”Scalar equations for synchronous Boolean networks with biological Applications” by C. Farrow, J. Heidel, J. Maloney, and J. Rogers. IEEE Trans Neural Netw. 16(6),1715-1716.

CE

[16] Akutsu T, Kosub S, Melkman AA, Tamura T (2012) Finding a periodic attractor of a Boolean network. IEEE/ACM Trans Comput Biol Bioinform. 9(5),1410-1421.

AC

[17] Oosawa C, Savageau MA (2002) Effects of alternative connectivity on behaviour of randomly constructed Boolean networks. Physica D 170, 143-161. [18] Raeymaekers L (2002) Dynamics of Boolean networks controlled by biologically meaningful functions. Journal of Theoretical Biology 218, 331-341. [19] Za´ nudo JG, Albert R (2013) An effective network reduction approach to find the dynamical repertoire of discrete dynamic networks. Chaos 23(2), 025111. [20] Alan Veliz-Cuba (2011) Reduction of Boolean network models. Journal of Theoretical Biology 289(21), 167-172.

15

ACCEPTED MANUSCRIPT

[21] Assieh Saadatpour, R´eka Albert, Timothy CR (2013) A Reduction Method for Boolean Network Models Proven to Conserve Attractors. SIAM Journal on Applied Dynamical Systems 12(4), 1997-2011.

CR IP T

[22] Filippone M, Camastra F, Masulli F, Rovetta S (2008) A survey of kernel and spectral methods for clustering. Pattern Recognition 41, 176-190. [23] Leicht EA, Newman MEJ (2008) Community structure in directed networks. Physical Review Letters 100, 118703. [24] Zhao Y, Kim J, Filippone M (2003) Aggregation Algorithm towards Large-Scale Boolean Network Analysis. IEEE Transactions on Automatic Control 58(8), 19761985.

AN US

[25] Kirill K, Klavdiya B,Piotr JG Janusz AH (2016) Parallel Simulation of Adaptive Random Boolean Networks. Procedia Computer Science 101, 35-44. [26] Wuensche A (1999) Discrete dynamical networks and their attractor basins. Complexity International 6, 1-24.

M

[27] Dubrova E, Teslenko M (2011) A SAT-Based Algorithm for Finding Attractors in Synchronous Boolean Networks. Transactions on Computational Biology and Bioinformatics 8(5), 1393-1399.

ED

[28] Wuensche A (1999) Discrete dynamical networks and their attractor basins. Complexity International 6, 1-24.

PT

[29] David James Irons (2006) Improving the efficiency of attractor cycle identification in Boolean networks. Physica D 217, 7-21.

CE

[30] He QB, Xia ZL, Lin B (2016) An efficient approach of attractor calculation for largescale Boolean gene regulatory networks. Journal of Theoretical Biology 408, 137-144. [31] Biere A, Heule M, Van Maaren H, Walsh T (2009) HANDBOOK OF SATISFIABILITY.IOS Press(ISBN:1586039296).

AC

[32] Davis M, Putnam H (1960) A computing procedure for quantification theory. Journal of the ACM 7, 201-215. [33] Davis M, Logemann G, Loveland D (1962) A machine program for theorem-proving. Communications of the ACM 5,394-397. [34] Klamt S, Saez-Rodriguez J, Lindquist JA, Simeoni L, Gilles ED (2006) A methodology for the structural and functional analysis of signaling and regulatory networks. BMC Bioinformatics 7,1-26. 16

ACCEPTED MANUSCRIPT

[35] Mendoza L, Xenarios I (2006) A method for the generation of standardized qualitative dynamical systems of regulatory networks. Theoretical Biology and Medical Modelling 3,1-18.

AC

CE

PT

ED

M

AN US

CR IP T

[36] R´ıos O, Frias S, Rodr´ıguez A, Kofman S, Merchant H, Torres L, Mendoza L (2015) A Boolean network model of human gonadal sex determination. Theoretical Biology and Medical Modelling 12,1-18.

17