Minimum dominating set-based methods for analyzing biological networks

Minimum dominating set-based methods for analyzing biological networks

Methods xxx (2016) xxx–xxx Contents lists available at ScienceDirect Methods journal homepage: www.elsevier.com/locate/ymeth Minimum dominating set...

968KB Sizes 0 Downloads 81 Views

Methods xxx (2016) xxx–xxx

Contents lists available at ScienceDirect

Methods journal homepage: www.elsevier.com/locate/ymeth

Minimum dominating set-based methods for analyzing biological networks Jose C. Nacher a,⇑, Tatsuya Akutsu b,* a b

Department of Information Science, Faculty of Science, Toho University, Miyama 2-2-1, Funabashi, Chiba 274-8510, Japan Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan

a r t i c l e

i n f o

Article history: Received 17 September 2015 Received in revised form 16 November 2015 Accepted 16 December 2015 Available online xxxx Keywords: Minimum dominating set Complex networks Protein-protein interaction networks Network controllability

a b s t r a c t The fast increase of ‘multi-omics’ data does not only pose a computational challenge for its analysis but also requires novel algorithmic methodologies to identify complex biological patterns and decipher the ultimate roots of human disorders. To that end, the massive integration of omics data with disease phenotypes is offering a new window into the cell functionality. The minimum dominating set (MDS) approach has rapidly emerged as a promising algorithmic method to analyze complex biological networks integrated with human disorders, which can be composed of a variety of omics data, from proteomics and transcriptomics to metabolomics. Here we review the main theoretical foundations of the methodology and the key algorithms, and examine the recent applications in which biological systems are analyzed by using the MDS approach. Ó 2016 Elsevier Inc. All rights reserved.

Contents 1. 2.

3.

4. 5.

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1. Minimum dominating set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2. Computation of MDS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3. Relation to structural controllability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Extensions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1. Critical nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2. MDS in bipartite networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3. Other extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Analyses of biological networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Funding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1. Introduction The rapid technological developments of ‘multi-omics’ based methodologies are providing an increasing amount of data on the fundamental constituents of the cells, such as genes (genomics), RNAs (transcriptomics), proteins (proteomics) and metabolites ⇑ Corresponding authors. E-mail addresses: [email protected] (J.C. Nacher), takutsu@kuicr. kyoto-u.ac.jp (T. Akutsu).

00 00 00 00 00 00 00 00 00 00 00 00 00

(metabolomics). Computational and system biologists are, therefore, having a unique opportunity to design novel algorithmic and mathematical-based methods to analyse, identify and extract biological knowledge from the data [1]. Many computational techniques have then been proposed to analyse biological phenomena based on the collected experimental data. Among them, network approaches are becoming relevant because they use knowledge not only from the individual molecules, but also from the complex web of interactions between the components of the cell [2]. This detail is relevant because biological functions and diseases cannot

http://dx.doi.org/10.1016/j.ymeth.2015.12.017 1046-2023/Ó 2016 Elsevier Inc. All rights reserved.

Please cite this article in press as: J.C. Nacher, T. Akutsu, Methods (2016), http://dx.doi.org/10.1016/j.ymeth.2015.12.017

2

J.C. Nacher, T. Akutsu / Methods xxx (2016) xxx–xxx

be associated with a single molecule. The functional patterns emerge through complex associations between life molecules. Similarly, disease phenotypes are the result of pathobiological processes that occur in a biological pathway, or in a larger scale, in the human interactome, which represents the entire map of all molecular interactions in a cell [3,4]. Network pharmacology is also becoming more relevant since drugs efficacy depends on their interactions with molecules located in a large network [5,6]. As a result, when a right target is selected, the drug may enhance its effects via network propagation but, in the opposite case, it also may lead to unwanted side effects. Strategies to target hubs, to use disease modules to identify new targets as well as to disrupt strategic locations in disease pathways are among the present and future directions using cellular network concepts [3,7,8]. Network biology, therefore, rapidly emerged as a branch of computational biology that focuses on analyzing biological data using a network representation. Several types of interactions (protein-protein, chemical reactions, transcription-regulation) can define specific network levels which can be investigated using algorithmic tools and metrics. Protein networks are defined by proteins that are physically binding to each other. Metabolic networks consist of metabolites and chemical reactions where the latters are catalyzed by enzymes, and regulatory networks are represented by transcriptional factors that regulate genes. Evolutionary models, extraction of biological motifs, disease-gene identification and disease module prediction are some examples of the application of the network techniques across different network levels [3]. In this review, we focus on a novel algorithmic approach that is showing promising results to analyze complex biological networks. The Dominating Set (DS) and minimum dominating set (MDS) concepts emerged from classical graph theory several decades ago. The associated algorithms and their variants have been applied to assist a rich variety of problems from computer and wireless communication networks to social systems [9]. However, the application of MDS to specific complex network patterns such as scale-free networks had not been explored except one for web graphs [10]. This scale-free network structure is important because it seems ubiquitous in many biological systems. From proteinprotein interaction networks to metabolic networks, the degree distribution of the network follows a power-law (i.e., the probabilc ity that a node with k links follows PðkÞ / k ). Networks with this degree distribution are called scale-free networks. An application of the MDS methodology to network controllability analysis unveiled the conditions necessary to achieve full controllability and showed that a small fraction of nodes can control the entire network [11]. This finding later inspired Wuchty to investigate in more detail controllability in protein interaction networks [12]. Unexpectedly, his findings showed that not only the optimized subset of proteins within the MDS can reach/control any protein of the non-MDS, but also that the proteins in the MDS are enriched with unique biological functions and features, such as cancerrelated, and virus targeted-genes. This review assembles the state of the art on the MDS algorithmic approaches that are having an impact on analyzing a wealth of ‘omics’ systems, from drugtarget and protein networks to non-coding RNA interactions.

2. Methods

between two objects. For example, in a protein-protein interaction (PPI) network, a node corresponds to a protein and an edge corresponds to an interaction between two proteins. Each edge has a direction in directed networks, whereas it does not (i.e., each edge is bi-directional) in undirected networks. Therefore, each edge is represented by a pair of nodes. For example, ðu; v Þ represents an edge between nodes u and v. It is to be noted that ðu; v Þ means an edge directed from u to v in directed networks whereas we do not distinguish ðu; v Þ from ðv ; uÞ in undirected networks. Hereafter, networks (resp., graphs) mean undirected networks (resp., graphs) unless otherwise stated. For a graph GðV; EÞ, a subset of nodes S # V is called a dominating set (DS) if every node in V is either an element of S or is adjacent to an element of S. That is, for any node v 2 V; v 2 S holds or there exists a node u 2 S such that ðu; v Þ 2 E. We say that v is dominated by u if ðu; v Þ 2 E. Then, S is a dominating set if each node in V is either in S or is dominated by some node in S. A dominating set with the minimum number of elements is called a minimum dominating set (MDS). As discussed later, MDSs are not necessarily uniquely determined (i.e., there may exist multiple MDSs for a given graph GðV; EÞ). Fig. 1 illustrates an MDS and a DS, where dark gray circles represent nodes in an MDS or a DS. In Fig. 1, node a is dominated by node b, and node c is dominated by nodes b and e, In this case, the MDS is not uniquely determined: S ¼ fa; d; eg is also an MDS. 2.2. Computation of MDS Although MDS is a very important concept in graph theory, it is known that computation of an MDS is NP-hard, which means that it is not plausible that there exists a theoretically efficient (i.e., polynomial-time) algorithm to exactly compute an MDS. However, NP-hardness does not necessarily mean that there does not exist a practically fast algorithm. Actually, many algorithms have been proposed for exactly computing an MDS. From a theoretical viewpoint, extensive studies have been done on development of Oðan Þ time exact algorithms for smaller constants a [13–15], where n is the number of nodes in graph G. To our knowledge, the current best a is 1.4689 [15]. However, no implementation results are included in most of these studies and it seems that these algorithms are not practically useful. From a practical viewpoint, many heuristic methods have also been proposed using such techniques as simulated annealing, genetic algorithms, and ant colony optimization (see [16] and its references). However, these methods are not guaranteed to output exact solutions, or there is no theoretical guarantee on the quality of solutions. Among exact computational methods, the most widely used one would be that based on integer linear programming (ILP). In the ILP-based method, an instance of the MDS problem is transformed into an integer linear program in a simple manner. Although ILP is also an NP-hard problem, there exist practical solvers such as CPLEX [17] and Gurobi [18] that can solve large-scale ILP instances and thus we can utilize them. In this method, we assign a 0–1 variable yv to each node v in V, where yv ¼ 1 (resp., yv ¼ 0) indicates that v is in an MDS (resp., not in an MDS). From a given graph GðV; EÞ, we construct an integer linear program as follows.

minimize 2.1. Minimum dominating set The minimum dominating set is a well-known concept in graph theory and computer science [9]. In this paper, we assume that networks are represented as graphs. A graph GðV; EÞ consists of a set of nodes V and a set of edges E, where a node represents some object and an edge represents a relation, or existence of a relation,

X v 2V

yv ;

subject to yv þ

X ðu;v Þ2E

yu P 1 for all v 2 V;

yv 2 f0; 1g for all v 2 V:

ð1Þ

Then, an MDS is given by the set S ¼ fv jyv ¼ 1g. The objective function means that the number of nodes with value 1 (i.e., the number

Please cite this article in press as: J.C. Nacher, T. Akutsu, Methods (2016), http://dx.doi.org/10.1016/j.ymeth.2015.12.017

J.C. Nacher, T. Akutsu / Methods xxx (2016) xxx–xxx

(a)

b

a

(b)

c

d

e Fig. 1. Examples of an MDS (a) and a DS (b). Dark gray circles represent nodes in MDS and DS. A set of nodes S is called a DS if each node in a network is either an element of S or is a neighbor of an element of S. A DS with the minimum number of elements is called an MDS.

of nodes in S) must be minimized and the constraints mean that each node v must be in S or a neighbor of some node u in S. It is shown in [19] that this method can compute an MDS exactly for moderate size networks (e.g., networks with a thousand nodes). If we need to handle large-scale networks, use of approximation algorithms might be a good choice. For the MDS problem, a simple greedy-type approximation algorithm is known [20], which is intrinsically the same as that for the set cover problem [21,22]. The algorithm is very simple. It repeats the following steps until the graph becomes empty. (i) Select a node v with the maximum number of neighbors. (ii) Remove v and its neighbors from G. Clearly, a set of the selected nodes is a dominating set. Furthermore, it is known that the size of the set is at most 1 þ log2 n times larger than that of an MDS [20]. It is also important to analyze how the size of an MDS changes according to the network size and network topology. Nacher and Akutsu studied the relationship between the MDS size and exponent c of the power-law degree distribution in a scale-free network [11,23], where the degree of a node is the number of neighboring nodes. Molnár et al. studied the relationship with degree cutoff [24], and they also studied the relationship with the degreedegree correlation (assortativity/disassortativity) [25]. 2.3. Relation to structural controllability Network controllability is one of important research topics in complex networks. A set of nodes in a given network is called a set of driver nodes if control of them brings the network from any initial state to any desired state in finite time. In such a case, the system consisting of the network and the set of driver nodes is called controllable. Furthermore, the system without parameters

u1

v1

v1 u11

v2

v3

v4

(a) Model by Liu et al.

v2

u21 v3

u31 v4

(b) MDS model

Fig. 2. Comparison of models by Liu et al. [26] (a) and by MDS approach [11] (b), where v i and uj denote a node and a control signal, respectively. Only driver node values can be directly controlled through external signals in model (a), whereas each driver node can control its links individually in model (b).

3

is called structurally controllable if it is controllable for almost all sets of parameter values. Liu et al. showed that for networks with linear dynamics, a minimum set of driver nodes can be found by computing a maximum bipartite matching [26]. It is worth mentioning that although the work of Liu et al. popularized the ‘‘minimum input theorem” and rediscovered the maximum matching approach, some of these theoretical results have been discussed in different context in a previous research [27]. Liu et al. also analyzed the average number of driver nodes in scale-free and random (Erdös–Rényi) networks, and showed that the more heterogeneous a network degree distribution is, the larger number of driver nodes is required [26]. Nacher and Akutsu also showed a relationship between an MDS and structural controllability for linear systems [11], which has some similarity with the edge dynamics model proposed by Nepusz and Vicsek [28]. It is stated in [11] that if every edge in a network is bi-directional and every node in an MDS can control all of its outgoing edges individually, then the network is structurally controllable by selecting the nodes in an MDS as the driver nodes. Furthermore, they showed that the more heterogeneous a network degree distribution is, the smaller number of driver nodes is required. At a glance, this result seems to be contradicting to the result by Liu et al. However, this difference comes from the difference in control models (see also Fig. 2). Liu et al. assumed that only driver node values can be directly controlled through external signals. On the other hand, the MDS approach assumes that each driver node can control its links independently. Therefore, a node with degree k is treated as if it were a set of k nodes. From the above mentioned relationship between an MDS and structural controllability, it is expected that nodes in an MDS have important roles in control of the network. Therefore, identification of an MDS may lead to identification of important nodes, that is, important proteins, genes, and other molecules. 3. Extensions In this section, we briefly review extensions of the MDS-based approach. 3.1. Critical nodes As discussed in the previous section, nodes in an MDS are considered to have important roles in controlling the whole network. However, also as mentioned before, MDSs are not necessarily uniquely determined; there may exist multiple dominating sets with the minimum cardinality. This situation is the same for the maximum bipartite matching approach by Liu et al. In order to cope with this ambiguity, Jia et al. introduced the concepts of critical nodes and redundant nodes [29], where critical nodes and redundant nodes are nodes that appear in all minimum sets of driver nodes and do not appear in any minimum set of driver nodes, respectively. These concepts have been adopted for MDSs [19]. In the context of MDS, a node is called critical if it appears in all MDSs, redundant if it never appears in any MDS, and intermittent otherwise (see also Fig. 3). It is obvious from the definitions that these nodes are uniquely determined. It is shown in [19] that the set of critical nodes can be computed by the following procedure. 1. 2. 3. 4.

Compute an MDS M for GðV; EÞ using ILP. Let C MDS be an empty set. Repeat steps 4–5 for all v 2 M. Make an ILP instance Iv by adding a constraint of yv 6 0 to the instance given by Eqs. (1). 5. If Iv does not have a feasible solution or jM v j > jMj holds where Mv ¼ fv jyv ¼ 1g, then let C MDS C MDS [ fv g. 6. Return C MDS .

Please cite this article in press as: J.C. Nacher, T. Akutsu, Methods (2016), http://dx.doi.org/10.1016/j.ymeth.2015.12.017

4

J.C. Nacher, T. Akutsu / Methods xxx (2016) xxx–xxx

(a)

X

(b)

(c)

Y

Critical Intermittent Redundant

Fig. 3. Example of network with two MDSs (a) and (b). Dark gray circles represent nodes in MDSs. Black, hatched, and light gray circles represent critical, redundant, and intermittent nodes, respectively, where critical nodes appear in all MDSs and redundant nodes never appear in any MDS.

Similarly, the set of redundant nodes can be computed by the following procedure. 1. 2. 3. 4.

Compute an MDS M for GðV; EÞ using ILP. Let RMDS be an empty set. Repeat steps 4–5 for all v 2 V  M. Make an ILP instance Iv by adding a constraint of yv P 1 to the instance given by Eqs. (1). 5. If Iv does not have a feasible solution or jM v j > jMj holds where Mv ¼ fv jyv ¼ 1g. then let RMDS RMDS [ fv g. 6. Return RMDS .

The set of intermittent nodes is determined as the set of remaining nodes. It is expected that critical nodes play more important control roles than other MDS nodes do.

3.2. MDS in bipartite networks We have so far considered unipartite networks in which all nodes represent the same kind of objects. For example, all nodes in a PPI network represent proteins. However, various kinds of biological networks have bipartite network structures in which nodes are classified in two types, and edges exist only between different type nodes. For example, drug-target networks consist of drug nodes and protein nodes [30]. For another example, metabolic networks consist of compound nodes and reaction nodes. Therefore, it is important to extend the concept of MDS to bipartite networks. Bipartite networks can be further classified into uni-directional networks and bi-directional networks. Let X and Y be sets of nodes of each type, and let E be a set of edges. Then, all edges in E are directed from X to Y in uni-directional networks, whereas bidirectional networks can have edges from X to Y and edges from Y to X. For example, drug-target networks are uni-directional whereas metabolic networks are bi-directional. Since it is difficult to appropriately define MDSs in bi-directional networks, we only consider here MDSs in uni-directional networks. In uni-directional bipartite networks, nodes in a dominating set S are selected only from X. Then, each node in Y has at least one incoming edge emanating from a node in S # X. That is, S (S # X) is a dominating set if for each v 2 Y, there exists an edge ðu; v Þ 2 E such that u 2 S. As in unipartite networks, an MDS is defined as a dominating set with the minimum cardinality, which is not necessarily uniquely determined. Fig. 4 gives an example of an MDS in a bipartite network. An MDS in a bipartite network is intrinsically the same as a set cover, computation of which is known to be NP-hard [20]. As in the

Fig. 4. Example of an MDS in a bipartite network. Dark gray circles represent nodes in the MDS.

case of unipartite networks, an MDS in a bipartite network can be exactly computed by using the following ILP formulation:

minimize

X xu ; u2X

subject to

X

xu P 1 for all v 2 Y;

ðu;v Þ2E

xu 2 f0; 1g for all u 2 X:

ð2Þ

If a target network is large, it may take too long time to get a solution of ILP. In such a case, approximation algorithms for the set cover problem [21,22] may be employed. 3.3. Other extensions We have assumed that edges are stable. However, some interactions may become unavailable due to change of an environment and other factors. Therefore, it may be useful to consider an extension of the MDS so that it is robust against failure of edges. In [31], the concept of C-robust MDS (C-RMDS) was proposed. For a given network GðV; EÞ, S # V is called a C-robust dominating set if each node v 2 V belongs to S or has at least C edges connecting to nodes in S. Then, each node in V is covered by at least one node in S even if any C  1 edges are deleted. A C-robust MDS is defined as a C-robust dominating set with the minimum cardinality. On the other hand, Molnár et al. considered robustness against node failures [32]. They proposed the Flexible-Redundancy Dominating Set and Flexible-Cost Dominating Set and demonstrated the effectiveness of these concepts using simulations on scale-free and real network structure data. In Section 3.1, we reviewed the critical nodes that were introduced in order to address the non-uniqueness of MDSs. Zhang et al. proposed another approach to address the non-uniqueness [33]. They proposed the Centrality-Corrected Minimum Dominating Set, which is a dominating set minimizing the weighted sum of selected nodes where weights are given by using a kind of inverse of some centrality measure in the field of complex networks. Although the uniqueness is not necessarily guaranteed, it is expected in most cases that such a set is uniquely determined. 4. Analyses of biological networks In this section, we review applications of the MDS approach to biological networks, which are summarized in Table 1. The research interest in MDS algorithmic methods to analyze biological systems increased with the advent of controllability concepts in the context of complex networks. Before, however, domination concepts have already been applied to biological networks and statistically significant associations between structural central

Please cite this article in press as: J.C. Nacher, T. Akutsu, Methods (2016), http://dx.doi.org/10.1016/j.ymeth.2015.12.017

J.C. Nacher, T. Akutsu / Methods xxx (2016) xxx–xxx Table 1 The main features of the analyzed biological networks using a variety of MDS approaches. U: Unipartite. B: Bipartite. UD: Undirected. D: Directed. UDT: Unidirectional. PPI: protein-protein interaction network. DT: Drug-protein target network. ncRNA-P: non-coding RNA-protein network. C-MN: Cancer Metabolic Network. DSDC: Dominating Set with Degree Centrality. CC-MDS: Centrality Corrected Minimum Dominating Set. MDS-CS: MDS approach as Cover Set. C-MN-EC: Cancer Metabolic Network with Enzyme-centric projection. MDS-ADT: MDS with Anticancer DrugTarget nodes as drivers. Network

Type

Network size

MDS method

Refs.

PPI PPI PPI PPI DT

U/UD U/UD U/UD U/UD B/UDT

DS-DC MDS CC-MDS Critical-MDS MDS-CS

[34] [12] [33] [19] [30]

ncRNA-P

B/UDT

Critical-MDS

[39]

C-MN-EC

U/D

9141 proteins 8073 proteins 7865 proteins 1607 proteins 888 drugs, 394 target proteins 3894 ncRNAs, 5783 proteins 2687 reactions

MDS-ADT

[40]

genes and aging and infectious diseases among others have been reported [34]. Later, Wuchty presented a detailed analysis on the controllability in protein-protein interaction networks in human and yeast organisms [12] using the MDS approach proposed by Nacher and Akutsu in Ref. [11]. His findings showed that the proteins of the minimum dominating set occupy strategic locations from where each protein in the non-MDS can be reached by at least one interaction. It can be thought that, because the MDS is highly enriched with high degree proteins, the identified optimized subset of proteins is simply a selection of hubs. This is not true because the degree distribution of proteins that belong to the MDS also follows a power-law [11,12]. This indicates, therefore, that key proteins with low degree also participate in the MDS. Computational experiments also showed that the deletion of MDS proteins affected more the resilience of the network than the suppression of the hubs alone. Furthermore, besides the topological implications in network controllability, the MDS nodes are also significantly enriched with proteins that play roles in diseases. By using 496 oncogenes and 876 tumor suppressor genes collected from the CancerGenes database, Wuchty determined a significant association of cancer genes with the identified MDS of proteins. The analysis was also extended to proteins of various human virus using the Molecular INTeraction (MINT) and essential genes collected from the Database of Essential Genes (DEG) finding also a significant enrichment of the MDS in both cases. Furthermore, the relationship between betweenness centrality and MDS was also explored by measuring the top 10% of interactions with highest edge betweenness and computing the fraction of pairs connected by each link as four options (MDS (non-MDS), MDS (non-MDS)). The results showed that in protein-protein networks those interactions with high betweenness are connected with MDS proteins with high significance. Moreover, the analysis also showed that genetic interactions and phosphorylation events preferably appeared between MDS proteins i.e. (MDS, MDS) pairs. These bottleneck interactions seem, therefore, playing an important functional role in the network, and could not have been uncovered with the MDS approach. It is worth noticing that the importance of betweenness in the MDS computation was also highlighted in the bipartite network analysis of the drug-gene network using the MDS [30]. Further extensions of the controllability analysis for the protein interaction network included enrichment calculations of functional classes for essential proteins in the MDS. The result showed that the essential proteins are enriched with transcription, replication and signal transduction functions. In contrast, depletion of transportation and translation functions were observed [35]. A drawback of the Wuchty analysis is that the MDS is not unique. There are multiple solutions for the same ILP that can dominate the

5

entire network. This may be a relevant issue when assigning biological functionalities or diseases to each protein in the network. To address this issue, the critical set of proteins should be computed as defined by [19]. However, there is a computational limitation when large-scale networks are analyzed. As discussed before, Zhang et al. suggested an alternative method that uses Centrality-Corrected Minimum Dominating Set (CC-MDS) [33]. This method is intended to identify a dominating set by minimizing the weighted sum of selected nodes where weights are given by using a kind of inverse of some centrality measure, such as degree and betweenness centrality. A drawback of the method is that the uniqueness of the MDS is not necessarily guaranteed. However, it is likely that in most cases the MDS is uniquely determined. When compared with Wuchty findings, the results showed that the proteins in the CC-MDS are more significantly enriched in protein complexes than in MDS proteins. Similarly, more essential genes and diseased-associated genes appear in CC-MDS proteins than in MDS proteins. Finally, the CC-MDs proteins also show a higher enrichment of transcription factors and protein kineses. In any case, Wuchty’s work opened the doors to investigate in more detail functional and disease’s related proteins using the MDS framework, which was a quite unexplored area. Therefore, it is expected and encouraged novel research in this direction assisted by both new computational methods and experimental approaches. Another important problem is to examine the dynamics of biological networks from a control theory view point using minimum dominating sets. Wang et al. investigated the MDS identified in protein interaction network specific to the yeast cell cycle [36]. Gene expression data corresponding to two full yeast cell cycles collected at 17 time points and with 10 min intervals were used in the study [37]. Although MDS is an NP-hard problem, it allows some exact solutions for certain sizes of networks. Here, the problem was formalized as a binary integer programming problem and solved using a linear programming (LP)-based branch-and-bound algorithm [38], as it was also done in [12]. By mapping gene expression data and other biological annotated data, such as semantic similarity from Gene Ontology, the statistical properties of the identified MDS were examined. The results, although not conclusive, suggested that the proteins associated to the MDS play an important role in coordinating a variety of molecular functions at different time steps of the cell cycle. However, unlike previous studies [12], the proteins identified in the MDS were not found to be significantly enriched with essential genes. The authors also indicated that the dataset used in the study is small and published in 2005, and that higher quality binary protein interaction network should be used for future improvements and extensions of their analysis. As shown above in Methods sections, bipartite networks have also been studied in the context of controllability. Let X and Y be sets of nodes of each type, and let E be a set of edges. Then, in uni-directional networks, all edges in E are directed from X to Y. This is the case of the drug-target protein network. The theoretical details of the controllability in unidirectional bipartite network were investigated in [30]. There it was shown that the MDS significantly depends on the maximum degree of the nodes in X, which unveiled another tool to optimize network control. Moreover, the analysis demonstrated that in bipartite networks the MDS model requires fewer nodes than the nodal dynamics model using maximum matching [26] to structurally control the entire network. An important aspect of the problem is that structural controllability studied in [26] only guarantees that the system can be controllable provided that there exists some configuration of weights for the edges. This can be a problem in case that the real systems may not have these particular weights. The results provided by the MDS model, however, still hold even in the extreme case that all edges have the same weights. In the MDS case, the application of

Please cite this article in press as: J.C. Nacher, T. Akutsu, Methods (2016), http://dx.doi.org/10.1016/j.ymeth.2015.12.017

6

J.C. Nacher, T. Akutsu / Methods xxx (2016) xxx–xxx

a unique signal through an edge to a single node, makes it controllable and, therefore, structural controllability is not even needed. This detail is important because it suggests that the MDS algorithmic approach could be applied to some cases of non-linear and/or discrete models, whereas the nodal dynamics approach is limited to linear systems. The computation of the MDS in the drug-target protein network showed that only 21% of the approved drugs could structurally control the known druggable proteome [30]. By considering only the giant connected component, the fraction of drugs required to achieve full network control decreased to 8%. While some fraction of disease-gene products are covered by a highly connected set of drugs, other proteins associated to complex disorders such as cancer and immunology tend to be controlled by low-degree drugs. The projection of the bipartite network onto the drug space showed that the MDS tends to be composed of nodes with high betweenness centrality. The controllability in bipartite networks using the MDS approach also had the drawback of the multiple MDS solutions for a given network. Therefore, Kagami et al. developed a new algorithm that allows us to detect the critical nodes in a network so that these nodes are always included in the MDS solution [39]. The bipartite network representation of the entire ncRNA-protein interactions is shown in Fig. 5. The algorithm also classifies the control roles of nodes into redundant or intermittent as defined in [29,19]. They then applied the new algorithm to the unidirectional bipartite non-coding RNA-protein network composed of almost 93,000 interactions and identified the non-coding RNAs with critical roles in network control. Moreover, they mapped annotated information of more than 350 human disorders on the ncRNAs and examined the enrichment of these ncRNAs in human diseases. The results not only showed that a small set of ncRNAs can control the entire network but also highlighted that the critical set of nodes is enriched with high-degree ncRNAs. The disease mapping on the ncRNAs also showed that the critical set of ncRNAs

was significantly enriched with diseases. The analysis was also extended to each particular disease identifying those that have the highest number of ncRNAs engaged in critical network control such as hepatocellular carcinoma and stomach, breast and colorectal neoplasms. The metabolic pathways are also an important example of biological bipartite networks. Masoudi-Nejad’s group analyzed controllability in metabolic networks of 15 cancer cell types and the corresponding normal cell types by considering anticancer drug targets as potential driver nodes [40]. The work examined the connections between structural network controllability using the MDS conceptual approach, topological metrics and metabolic drug target (i.e., targets of approved anticancer metabolic drugs). The MDS controllability approach in metabolic networks was justified by the fact that controlling cancer metabolism through internal signals seems reasonable biologically. However, the analysis did not determine the driver nodes using previous studies [11]. Instead, they focus on whether the MDS concept, in which it is assumed that each driver node can control its links independently, is useful to find new drug targets in cancer metabolic networks. The study considered separately the chemical compound (metabolite-centric) and enzyme (enzyme-centric) unipartite projections of the bipartite metabolic network representation. The results showed that the distribution of drug targets among the 100 top nodes of twelve primary centrality metrics were not significant. However, more complex metrics such as motifs and modules showed a different pattern. The decomposition of drug targets in the enzyme-centric projection using clusters or modules uncovered that most of the drug targets are part of one specific cluster or module computed by the MCODE method in the enzyme projected network. Similarly, motif extraction analysis showed a significant variation and abundance of drug targets per motif in the enzyme projected network for cancer cells. The complex network science has provided a large variety of tools and metrics to measure the structural properties of real

Fig. 5. (a) Bipartite network representation of the entire ncRNA-protein interactions. (b) The highlighted subgraph of the network shows that only three critical ncRNA nodes can dominate the entire system.

Please cite this article in press as: J.C. Nacher, T. Akutsu, Methods (2016), http://dx.doi.org/10.1016/j.ymeth.2015.12.017

J.C. Nacher, T. Akutsu / Methods xxx (2016) xxx–xxx

networks. One of these metrics is the modularity that emerged from the observation that some nodes are densely connected to each other. While these tightly connected nodes define a module, they exhibit a weak connectivity to other nodes in different modules. Recent analyses have investigated the relation between the controllability and modularity [41] using the MDS approach proposed for undirected networks [11]. The results showed that it is easier to control networks with high modularity. Moreover, when the networks have communities with a large size of nodes, more driver nodes are necessary than when the network is organized in smaller size modules [41]. These results could be linked to the findings shown in [40], in which most of targets of approved anticancer metabolic drugs (driver nodes) belong to one specific cluster of the enzyme-centric network for cancer cells.

5. Conclusions The variety and impact of the reviewed findings suggest that the MDS is a promising approach to analyse complex biological networks composed of ‘multi-omics’ data, from proteomics and transcriptomics to metabolomics. The integration of omics data with disease information also led to novel findings. The high enrichment observed in the driver nodes with diseases associations suggests that they could be important target molecules, which are placed in strategic locations, for future developments and design of drugs. Because the module concept is closely related to that of biological function, which is performed by a group of molecules, and also to that of human disorder, in which specific disease phenotypes tend to be agglomerated in the same network neighborhood [4], a research on the associations between the MDS and identified modules in biological networks could be an interesting extension of the work done by Sun in [41]. Moreover, the development of more efficient algorithms is desirable to identify fast and uniquely the critical set of nodes in large-scale networks, such as proteome-wide protein interaction network. Future directions may include the development of more sophisticated controllability models by extending further the concepts introduced by the MDS approach. Experiments that combine the identified MDS with drugs to regulate specific functions could also be important to explore and develop new and effective therapies.

Funding J.C.N. and T.A. were partially supported by MEXT, Japan (Grant-in-Aid 25330351) and (Grant-in-Aid 26540125), respectively. This work was also supported in part by research collaboration projects by Institute for Chemical Research, Kyoto University. References [1] B. Berger, J. Peng, M. Singh, Computational solutions for omics data, Nat. Rev. Genet. 14 (2013) 333–346. [2] M. Vidal, M.E. Cusick, A.-L. Barabási, Interactome networks and human disease, Cell 144 (2011) 986–995. [3] A.-L. Barabási, N. Gulbahce, J. Loscalzo, Network medicine: a network-based approach to human disease, Nat. Rev. Genet. 12 (2011) 56–68. [4] J. Menche, A. Sharma, M. Kitsak, D. Ghiassian, M. Vidal, J. Loscazlo, A.-L. Barabási, Uncovering disease-disease relationships through the incomplete interactome, Science 347 (2015). 1257601-1. [5] A.L. Hopkins, Network pharmacology: the next paradigm in drug discovery, Nat. Chem. Biol. 4 (2008) 682–690. [6] P. Csermely, T. Korcsmaros, H.J.M. Kiss, G. London, R. Nussinov, Structure and dynamics of biological networks: a novel paradigm of drug discovery. A comprehensive review, Pharmacol. Ther. 138 (2013) 333–408.

7

[7] L. Chu, B.S. Chen, Construction of a cancer-perturbed protein-protein interaction network for discovery of apoptosis drug targets, BMC Syst. Biol. 2 (2008) 56. [8] A. Azmi, Z. Wang, P.A. Phillip, R.M. Mohammad, F.H. Sarkar, Proofs of concept: a review on how network and systems biology approaches aid in the discovery of potent anticancer drug combinations, Mol. Cancer Ther. 9 (2010) 3137– 3144. [9] T. Haynes, S.T. Hedetniemi, P.J. Slater, Fundamentals of Domination in Graphs, Pure Applied Mathematics, Chapman and Hall/CRC, New York, 1998. [10] C. Cooper, R. Klasing, M. Zito, Lower bounds and algorithms for dominating sets in web graphs, Internet Math. 2 (2005) 275–300. [11] J.C. Nacher, T. Akutsu, Dominating scale-free networks with variable scaling exponent: heterogeneous networks are not difficult to control, New J. Phys. 14 (2012) 073005. [12] S. Wuchty, Controllability in protein interaction networks, Proc. Natl. Acad. Sci. U.S.A. 111 (2014) 7156–7160. [13] F.V. Fomin, F. Grandoni, D. Kratsch, A measure & conquer approach for the analysis of exact algorithms, J. ACM 56 (2009) 25. [14] J.M.M. van Rooij, H.L. Bodlaender, Exact algorithms for dominating set, Discrete Appl. Math. 159 (2011) 2147–2164. [15] Y. Iwata, A faster algorithm for dominating set analyzed by the potential method, Lect. Notes Comput. Sci. 7112 (2012) 41–54. [16] A.-R. Hedar, R. Ismail, Simulated annealing with stochastic local search for minimum dominating set problem, Int. J. Mach. Learn. Cyber. 3 (2012) 97–109. [17] CPLEX web site: http://www-01.ibm.com/software/commerce/optimization/ cplex-optimizer/ (Sept. 1, 2015) [18] Gurobi web site: http://www.gurobi.com/ (Sept. 1, 2015) [19] J.C. Nacher, T. Akutsu, Analysis of critical and redundant nodes in controlling directed and undirected complex networks using dominating sets, J. Comp. Network 2 (2014) 394–412. [20] G. Ausiello, P. Crescenzi, G. Gambosi, V. Kann, A. Marchetti-Spaccamela, M. Protasi, Complexity and Approximation. Combinatorial Optimization Problems and Their Approximability Properties, Springer Verlag, Berlin, 1999. [21] V.V. Vazirani, Approximation Algorithms, Springer-Verlag, Berlin, 2001. [22] D.P. Williamson, D.B. Shmoys, The Design of Approximation Algorithms, Cambridge University Press, New York, 2011. [23] J.C. Nacher, T. Akutsu, Analysis on controlling complex networks based on dominating sets, J. Phys. Conf. Ser. 410 (2013) 012104. [24] F. Molnár Jr., S. Sreenivasan, B.K. Szymanski, G. Korniss, Minimum dominating sets in scale-free network ensembles, Sci. Rep. 3 (2013) 1736. [25] F. Molnár Jr., N. Derzsy, É. Czabarka, L. Szeékely, B.K. Szymanski, G. Korniss, Dominating scale-free networks using generalized probabilistic methods, Sci. Rep. 4 (2014) 6308. [26] Y.-Y. Liu, J.-J. Slotine, A.-L. Barabási, Controllability of complex networks, Nature 473 (2011) 167–173. [27] C. Commault, J.-M. Dion, J.W. van der Woude, Characterization of generic properties of linear structured systems for efficient computations, Kybernetika 38 (2002) 503–520. [28] T. Nepusz, T. Vicsek, Controlling edge dynamics in complex networks, Nat. Phys. 8 (2012) 568–573. [29] T. Jia, Y.-Y. Liu, E. Csokam, M. Posfai, J.-J. Slotine, A.-L. Barabási, Emergence of bimodality in controlling complex networks, Nat. Commun. 4 (2013) 2002. [30] J.C. Nacher, T. Akutsu, Structural controllability of unidirectional bipartite networks, Sci. Rep. 3 (2013) 1647. [31] J.C. Nacher, T. Akutsu, Structurally robust control of complex networks, Phys. Rev. E 91 (2015) 012826. [32] F. Molnár Jr., N. Derzsy, B.K. Szymanski, G. Korniss, Building damage-resilient dominating sets in complex networks against random and targeted attacks, Sci. Rep. 5 (2015) 8321. [33] X.-F. Zhang, L. Ou-Yang, Y. Zhu, M.-Y. Wu, D.-Q. Dai, Determining minimum set of driver nodes in protein-protein interaction networks, BMC Bioinform. 16 (2015) 146. [34] T. Milenkovic, V. Memisevic, A. Bonato, N. Przulj, Dominating biological networks, PloS One 6 (2011) e23016. [35] S. Khuri, S. Wuchty, Essentiality and centrality in protein networks revisited, BMC Bioinfor. 16 (2015) 109. [36] H. Wang, H. Zheng, F. Browne, C. Wang, Minimum dominating sets in cell cycle specific protein interaction networks, in: Proc. of the 2014 IEEE Int. Conf. on Bioinformatics and Biomedicine (BIBM) pp. 25–30. [37] U. de Lichtenberg, L. Jensen, S. Brunak, P. Bork, Dynamic complex formation during the yeast cell cycle, Science 307 (2005) 724–727. [38] A.H. Land, A.G. Doig, An automatic method of solving discrete programmingproblems, Econometrica 28 (1960) 497–520. [39] H. Kagami, T. Akutsu, S. Maegawa, H. Hosokawa, J.C. Nacher, Determining associations between human diseases and non-coding RNAs with critical roles in network control, Sci. Rep. 5 (2015) 14577. [40] Y. Asgari, A. Salehzadeh-Yazdi, F. Schreiber, A. Masoudi-Nejad, Controllability in cancer metabolic networks according to drug targets as driver nodes, PLoS One 8 (2013) e79397. [41] P.G. Sun, Controllability and modularity of complex networks, Inf. Sci. 325 (2015) 20–32.

Please cite this article in press as: J.C. Nacher, T. Akutsu, Methods (2016), http://dx.doi.org/10.1016/j.ymeth.2015.12.017