17th IFAC Symposium on SystemCenter Identification Beijing International Convention 17th IFAC Symposium on SystemCenter Identification Beijing International October 19-21, 2015. Convention Beijing, China 17th IFAC Symposium on System Identification 17th IFAC Symposium on System Identification Beijing International Convention Center Available online at www.sciencedirect.com October 19-21, 2015. Beijing, China Beijing International International Convention Center Beijing Center October 19-21, 2015. Convention Beijing, China October 19-21, 2015. Beijing, China October 19-21, 2015. Beijing, China
ScienceDirect
Inferring the miRNA-disease associations based on IFAC-PapersOnLine 48-28 (2015) 007–011 Inferring the miRNA-disease associations based on Inferring the miRNA-disease associations based on domain-disease associations Inferring the miRNA-disease associations based on Inferring the miRNA-disease associations based on domain-disease associations domain-disease associations domain-disease associations domain-disease Gui-Min Qin*.associations Rui-Yi Li**.
Gui-Min Qin*. Rui-Yi Li**. Gui-Min Qin*. Zhao*** Rui-Yi Li**. Xing-Ming Gui-Min Qin*. Zhao*** Rui-Yi Gui-Min Qin*. Rui-Yi Li**. Li**. Xing-Ming Xing-Ming Zhao*** Xing-Ming Zhao*** * School of Software, Xidian Zhao*** University, Xi’an 710126, Xing-Ming * School of Software, Xidian University, Xi’an 710126, China (Tel: 13571870190; e-mail:
[email protected]). * School of Software, Xidian University, Xi’an 710126, China (Tel: 13571870190; e-mail:
[email protected]). * School of Software, Xidian University, Xi’an ** School ofSoftware, Software,Xidian Xidian University, Xi’an710126, 710126, * School of University, Xi’an 710126, China (Tel: 13571870190; e-mail:
[email protected]). ** School of Software, Xidian University, Xi’an 710126, China (Tel: 13571870190; e-mail:
[email protected]). China (e-mail:Xidian
[email protected]) China (Tel: 13571870190; e-mail:
[email protected]). ** School of Software, University, Xi’an 710126, China (e-mail:
[email protected]) ** School of Xidian University, Xi’an 710126, *** School Electronics and Information Engineering, University, **of School of Software, Software, Xidian University, Xi’anTongji 710126, China (e-mail:
[email protected]) *** School of Electronics and Information Engineering, Tongji University, China (e-mail:
[email protected]) Shanghai 210804, China (e-mail:
[email protected]) China (e-mail:
[email protected]) *** School of Electronics and Information Engineering, Tongji University, Shanghai 210804, and China (e-mail:
[email protected]) *** of Electronics Information Engineering, Tongji University, *** School School of Electronics and Information Engineering, Tongji University, Shanghai 210804, China (e-mail:
[email protected]) Shanghai 210804, China (e-mail:
[email protected]) Shanghai China (e-mail:
[email protected]) Abstract: MicroRNAs (miRNAs)210804, are a class of small endogenous non-coding genes, acting as regulators Abstract: MicroRNAs (miRNAs) are a class of small endogenous non-coding genes, acting asinregulators in the post-transcriptional processes. miRNAs are found to be widely different Abstract: MicroRNAs (miRNAs) are aRecently, class of small endogenous non-coding genes,involved acting as regulators in the post-transcriptional processes. Recently, miRNAs are found to be widely involved inregulators different Abstract: MicroRNAs (miRNAs) are aa class of small endogenous non-coding genes, acting as types of diseases. Therefore, identification of disease associated miRNAs can help understand the Abstract: MicroRNAs (miRNAs) are class of small endogenous non-coding genes, acting as regulators in the post-transcriptional processes. Recently, miRNAs are found to be widely involved in different types of diseases. Therefore, identification of disease associated miRNAs can help understand the in the post-transcriptional processes. Recently, miRNAs are found to be widely involved in different mechanisms that underlie the disease and identify new biomarkers. However, it is not easy to identify the in the post-transcriptional processes. Recently, miRNAs are found to be widely involved in different types of diseases. Therefore, identification of disease associated miRNAs can help understand the mechanisms that underlie the disease and identify new biomarkers. However, it is not easy to identify types of diseases. Therefore, identification of disease associated miRNAs can help understand the miRNAs to Therefore, diseases to its involvements in However, various biological processes. In this types of related diseases. identification of disease associated miRNAs understand the mechanisms that underlie the due disease andextensive identify new biomarkers. it can is nothelp easy to identify the miRNAs related to adiseases due to its extensive involvements in However, various biological processes. In this mechanisms that underlie the disease and identify new biomarkers. it is not easy to identify the work, we present new approach to identify disease associated miRNAs based on domains, mechanisms that underlie the disease and identify new biomarkers. However, it is not easy to identify the miRNAs related to diseases due to its extensive involvements in various biological processes. In this work, werelated present new approach to extensive identify disease associated miRNAs based on domains, the miRNAs to due its involvements in biological processes. In functional and structural blocks The results on associated real datasets demonstrate that method can miRNAs to aadiseases diseases dueofto toproteins. itsto extensive involvements in various various biological processes. In this this work, werelated present new approach identify disease miRNAs based onourdomains, the functional and structural blocks of proteins. The results on real datasets demonstrate that our method can work, we present a new approach to identify disease associated miRNAs based on domains, the effectively identify disease related with high precision. work, we and present a new approach to identify disease based the functional structural blocks of miRNAs proteins. The results on associated real datasetsmiRNAs demonstrate thatonourdomains, method can effectively and identify disease related miRNAs with high precision. functional structural blocks of proteins. The results on datasets that can functional and structural blocks of miRNAs proteins. TheAlgorithms; results on real real datasets demonstrate demonstrate that our our method method can effectively identify disease related with high precision. Keywords: Computational methods; RNA; Biomedical systems; Statistical analysis; © 2015, IFAC (International Federation ofRNA; Automatic Control) Hosting by Elsevier Ltd. All rights reserved. effectively identify disease miRNAs with high precision. Keywords: Computational methods; Biomedical systems; Statistical analysis; effectively identify disease related related miRNAs withAlgorithms; high precision. Biotechnology Keywords: Computational methods; RNA; Algorithms; Biomedical systems; Statistical analysis; Biotechnology Keywords: Computational Computational methods; methods; RNA; RNA; Algorithms; Algorithms; Biomedical Biomedical systems; systems; Statistical Statistical analysis; analysis; Keywords: Biotechnology Biotechnology Biotechnology 1. INTRODUCTION example, by investigating known disease-miRNA 1. INTRODUCTION example, by knownfound disease-miRNA associations, Lu etinvestigating al (Lu et al., 2008) that miRNAs 1. INTRODUCTION example, by investigating known disease-miRNA MicroRNAs (miRNAs) are a class of small (~22nt) associations, Lu et al (Lu et al., 2008) found that 1. INTRODUCTION example, by investigating known disease-miRNA related to phenotypically similar diseases tendmiRNAs to be 1. INTRODUCTION example, by investigating known disease-miRNA MicroRNAs (miRNAs) RNAs, are a class of suppressing small (~22nt) associations, Lu et al (Lu et al., 2008) found that miRNAs non-coding regulatory the related to phenotypically similar diseases tend to be MicroRNAs (miRNAs) are a normally class of small (~22nt) associations, Lu et al (Lu et al., 2008) found that miRNAs functionally related. Chen et al (X. Chen et al., 2012) associations, Lu et al (Lu et al., 2008) found that miRNAs non-coding the related to phenotypically similar diseases tend to be MicroRNAs (miRNAs) are class of small expression ofregulatory their targetRNAs, genes post-transcriptional stage MicroRNAs (miRNAs) are aaatnormally class of suppressing small (~22nt) (~22nt) functionally related. Chen et al (X. Chen et al., 2012) non-coding regulatory RNAs, normally suppressing the related to phenotypically similar diseases tend to be developed new approach, named RWRMDA (Random related to aphenotypically similar diseases tend to be expression of their target genes at post-transcriptional stage functionally related. Chen et al (X. Chen et al., 2012) non-coding regulatory RNAs, normally suppressing the (Ambros, 2004; Bartel, 2004). Accumulating evidences non-coding suppressingstage the developed a new approach, named RWRMDA (Random expression ofregulatory their targetRNAs, genes atnormally post-transcriptional functionally related. Chen et al (X. Chen et al., 2012) Walk with Restart for MiRNA–Disease Association), to et al., 2012) functionally related. Chen et al (X. Chen (Ambros, 2004; Bartel, 2004). Accumulating evidences developed a new approach, named RWRMDA (Random expressionthat of their their target isgenes genes at of post-transcriptional stage indicate miRNA oneat the most evidences important expression of target post-transcriptional stage Walk with for MiRNA–Disease Association), to (Ambros, 2004; Bartel, 2004). Accumulating developed aaRestart new approach, named RWRMDA (Random infer potential miRNA-disease interactions based on developed new approach, named RWRMDA (Random indicate that miRNA is one of the most important Association), to Walk with Restart for MiRNA–Disease (Ambros, 2004; 2004; the Bartel, 2004). Accumulating evidences components cell,is2004). playing critical roles evidences in many (Ambros, Bartel, Accumulating infer potential miRNA-disease interactions based on indicate thatofmiRNA one of the most important Walk with Restart for MiRNA–Disease Association), to miRNA–miRNA functional similarity network. Recently, Walk with Restart for MiRNA–Disease Association), to components of the cell, playing critical roles in many infer potential miRNA-disease interactions based on indicate that miRNA is one of the most important biologicalthat processes et al., of 2005; andimportant Ambros, indicate one theKarp most miRNA–miRNA functional network. Recently, components ofmiRNA the (Cheng cell,is playing critical roles in many infer potential on Chen and ZhangmiRNA-disease (H. Chensimilarity and interactions Zhang, 2013)based adopted infer potential miRNA-disease interactions based on biological processes (Cheng et al., 2005; Karp and in Ambros, miRNA–miRNA functional similarity network. Recently, components of the cell, playing critical roles many 2005) including development, proliferation, differentiation, components of the cell, playing critical roles in many Chen and(Network Zhang functional (H. Chensimilarity andBased Zhang, 2013)Recently, adopted biological processes (Cheng et al., 2005; Karp and Ambros, miRNA–miRNA network. Net-CBI Consistency Inference) to infer miRNA–miRNA functional similarity network. Recently, 2005) including development, proliferation, differentiation, Chen and Zhang (H. Chen and Zhang, 2013) adopted biological processes (Cheng et al., 2005; Karp and Ambros, apoptosis, signal development, transduction, infection, and on. It biological processes (Cheng et viral al., 2005; Karpdifferentiation, and so Ambros, Net-CBI (Network Consistency Based Inference) infer 2005) including proliferation, Chen and Zhang Chen and Zhang, 2013) adopted potential associations based on the to idea of Chen anddisease-miRNA Zhang (H. (H. Chen and Zhang, 2013) adopted apoptosis, signalthat transduction, viral infection, and on. as It Net-CBI (Network Consistency Based Inference) to infer 2005) including development, proliferation, differentiation, has been found a single miRNA can regulate as so many 2005) including proliferation, differentiation, potential disease-miRNA associations based on the idea of apoptosis, signal development, transduction, viral infection, and so on. It Net-CBI (Network Consistency Based Inference) to infer network consistency as well miRNA functional similarity Net-CBI (Network Consistency Based Inference) to infer has been that a single miRNA can regulate as many as potential disease-miRNA associations based on the idea of apoptosis, signal transduction, viral infection, and so on. It 200 genesfound (Esquela-Kerscher and 2006) and about apoptosis, signal transduction, viralSlack, infection, and so on. It network consistency as well miRNA functional similarity has been found that a single miRNA can regulate as many as potential disease-miRNA associations based on the idea of network, disease as similarity network andthe known potential disease-miRNA associations based on idea of 200 genes (Esquela-Kerscher and 2006) and about network consistency well miRNA functional similarity has aa single miRNA can regulate many as one thirdfound of that human genes are Slack, targeted byas miRNAs has been been found that single miRNA can regulate as many as network, disease similarity network and known 200 genes (Esquela-Kerscher and Slack, 2006) and about network consistency as well miRNA functional similarity miRNA-disease associations. Chen and Yan (X. Chen and network consistency as well miRNA functional similarity one third of human genes are targeted by miRNAs network, disease similarity network and known 200 genes (Esquela-Kerscher (Esquela-Kerscher and Slack, 2006) and These about (Bandyopadhyay et al., genes 2010;and Yang et al., 2008). 200 genes Slack, 2006) about miRNA-disease associations. Chennetwork and(Regularized Yan (X. and one third of human are targeted by and miRNAs network, similarity and known Yan, 2014)disease developed RLSMDA Least network, disease similarity and Chen known (Bandyopadhyay et al., genes 2010; Yang et al., 2008). These miRNA-disease associations. Chennetwork and Yan (X. Chen and one third of of human human genes are targeted by inmiRNAs miRNAs miRNA-mRNA interactions play critical roles various one third are targeted by Yan, 2014) developed RLSMDA (Regularized Least (Bandyopadhyay et al., 2010; Yang et al., 2008). These miRNA-disease associations. Chen and Yan (X. Chen and Squares for MiRNA-Disease Association) to Chen uncover miRNA-disease associations. Chen and(Regularized Yan (X. and miRNA-mRNA interactions play critical roles in various Yan, 2014) developed RLSMDA Least (Bandyopadhyay et al., 2010; Yang et al., 2008). These biological processes (Friedman et critical al.,et2009; J. inLi various et al., (Bandyopadhyay et al., 2010; Yang al., 2008). These Squares2014) for MiRNA-Disease Association) to uncover miRNA-mRNA interactions play roles Yan, developed RLSMDA (Regularized Least potential disease-miRNA by integrating known Yan, developed associations RLSMDA (Regularized Least biological processes (Friedman et critical al., 2009; J. in Li various et al., Squares2014) for MiRNA-Disease Association) to uncover miRNA-mRNA interactions play roles 2012). miRNA-mRNA interactions play critical roles in various potential disease-miRNA associations by integrating known biological processes (Friedman et al., 2009; J. Li et al., Squares for MiRNA-Disease Association) to uncover disease-miRNA associations, disease-disease similarities Squares for MiRNA-Disease Association) to uncover 2012). potential disease-miRNA associations by integrating known biological processes (Friedman et al., 2009; J. Li et al., biological processes (Friedman et al., 2009; J. Li et al., disease-miRNA associations, disease-disease similarities 2012). potential disease-miRNA associations by known and miRNA-miRNA functional similarities. Recently, Zhao potential disease-miRNA associations by integrating integrating known Recently, it was found that miRNAs play important roles in disease-miRNA associations, disease-disease similarities 2012). 2012). and miRNA-miRNA functional similarities. Recently, Zhao Recently, it was found that miRNAs play important roles in disease-miRNA associations, disease-disease similarities et al (X.-M. Zhao et al., 2014) presented a novel approach to disease-miRNA associations, disease-disease similarities many diseases. For example, the databases such as HMDD and miRNA-miRNA functional similarities. Recently, Zhao Recently, it was found that miRNAs play important roles in et al (X.-M. Zhao et al., 2014) presented a novel approach to many diseases. For example, the databases such as HMDD and miRNA-miRNA functional similarities. Recently, Zhao Recently, it was found that miRNAs play important roles in predict disease associated miRNAs based on gene and miRNA-miRNA functional similarities. Recently, Zhao (Human microRNA Disease Database) (Y. Li et al., 2013) Recently, it was found that miRNAs play important roles in et al (X.-M. Zhao et al., 2014) presented a novel approach to many diseases. For example, the databases such as HMDD predict disease associated miRNAs based on gene (Human microRNA Disease Database) (Y. Li et al., 2013) et al (X.-M. Zhao et al., 2014) presented a novel approach to many diseases. For example, the databases such as HMDD expression data and obtained et al (X.-M. Zhao etassociated al., 2014)promising presented a novel to and et al., collect hundreds of manymiR2Disease diseases. For(Jiang example, the 2009) databases as predict disease miRNAsresults. based approach on gene (Human microRNA Disease Database) (Y. such Li et al.,HMDD 2013) expression data andassociated obtained promising results. and miR2Disease (Jiang et al., 2009) collect hundreds of predict disease miRNAs based on gene (Human microRNA Disease Database) (Y. Li et al., 2013) predict disease associated miRNAs based on gene miRNAs related to diseases. Since the miRNAs are widely (Human microRNA Disease Database) (Y. Li et al., 2013) expression data and obtained promising results. and miR2Disease (Jiang et al., 2009) collect hundreds of In general, data the above methodspromising identify associations between miRNAs related to(Jiang diseases. Since the miRNAs are widely expression and results. and miR2Disease et 2009) collect hundreds of expression data and obtained obtained promising results. involved in various biological processes, it is a big challenge and miR2Disease et al., al., 2009) collect hundreds of In general, the above methods identify associations between miRNAs related to(Jiang diseases. Since the miRNAs are widely diseases and miRNAs through the miRNA-gene interactions. involved in various biological processes, it is a big challenge In general, the above methods identify associations between miRNAs related diseases. Since the miRNAs widely to identify the to toare diseases. miRNAs related topotential diseases.miRNAs Since the related miRNAs are widely diseases and miRNAs through the miRNA-gene interactions. involved in various biological processes, it is a big challenge In general, the above methods identify associations between It isgeneral, known that domains are structural and functional blocks In above methods identify associations between to identify the potential miRNAs related to challenge diseases. diseases andthe miRNAs through the miRNA-gene interactions. involved in various biological processes, it is Therefore, is strongly demanded to powerful involved in it various biological processes, itdevelop is aa big big It isproteins, known that domains are structural and functional blocks to identify the potential miRNAs related to challenge diseases. diseases and miRNAs through the miRNA-gene interactions. of and the dysfunction of proteins in diseases diseases and miRNAs through the miRNA-gene interactions. Therefore, it is strongly demanded to develop powerful It is known that domains are structural and functional blocks to identify identify themethods potential miRNAs related related to to diseases. diseases. computational fordemanded accurately potential to potential miRNAs of proteins, and the dysfunction of proteins in diseases Therefore, itthe is strongly toidentifying develop powerful It is known that domains are structural and functional blocks should affect of domains. Similarly, the It isproteins, known that domains are structural functional blocks computational accurately potential of andthe thefunction dysfunction of and proteins in diseases Therefore, it it is ismethods stronglyfor demanded toidentifying develop powerful powerful disease-related miRNAs (X.demanded Chen et al.,to 2012). Therefore, strongly develop should affectinteractions thethe function of domains. Similarly, the computational methods for accurately identifying potential of proteins, and dysfunction of proteins in diseases drug-protein have been found to be determined of proteins, and the dysfunction of proteins in diseases disease-related miRNAs (X. Chen et al., 2012). should affect the function of domains. Similarly, the computational for accurately identifying potential computational methods methods for accurately identifying potential drug-protein interactions have of beendomains. found tointeractions be determined disease-related miRNAs (X. Chen et al., 2012). have should affect the function Similarly, the by drug-domain interactions theseto are should affectinteractions the function of Similarly, the In literature, many computational approaches been drug-protein havealthough beendomains. found be determined disease-related miRNAs (X. Chen et al., 2012). disease-related miRNAs (X. Chen et al., 2012). by drug-domain interactions although these interactions are In literature, many computational approaches have been drug-protein interactions have been found to be determined not necessarily physical binding (Y.-Y. Wang et al., 2012). drug-protein interactions have been found to be determined proposed to predict disease associated miRNAs. For by drug-domain interactions although these interactions are In literature, many computational approaches have been not drug-domain necessarily physical binding (Y.-Y.these Wang et al., 2012). proposed to many predictcomputational disease associated miRNAs. For by interactions although interactions are In literature, approaches have by interactions although interactions are In literature, approaches have been been not drug-domain necessarily physical binding (Y.-Y.these Wang et al., 2012). proposed to many predictcomputational disease associated miRNAs. For Wang et al., al., 2012). 2012). not proposed not necessarily necessarily physical physical binding binding (Y.-Y. (Y.-Y. Wang et proposed to to predict predict disease disease associated associated miRNAs. miRNAs. For For
Copyright © IFAC 2015 7 Copyright IFAC 2015 7 Hosting by Elsevier Ltd. All rights reserved. 2405-8963 © 2015, IFAC (International Federation of Automatic Control) Copyright © IFAC 2015 7 Peer review©under of International Federation of Automatic Control. Copyright IFAC responsibility 2015 7 Copyright © IFAC 2015 7 10.1016/j.ifacol.2015.12.091
2015 IFAC SYSID 8 October 19-21, 2015. Beijing, China
Gui-Min Qin et al. / IFAC-PapersOnLine 48-28 (2015) 007–011
Recently, it was found that some mutations, e.g. non-synonymous single nucleotide polymorphisms (nsSNPs), occurring in protein domains are responsible for distinct diseases (W. Wang et al., 2010). Based on these observations, Wang et al developed a new computational approach to extract the associations between protein domains and diseases, and some disease related domains were identified (W. Wang et al., 2010).
interactions were further filtered by keeping only those interactions with negative co-expression correlations between pairs of genes and miRNAs considering that most miRNAs suppress the expression of their corresponding target genes. With miRNA-gene interactions and protein domain annotations, we established the miRNA-domain associations, where a pair of miRNA and domain will be connected if one of the miRNA’s target genes contains the domain. Consequently, 10126 associations between 328 miRNAs and 1039 domains were obtained.
Since miRNAs regulate the post-transcriptional process, the dysregulation may lead to the aberrant expression of target genes that are responsible for diseases, where the domain contents of the translated proteins will also be affected. Considering that domains are functional units of proteins and they are more conservative than proteins, we assumed that investigating diseases based on domains may provide new insights into the disease mechanisms and accordingly the disease-miRNA associations may be unveiled with more high resolution. In this work, we proposed a new approach to predict the associations between miRNAs and diseases based on domains instead of genes/proteins. The results on three cancer datasets demonstrated that the new approach outperforms the target gene based approaches, indicating the effectiveness of our approach. The rest of the paper was organized as follows. The second part presents the data used here and the methods; the third part shows the results on real data; finally, the conclusions were drawn.
Table 1. The datasets of three cancers Cancer type
2. MATERIALS AND METHODS
Number Expression profiles
of samples
Breast
Gene expression
337
cancer
miRNA expression
313
Colon
Gene expression
174
cancer
miRNA expression
187
Ovarian
Gene expression
586
cancer
miRNA expression
587
Common samples 313 147 583
The domain-domain interactions (DDIs) were obtained from the DOMINE database (Raghavachari et al., 2008), which contains experimentally verified DDIs based on protein structures and those predicted by different computational approaches based on protein-protein interactions (X. M. Zhao et al., 2010). Specifically, these interactions were grouped into distinct categories based on the interaction confidence. Here, we only used the DDIs with high and medium confidence, and constructed a domain-domain interaction network with 5526 interactions among 2106 domains.
2.1 Data sources The domain information for proteins was obtained from the Pfam database (Punta et al., 2011), and both the manually curated, high-quality collection of domain families (Pfam-A) and the collection of predicted domain families (Pfam-B) were taken into account here. The disease associated non-sysnonymous SNPs (nsSNPs) were downloaded from the UniProt database (Jain et al., 2009), where the nsSNPs were classified into three categories, i.e. disease, polymorphism and unclassified. We only considered the nsSNPs belonging to the disease category. One domain will be connected with a disease if the domain contains the disease associated nsSNPs. Finally, we established 1021 associations between 518 domains and 799 diseases.
MiRNAs with similar functions are more likely associated with similar diseases and vice versa (Lu et al., 2008). Therefore, a miRNA-miRNA functional similarity network was constructed here, where the functional similarity between a pair of miRNAs was estimated by measuring the semantic similarity of their associated diseases as described by Xuan et al (Xuan et al., 2013). Furthermore, the similarity between two miRNAs in the same family or cluster was set higher than those outside since they are more likely associated with similar diseases.
The miRNA target genes were retrieved from the miRTarBase database (Hsu et al., 2014), which contains experimentally validated miRNA-gene interactions. As a result, we obtained a regulatome composed of 39110 interactions between 597 miRNAs and 12108 genes. Moreover, we only considered genes that have been annotated in UniProt.
The disease-miRNA associations were retrieved from HMDD, a resource containing curated experimentally supported miRNA-disease associations. Currently, HMDD (released in June 2014) collects 10368 entries covering 572 miRNA genes and 378 diseases. These disease-miRNA associations will be used as gold standard to evaluate the performance of our computational approach.
In this work, we considered three types of cancers, i.e. breast cancer, colon cancer and ovarian cancer. The gene expression and miRNA expression profiles for these three cancer types were obtained from TCGA (The Cancer Genome Atlas) (Network, 2011), and the statistics of the data can be found in Table 1. We only considered the samples that have both gene and miRNA expression. With the gene and miRNA expression profiles, the miRNA-gene 8
2015 IFAC SYSID October 19-21, 2015. Beijing, China
Gui-Min Qin et al. / IFAC-PapersOnLine 48-28 (2015) 007–011
9
the random walk with restart can be defined as follows.
2.2 Methods
𝑝𝑝(𝑡𝑡 + 1) = (1 − 𝑟𝑟)𝑊𝑊𝑊𝑊(𝑡𝑡) + 𝑟𝑟𝑟𝑟(0)
Fig. 1 shows the schematic illustration of our proposed approach for identifying disease associated miRNAs. Firstly, an extended set of disease associated domains were obtained based on the domain-domain interactions; secondly, the miRNAs were linked to diseases with the disease-domain, protein-domain and miRNA-gene associations; finally, the predicted disease-miRNA associations were further refined based on known disease-miRNA associations with “guilt by association” rule. The details will be addressed in the following parts.
(1)
After the random walk stops, all candidate domains were ranked with their probabilities of being arrived, and those with high probabilities are more likely to be associated with the given disease. In this way, we can find more disease associated domains. Similarly, the current knowledge about disease-miRNA associations is far from complete, and we predicted potential disease miRNAs by designing a new score scheme. With a set of candidate miRNAs for disease d as well as N miRNAs confirmed to be associated with d, a score for miRNA 𝑅𝑅𝑝𝑝 was defined as below.
Due to the incomplete knowledge about disease-domain associations, we extended the associations in a computational way. With the above domain-domain interaction network (DDIN), network smoothing was performed by employing random walk with restart (Vanunu et al., 2010). For a given disease d, all the domains confirmed to be associated with d were used as seed domains, while the rest in the DDIN will be considered as candidate domains. For the seed domains, their initial probabilities p(0) were set to be the same and their sum equals to 1. The initial probabilities for non-seed domains were set to 0. Here, restart of random walk from seed nodes was allowed with probability r (0
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(𝑅𝑅𝑝𝑝 ) =
∑𝑁𝑁 𝑖𝑖=1 𝑠𝑠𝑠𝑠𝑠𝑠(𝑅𝑅𝑖𝑖 ,𝑅𝑅𝑃𝑃 ) 𝑁𝑁
∗ 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑁𝑁 ,
(2)
where 𝑅𝑅𝑖𝑖 is one of the N disease miRNAs and sim(𝑅𝑅𝑖𝑖 , 𝑅𝑅𝑝𝑝 ) denotes the functional similarity between the two miRNAs. In the above score, the more functional similar 𝑅𝑅𝑝𝑝 is to the confirmed miRNAs, the more likely it is related to disease d. Moreover, the more disease associated miRNAs 𝑅𝑅𝑝𝑝 functionally interact with, the more possible it is associated with the corresponding disease, which is why 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑁𝑁 was adopted in the score. With the score, the candidate miRNAs were ranked and the top ones were regarded as disease miRNAs.
Step1: For a given disease d, mapping d-related domains onto DDIN. Step2: Identifying more d associated domains with random walk with restart on DDIN. Step3: Obtaining the initial d-related miRNAs set I by combining d-related domains acquired from step2 and miRNA-domain association network. Step4: For the set I, choosing the most likely d-related miRNAs according to their scores.
Step1
Step2
Step3
Step4 MiRNA Domain Disease
Fig.1. The flowchart of predicting miRN A-disease associations based on domain-disease interactions. 9
2015 IFAC SYSID 10 October 19-21, 2015. Beijing, China
Gui-Min Qin et al. / IFAC-PapersOnLine 48-28 (2015) 007–011
3. RESULT
3.2 Performance evaluation In this paper, we applied our approach, named as miRdomain hereinafter, to identify miRNAs associated with breast cancer, colon cancer and ovarian cancer. We also compared our method with another instinctive approach that predicts miRNA-disease associations based on miRNAs’ target genes, named as miRtarget here. With the disease-gene associations from the OMIM and GAD databases, miRtarget regards an miRNA to be related to one disease if at least one of the miRNA’s target genes is known to be associated with the disease.
3.1 MiRNA-domain-disease network With the miRNA-domain and domain-disease associations (see Methods), we constructed an miRNA-domain-disease association network as shown in Fig. 2, where there are only edges between miRNAs and domains as well as edges between domains and diseases. With this association network, we aimed to find possible links between miRNAs and diseases.
With the gold standard disease-miRNA associations, the two approaches were evaluated for the three cancers of interest. Specifically, precision, recall and F1 were used to evaluate the prediction results of distinct approaches. For a certain disease d, precision is the proportion of confirmed d-related miRNAs out of the predicted positives, recall is the proportion of d-related miRNAs that can be correctly predicted, and F1 is the harmonic mean of precision and recall that can be used as a single measure of overall performance. Table 2. The performance of methods based on two miRNA
different associations
domain
Cancer type
disease
Fig.2. The network of miRNA-domain-disease with 287 miRNAs, 556 diseases, and 299 domains.
Precision
Recall
F1
miRtarget
0.53
0.91
0.67
miRdomain
0.73
0.67
0.70
miRtarget
0.55
0.89
0.68
miRdomain
0.72
0.70
0.71
miRtarget
0.40
0.87
0.55
miRdomain
0.55
0.65
0.60
Breast cancer
Since one domain may occur in multiple proteins, to avoid false positive miRNA-domain associations, we only considered those domains that occur frequently in the miRNAs’ target genes. For each miRNA and one domain, the occurrence frequency (OF) of the domain in the miRNA’s target genes was calculated. For example, miR-205 has 8 target genes, i.e. DDX5, ERBB2, ERBB3, LRP1, LRRK2, SRC, VEGFA, and ZEB1. Among these 8 target genes, domain PF07714 occurs in 3 genes (ERBB2, ERBB3, and SRC), and the occurrence frequency is 38 % (3/8). Fig. 3 shows the distribution of the domain OF. From the results, we can see that most domains occur only in few proteins.
Colon cancer
Ovarian Cancer
Table 2 shows the performance of the two approaches. From the results, we can see that miRdomain significantly outperforms miRtarget with respect to the overall performance. Despite miRtarget may find more disease associated miRNAs (high recall) but with a lot of false positives (low precision). For example, with our miRdomain, we obtained 126 breast cancer related miRNAs, out of which 92 miRNAs were reported to be related to breast cancer in HMDD. The miRtarget approach found 300 breast cancer associated miRNAs, and 160 of them can be validated by HMDD. From the results, we can clearly see that although miRtarget can find more possible disease associated miRNAs, these predictions contain many false positives. The good performance of the miRdomain approach on three different cancer datasets demonstrates that the functional blocks of proteins, i.e. domains, can help bridge miRNAs and diseases better than genes, and also confirms the effectiveness of our proposed approach.
12% 25%
Method
OF>10% 1%
63%
Fig.3. The distribution of domains’ occurrence frequency in miRNAs’ target genes. 10
2015 IFAC SYSID October 19-21, 2015. Beijing, China
Gui-Min Qin et al. / IFAC-PapersOnLine 48-28 (2015) 007–011
4. CONCLUTION
11
design and implementation of the UniProt website. BMC bioinformatics, 10(1), 136. Jiang, Q., Wang, Y., Hao, Y., Juan, L., Teng, M., Zhang, X., Li, M., Wang, G., and Liu, Y. (2009). miR2Disease: a manually curated database for microRNA deregulation in human disease. Nucleic acids research, 37(suppl 1), D98-D104. Karp, X., and Ambros, V. (2005). Encountering microRNAs in cell fate signaling. Science, 310(5752), 1288-1289. Li, J., Liu, Y., Xin, X., Kim, T. S., Cabeza, E. A., Ren, J., Nielsen, R., Wrana, J. L., and Zhang, Z. (2012). Evidence for positive selection on a number of microRNA regulatory interactions during recent human evolution. PLoS genetics, 8(3), e1002578. Li, Y., Qiu, C., Tu, J., Geng, B., Yang, J., Jiang, T., and Cui, Q. (2013). HMDD v2. 0: a database for experimentally supported human microRNA and disease associations. Nucleic acids research, gkt1023. Lu, M., Zhang, Q., Deng, M., Miao, J., Guo, Y., Gao, W., and Cui, Q. (2008). An analysis of human microRNA and disease associations. PloS one, 3(10), e3420. Network, C. G. A. R. (2011). Integrated genomic analyses of ovarian carcinoma. Nature, 474(7353), 609-615. Punta, M., Coggill, P. C., Eberhardt, R. Y., Mistry, J., Tate, J., Boursnell, C., Pang, N., Forslund, K., Ceric, G., and Clements, J. (2011). The Pfam protein families database. Nucleic acids research, gkr1065. Raghavachari, B., Tasneem, A., Przytycka, T. M., and Jothi, R. (2008). DOMINE: a database of protein domain interactions. Nucleic acids research, 36(suppl 1), D656-D661. Vanunu, O., Magger, O., Ruppin, E., Shlomi, T., and Sharan, R. (2010). Associating genes and protein complexes with disease via network propagation. PLoS computational biology, 6(1), e1000641. Wang, W., Zhang, W., Jiang, R., and Luan, Y. (2010). Prioritisation of associations between protein domains and complex diseases using domain–domain interaction networks. IET systems biology, 4(3), 212-222. Wang, Y.-Y., Nacher, J. C., and Zhao, X.-M. (2012). Predicting drug targets based on protein domains. Mol. BioSyst., 8(5), 1528-1534. Xuan, P., Han, K., Guo, M., Guo, Y., Li, J., Ding, J., Liu, Y., Dai, Q., Li, J., and Teng, Z. (2013). Prediction of microRNAs associated with human diseases based on weighted k most similar neighbors. PloS one, 8(8), e70204. Yang, H., Dinney, C. P., Ye, Y., Zhu, Y., Grossman, H. B., and Wu, X. (2008). Evaluation of genetic variants in microRNA-related genes and risk of bladder cancer. Cancer research, 68(7), 2530-2537. Zhao, X.-M., Liu, K.-Q., Zhu, G., He, F., Duval, B., Richer, J.-M., Huang, D.-S., Jiang, C.-J., Hao, J.-K., and Chen, L. (2014). Identifying cancer-related microRNAs based on gene expression data. Bioinformatics, btu811. Zhao, X. M., Chen, L., and Aihara, K. (2010). A discriminative approach for identifying domain–domain interactions from protein–protein interactions. Proteins: Structure, Function, and Bioinformatics, 78(5), 1243-1253.
Identifying potential disease-miRNA associations is important for understanding the pathogenesis of disease and design new therapies. In this paper, we presented a new approach to predict disease associated miRNAs based on domains instead of genes. The good performance of our approach on real datasets demonstrate the effectiveness of the approach, which also implies that domains can bridge miRNAs and diseases very well and provides new perspectives on diseases. ACKNOWLEGEMENT This work was partly supported by the National Nature Science Foundation of China (91130032, 61103075, and 61100157), Innovation Program of Shanghai Municipal Education Commission (13ZZ072), Shanghai Pujiang Program (13PJD032), the China Postdoctoral Science Foundation (2013M540386) and the Fundamental Research Funds for the Central Universities (BDZ011401, JB141001) REFERENCES Ambros, V. (2004). The functions of animal microRNAs. Nature, 431(7006), 350-355. Bandyopadhyay, S., Mitra, R., Maulik, U., and Zhang, M. Q. (2010). Development of the human cancer microRNA network. Silence, 1(1), 6. Bartel, D. P. (2004). MicroRNAs: genomics, biogenesis, mechanism, and function. cell, 116(2), 281-297. Chen, H., and Zhang, Z. (2013). Similarity-based methods for potential human microRNA-disease association prediction. BMC medical genomics, 6(1), 12. Chen, X., Liu, M.-X., and Yan, G.-Y. (2012). RWRMDA: predicting novel human microRNA–disease associations. Molecular BioSystems, 8(10), 2792-2798. Chen, X., and Yan, G.-Y. (2014). Semi-supervised learning for potential human microRNA-disease associations inference. Scientific reports, 4. Cheng, A. M., Byrom, M. W., Shelton, J., and Ford, L. P. (2005). Antisense inhibition of human miRNAs and indications for an involvement of miRNA in cell growth and apoptosis. Nucleic acids research, 33(4), 1290-1297. Esquela-Kerscher, A., and Slack, F. J. (2006). Oncomirs—microRNAs with a role in cancer. Nature Reviews Cancer, 6(4), 259-269. Friedman, R. C., Farh, K. K.-H., Burge, C. B., and Bartel, D. P. (2009). Most mammalian mRNAs are conserved targets of microRNAs. Genome research, 19(1), 92-105. Hsu, S.-D., Tseng, Y.-T., Shrestha, S., Lin, Y.-L., Khaleel, A., Chou, C.-H., Chu, C.-F., Huang, H.-Y., Lin, C.-M., and Ho, S.-Y. (2014). miRTarBase update 2014: an information resource for experimentally validated miRNA-target interactions. Nucleic acids research, 42(D1), D78-D85. Jain, E., Bairoch, A., Duvaud, S., Phan, I., Redaschi, N., Suzek, B. E., Martin, M. J., McGarvey, P., and Gasteiger, E. (2009). Infrastructure for the life sciences:
11