Co-expression network with protein–protein interaction and transcription regulation in malaria parasite Plasmodium falciparum

Co-expression network with protein–protein interaction and transcription regulation in malaria parasite Plasmodium falciparum

Gene 518 (2013) 7–16 Contents lists available at SciVerse ScienceDirect Gene journal homepage: www.elsevier.com/locate/gene Co-expression network w...

801KB Sizes 3 Downloads 71 Views

Gene 518 (2013) 7–16

Contents lists available at SciVerse ScienceDirect

Gene journal homepage: www.elsevier.com/locate/gene

Co-expression network with protein–protein interaction and transcription regulation in malaria parasite Plasmodium falciparum Fu-Dong Yu a, b, 1, Shao-You Yang b, 2, Yuan-Yuan Li b,⁎, Wei Hu a,⁎⁎ a b

School of Life Sciences, Fudan University, 220 Handan Road, Shanghai 200433, PR China Shanghai Center for Bioinformation Technology, 1278 Keyuan Road, Shanghai 201203, PR China

a r t i c l e

i n f o

Available online 26 December 2012 Keywords: Gene co-expression network Malaria parasite Module analysis

a b s t r a c t Background: Malaria continues to be one of the most severe global infectious diseases, as a major threat to human health and economic development. Network-based biological analysis is a promising approach to uncover key genes and biological processes from a network viewpoint, which could not be recognized from individual gene-based signatures. Results: We integrated gene co-expression profile with protein–protein interaction and transcriptional regulation information to construct a comprehensive gene co-expression network of Plasmodium falciparum. Based on this network, we identified 10 core modules by using ICE (Iterative Clique Enumeration) algorithm, which were essential for malaria parasite development in intraerythrocytic developmental cycle (IDC) stages. In each module, all genes were highly correlated probably due to co-regulation or formation of a protein complex. Some of these genes were recognized to be differentially coexpressed among three close-by IDC stages. The gene of prpf8 (PFD0265w) encoding pre-mRNA processing splicing factor 8 product was identified as DCGs (differentially co-expressed genes) among IDC stages, although this gene function was seldom reported in previous researches. Integrating the species-specific gene prediction and differential co-expression gene detection, we found some modules could perform species-specific functions according to some of genes in these modules were species-specific genes, like the module 10. Furthermore, in order to reveal the underlying mechanisms of the erythrocyte invasion by P. falciparum, Steiner Tree algorithm was employed to identify the invasion subnetwork from our gene co-expression network. The subnetwork-based analysis indicated that some important Plasmodium parasite specific genes could corporate with each other and be co-regulated during the parasite invasion process, which including a head-to-head gene pair of PfRH2a (PF13_0198) and PfRH2b (MAL13P1.176). Conclusions: This study based on gene co-expression network could shed new insights on the mechanisms of pathogenesis, even virulence and P. falciparum development. Crown Copyright © 2012 Published by Elsevier B.V. All rights reserved.

1. Introduction Malaria remains to be one of the most severe global infectious diseases, infecting 300–500 million people and causing millions of deaths every year. Plasmodium falciparum, the most devastating human malaria parasite, is responsible for over 90% of human deaths from malaria. However, no effective anti-malaria vaccines are available for clinical use in humans (Crompton et al., 2010). During the past

Abbreviations: IDC, intraerythrocytic developmental cycle; DCGs, differentially co-expressed genes; MCL, Markov cluster algorithm; ICE, Iterative Clique Enumeration; PCC, Pearson Correlation Coefficient; PPI, protein–protein interaction; GO, Gene Ontology; ISSPG, Identification of Species-Specific eukaryotic Parasite Genes; TF, transcription factor. ⁎ Corresponding author. Tel.: +86 21 20283720; fax: +86 21 20283780. ⁎⁎ Corresponding author. Tel.: +86 21 65643031; fax: +86 21 54653512. E-mail addresses: [email protected] (F.-D. Yu), [email protected] (S.-Y. Yang), [email protected] (Y.-Y. Li), [email protected] (W. Hu). 1 Tel.: +86 21 20283722; fax: +86 21 20283780. 2 Tel.: +86 21 20283759; fax: +86 21 20283780.

several decades, the control of malaria still relies heavily on chemotherapy, which causing drug resistance in malaria parasites increased the morbidity and mortality rates in malaria endemic regions. It's urgently needed to develop new drug and vaccine. On the other hand, some mechanism researches about parasitism, pathogenesis and even virulence could improve and accelerate the development of drug and vaccine. After the genome sequence of P. falciparum became available (Gardner et al., 2002), the gene expression profiles of its intraerythrocytic developmental cycle (IDC) (Bozdech et al., 2003; Llinas et al., 2006) and other asexual and bloodborne stages (Le Roch et al., 2003) have been illustrated by microarray. Several researches have discovered the possible regulation relationships between transcriptional factors of Apicomplexan-specific AP2 (ApiAP2) family and their binding motifs (Balaji et al., 2005; De Silva et al., 2008). ApiAP2 family putative transcriptional regulators were proved to play key roles in development and environmental stress response pathways in Plasmodium species (De Silva et al., 2008; Jurgelenaite et al., 2009; Sims et al., 2009).

0378-1119/$ – see front matter. Crown Copyright © 2012 Published by Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.gene.2012.11.092

8

F.-D. Yu et al. / Gene 518 (2013) 7–16

was above the threshold. One of clique-percolation algorithms named ICE (Iterative Clique Enumeration) algorithm (Shi et al., 2010) was employed to identify functional modules (or named subnetworks), which could represent major transcriptional programs in co-expression network or protein complex, based on the topology of our constructed comprehensive gene co-expression network. To address each module function, a hypergeometric distribution enrichment analysis was used to analyze the function of modules identified by ICE algorithms using Gene Ontology (GO) annotation and KEGG pathway information. Thus we obtained the mostly significant enriched function of each module using this functional enrichment analysis although each module covered diversity heterogeneity functions with significant statistics. Therefore, genes highly co-expressed in each module could work together to perform potential homogeneity function, no matter the genes in one module with poorly understood or putative annotation. Besides, this functional module analysis is also one of the popular approaches used to annotate those genes with poorly functional annotation, and even postulate previously unrecognized members of some subnetworks. As we all know, more than half of the genes in P. falciparum are predicted with poor annotation, and almost all of them are species-specific genes or genus-specific genes. And these genes are attracted for drug development. Considering the significance of species-specific genes for drug development, we detected the potential species-specific genes in function modules in this study. For each module, all genes implemented functional biological process or regulation complex together with highly co-expression. Some of them co-expressed

Recently, experimental map of physical interactions between proteins of P. falciparum were released by yeast two-hybrid screen (LaCount et al., 2005). Some studies based on protein–protein interaction (PPI) network showed the insights of core interactome (Wuchty et al., 2009) and protease-associated cellular subnetworks (Lilburn et al., 2011) in P. falciparum. Although significant progresses have been made, there are still much more works needed to learn and decipher the functions of these genes or proteins, including how they interact with other proteins or regulate other genes in differential developmental stages to coordinate important biological processes or pathways related to growth, invasion, pathogenesis, response to drug treatment and resistance in malaria parasites. One of the most popular methods to uncover these potential functions in biological systems depends on comprehensive network biology study. As usually knowledge-based information is used to guide to build a network of proteins or genes. In this study we incorporated the protein–protein interaction data from STRING database (Szklarczyk et al.) and the yeast 2-hybrid (Y2H) screened physical protein interaction data (Aurrecoechea et al., 2009; LaCount et al., 2005), ApiAP2 transcriptional regulators driven regulation data (De Silva et al., 2008), IDC stages gene expression data (Llinas et al., 2006), and literature text mining information into building a comprehensive gene co-expression network with protein–protein interaction or regulation of P. falciparum (see the workflow of this study, Fig. 1). In this network, each node represented a gene or protein and each edge represented co-expression relation between each two gene-pair if co-expression correlation value

Knowledgebase

Gene expression datasets

Protein-protein interaction (PPI)

Gene Co-expression Network

TF-regulation

Invasive pathway (As seed) Gene Co-expression Network with PPI and TF-regulation

Modularization

Steiner Tree algorithm

ICE algorithm

Module

Module

Module

Invasive subnetwork

Module functional enrichment processing Gene Ontology

DCGs

Module

DCGs

KEGG

Module

Module

Species-specific genes

Fig. 1. The workflow of comprehensive gene co-expression network analysis in Plasmodium falciparum.

F.-D. Yu et al. / Gene 518 (2013) 7–16

differentially among different IDC developmental stages, which might lead to improve our understanding of transcriptional mechanisms especially the underlying phenotypic changes. Invasion to host erythrocyte cell from P. falciparum is essential biological process for access to a rich source of nutrients from host cells (Cowman & Crabb, 2006). It's important to demonstrate those related genes of invasion pathway in our constructed gene co-expression network for malaria research. Furthermore, we also detected this pathway associated genes in P. falciparum co-expression relation above selection threshold (PCC value > 0.8, adjusted P-value b 0.005). Four modules identified from these co-expression related gene-pairs, indicated that they were essential to cooperate with each other for invasion to host erythrocyte cell process. Taken together, these insights could help us to improve our understanding of the mechanisms inherent parasite metabolism, regulation, invasion and infection during P. falciparum IDC stages. 2. Materials and methods 2.1. Data processing In order to further analyze the gene expression profile of P. falciparum 3D7, we collected and revised the time series gene expression normalized data from PlasmoDB (Aurrecoechea et al., 2009), which were released by DeRisi laboratory (Llinas et al., 2006) and published at Nucleic Acids Research journal in 2006 (Llinas et al., 2006). And this normalized data included 53 hourly time points gene expression profile, covering intraerythrocytic developmental cycle (IDC) stages of P. falciparum. As ApiAp2 family genes are important transcriptional factors (TFs) especially in Apicomplexan parasites (De Silva et al., 2008), we built the relationship information of ApiAP2 TFs and their regulated target genes in P. falciparum collected from literature (Berger et al., 2006; Levy & Hannenhalli, 2002) and PlasmoDB (Aurrecoechea et al., 2009) database. After removing redundant data processes, a total of 24 TFs of ApiAp2 family members with their target genes of 19,398 relationships were used in this study. Interaction between proteins in each organism could infer a mechanistic overview for most biological processes. In order to study the co-expression gene pairs and their products with protein–protein interaction (PPI) in P. falciparum, we gathered the yeast two-hybrid (Y2H) screen data from literature (LaCount et al., 2005) and the predicted data from String database (Szklarczyk et al.). After redundancy filtering process, we got the 1268 genes with unique 5259 PPIs of P. falciparum for this study. To verify our constructed network and identified modules, we download the other independent RNA-Seq dataset from PlasmoDB database (Aurrecoechea et al., 2009), which was published in Molecular Microbiology (Otto et al., 2010). This dataset contains 7 time points (0 h, 8 h, 16 h, 24 h, 32 h, 40 h and 48 h) covering all the IDC stages of P. falciparum. The normalized RNA-Seq dataset were transformed into log2 value to further verify our constructed comprehensive gene co-expression network. 2.2. Comprehensive gene co-expression network construction To evaluate the co-expression power in all genes, Pearson's correlation coefficients (PCCs) are used in this study to calculate gene pair correlation based on gene expression. For a given gene expression Þ matrix with n genes and m samples, all the nðn−1 gene correlation 2 pair number is generated. Each gene pair in this expression matrix is filtered as pair genes have PPI interaction or TF regulation. And in order to decrease false positive, a correlation threshold with absolute PPC value (|PCC| > 0.75) and Bonferroni-adjusted P value (P b 0.005) is used to select the co-expression gene pairs like those of previous researches. Based on this selection cutoff, a comprehensive gene

9

co-expression network of P. falciparum is constructed in which each node represents a gene while each edge represents co-expression gene pair with PPI interaction or TF regulation. 2.3. Co-expression module identification To facilitate further analyses of transcriptional mechanisms or protein complexes encoded in the above constructed gene co-expression network, we perform the ICE (Iterative Clique Enumeration) algorithm (Shi et al., 2010) to identify relatively independently co-expression modules with regularly maximal clique number in each module. Furthermore, we tried to test the reasonableness of extracted co-expression modules by using MCL (Markov cluster algorithm), comparing those modules using the ICE algorithm. And the modules by MCL are less separated as the gene number in these modules is overage or too little while the modules by ICE are not. Therefore, we select the ICE algorithm to propose co-expression modules in this study. Gene sets containing less than 5 genes overlapping with the network were removed from the analysis. These modules were visualized in the Cytoscape package (Smoot et al., 2011). 2.4. Module expression measurement In order to compare the expression pattern between modules, we employed the average expression of all genes in a module to represent the module expression. For a given module Mj, the average expression of module Mj could be estimated by using the formulation of ∑ EGk EMj ¼

Gk ∈Mj N Mj

. NMj represented the gene number of module Mj, while

EGk represented the gene expression in module Mj. 2.5. Functional enrichment analysis We conducted functional enrichment analysis for the networks based on Gene Ontology (GO) (Ashburner et al., 2000) and KEGG pathway (Kanehisa & Goto, 2000). Enrichment significance was determined by analyzing a hypergeometric distribution. For given a module with n genes and a priori M genes with functional category in the whole network, a hypergeometric test could evaluate the significance of the m genes in each module with the functional category. All N genes in the whole gene network are used as a reference. The significance of gene enrichment function in each module can be calculated N−M M  n X n−m   i . The cutoff of significance gene enrichment by p ¼ 1− N i¼m

n

analysis was selected with P b 0.0001 in this study. 2.6. Species-specific genes prediction As we all know, species-specific genes in eukaryotic parasites are usually considered as potential drug targets (Fichera & Roos, 1997; Li et al., 2003). It's worth evaluating the species-specific genes of eukaryotic parasites. We developed a method to identify the species-specific genes of P. falciparum based on sequence similarity by using BLAST against NR database (see our ISSPG sever: http://www.scbit.org/isspg/ upload/homepage.action). From these BLAST result, we could identify those genes which only exist in P. falciparum (species-specific genes), in Plasmodium genus (genus-specific genes) and in eukaryotic parasites (parasite-specific genes) separately (see Supplement Fig. 1 about ISSPG workflow). 2.7. Differentially co-expressed gene (DCG) detection Differentially co-expressed gene (DCG) identification is important to increasingly investigate the global transcriptional mechanisms in

10

F.-D. Yu et al. / Gene 518 (2013) 7–16

diseases research. In order to measure the DCGs' function in differential malaria development stages, we used the DCe (Differential Coexpression Enrichment) algorithm from DCGL R package (Liu et al., 2010) to make out DCGs of P. falciparum from all normalized gene expression dataset based on dramatical changes of differentially co-expressed links between DCGs and their neighbors as our recent work (Yu et al., 2011). And DCGs were measured among two different close-by development stages of P. falciparum with an FDR-adjusted P-value (q-value) of PCC (Pearson Correlation Coefficient) using the Benjamini–Hochberg method. The significant statistic DCGs were selected with the q-value threshold of 0.0001in this study. 3. Result 3.1. A knowledge guided genome scale comprehensive gene co-expression network with PPI interaction and TF regulation and module identification Microarray gene expression normalized data of P. falciparum 3D7 in this study were downloaded from DeRisi laboratory (Llinas et al., 2006) and regulated by using PlasmoDB (Aurrecoechea et al., 2009) corresponding data. This dataset of gene expression covered 53 time-points including different four IDC (Intraerythrocytic Development Cycle) stages. A knowledge base was built using the information of protein–protein interactions and transcription regulation, in which the protein–protein interaction (PPI) information of P. falciparum was collected from publication (LaCount et al., 2005) of yeast-two-hybrid (Y2H) experiment screening data and from STRING database (Szklarczyk et al.) of predicted PPIs data, while the transcription regulation data of transcription factors (ApiAp2 family in P. falciparum) driven data were captured from publication (Berger et al., 2006; Levy & Hannenhalli, 2002) and those from PlasmoDB database (Aurrecoechea et al., 2009) to adjust these regulation data. The co-expression relationship between genes was measured by Pearson Correlation Coefficients (PCCs) in this study. And these filtered co-expression gene pairs above the cutoff selection could be built to a gene co-expression network. A knowledge-guided method of PPI interaction and ApiAp2 transcription factor driven regulation was employed to turn this network into a comprehensive gene co-expression network, in which each node was a gene and two genes were connected by an edge representing co-expression relationship. Our constructed comprehensive gene co-expression network had 2547 nodes and 5538 edges while each edge restrictively represented one co-expression gene pair with PPI interaction or

ApiAp2-driven regulation above the selection cutoff (see Materials and methods section). As previously reported, biological networks, including gene co-expression networks, usually share some hierarchical scale-free network characteristics of the power law degree distribution and high clustering coefficient (Stuart et al., 2003; van Noort et al., 2004). Therefore, we could further examine these topological characteristics of our constructed network with some methods to measure the scale-free networks. In biological networks, modules usually represent functional pathways and molecular complexes as previous study (Lilburn et al., 2011; Shi et al., 2010; Spirin & Mirny, 2003; Wuchty et al., 2009; Zheng et al., 2011). So it's necessary to identify functional modules from our constructed gene co-expression network to study this kind of module or subnetwork functions of P. falciparum. Considering the characteristic of gene co-expression networks is usually in keeping with a scale-free distribution, it's applicable to identify modules or subnetworks from these co-expression networks using clique-based algorithms (Voy et al., 2006). Although we also tried to employ MCL (Markov cluster algorithm) to recognize modules from our constructed network, the identified modules were almost not separated completely. Some of these modules share genes, and some hub genes connected different modules, and therefore it was not easy for Cytoscape package to visualize the results and to narrow down the study. We applied the ICE (Iterative Clique Enumeration) algorithm (Shi et al., 2010) to propose module detection with the threshold of 5 cliques. Using the ICE algorithm, we obtained 10 core modules with 85 nodes (genes) and 852 edges (co-expression relation between two genes) from our constructed network (see Fig. 2, the topology of 10 core modules). The genes in each module could be considered as high proportion of co-expression with protein–protein interaction or transcriptional regulation, which could be disposed to explore relatively independent biological processes and transcription regulation programs as working together in P. falciparum. In this study hub genes among modules were defined to occupy 2 and more than 2 modules. Hub genes are highly connected nodes in given network, which often act as essential genes in PPI network as previously reported (Lilburn et al., 2011; Yu et al., 2004). 3.2. The representative network verification For systematic analysis of gene biological networks, it's important to verify the constructed networks or validate the gene function in networks. In general, representative network verification could be

Fig. 2. Modules identified by the ICE (Iterative Clique Enumeration) algorithm. 10 modules identified by ICE algorithm with GO enrichment annotation. We selected the most significant GO enrichment term as the module main function, although each module had more diversity GO term functions above the selection threshold (P-valueb 0.0001). In each module, each node represented one gene, and each edge represented one gene pair highly co-expressed with protein–protein interaction or transcription regulation. Module_x represents the module “x”, like Module_1 as Module 1 and Module_1_4 represents the hub genes in module 1 and module 4.

F.-D. Yu et al. / Gene 518 (2013) 7–16

approached from two possible perspectives using bioinformatics or computational analysis technique like previous research (Zheng et al., 2011). One is to verify the robustness of the measurement method of network construction and the network searching strategy. The other one is to validate the robustness of the gene expression patterns in network or modules (or subnetworks). Considering the characteristic of IDC stages in P. falciparum, we selected the other IDC stage transcriptome dataset generated by RNA-Seq sequencing technology (Otto et al., 2010) to verify our comprehensive gene co-expression network. In order to verify the robustness of the method network construction and the network searching strategy, this RNA-Seq dataset was used to construct comprehensive gene co-expression network and identify modules using identical procedures. We obtained a comprehensive gene co-expression network with 1418 nodes and 2465 edges, and 35.85% genes (913 nodes) were found in the original network. After modularization using the ICE algorithm, 9 modules with 69 nodes and 412 edges were generated from this verification network, in which there were 52.94% genes (45 nodes) were found in those original modules. In order to verify the gene expression patterns in modules in this study, we mapped the verification of 9 modules into those origin 10 modules. There were 7 modules (with 70%) corresponding to 8 original modules (see Supplemental Table 1). For module expression pattern comparison, we used average expression of all genes in a module to represent the module expression. The expression patterns of 7 verification modules were similar to the expression patterns of 8 original modules (see Supplemental Figs. 2A and B). From these representative verification comparison results, the consistency suggested the robustness and reproducibility of comprehensive gene co-expression network and identified modules.

3.3. Function analysis of conserved modules in P. falciparum gene co-expression network In our generated 10 modules, we found the module 1 and the module 4 with 18 hub genes (connecting 2 modules), which indicated that these two modules (module 1 and module 4) could propose similar function or compose of similar protein complex. This result indicated that it was necessary to detect the potential function of each module. In order to evaluate the possible functions of these identified modules, we performed function enrichment analysis with GO (Gene Ontology) biological process categories in each module. Among these 10 modules, all of them had at least one of the GO categories with 0.0001 P-value threshold, and the most significant GO function was selected to be the module function. In the GO annotation result, these modules carried out a diverse variety of biological processes including translation, ribosome biogenesis, protein catabolic process, mRNA processing, ubiquitin-dependent protein catabolic process, DNA replication, RNA splicing via transesterification reactions and regulation of transcription, and we selected the most significant GO enrichment result term as the module main function (see Fig. 2, about module topology and modules function). The interesting GO enrichment result of module 7 suggested that the genes of this module could associate with adhesion to host process by probably using some genes such as PFC0285c, PFC0350c, PFF0430w, or some other genes in this module (see Table 1). The topology result of modules showed that the module 1 and the module 4 could perform similar biological process like translation with the same GO term (GO:0006412, translation process) while some genes as hub gene represented these two modules (see Fig. 2 and Table 1). These hub genes (connecting 2 modules) among modules could connect different modules which might work together for some important biological processes. These GO enrichment analysis results indicated that each module identified by the ICE algorithm could fulfill highly functional homogeneity and relatively independent biological processes, which were mainly essential biological processes for development of P. falciparum.

11

Furthermore, we had done module pathway enrichment analysis with KEGG pathway to confirm the potential module function. In the result of module pathway enrichment, 8 modules out of 10 were enriched to at least one metabolic pathway including ribosome, ribosome biogenesis in eukaryotes, proteasome, spliceosome and DNA replication (see Table 2) with P-value less than 10 −6. While the pathway enrichment result about module 1 and module 4 was of the same annotation with ribosome metabolic pathway annotation (KEGG pathway ID: 03010), it ensured that these two modules function corresponding to their GO enrichment result. The module function enrichment results with GO and KEGG pathway presented that core modules identified from comprehensive gene co-expression network could perform cooperatively essential biological processes for parasite development in IDC stages, such as transcription, translation, DNA replication, and associated transcriptional regulation process. In addition, in our generated modules with enriched function annotation there were some genes in each module with uncharacteristic annotation or without GO categories or without KEGG pathway annotation (see Supplement Table 2), and these poorly annotated genes could carry out the function same as module function, which could be annotated at module level. And among these unannotated genes, some of them might be potential species-specific genes without orthologous genes in other species. Therefore, it's necessary to identify those potential species-specific genes and detect their possible function in our constructed P. falciparum network. 3.4. Species-specific genes in modules identified by the ICE algorithm The species-specific genes in eukaryotic parasites could mainly perform unique, species-specific biological processes, and are often used to be potential targets for anti-parasites (Bozdech, 2004; Singh et al., 2006). For drug development, some essential genes conserved across species are often candidates for broad-spectrum drug targets while those specific genes for one species are candidates for speciesspecific ones. Thus, it's necessary to detect these species-specific genes in P. falciparum for their functional study. One interesting approach to detect these genes is using comparative genomics of different species genome sequences, which could offer the insights into drug discovery. In order to identify these species-specific genes or parasites-specific genes (which were specific in parasite species), we developed a web server (ISSPG: Identification of Species-Specific eukaryotic Parasite Genes) based on sequence similarity (http://www.scbit.org/isspg/ upload/homepage.action). By using this web-server, we could get access to the potential specific genes, such as species-specific genes of P. falciparum, genus-specific genes in Plasmodium genus, and parasite-specific genes of eukaryotic parasite species (see Supplement Fig. 3). And these specific genes were important for drug development. Based on the species-specific gene predicted result and 10 modules identified by ICE algorithm, we found that four out of five genes in module 10 were genus-specific genes, four genes of PFD0885c, SIP2 (PFF0200c), MAL8P1.153 (one of ApiAP2 TFs) and MSP9 (PFL1385c) were genus-specific gene in Plasmodium (see Supplement Table 3). This result indicated that the genes of module 10 could work together specifically in Plasmodium parasites for transcription regulation process. Relying on the potential function of module 10 of DNA-dependent regulation of transcription, PFD0885c gene could also be inferred to perform similar function of transcriptional regulation. 3.5. Differentialy co-expressed genes among recognized 10 modules In each module, all genes were co-expressed with each other. And among co-expressed genes, some of them are often differentially co-expressed with other genes (named as their linkers) in different development stages. Identification of these differentially co-expressed genes (DCGs) could increase the investigation of the global transcriptional mechanisms or pathways. We used the DCe (Differential

12

F.-D. Yu et al. / Gene 518 (2013) 7–16

Table 1 All GO (Gene Ontology) annotation above the threshold for each module identified by ICE algorithm. Modules GO_ID

GO_term

1

GO:0006412 Translation

1 1

GO:0006414 Translational elongation GO:0019856 Pyrimidine base biosynthetic process GO:0022900 Electron transport chain GO:0006396 RNA processing GO:0042254 Ribosome biogenesis GO:0042255 Ribosome assembly GO:0042262 DNA protection GO:0006508 Proteolysis GO:0006510 ATP-dependent proteolysis GO:0006511 Ubiquitin-dependent protein catabolic process GO:0030163 Protein catabolic process GO:0006412 Translation

1 2 2 2 2 3 3 3 3 4

4 4 4 5 5 6 6

6 7 7 7 7 7 7 8 8 9

9 9 10

GO:0006414 Translational elongation GO:0019856 Pyrimidine base biosynthetic process GO:0022900 Electron transport chain GO:0006397 mRNA processing GO:0008380 RNA splicing GO:0006511 Ubiquitin-dependent protein catabolic process GO:0051603 Proteolysis involved in cellular protein catabolic process GO:0051604 Protein maturation GO:0006457 Protein folding GO:0007010 Cytoskeleton organization GO:0007015 Actin filament organization GO:0044267 Cellular protein metabolic process GO:0044406 Adhesion to host GO:0051083 ‘De novo’ cotranslational protein folding GO:0006260 DNA replication GO:0006270 DNA-dependent DNA replication initiation GO:0000375 RNA splicing, via transesterification reactions GO:0000398 Nuclear mRNA splicing, via spliceosome GO:0008380 RNA splicing GO:0006355 Regulation of transcription, DNA-dependent

P-value

Genes_enriched_module

Gene_number_module

MAL13P1.209 PF07_0043 PF07_0080 PF07_0088 PF08_0076 PF10_0038 PF10_0043 PF10_0264 PF10_0272 PF11_0065 PF11_0272 PF11_0313 PF11_0438 PF13_0045 PF13_0129 PF13_0132 PF13_0268 PF14_0231 PF14_0240 PF14_0391 PFC0775w PFD0770c PFD1055w PFE0185c PFE0845c PFE0850c PFE1005w PFF0885w PFI0190w 0.000610262 PF11_0245 PF11_0313 1.45E−05 PF13_0132 PFE1005w

30

4.34E−10 0.000115532 2.13E−05 2.13E−05 2.13E−05 8.06E−06 8.06E−06

PF07_0088 PF08_0055 PF08_0041 PF08_0041 PF08_0041 PF10_0081 PF10_0081

30 11 11 11 11 7 7

6.14E−07

PF13_0063 PF14_0632 PFL2345c

0

PF08_0076 PF10_0038 PF10_0264 PF13_0045 PFC0775w PF14_0174 PF11_0090 PF11_0090 PF11_0090 PF11_0303 PF11_0314 PF11_0303 PF11_0314

2.97E−12 0

30 30

7

PF10_0081 PF11_0314 PF13_0063 PFL2345c MAL13P1.209 PF07_0043 PF07_0080 PF10_0043 PF10_0264 PF11_0065 PF13_0045 PF13_0049 PF13_0129 PF13_0132 PF14_0230 PF14_0231 PF14_0240 PF14_0391 PF14_0448 PFC0775w PFD0770c PFE0350c PFE0845c PFE0850c PFE1005w PFF0885w 0.000310779 PF14_0486 PFC0400w 7.27E−06 PF13_0132 PFE1005w

7 24

3.77E−07 8.17E−06 5.85E−06 0

PF10_0264 PF14_0587 PF14_0194 PF07_0112

PF13_0045 PF14_0448 PFC0775w PFE0865c PFE0865c MAL8P1.128 PF13_0156 PFE0915c PFF0420c

24 6 6 5

0

PF07_0112 MAL8P1.128 PF13_0156 PFE0915c PFF0420c

5

6.29E−14 8.42E−10 2.03E−07

PF07_0112 PF13_0156 PFE0915c PFF0420c PF11_0331 PFC0285c PFC0350c PFF0430w PF11_0331 PFL2215w

5 5 5

2.03E−07

PF11_0331 PFL2215w

5

1.11E−15

PF11_0331 PFC0285c PFC0350c PFF0430w

5

1.43E−11 4.24E−10

PFC0285c PFC0350c PFF0430w PF11_0331 PFC0285c PFC0350c PFF0430w

5 5

5.81E−08 3.04E−07

PF07_0023 PF14_0177 PF14_0352 PF07_0023 PF14_0177

5 5

3.89E−09

PF10_0041 PF10_0217 PFD0265w

5

3.04E−07

PF10_0041 PFD0265w

5

2.94E−06 7.36E−05

PF10_0217 PFD0265w MAL8P1.153 PFF0200c

5 5

24 24

P-value represents the enrichment P-value; Gene_number represents the gene number of module.

Co-expression enrichment) algorithm from DCGL R package (Liu et al., 2010) to identify those DCGs in different developmental stages of P. falciparum 3D7. In this study, we divided the developmental IDC stages of P. falciparum into four stages of ring, trophozoite, schizont and merozoite from a time of 1 h to 53 h covering one IDC cycle based on previous literature (Bozdech et al., 2003). At the whole genome level, the potential DCGs between two close-by stages of above four IDC stages were detected above the threshold (see Fig. 3). Among the above 10 modules, we found that prpf8 (PFD0265w) gene in the module 9 was a significant DCG among three comparisons of ring and trophozoite, trophozoite and schizont, schizont and merozoite. This gene with annotation of putative pre-mRNA-processing splicing

factor 8 could carry out RNA splicing biological process, which corresponds to the modular annotation of GO term (GO:0000375 and GO:0008380) and spliceosome KEGG pathway (KEGG id: 03040) further confirm this potential gene function. Furthermore, malaria parasite metabolic pathway information (Ginsburg) could verify this gene to perform important biological function in the splicing of pre-mRNA pathway, while the expression pattern of prpf8 gene (see Supplement Fig. 4) in IDC stages showed that this gene was important for the P. falciparum development especially for the stages late-ring and trophozoite which could be involve in DNA replication, transcription and translation biological processes. In addition, this prpf8 gene was conserved from protozoa to human species in accordance with that of

F.-D. Yu et al. / Gene 518 (2013) 7–16

13

Table 2 Modules function enriched by KEGG pathway information. Module pathway_id pathway_name P-value

geneNum_module geneName_module expGeneNum_withpathway wholeGeneNum_withPathway geneNum_module

1

3010

Ribosome

0

30

2

3008

Ribosome biogenesis in eukaryotes

1.49E− 11

11

3

3050

Proteasome

0

7

4

3010

Ribosome

0

24

5

3040

Spliceosome

4.99E− 09

6

6

3050

Proteasome

0

5

8

3030

DNA replication

1.10E− 08

5

9

3040

Spliceosome

2.86E− 07

5

MAL13P1.209 PF07_0043 PF07_0080 PF07_0088 PF08_0076 PF10_0038 PF10_0043 PF10_0264 PF10_0272 PF11_0065 PF11_0272 PF11_0313 PF11_0438 PF13_0045 PF13_0129 PF13_0132 PF13_0268 PF14_0231 PF14_0240 PF14_0391 PFC0775w PFD1055w PFE0185c PFE0845c PFE0850c PFE1005w PFI0190w PF08_0041 PF08_0055 PF11_0191 PF14_0174 PFF0625w PF08_0109 PF10_0081 PF11_0303 PF11_0314 PF13_0063 PF14_0632 PFL2345c MAL13P1.209 PF07_0043 PF07_0080 PF10_0043 PF10_0264 PF11_0065 PF13_0045 PF13_0049 PF13_0129 PF13_0132 PF14_0230 PF14_0231 PF14_0240 PF14_0391 PF14_0448 PFC0400w PFC0775w PFE0350c PFE0845c PFE0850c PFE1005w PF11_0250 PF11_0108 PF14_0194 PF14_0587 PF07_0112 MAL8P1.128 PF13_0156 PFE0915c PFF0420c PF07_0023 PF14_0177 PFD0420c PF10_0041 PFC0375c PFD0265w

27

50

30

5

17

11

7

19

7

21

50

24

4

41

6

5

19

5

3

19

5

3

41

5

14

F.-D. Yu et al. / Gene 518 (2013) 7–16

R1_T2

T2_S3

348 589

257

380

523

505

617

S3_M4 Fig. 3. The statistic result of DCGs (differentially co-expressed genes) in all Plasmodium falciparum genes among two different close-by IDC stages. R1_T2 represents the two adjacent IDC stages of Ring and Trophozoite; T2_S3 represents the two adjacent stages of trophozoite and schizont; and S3_M4 represents the two adjacent stages of schizont and merozoites. The number in each circle means the DCG number in the relative adjacent IDC stages. Number in circles represents the gene number of DCGs in relative stage.

the HomoGene database (Sayers et al., 2013), which further shed insights that this conserved gene was essential and important in splicing biological process. According to the module function and prpf8 gene function, all the genes in the module 9 were more likely to work together to be involved in the splicing of pre-mRNA pathway. Therefore, the potential functions of DCGs detected in the two close-by stages of IDC development should coincide with the modules' function. During the ring and early trophozoite stages, P. falciparum genes were led to associate with the transcriptional and translational machinery, glycolysis and ribonucleotide biosynthesis (Bozdech et al., 2003), and the detected DCGs between these two stages were differentially co-expressed for similar functions, which were also enriched with their related modules' function like the modules of 1, 4, 7, and 9. Following stepping into the schizont stage, the malaria parasite cells will prepare for reinvasion of host new red blood cells (RBCs) by replicating and starting for those gene expressions by regulation process. Some of these genes were co-expressed differentially accounting for this process, which ranged over seven out of ten modules. During the late schizont and merozoite stages, the parasite cells step into schizogenesis process after replication preparation and invasion of new host erythrocyte cells, and merozoite member proteins were stage-specifically expressed for invasion process as previously reported (Bozdech et al., 2003; Llinas et al., 2006). However, little is known about what genes were involved in this process to regulate the expression of these stage-specific parasite genes (Bozdech et al., 2003). Our analysis result of core functional modules identified from our constructed comprehensive gene co-expression network and some DCGs detected in these stages among core modules suggested that genes, such as PF10_0272, PF14_0486, PFC0805w and prpf8 (PFD0265w) among the modules of 1, 4 and 9, shed some important clues of this gene regulation for invasion process.

3.6. P. falciparum invasion subnetwork As we all know, it's crucial for malaria parasites' survival and cell cycle to invade the host erythrocyte cell. Considering this reason, the malaria parasites' invasion process was made to account for much of the anti-malaria drug and vaccine development to prevent Plasmodium invasion. It's worth doing analysis on the P. falciparum invasion pathway in our constructed gene co-expression network. As for the above modularization result by the ICE algorithms, we did not found the core modules which were involved directly in the invasion pathway in the whole IDC developmental stages of P. falciparum. Thus in this study we performed invasion pathway analysis in order to uncover potential mechanisms of those genes involved in the invasion process in gene co-expression network by software GenRev using the Steiner Tree algorithms (Zheng & Zhao, 2012). Based on this invasion pathway information of malaria metabolic pathway database (Ginsburg), we collected manually 71 genes which were involved in this biological process, and searched invasion subnetwork in our constructed gene co-expression network with these invasion genes as input seed by the Steiner Tree algorithms. The invasion subnetwork was generated by the Steiner Tree algorithms (see Fig. 4), in which we could figure out some genes (like some transcription factor genes in this result) which were not found to be involved in the invasion process by using co-expression relationship. Each edge in this subnetwork represented co-expressed gene pair with protein–protein interaction or transcriptional regulation. In this study hub genes in subnetwork were defined with more than 10 linkers (or edges). We found that 6 genes in this subnetwork were hub genes which were highly connected to more than 10 nodes although these 6 genes were ApiAP2 family transcription factors. In order to study those genes which were not found in the invasion pathway while in our invasion subnetwork, we selected RH2a and RH2b gene pair as an example for description. As we all know, erythrocyte invasion of Apicomplexan parasites into host cells is involving multiple ligands known as micronems and rhoptries. For P. falciparum, there are two major protein families, named the Duffy binding-like family (DBL) and the Reticulocyte binding homologues (RH), that mediate erythrocyte invasion by binding to specific receptors of host red blood cell. RH2a (PF13_0198) and RH2b (MAL13P1.176) were identified as the first two members of the PfRH family (Rayner et al., 2000; Triglia et al., 2001). In chromosome location, these two genes were adjacently located genes on chromosome 13 as a head-to-head (H2H) gene pair with different orientation (Duraisingh et al., 2003; Dvorin et al., 2010; Rayner et al., 2000). And these two genes were proved as key adhesive molecules to invade host erythrocytes (Sahar et al., 2011; Triglia et al., 2011). Immune experiment indicated that RH2b N-terminal region played the key role for erythrocyte invasion by binding to the erythrocyte cell (Triglia et al., 2011). But transcription regulation for these two H2H genes was little reported in previous researches. Based on transcription factor (TF) binding site information, ApiAP2 family TFs could bind to the promoter region of these two H2H genes. From our invasion subnetwork analysis result, we found that this two H2H gene pair could share the two TF regulation including ApiAP2 family TFs of AP2-O (PF11_0442) and PFD0985w, while RH2a was also regulated by MAL8P1.153 (see Supplement Fig. 5). The gene expression profile of these two H2H genes and their potential transcriptional factor further confirmed the regulation of RH2a/2b gene pair (data not shown). As for previous research, malaria parasites invade host cells through a moving junction (MJ) complex on the parasite–host cell interface, and the MJ complex assembly is triggered by parasite rhoptry neck proteins (RONs) and their receptor. During this process, AMA1 (PF11_0344) as a receptor of P. falciparum RON2 (PF14_0495) worked together for initiating MJ complex assembly for erythrocyte invasion (Tonkin et al., 2011). Antibodies for binding of RON2 and AMA1 experiment revealed that RON2 initiated the junction formation while AMA1 further induced parasitophorous vacuole formation

F.-D. Yu et al. / Gene 518 (2013) 7–16

15

SERA4 RH3 RON6 P12

PF13_0097

RALP1

PF13_0161

DBLMSP MAL8P1.153

RH2a

MSP3

Pf34

Pf332

H101

MSP4

MSP10 AP2-O

RAP2 Pf92 EBA140

SERA3

EBA165 PFD0985w GLURP

SUB2

RH2b

P38

EBA175

P41

AARP

RH1 MTRAP

ASP PF14_0675

MSP2

AMA1 CLAG2

MSP9 MSRP3

SIP2

MSP6

RAMA

MAL7P1.170

RAP1 MCM2 PFF0670w

RhopH2 SERA5

CLAG3.2 PF10_0343

MSP7

RhopH3

CLAG3.1

PF14_0231 SERA6 PF11_0404 MSRP4 Pf113

MSP3.8

CLAG8

RON4 MSP1

PFE0840c

RON2

CLAG9

PF10_0324

RH5 RAP3

PF11_0277 PTRAMP

EBA181

Fig. 4. The topology of invasion subnetwork identified by the Steiner Tree algorithm. The triangle represents transcription factor. Green color represents the genes involving in invasion process were not reported in previous researches.

(Srinivasan et al., 2011). In our invasion subnetwork result, RON2 genes could form a protein complex with RON4, MSP1 and PF10_0324 for MJ complex formation. Furthermore, gene expression profile suggested that AMA1 gene was regulated by PF11_0404 positive regulation and PFD0985w negative regulation (see Supplement Fig. 6) while RON2 was regulated by PFE0840c. Taken together, our invasion subnetwork analysis of P. falciparum could indicate that the core members involved in this invasion process cooperated to invade the host erythrocytes for malaria parasite growth and development, including how to potentially regulate and interact with each other. 4. Conclusion and discussion The network biology approach analysis based on knowledge base could show some characteristics at topology level which could not be detected at other low dimensional levels, although there were some limitations, including missing data, noisy data, no more than ApiAp2 gene family transcription regulation data, and some others. A more informational integration of expression data with different dimension data like protein interaction and regulation data could help to reduce this kind of limitations. Despite these limitations, our comprehensive co-expression gene network analysis could produce known proved associations responsible for P. falciparum development in IDC stages. Besides, we also found some previously unrecognized results in the biology of this parasite, such as those genes in functional core modules with highly co-expressed, species-specific module like the module 10. On the other hand, our analysis result could functionally annotate putative annotation genes.

By using the clique-based ICE (Iterative Clique Enumeration) algorithm, relatively independent maximal cliques as modules could identify those core modules with essential functions for the development of P. falciparum in IDC stages. In our identified 10 modules enriched to carry out necessary biological functions, such as transcription, translation, regulation and metabolic processes. Some of the genes in the modules might also help to be involved in malaria parasite invasion to host erythrocyte process. And even some of the genes in these modules were differentially co-expressed during P. falciparum development in IDC stages. Among these DCGs (differentially co-expressed genes) by using DCGL R package, the gene of prpf8 (PFD0265w) encoded pre-mRNA processing splicing factor 8 was identified to co-express differentially in IDC stages. Incorporating speciesspecific gene analysis, we found some Plasmodium genus-specific genes in some of the above 10 modules to carry out species-specific functions with their similar module functions, although there were a little functional annotations or research reports. Considering the importance of invasion to erythrocyte process and those genes identified in the above modules involving this invasion pathway, we further employed the Steiner Tree algorithm to recognize the invasion subnetwork from our constructed comprehensive gene co-expression network. The subnetwork result implied that those genes involved in this process could corporate with each other or co-regulate among IDC stages. Some genes in this subnetwork were potential drug targets or vaccines because of their essential functions in invasion process. Taken together, our study could provide new insights into the mechanisms underlying pathogenesis, even virulence and parasite development.

16

F.-D. Yu et al. / Gene 518 (2013) 7–16

Competing interests The authors declare that they have no competing interests. Authors' contributions FDY contributed to the design and conception of this study, conducted computational experiments, performed and interpreted data and drafted the manuscript. SYY and FDY designed and developed the web ISSPG server. WH and YYL conceived this project and participated in its design, helped to analyze and interpret the data and drafted the manuscript. All authors contributed to and approved the final manuscript. Supplementary data to this article can be found online at http://dx. doi.org/10.1016/j.gene.2012.11.092. Acknowledgments We would like to thank Mr. Chao Chen and Dr. Xu-Feng Zhang from Shanghai Center for Bioinformation Technology for web development suggestion. And we thank for Dr. Qi Liu from Shanghai Jiao Tong University for web-server naming and web development. This work was supported by grants from the National Natural Science Foundation of China (30900834, 31171268) and the Shanghai Postdoctoral Sustentation Fund, China (09R21411100). References Ashburner, M., et al., 2000. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25 (1), 25–29. Aurrecoechea, C., et al., 2009. PlasmoDB: a functional genomic database for malaria parasites. Nucleic Acids Res. 37 (Database issue), D539–D543. Balaji, S., et al., 2005. Discovery of the principal specific transcription factors of Apicomplexa and their implication for the evolution of the AP2-integrase DNA binding domains. Nucleic Acids Res. 33 (13), 3994–4006. Berger, M.F., et al., 2006. Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat. Biotechnol. 24 (11), 1429–1435. Bozdech, Z., 2004. Molecular approaches to malaria: on the way to integration. Genome Biol. 5 (4), 319. Bozdech, Z., et al., 2003. The transcriptome of the intraerythrocytic developmental cycle of Plasmodium falciparum. PLoS Biol. 1 (1), E5. Cowman, A.F., Crabb, B.S., 2006. Invasion of red blood cells by malaria parasites. Cell 124 (4), 755–766. Crompton, P.D., Pierce, S.K., Miller, L.H., 2010. Advances and challenges in malaria vaccine development. J. Clin. Invest. 120 (12), 4168–4178. De Silva, E.K., et al., 2008. Specific DNA-binding by apicomplexan AP2 transcription factors. Proc. Natl. Acad. Sci. U. S. A. 105 (24), 8393–8398. Duraisingh, M.T., et al., 2003. Phenotypic variation of Plasmodium falciparum merozoite proteins directs receptor targeting for invasion of human erythrocytes. EMBO J. 22 (5), 1047–1057. Dvorin, J.D., et al., 2010. Functional diversification between two related Plasmodium falciparum merozoite invasion ligands is determined by changes in the cytoplasmic domain. Mol. Microbiol. 75 (4), 990–1006. Fichera, M.E., Roos, D.S., 1997. A plastid organelle as a drug target in apicomplexan parasites. Nature 390 (6658), 407–409. Gardner, M.J., et al., 2002. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature 419 (6906), 498–511. Ginsburg, H., a. Malaria Parasite Metabolic Pathwayshttp://priweb.cc.huji.ac.il/malaria/. Jurgelenaite, R., et al., 2009. Gene regulation in the intraerythrocytic cycle of Plasmodium falciparum. Bioinformatics 25 (12), 1484–1491.

Kanehisa, M., Goto, S., 2000. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28 (1), 27–30. LaCount, D.J., et al., 2005. A protein interaction network of the malaria parasite Plasmodium falciparum. Nature 438 (7064), 103–107. Le Roch, K.G., et al., 2003. Discovery of gene function by expression profiling of the malaria parasite life cycle. Science 301 (5639), 1503–1508. Levy, S., Hannenhalli, S., 2002. Identification of transcription factor binding sites in the human genome sequence. Mamm. Genome 13 (9), 510–514. Li, L., Stoeckert Jr., C.J., Roos, D.S., 2003. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13 (9), 2178–2189. Lilburn, T.G., et al., 2011. Protease-associated cellular networks in malaria parasite Plasmodium falciparum. BMC Genomics 12 (Suppl. 5), S9. Liu, B.H., et al., 2010. DCGL: an R package for identifying differentially coexpressed genes and links from gene expression microarray data. Bioinformatics 26 (20), 2637–2638. Llinas, M., et al., 2006. Comparative whole genome transcriptome analysis of three Plasmodium falciparum strains. Nucleic Acids Res. 34 (4), 1166–1173. Otto, T.D., et al., 2010. New insights into the blood-stage transcriptome of Plasmodium falciparum using RNA-Seq. Mol. Microbiol. 76 (1), 12–24. Rayner, J.C., et al., 2000. Two Plasmodium falciparum genes express merozoite proteins that are related to Plasmodium vivax and Plasmodium yoelii adhesive proteins involved in host cell selection and invasion. Proc. Natl. Acad. Sci. U. S. A. 97 (17), 9648–9653. Sahar, T., et al., 2011. Plasmodium falciparum reticulocyte binding-like homologue protein 2 (PfRH2) is a key adhesive molecule involved in erythrocyte invasion. PLoS One 6 (2), e17102. Sayers, E.W., et al., 2013. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 39 (2), D38–D51. Shi, Z., Derow, C.K., Zhang, B., 2010. Co-expression module analysis reveals biological processes, genomic gain, and regulatory mechanisms associated with breast cancer progression. BMC Syst. Biol. 4, 74. Sims, J.S., et al., 2009. Patterns of gene-specific and total transcriptional activity during the Plasmodium falciparum intraerythrocytic developmental cycle. Eukaryot. Cell 8 (3), 327–338. Singh, S., Malik, B.K., Sharma, D.K., 2006. Molecular drug targets and structure based drug design: a holistic approach. Bioinformation 1 (8), 314–320. Smoot, M.E., et al., 2011. Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics 27 (3), 431–432. Spirin, V., Mirny, L.A., 2003. Protein complexes and functional modules in molecular networks. Proc. Natl. Acad. Sci. U. S. A. 100 (21), 12123–12128. Srinivasan, P., et al., 2011. Binding of Plasmodium merozoite proteins RON2 and AMA1 triggers commitment to invasion. Proc. Natl. Acad. Sci. U. S. A. 108 (32), 13275–13280. Stuart, J.M., et al., 2003. A gene-coexpression network for global discovery of conserved genetic modules. Science 302 (5643), 249–255. Szklarczyk, D., et al. The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res. 39 (Database issue), D561–8. Tonkin, M.L., et al., 2011. Host cell invasion by apicomplexan parasites: insights from the co-structure of AMA1 with a RON2 peptide. Science 333 (6041), 463–467. Triglia, T., et al., 2001. Identification of proteins from Plasmodium falciparum that are homologous to reticulocyte binding proteins in Plasmodium vivax. Infect. Immun. 69 (2), 1084–1092. Triglia, T., et al., 2011. Plasmodium falciparum merozoite invasion is inhibited by antibodies that target the PfRh2a and b binding domains. PLoS Pathog 7 (6), e1002075. van Noort, V., Snel, B., Huynen, M.A., 2004. The yeast coexpression network has a smallworld, scale-free architecture and can be explained by a simple model. EMBO Rep. 5 (3), 280–284. Voy, B.H., et al., 2006. Extracting gene networks for low-dose radiation using graph theoretical algorithms. PLoS Comput. Biol. 2 (7), e89. Wuchty, S., Adams, J.H., Ferdig, M.T., 2009. A comprehensive Plasmodium falciparum protein interaction map reveals a distinct architecture of a core interactome. Proteomics 9 (7), 1841–1849. Yu, H., et al., 2004. Genomic analysis of essentiality within protein networks. Trends Genet. 20 (6), 227–231. Yu, H., et al., 2011. Link-based quantitative methods to identify differentially coexpressed genes and gene pairs. BMC Bioinformatics 12, 315. Zheng, S., Zhao, Z., 2012. GenRev: exploring functional relevance of genes in molecular networks. Genomics 99 (3), 183–188. Zheng, S., et al., 2011. Integrative network analysis identifies key genes and pathways in the progression of hepatitis C virus induced hepatocellular carcinoma. BMC Med. Genomics 4, 62.