Transcriptome analysis of the adult rumen fluke Paramphistomum cervi following next generation sequencing

Transcriptome analysis of the adult rumen fluke Paramphistomum cervi following next generation sequencing

GENE-40572; No. of pages: 7; 4C: Gene xxx (2015) xxx–xxx Contents lists available at ScienceDirect Gene journal homepage: www.elsevier.com/locate/ge...

1MB Sizes 0 Downloads 45 Views

GENE-40572; No. of pages: 7; 4C: Gene xxx (2015) xxx–xxx

Contents lists available at ScienceDirect

Gene journal homepage: www.elsevier.com/locate/gene

Research paper

Transcriptome analysis of the adult rumen fluke Paramphistomum cervi following next generation sequencing Vijayata Choudhary a,1, Sweta Garg a,1, Reetika Chourasia a,1, J.J. Hasnani a, P.V. Patel a, Tejas M. Shah b,1, Vaibhav D. Bhatt b, Amitbikram Mohapatra b, Damer P. Blake c, Chaitanya G. Joshi b,⁎ a b c

Department of Veterinary Parasitology, College of Veterinary Science & Animal Husbandry, AAU, Anand 388 001, Gujarat, India. Department of Animal Biotechnology, College of Veterinary Science & Animal Husbandry, AAU, Anand 388 001, Gujarat, India. Pathology and Pathogen Biology, Royal Veterinary College, Hawkshead Lane, North Mymms, Hertfordshire AL9 7TA, UK.

a r t i c l e

i n f o

Article history: Received 30 December 2014 Received in revised form 21 April 2015 Accepted 1 June 2015 Available online xxxx Keywords: Paramphistomum cervi Paramphistomosis Transcriptome Cathepsin Next generation sequencing

a b s t r a c t Rumen flukes are parasitic trematodes (Platyhelminthes: Digenea) of major socioeconomic importance in many countries. Key representatives, such as Paramphistomum cervi, can cause “Rumen fluke disease” or paramphistomosis and undermine economic animal productivity and welfare. P. cervi is primarily a problem in sheep, goat and buffalo production as a consequence of reduced weight gain and milk production, clinical disease or death. Recent technological advances in genomics and bioinformatics now provide unique opportunities for the identification and pre-validation of drug targets and vaccines through improved understanding of the biology of pathogens such as P. cervi and their relationship with their hosts at the molecular level. Here, we report next generation transcriptome sequencing analysis for P. cervi. RNAseq libraries were generated from RNA extracted from 15 adult P. cervi parasites sampled from each of three different host species (sheep, goat and buffalo) and a reference transcriptome was generated by assembly of all Ion Torrent PGM sequencing data. Raw reads (7,433,721 in total) were initially filtered for host nucleotide contamination and ribosomal RNAs and the remaining reads were assembled into 43,753 high confidence transcript contigs. In excess of 50% of the assembled transcripts were annotated with domain- or protein sequence similarity derived functional information. The reference adult P. cervi transcriptome will serve as a basis for future work on the biology of this important parasite. Using the widely investigated trematode virulence factor and vaccine candidate Cathepsin L as an example, the epitope GPISIAINA was found to be conserved in P. cervi isolated from three different host species supporting its candidacy for vaccine development and illustrating the utility of the adult P. cervi transcriptome. © 2015 Elsevier B.V. All rights reserved.

1. Introduction The rumen fluke, Paramphistomum cervi, is one of the most common causes of paramphistomosis. The parasite can affect a wide variety of livestock species across a broad geographical range including subtropical and tropical regions. Completion of the P. cervi lifecycle requires two hosts, featuring snail intermediate and mammalian (usually ruminant) definitive hosts. Infection of the definitive host is initiated by ingestion of encysted metacercariae attached to vegetation or floating in water (Shank and Russell, 1976). The subsequent pathology is predominantly associated with the immature fluke as it develops in the small intestine Abbreviations: RSEM, RNAseq by Expectation Maximization; GO, Gene ontology; EC, Enzyme commission numbers; KAAS, KEGG Automatic Annotation Server; BBH, Bidirectional best hit method; ANN, Artificial Neural Network; SVM, Support Vector Machine; SRA, Short Read Archive; NCBI, National Center for Biotechnology Information; KEGG, Kyoto Encyclopedia of Genes and Genomes. ⁎ Corresponding author. E-mail address: [email protected] (C.G. Joshi). 1 Authors have made equal contributions.

(duodenum). In contrast adult fluke are primarily located in the rumen (Dalton and Pole, 1978) where they are largely considered to be commensal, although severe mucosal damage can be provoked by heavy infection (Rolfe et al., 1994). Heavy parasite burdens commonly compromise livestock production through reduced feed conversion efficiency, loss of weight and decreased milk yield, incurring economic losses with elevated morbidity and mortality and compromised welfare (Kilani et al., 2003; Rolfe et al., 1991). Understanding parasite genomes and their impact on lifecycles and host interactions is commonly undermined by the absence of good quality resources, most notably for many non-zoonotic veterinary pathogens. Advances in sequencing technologies and assembly algorithms now provide cost-effective opportunities to address such questions, offering insights into parasite biology, parasite–host interactions and the impact of host diversity (Young et al., 2010). In the absence of a reference genome sequence assembly RNAseq transcriptome profiling using next generation sequencing (NGS) technologies has become an efficient alternative, offering insights into transcriptome size and nature, transcription levels, synonymous and non-synonymous polymorphism

http://dx.doi.org/10.1016/j.gene.2015.06.002 0378-1119/© 2015 Elsevier B.V. All rights reserved.

Please cite this article as: Choudhary, V., et al., Transcriptome analysis of the adult rumen fluke Paramphistomum cervi following next generation sequencing, Gene (2015), http://dx.doi.org/10.1016/j.gene.2015.06.002

2

V. Choudhary et al. / Gene xxx (2015) xxx–xxx

2. Materials and methods

molecular function and cellular component. The top ten BLAST hits with a cutoff E-value of ≤1e−6 and similarity cut-off of 55% were determined for GO annotation. The obtained annotations were enriched and refined using ANNEX. Level 2 of the GO annotations are presented. GO-slim term analysis was also performed using Blast2GO to obtain a broad overview of the ontology distributions. The generic-slims were specifically chosen to implement the GO-slim step. Enzyme Commission (EC) numbers of the corresponding GO annotated sequences were also obtained with an E-value cutoff of 1e − 6. KEGG pathways were assigned to the assembled transcripts using the online KEGG Automatic Annotation Server (KAAS, http://www. genome.jp/tools/kaas) (Moriya et al., 2007). The KEGG Ortholog assignments and pathway maps were obtained using the bidirectional best hit method (BBH) on the KAAS website.

2.1. Collection of parasites

2.4. Full length cDNA prediction

Individual adult P. cervi were sampled from the rumen of sheep, goat and buffalo at an Ahmedabad abattoir in Gujarat, India, washed in 1 × phosphate buffered saline (PBS; pH 7.4) and stored in RNAlater (Invitrogen, UK) at −80 °C. Parasite identity was confirmed histologically by comparison of the pharynx, genital opening and acetabulum type (Fig. S1) (Coskun et al., 2012).

Putative full-length cDNAs were identified using the online tool TargetIdentifier (Min et al., 2005) and comparison with the NCBI nonredundant protein database with a cutoff E-value of 1e − 5. Once the start codon (ATG) and poly(A) tail were identified, the cDNA sequence was considered a full-length cDNA.

and the occurrence of novel splice variants (Morozova et al., 2009). Opportunities include the identification of conserved parasitespecific metabolic pathways and mechanisms of host interaction which may be exploited in the development of novel drugs and vaccines (Fitzpatrick et al., 2005). For P. cervi the economic importance of infection has long been recognized in ruminants but appropriate genomic and transcriptomic datasets are currently lacking. The study described here elucidates for the first time the adult P. cervi transcriptome using next-generation (high throughput) sequencing and advanced in silico analyses as a foundation for future genomic, proteomic and systems biological studies and may prove of value in the development of transmission blocking vaccines.

2.2. Ion PGM transcriptome sequencing Fifteen adult P. cervi flukes collected from a single representative of each host species were used for RNA extraction. Total RNA was extracted using TRIzol® reagent (Invitrogen, UK) and purified using an RNeasy® mini kit according to the manufacturer's instructions (Qiagen, Netherlands). RNA integrity and concentration were confirmed using the Bioanalyzer 2100 (Agilent Technologies, UK) RNA 6000 Nanochip. mRNA was isolated from each RNA sample using a mRNA isolation kit (Roche, Germany), according to the manufacturer's instructions. mRNA was fragmented using RNase III at 37 °C for 4 min and fragmentation was confirmed by RNA 6000 Picochip on the Bioanalyzer 2100 (Agilent Technologies, UK). Strand-specific transcriptome libraries were prepared for each sample using an Ion Total RNA-Seq Kit v2, clonally amplified and sequenced on the Ion PGM system (Life Technologies) as per the manufacturer's instructions. 2.3. Bioinformatics analyses An overview of the biological and bioinformatics pipeline used is shown in Fig. S2. Clean reads were obtained from the raw Ion torrent data by trimming out adaptor sequences and poly-A-tails, removing reads with mean quality scores b 20, ambiguous bases ‘N’, low complexity and short reads (b60 bp) using PRINSEQ v0.20.2 (http:// prinseq.sourceforge.net/). These filtered RNAseq reads were aligned to the Bos taurus cow genome (version bosTau7) using GMAP (Wu and Watanabe, 2005) with default settings to remove contaminating host-derived sequences. The resulting non-aligned reads were considered to represent the adult P. cervi transcriptome. These remaining high quality adult P. cervi reads from all three hosts (sheep, goat and buffalo) were combined into a single dataset and assembled using Trinity to generate a ‘reference’ de novo transcriptome assembly (Haas et al., 2013). RNAseq by Expectation Maximization (RSEM) (Li and Dewey, 2011) was used to estimate the transcript FPKM values (fragments per kilobase of transcript per million mapped reads) for each of the host-specific datasets based on read abundance using the Bowtie aligner (Langmead et al., 2009). The assembled transcript contigs were subjected to similarity search against NCBI's non-redundant (nr) database using BLASTx, with a permissive cut-off E-value of ≤ 1e − 6. The BLASTx results were then combined and imported into Blast2GO v2.6.2 (Conesa et al., 2005) for gene ontology (GO) term analysis, describing biological process,

2.5. T cell epitope prediction Contigs predicted to encode the vaccine candidates Cathepsin B, Cathepsin D, Cathepsin L and Legumain were subjected to T cell epitope prediction using CTLPred (Bhasin and Raghava, 2004). To optimize the accuracy of prediction both Artificial Neural Network (ANN) and Support Vector Machine (SVM) based prediction methods were used with cutoff scores set to the default. 3. Results 3.1. Transcriptomic characterization of adult P. cervi We produced a total of 7,433,721 single-end (SE) reads using Ion Torrent PGM technology, representing 1.14 Gb of data (Table 1). Because the parasites were collected in vivo we initially used the genome alignment tool GMAP (Wu and Watanabe, 2005) to sequentially align the RNAseq reads to the B. taurus (bosTau7) reference genome to filter out contaminating host reads. After quality assessment and data filtering 3,272,893 reads (50.26% GC content) were selected for de novo assembly using a combined reads approach. Filtered reads from adult P. cervi sampled from sheep, goat and buffalo hosts were combined. Using Trinity software (Haas et al., 2013) 43,753 transcript contigs were assembled, presenting a mean length of 532 bp and an N50 length of 658 bp. The GMAP aligner mapped 99% of the reads back onto the assembled contigs, considering only properly mapped reads. All sequencing reads were deposited into the Short Read Archive (SRA) of the National Center for Biotechnology Information (NCBI), and can be

Table 1 Summary of reads and assembled transcript contigs of Paramphistomum cervi sampled from sheep, goat and buffalo.

Total raw reads per host Total raw reads in pooled data Total bases in pooled data (Gb) Total reads after quality filter Q20 bases (Mb) GC percentage Total assembled transcript contigs Assembled transcript contigs mean size (bp) Assembled transcript contig N50 size (bp)

Sheep

Goat

Buffalo

2,677,227 7,433,721 1.14 3,272,893 550 50.26 43,753 532 658

1,721,860

3,034,634

Please cite this article as: Choudhary, V., et al., Transcriptome analysis of the adult rumen fluke Paramphistomum cervi following next generation sequencing, Gene (2015), http://dx.doi.org/10.1016/j.gene.2015.06.002

V. Choudhary et al. / Gene xxx (2015) xxx–xxx

3

accessed under the accession numbers SRA091604 (sheep), SRA039814 (goat) and SRA091607 (buffalo). Parasite species identity was confirmed by 293 bidirectional BLASTn using GenBank sequence KF475773 (P. cervi mitochondrion, complete sequence) as the query, identifying contig comp12830_c0_seq2 as the top hit for cytochrome oxidase subunit-1. Reciprocal BLASTn comparison confirmed the top hit to be KF475773 (3.00e−70). The 43,753 assembled transcript contigs were queried against NCBI nr databases. We found 23,258 contigs (53.16%) similar to proteins in the nr database. The top-hit species distribution is presented in Fig. 1. Among the nr BLASTx top hits, 13,676 were Clonorchis sinensis proteins. A further 4315 top hits were against Schistosoma mansoni proteins as well 2529 (Schistosoma japonicum), 422 (Echinococcus granulosus) and 222 (Hymenolepis microstoma) protein hits. Combined, these five parasite species accounted for 91% of the total nr top hits. A strong correlation was observed between transcript contig length and annotation success with 60% of the contigs between 500–1999 bp in length successfully annotated, whereas 85% of the longer contigs (N 2000 bp) retrieved hits above the e-value cutoff (Fig. 2). 3.2. Global gene transcription profile between P. cervi sampled from sheep, goat and buffalo In an attempt to understand the dynamic performance between the three sampled P. cervi transcriptomes, abundance estimation was applied to quantify transcription levels. Comparison of the output revealed data skewed in favor of the sheep-derived P. cervi dataset, presenting ~8.7 and 6.6 fold greater transcript complexity than the goat and buffalo datasets which precluded further host-specific analysis (Table S1, Fig. S3). Analysis of the combined dataset found that the majority of genes involved in fatty acid elongation, fatty acid metabolism, the citrate cycle (TCA cycle), glycolysis, gluconeogenesis, oxidative phosphorylation and amino acid metabolism were transcribed in P. cervi (Table S2). Genes associated with sugar and cobalamin binding were upregulated, as were genes associated with transport of cations, anions and oligopeptides. Among the glucose transporter genes comp13958_c0_seq3 (FPKM N 80) and comp14322_c4_seq1 (FPKM N30) were transcribed at high levels (Table S1). The gene encoding phosphoenolpyruvate carboxykinase (comp14436_c3_seq2; PEPCK, E4.1.1.32), which can transform oxaloacetate to phosphoenolpyruvate and play a role in malic acid disproportionation (Wu et al., 2012), was also found to be transcribed at a high level (FPKM N 400) in P. cervi. In addition, fructose-1, 6-bisphosphatase (comp7024_c0_seq1), a key regulatory enzyme of gluconeogenesis, was also highly transcribed

Fig. 2. De novo assembly contig length distribution. Histogram of the sequence-length distribution of transcript contigs with significant BLASTx hits in the NCBI nr database.

(FPKM N 20) (Table S1). The Cathepsin D gene comp14449_c0_seq3, a major component of the lysosomal proteolytic system (Koike et al., 2000), was transcribed at extremely high levels in the adult fluke (FPKM N 300), indicating that P. cervi could utilize proteins from the host (Table S1). 3.3. Functional and pathway annotation of proteins encoded in the P. cervi transcriptome The associated BLASTx hits were searched for their respective Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) and Enzyme Commission (EC) codes for each transcript contig and the highest bit score was selected. The annotated contigs were then assigned to Gene Ontology (GO) terms for functional classification. Three main categories of GO classification (i.e., biological process, molecular function and cellular component) were analyzed separately to learn as much as possible about their functional distribution. A total of 11,952 of the annotated transcript contigs could be assigned to one or more GO term. To simplify the functional distribution, the annotated

Fig. 1. Top-Hit species distribution of the Paramphistomum cervi reference transcriptome contigs showing an abundance of top-hits to sequences from the Trematoda.

Please cite this article as: Choudhary, V., et al., Transcriptome analysis of the adult rumen fluke Paramphistomum cervi following next generation sequencing, Gene (2015), http://dx.doi.org/10.1016/j.gene.2015.06.002

4

V. Choudhary et al. / Gene xxx (2015) xxx–xxx

Fig. 3. Distribution of GO-slim functional classifications. (A), (B) and (C) are the distribution of the level 2 biological process, cellular component and molecular function for the Paramphistomum cervi reference assembly.

Please cite this article as: Choudhary, V., et al., Transcriptome analysis of the adult rumen fluke Paramphistomum cervi following next generation sequencing, Gene (2015), http://dx.doi.org/10.1016/j.gene.2015.06.002

V. Choudhary et al. / Gene xxx (2015) xxx–xxx

sequences were assigned to GO-slim terms to obtain a “thin” version of classification. Metabolic process (GO:0008152) and cellular process (GO:0009987) within biological process, binding activity (GO:0005488) and catalytic activity (GO:0003824) within molecular function and cell (GO:0005623), and organelle (GO:0043226) within cellular component were the most representative level 2 GO terms in the P. cervi assembled datasets (Fig. 3). All annotated contigs were then associated with enzyme codes (ECs), which returned 6005 unique EC numbers. To further identify the biological pathways active during adult P. cervi infection, annotated transcripts were searched against pathway collections in the KEGG database. A total of 84 pathways were predicted (Table S2). The most representative pathways included purine and pyrimidine metabolism, involving DNA and RNA biosynthesis and metabolism. Aminoacyl-tRNA biosynthesis, pyruvate metabolism as well as N-glycan biosynthesis pathways were also prominent within the mapping results. An InterproScan analysis of transcript contigs identified 19,622 conserved protein families in P. cervi. 3.4. Full-length cDNA prediction A total of 7003 full-length and open reading frame (ORF) completelysequenced contigs were identified from the assembly with a cutoff E-value of 1e − 5, with sequence lengths from 201 bp to 5213 bp (Fig. S4). Most of the identified full-length cDNA sequences were shorter than 1 kb, suggesting that those long full-length cDNA sequences were not easily assembled using only the current set of transcriptome data. 3.5. Identification of vaccine candidate homologues and T-cell epitope prediction We classified the P. cervi assembled transcript proteases into four functional groups based on catalytic type, namely serine, threonine, aspartate and metallo- or cysteine proteases. Cysteine proteases showed the highest transcription (~60%) levels among the four types of proteases in all three sampled P. cervi (Tables S1 and S2). Of the P. cervi cysteine proteases, a Cathepsin B-like isoenzyme was transcribed across all three P. cervi datasets. The predicted epitope GPISIAINA within the Cathepsin L gene was identified in all three P. cervi datasets. The predicted epitopes RLPGFVRYK (sheep and goat), ARLPGFVRY (sheep and buffalo) and IPYGNEYAL (goat and buffalo) were shared by two of the three sampled P. cervi datasets. 4. Discussion Presented here is, to our knowledge, the first high throughput transcriptome dataset for the amphistome P. cervi, an invasive parasite that threatens important livestock species. Recently, identification of P. cervi has been complicated with diagnosis of Calicophoron daubneyi (Gordon et al., 2013). In the study described here histological parasite identification was supplemented by bioinformatics comparison of the mitochondrial cytochrome oxidase subunit-1 to confirm subject identity as P. cervi. In total 43,753 transcript contigs were generated from adult P. cervi fluke sampled from three different ruminant hosts. The number of flukes used was kept to a minimum to reduce complications of intraspecific sequence heterogeneity, as reported in previous studies (Blouin et al., 1995). These data will support a broad spectrum of molecular and biological research on this ecologically important parasite and can inform study of the more pathogenic immature fluke lifecycle stage. As P. cervi lives in close association with its host, we used exhaustive filtering to remove all host-derived contamination from the raw data. We used a combined reads approach to merge all data generated (Haas et al., 2013) to assemble a reference transcriptome that had low redundancy and high completeness. For adult P. cervi, 53% of the proteins predicted here had putative C. sinensis homologues, with lower percentages (10–20%) for other

5

digenean trematodes such as S. mansoni and S. japonicum, possibly reflecting differences in lifecycle stage sampled, lifestyle and habitat (alimentary compared to blood systems). Orphan molecules (i.e. those for which no homologues were identified) may represent P. cervi-specific transcripts, highlighting the biological uniqueness of the parasite and lifecycle stage sampled. Others were likely to be poorly assembled reads, intergenic non-coding RNAs and 3′ or 5′ untranslated regions. Approximately one third of the peptides inferred from the adult stage P. cervi transcriptome mapped to biological pathways linked to pyruvate metabolism, aminoacyl-tRNA biosynthesis, oxidative phosphorylation, TCA cycle, pentose phosphate pathway and N-glycan biosynthesis, the latter of which was particularly interesting. Glycans are abundant on the surfaces of helminths and within their secreted antigens (Cummings and Nyame, 1996; Khoo and Dell, 2001). Many helminth glycans contain highly antigenic moieties that comprise either unusual (foreign) monosaccharides, or a foreign sequence or unusual linkage of common monosaccharides. Such immunogenic glycan antigens can induce specific antiglycan antibody responses which are dominant within many helminth-infected hosts (van Die and Cummings, 2006). There is good evidence that humoral immunity can provide protection against infections by parasitic trematodes in the Schistosoma genus, which suggests possibilities for future developments of glycan-based vaccines (Moloney and Webbe, 1990; Jankovic et al., 1999). Targeting the biosynthesis of the unusual glycans in parasitic helminths may provide new treatments and preventatives for these infections. While our experiment was not designed to identify host-specific differential transcription between the three P. cervi samples, we did use methods developed for comparison of RNAseq libraries to infer gene transcription for each. The low complexity identified for the goat and buffalo derived datasets, compared to that from the sheep, may be an artifact of the library preparation process (Table S1). For this reason only the sheep P. cervi dataset was considered further. Consistent with data from C. sinensis developmental RNAseq libraries (Yoo et al., 2011), increased transcription of genes with protein kinase and cysteine-type peptidase activities predominated. Oxidoreductase activity also increased, consistent with the transcription pattern of detoxification genes in C. sinensis (Yoo et al., 2011). As P. cervi adults inhabit the rumen, in an anaerobic and low glucose environment, their energy sources and metabolic pathways are focal points of their parasitic biology. The energy requirements of many organisms are mostly satisfied with fatty acids. Trematode parasites, however, are thought to lack a de novo fatty acid synthesis pathway (Berriman et al., 2009). In parasites, fatty acid binding proteins (FABPs) are critical proteins that enable fatty acid transportation (Janvilisri et al., 2007). These proteins were found to be highly transcribed in P. cervi and likely facilitate the ability of the adult rumen fluke to efficiently utilize fatty acids from the rumen. Moreover, unlike Schistosoma, KEGG pathway analysis suggests that fatty acid elongation can occur within P. cervi since enzymes involved in the pathway were transcribed at high levels (FPKM N 20) in adult fluke. Additionally, gene encoding enzymes involved in the pathway that converts glucose to acetyl-CoA were transcribed at high levels (FPKM N 20) in P. cervi. This observation is consistent with a study showing glucose serving as an energy source in C. sinensis (van Grinsven et al., 2009; Huang et al., 2013). Given these data, how does the adult fluke obtain enough glucose in the rumen? Nineteen transcripts encoding glucose transporters were identified. These highly transcribed glucose transporters might help the adult fluke absorb blood glucose efficiently from capillaries destroyed in the rumen epithelia when the adult invades the rumen. Meanwhile, genes involved in gluconeogenesis were also found in the P. cervi transcriptome (Tables S1 and S2). Taken together, P. cervi adult fluke might have the ability to generate glucose from non-carbohydrate carbon substrates by gluconeogenesis. Amino acids, such as cysteine, serine, alanine and glycine can be transformed to enter the TCA cycle and could also provide energy and metabolites.

Please cite this article as: Choudhary, V., et al., Transcriptome analysis of the adult rumen fluke Paramphistomum cervi following next generation sequencing, Gene (2015), http://dx.doi.org/10.1016/j.gene.2015.06.002

6

V. Choudhary et al. / Gene xxx (2015) xxx–xxx

Excretory–secretory proteins (ESPs) of parasites have attracted attention in the research community because of their potential uses in the development of diagnostics, vaccines, and drug therapies (Ju et al., 2009). Based on enrichment analysis of the biological process and molecular function, these genes were found to be enriched in lipidbinding, -transport and cysteine-type peptidase functions for P. cervi. Nineteen of them, including putative Cathepsins B, D and L were assigned to the cysteine protease family. Six of these genes were transcribed at high levels (FPKM N 300), particularly comp14407_c0_seq4 and comp14932_c2_seq6 (FPKM N 600). Cysteine protease family members play important roles in parasite invasion by degrading host proteins (Na et al., 2006). These putative ESPs provide potential targets for anti-parasitic drugs and may be developed as vaccine candidates. Cysteine, aspartic and metalloproteases, as well as aminopeptidases, have been implicated in important aspects of parasite function, including hemoglobin digestion and anticoagulant activity. Comparative transcriptome data can aid target selection by indicating potential functionally important proteins such as proteases that are enriched in the rumen-dwelling adult stages, suggesting a role in blood feeding as well as accessibility to host antibodies. Cathepsin B proteases (cbl) are part of an ordered hemoglobin degradation pathway, functioning after aspartic proteases (APRs) and upstream of metalloproteases (MEPs) and aminopeptidases (Williamson et al., 2004). Cathepsin B diversity may therefore be key in generating an array of substrates from ingested nutrients for efficient cleavage by downstream proteases and may be involved in the high blood digestion capacity of P. cervi. The cbl genes show increased transcription in adult P. cervi, identifying these as potentially important control targets. Comparison between the different host-derived datasets offered value through opportunities to identify conserved epitopes, promoting targeted vaccine development. The transcriptomic data presented here can prove of value to fundamental molecular studies of parasite development, reproduction and metabolic pathways. In the ongoing absence of genomic sequences for P. cervi a future focus should be developing functional genomics assays for different developmental stages of P. cervi. In addition, these transcriptomic data will also support future assembly of the P. cervi genome as well as the determination of gene structures, prediction of alternate transcript splicing and the characterization of regulatory elements. Ultimately, these advances will provide a sound platform for the delivery of applied outcomes, including the development of novel drugs and vaccines against P. cervi as well as tools for the diagnosis of paramphistomosis. Supplementary data to this article can be found online at http://dx. doi.org/10.1016/j.gene.2015.06.002. Conflicts of interest disclosure The authors have no conflict of interest. Acknowledgments We acknowledge the help rendered by the Department of Animal Biotechnology, AAU, Anand, Gujarat for providing the sequencing and bioinformatics analysis facility and all the members of the department for their help throughout this project. This work was supported by the AAU University Plan Project (AAU/Compt/BGT/ Plan R-3/10787-836/2013), who had no involvement in the study design, sample collection, data analysis or preparation of this manuscript. References Berriman, M., Haas, B.J., LoVerde, P.T., Wilson, R.A., Dillon, G.P., Cerqueira, G.C., Mashiyama, S.T., Al-Lazikani, B., Andrade, L.F., Ashton, P.D., Aslett, M.A., Bartholomeu, D.C., Blandin, G., Caffrey, C.R., Coghlan, A., Coulson, R., Day, T.A., Delcher, A., DeMarco, R., Djikeng, A., Eyre, T., Gamble, J.A., Ghedin, E., Gu, Y., HertzFowler, C., Hirai, H., Hirai, Y., Houston, R., Ivens, A., Johnston, D.A., Lacerda, D.,

Macedo, C.D., McVeigh, P., Ning, Z., Oliveira, G., Overington, J.P., Parkhill, J., Pertea, M., Pierce, R.J., Protasio, A.V., Quail, M.A., Rajandream, M.A., Rogers, J., Sajid, M., Salzberg, S.L., Stanke, M., Tivey, A.R., White, O., Williams, D.L., Wortman, J., Wu, W., Zamanian, M., Zerlotini, A., Fraser-Liggett, C.M., Barrell, B.G., El-Sayed, N.M., 2009. The genome of the blood fluke Schistosoma mansoni. Nature 460, 352–358. Bhasin, M., Raghava, G.P., 2004. Prediction of CTL epitopes using QM, SVM and ANN techniques. Vaccine 22, 3195–3204. Blouin, M.S., Yowell, C.A., Courtney, C.H., Dame, J.B., 1995. Host movement and the genetic structure of populations of parasitic nematodes. Genetics 141, 1007–1014. Conesa, A., Gotz, S., Garcia-Gomez, J.M., Terol, J., Talon, M., Robles, M., 2005. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21, 3674–3676. Coskun, S., Eslami, A., Halajian, A., Nikpey, A., 2012. Amphistome species in cattle in South coast of Caspian Sea. Iran. J. Parasitol. 7, 32–35. Cummings, R.D., Nyame, A.K., 1996. Glycobiology of schistosomiasis. FASEB J. 10, 838–848. Dalton, P.R., Pole, D., 1978. Water-contact patterns in relation to Schistosoma haematobium infection. Bull. World Health Organ. 56, 417–426. Fitzpatrick, J.M., Johnston, D.A., Williams, G.W., Williams, D.J., Freeman, T.C., Dunne, D.W., Hoffmann, K.F., 2005. An oligonucleotide microarray for transcriptome analysis of Schistosoma mansoni and its application/use to investigate gender-associated gene expression. Mol. Biochem. Parasitol. 141, 1–13. Gordon, D.K., Roberts, L.C., Lean, N., Zadoks, R.N., Sargison, N.D., Skuce, P.J., 2013. Identification of the rumen fluke, Calicophoron daubneyi, in GB livestock: possible implications for liver fluke diagnosis. Vet. Parasitol. 195, 65–71. Haas, B.J., Papanicolaou, A., Yassour, M., Grabherr, M., Blood, P.D., Bowden, J., Couger, M.B., Eccles, D., Li, B., Lieber, M., Macmanes, M.D., Ott, M., Orvis, J., Pochet, N., Strozzi, F., Weeks, N., Westerman, R., William, T., Dewey, C.N., Henschel, R., Leduc, R.D., Friedman, N., Regev, A., 2013. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 8, 1494–1512. Huang, Y., Chen, W., Wang, X., Liu, H., Chen, Y., Guo, L., Luo, F., Sun, J., Mao, Q., Liang, P., Xie, Z., Zhou, C., Tian, Y., Lv, X., Huang, L., Zhou, J., Hu, Y., Li, R., Zhang, F., Lei, H., Li, W., Hu, X., Liang, C., Xu, J., Li, X., Yu, X., 2013. The carcinogenic liver fluke, Clonorchis sinensis: new assembly, reannotation and analysis of the genome and characterization of tissue transcriptomes. PLoS ONE 8, e54732. Jankovic, D., Wynn, T.A., Kullberg, M.C., Hieny, S., Caspar, P., James, S., Cheever, A.W., Sher, A., 1999. Optimal vaccination against Schistosoma mansoni requires the induction of both B cell- and IFN-gamma-dependent effector mechanisms. J. Immunol. 162, 345–351. Janvilisri, T., Likitponrak, W., Chunchob, S., Grams, R., Vichasri-Grams, S., 2007. Charge modification at conserved positively charged residues of fatty acid binding protein (FABP) from the giant liver fluke Fasciola gigantica: its effect on oligomerization and binding properties. Mol. Cell. Biochem. 305, 95–102. Ju, J.W., Joo, H.N., Lee, M.R., Cho, S.H., Cheun, H.I., Kim, J.Y., Lee, Y.H., Lee, K.J., Sohn, W.M., Kim, D.M., Kim, I.C., Park, B.C., Kim, T.S., 2009. Identification of a serodiagnostic antigen, legumain, by immunoproteomic analysis of excretory–secretory products of Clonorchis sinensis adult worms. Proteomics 9, 3066–3078. Khoo, K.H., Dell, A., 2001. Glycoconjugates from parasitic helminths: structure diversity and immunobiological implications. Adv. Exp. Med. Biol. 491, 185–205. Kilani, K., Guillot, J., Chermett, R., 2003. Amphistomes: digestive. In: Lefevre, P.C., Blanco, J., Chermatt, J. (Eds.), TEC & DOC Edition, pp. 1400–1410. Koike, M., Nakanishi, H., Saftig, P., Ezaki, J., Isahara, K., Ohsawa, Y., Schulz-Schaeffer, W., Watanabe, T., Waguri, S., Kametaka, S., Shibata, M., Yamamoto, K., Kominami, E., Peters, C., von Figura, K., Uchiyama, Y., 2000. Cathepsin D deficiency induces lysosomal storage with ceroid lipofuscin in mouse CNS neurons. J. Neurosci. 20, 6898–6906. Langmead, B., Trapnell, C., Pop, M., Salzberg, S.L., 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25. Li, B., Dewey, C.N., 2011. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinf. 12, 323. Min, X.J., Butler, G., Storms, R., Tsang, A., 2005. TargetIdentifier: a webserver for identifying full-length cDNAs from EST sequences. Nucleic Acids Res. 33, W669–W672. Moloney, N.A., Webbe, G., 1990. Antibody is responsible for the passive transfer of immunity to mice from rabbits, rats or mice vaccinated with attenuated Schistosoma japonicum cercariae. Parasitology 100 (Pt 2), 235–239. Moriya, Y., Itoh, M., Okuda, S., Yoshizawa, A.C., Kanehisa, M., 2007. KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res. 35, W182–W185. Morozova, O., Hirst, M., Marra, M.A., 2009. Applications of new sequencing technologies for transcriptome analysis. Annu. Rev. Genomics Hum. Genet. 10, 135–151. Na, B.K., Kim, S.H., Lee, E.G., Kim, T.S., Bae, Y.A., Kang, I., Yu, J.R., Sohn, W.M., Cho, S.Y., Kong, Y., 2006. Critical roles for excretory–secretory cysteine proteases during tissue invasion of Paragonimus westermani newly excysted metacercariae. Cell. Microbiol. 8, 1034–1046. Rolfe, P.F., Boray, J.C., Nichols, P., Collins, G.H., 1991. Epidemiology of paramphistomosis in cattle. Int. J. Parasitol. 21, 813–819. Rolfe, P.F., Boray, J.C., Collins, G.H., 1994. Pathology of infection with Paramphistomum ichikawai in sheep. Int. J. Parasitol. 24, 995–1004. Shank, C., Russell, S., 1976. Epidemiology and Community Health in Water Climate Countries. Churchill living stone, NewYork. van Die, I., Cummings, R.D., 2006. Glycans modulate immune responses in helminth infections and allergy. Chem Immunol. Allergy 90, 91–112. van Grinsven, K.W., van Hellemond, J.J., Tielens, A.G., 2009. Acetate:succinate CoAtransferase in the anaerobic mitochondria of Fasciola hepatica. Mol. Biochem. Parasitol. 164, 74–79.

Please cite this article as: Choudhary, V., et al., Transcriptome analysis of the adult rumen fluke Paramphistomum cervi following next generation sequencing, Gene (2015), http://dx.doi.org/10.1016/j.gene.2015.06.002

V. Choudhary et al. / Gene xxx (2015) xxx–xxx Williamson, A.L., Lecchi, P., Turk, B.E., Choe, Y., Hotez, P.J., McKerrow, J.H., Cantley, L.C., Sajid, M., Craik, C.S., Loukas, A., 2004. A multi-enzyme cascade of hemoglobin proteolysis in the intestine of blood-feeding hookworms. J. Biol. Chem. 279, 35950–35957. Wu, T.D., Watanabe, C.K., 2005. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859–1875. Wu, X., Fu, Y., Yang, D., Zhang, R., Zheng, W., Nie, H., Xie, Y., Yan, N., Hao, G., Gu, X., Wang, S., Peng, X., Yang, G., 2012. Detailed transcriptome description of the neglected cestode Taenia multiceps. PLoS ONE 7, e45830.

7

Yoo, W.G., Kim, D.W., Ju, J.W., Cho, P.Y., Kim, T.I., Cho, S.H., Choi, S.H., Park, H.S., Kim, T.S., Hong, S.J., 2011. Developmental transcriptomic features of the carcinogenic liver fluke, Clonorchis sinensis. PLoS Negl. Trop. Dis. 5, e1208. Young, N.D., Campbell, B.E., Hall, R.S., Jex, A.R., Cantacessi, C., Laha, T., Sohn, W.M., Sripa, B., Loukas, A., Brindley, P.J., Gasser, R.B., 2010. Unlocking the transcriptomes of two carcinogenic parasites, Clonorchis sinensis and Opisthorchis viverrini. PLoS Negl. Trop. Dis. 4, e719.

Please cite this article as: Choudhary, V., et al., Transcriptome analysis of the adult rumen fluke Paramphistomum cervi following next generation sequencing, Gene (2015), http://dx.doi.org/10.1016/j.gene.2015.06.002