WormBase ParaSite − a comprehensive resource for helminth genomics

WormBase ParaSite − a comprehensive resource for helminth genomics

G Model MOLBIO-11030; No. of Pages 9 ARTICLE IN PRESS Molecular & Biochemical Parasitology xxx (2016) xxx–xxx Contents lists available at ScienceDir...

4MB Sizes 5 Downloads 93 Views

G Model MOLBIO-11030; No. of Pages 9

ARTICLE IN PRESS Molecular & Biochemical Parasitology xxx (2016) xxx–xxx

Contents lists available at ScienceDirect

Molecular & Biochemical Parasitology

WormBase ParaSite − a comprehensive resource for helminth genomics Kevin L. Howe a,∗ , Bruce J. Bolt a , Myriam Shafie b , Paul Kersey a , Matthew Berriman b a b

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK

a r t i c l e

i n f o

Article history: Received 7 October 2016 Received in revised form 24 November 2016 Accepted 25 November 2016 Available online xxx Keywords: Genome browser Comparative genomics Functional genomics Helminths WormBase Ensembl

a b s t r a c t The number of publicly available parasitic worm genome sequences has increased dramatically in the past three years, and research interest in helminth functional genomics is now quickly gathering pace in response to the foundation that has been laid by these collective efforts. A systematic approach to the organisation, curation, analysis and presentation of these data is clearly vital for maximising the utility of these data to researchers. We have developed a portal called WormBase ParaSite (http://parasite. wormbase.org) for interrogating helminth genomes on a large scale. Data from over 100 nematode and platyhelminth species are integrated, adding value by way of systematic and consistent functional annotation (e.g. protein domains and Gene Ontology terms), gene expression analysis (e.g. alignment of life-stage specific transcriptome data sets), and comparative analysis (e.g. orthologues and paralogues). We provide several ways of exploring the data, including genome browsers, genome and gene summary pages, text search, sequence search, a query wizard, bulk downloads, and programmatic interfaces. In this review, we provide an overview of the back-end infrastructure and analysis behind WormBase ParaSite, and the displays and tools available to users for interrogating helminth genomic data. © 2016 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

1. Introduction The WormBase project (http://www.wormbase.org, [1]) was initiated to facilitate and accelerate biological research that uses the model nematode Caenorhabditis elegans, by making the collected outputs of scientists accessible from a single resource. This enables the transfer of this wealth of knowledge to the study of other metazoa, from nematodes to humans. C. elegans data described in the research literature, deposited in the archives, or submitted directly is placed into context via a combination of detailed manual curation and semi-automatic data integration. In addition, the project curates the reference genome sequence, gene structures and other genomic features for C. elegans, thereby providing a high quality foundation for downstream studies. The WormBase mission also extends to free-living relatives of C. elegans, which were the focus of early post-genomic research in the areas of evolution and comparative biology [2]. In recent years, WormBase has begun to expand its mission to include plant and animal parasitic nematodes which, while more distantly related to C. elegans, have direct biomedical and

∗ Corresponding author. E-mail address: [email protected] (K.L. Howe).

agricultural importance and therefore attract research interest in their own right. The first parasitic nematode to have its genome fully sequenced was Brugia malayi [3], and since then, many have followed [4–37]. Of the WormBase “core” species (the ones for which the reference genome sequence and annotation are curated), three are parasitic worms: Brugia malayi, Onchocerca volvulus and Strongyloides ratti (http://www.wormbase.org/species). However, there are a number of challenges associated with expanding the remit to helminths. Firstly, the primary research goal of parasitologists is to identify ways of controlling the parasite, and as such their desired entry points and common use-cases for WormBase are often distinct from those of scientists doing basic science using C. elegans as a model. Secondly, the wider community of worm parasitologists includes those studying platyhelminths (flatworms), which are outside the taxonomic scope of WormBase, which is a nematode resource. Thirdly, there have been recent concerted efforts to sequence the genomes of many helminths (nematodes and platyhelmiths) (e.g. the 50 Helminth Genomes Initiative [38]), resulting in a flood of new draft genomes that vary considerably in contiguity. In response to these challenges, we have created WormBase ParaSite, a comprehensive new resource for parasitic worm genomes, aiming to serve parasitologists working on helminths who use genomics as an investigative tool. WormBase ParaSite leverages the

http://dx.doi.org/10.1016/j.molbiopara.2016.11.005 0166-6851/© 2016 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4. 0/).

Please cite this article in press as: K.L. Howe, et al., WormBase ParaSite − a comprehensive resource for helminth genomics, Mol Biochem Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.11.005

G Model MOLBIO-11030; No. of Pages 9

ARTICLE IN PRESS K.L. Howe et al. / Molecular & Biochemical Parasitology xxx (2016) xxx–xxx

2

infrastructure and expertise of the WormBase project, with a main mission of breadth (foundational data for many species) rather than depth (rich data for a single species).

Table 1 Species with RNA-Seq expression tracks in WormBase ParaSite (release 7). Included are some unpublished studies that are publicly available via the European Nucleotide Archive. Species

2. Data integration and analysis

Studies

Samples (tracks)

Nematode

Brugia malayi Onchocerca volvulus Strongyloides ratti Strongyloides stercoralis Trichuris muris

3 [58,59] 2 [60,61] 2 [34] 2 [62] 1 [25]

57 19 8 42 22

Platyhelminth

Echinococcus granulosus Echinococcus multicularis Fasciola hepatica Hymenolepis microstoma Schistosoma mansoni

2 [15] 2 [15] 2 [31] 2 [15] 6 [63–67]

6 35 17 17 33

2.1. Genomes Our mission is to include all publicly available nematode and platyhelminth genomes in WormBase ParaSite. Where multiple genome assemblies exist for the same species (e.g. different genome projects for Haemonchus contortus [18,19], and male and female isolates for Trichuris suis [24]), we have included all. Release 7 of the resource (August 2016) included genomes from 82 nematode species (98 genomes) and 28 platyhelminth species (30 genomes). We maintain an up-to-date list of genomes for ease of reference (http://parasite.wormbase.org/species.html). Our primary source for genome sequences is the International Nucleotide Sequence Database Collaboration (INSDC) resources [39]. In rare cases, we collect genomes from project-specific FTP sites or direct engagement with genome project scientists. However, it is our strong preference to use genomes deposited with INSDC as these have been formally checked, processed and versioned by an authoritative sequence archive resource. As well as allowing us to formally disambiguate between different genomes for the same species, by way of the INSDC BioProject identifier (http://www.ncbi.nlm.gov/bioproject), it also simplifies the process of integrating and displaying functional genomics data that has been submitted to the archives. For the species in common between WormBase and WormBase ParaSite (C. elegans and other free-living nematodes, Brugia malayi, Strongyloides ratti and Onchocerca volvulus), we synchronise the data with a specific release of WormBase. For example, WormBase ParaSite release 7 was synchronised with WormBase release WS254, meaning that data for species in common were identical in both resources.

2.2. Genome annotation 2.2.1. Gene structures The definition of the intron/exon structure of protein-coding genes is an important foundation for interpretation of genome function. WormBase curates gene structures for a C. elegans and a small set of nematode species with high-quality reference genomes [40]. For others, we import annotations from the acknowledged authority for that genome (e.g. GeneDB [41] for Schistosoma mansoni), or from the group that sequenced and published the genome [4–37]. In certain circumstances, we collaborate with genome projects to provide first pass annotation of protein-coding gene structures. We have co-developed a pipeline that produces high-quality gene predictions using MAKER [42] to integrate evidence from multiple sources: ab initio gene predictions from AUGUSTUS [43], GeneMark-ES [44], and SNAP [45]; projected annotations from C. elegans and the taxonomically nearest previously-annotated helminth using GenBlastG [46] and RATT [47]; and alignments of ESTs, mRNAs and proteins from related organisms. The pipeline was used to annotate the majority of the genomes sequenced as part of the 50 Helminth Genomes Project [38]. A small number of genomes have curated structures for noncoding RNAs, which we also import into WormBase ParaSite. To supplement these, we have a pipeline that uses RNAmmer [48], tRNAScan-SE [49] and Rfam [50] to predict structures for (respectively) ribosomal RNAs, transfer RNAs, and other non-coding RNAs.

2.2.2. Functional annotation Literature describing gene function is sparse for most of the species in WormBase ParaSite, although we anticipate that the availability of the genomes will stimulate new research. In lieu of annotations based on experimental evidence, we have used established automated methods to predict the function of as many gene products as possible. Firstly, we broker the submission of all protein sequences in WormBase ParaSite to the UniProt Knowledgebase [51], and import the product names assigned by them. These are defined by a combination of manual curation and automatic annotation. For more detailed annotation, we use InterProScan [52] from the InterPro project [53]. As well as predicting protein domains (e.g. from the Pfam [54] database), it also assigns terms from the Gene Ontology (GO) [55]. We run the latest version of the pipeline and data across the complete proteome for every genome each release, so that we are always up-to-date with the latest InterPro domain and Gene Ontology annotations. 2.2.3. Gene expression High-throughput sequencing of RNA has become the standard assay for measuring gene expression, and numerous studies conducting “RNA-Seq” experiments in helminth species have now been performed and deposited in the sequence archives. We ultimately aim to align all helminth RNA-Seq data to the corresponding reference genome using a standard pipeline, in collaboration with the Gene Expression Atlas project [56] (see Section 4). In the meantime, we have processed data for a small number of species using our own pipeline, as a pilot study (Table 1). Briefly, we use STAR [57] to align each experiment to the reference genome, merging resulting BAM files when they are technical replicates of the same sample. We have labelled each sample with the appropriate descriptive terms, using ontological terms where possible (e.g. from the WormBase life-stage ontology). The alignments can be viewed on the genome browser (see Section 3.2). 2.3. Comparative genomics Having genomes, genes and proteins for many helminth species provides the opportunity to study the evolution of helminths at the molecular level. We use the Ensembl Compara system [68] to infer the orthology and paralogy relationships between all nematode and platyhelminth genes, supplemented by genes from a number of other comparator species, including human, mouse and yeast. The method can be briefly summarised as follows: genes are first organised into homologous clusters. Traditionally, this has been done using NCBI BLAST+ [69] and hcluster sg (H. Li, unpublished) but more recent versions of the system support the use of Hidden Markov Models from the PANTHER protein classification system [70]. A protein multiple alignment for each cluster is then constructed using M-Coffee [71] or MAFFT [72]. The choice of program

Please cite this article in press as: K.L. Howe, et al., WormBase ParaSite − a comprehensive resource for helminth genomics, Mol Biochem Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.11.005

G Model MOLBIO-11030; No. of Pages 9

ARTICLE IN PRESS K.L. Howe et al. / Molecular & Biochemical Parasitology xxx (2016) xxx–xxx

3

Fig. 1. Viewing variation data in WormBase ParaSite. Top: the variation image view of a DNA helicase gene in Strongyloides ratti. The genomic variants are shown in a track along the top, drawn to show their location with respect to the gene structure and protein domains, and colour-coded according to their putative effect on the protein. Bottom: summary view of a single variant in the S. ratti genome. This view shows the effect of the variant on the reference annotation, genotypes for all samples assayed, and (not shown in figure) metrics associated with the quality of the variant call (e.g. read-depth). This particular variant gives rise to a nonsense mutation in 6 of the 10 strains assayed.

is made dynamically by the pipeline using a set of empiricallyderived rules regarding certain properties of the input data (for example, MAFFT is generally favoured for larger clusters). Finally, TreeBeST [68] is used to produce a gene tree, combining a number of methods which reconcile the sequence-based gene phylogeny with the species phylogeny. The resulting tree represents an evolutionary history of the gene family, which can be used to infer true orthologues (genes in different species related by a speciation event) and paralogues (genes in the same species related by a duplication event).

[75,76]; and (c) the website displays and tools [73]. We have customised some of the Ensembl tools for use in WormBase ParaSite. For example, we have modified Ensembl code to provide sequence search and data mining services that allow the interrogation of all species, or large sub-groups of species (e.g. all nematodes) at once in a single query (see Section 3.3).

2.4. Infrastructure

The most common entry point to WormBase ParaSite is via the home page (http://parasite.wormbase.org), which collects together the latest news from the project, some basic statistics, and a tool for finding genomes of interest. The header bar is present on all pages in WormBase ParaSite. As well as providing short-cut links to a variety of tools and documentation pages (http://parasite.wormbase.org/ info), it has a search-box which allows searching for genes by a vari-

The Ensembl infrastructure [73] is the basis for much of the management and analysis of the data in WormBase ParaSite. The main components we use are (a) the MySQL database schema and Application Programming Interface for loading and storing the data [74]; (b) the genome analysis pipelines and workflow management tools

3. Website − displays and tools 3.1. General navigation

Please cite this article in press as: K.L. Howe, et al., WormBase ParaSite − a comprehensive resource for helminth genomics, Mol Biochem Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.11.005

G Model MOLBIO-11030; No. of Pages 9 4

ARTICLE IN PRESS K.L. Howe et al. / Molecular & Biochemical Parasitology xxx (2016) xxx–xxx

Fig. 2. Compara tree showing the evolution of fox-1 in Trichinella spiralis, a gene involved in sex determination. In this view, the Trichinella sub-tree has been expanded for additional detail. Other parts of the tree are collapsed for brevity. A schematic of a protein multiple alignment of the homologs is shown in the right, with regions of conservation represented as coloured blocks.

ety of criteria, including gene name, product name, protein domain names/accessions, and Gene Ontology terms. The search box has an auto-complete function, allowing searches to be performed using only partial information. We also maintain a page for each genome and gene in WormBase ParaSite. The genome pages collect together a description of the species, attribution/references for the genome sequencing and annotation, and example entry points to data for that genome (e.g. gene, genomic region). Also shown are some basic statistics about the quality of the genome, for example CEGMA [77] and BUSCO [78] scores. Recent releases have displayed these graphically in an Assembly Statistics panel, using code developed by the LepBase project [79]. We also provide these data as columns in the table of all genomes (http://parasite.wormbase.org/species.html), which can be sorted by each of the criteria. The gene pages act a starting place for exploring various types of information associated with the genes, including transcript models, functional annotations, orthologues and paralogues, crossreferences to other resources, and the genomic context of the gene via the genome browser. A recent addition has been variation data from re-sequencing of isolates from selected species. These data can be viewed directly in WormBase ParaSite, via custom pages and tables (Fig. 1), and the genome browser (see Section 2.2). The principal entry point for the comparative genomics data (Section 2.3) is from the gene pages. The left-hand side-bar has links to tables of orthologues and paralogues for the gene, which can be filtered in a variety of ways. It is also possible to view the full tree for the family of which the gene is a member, collapsing and

expanding sub-trees according to interest (Fig. 2). Alongside this, protein multiple alignments for the family, or selected sub-trees, can be viewed and downloaded in a variety of formats. It is also possible to create a personal user account in WormBase ParaSite, and to store configuration and tool results under that account. Attaching a custom genome browser track whilst “logged in” will result in that track becoming available on any computer where the user has logged into their account. Additionally, results from the online tools (see Sections 3.3–3.5) are stored indefinitely for retrieval at a later date. 3.2. Genome browser We provide a fully interactive genome browser for every genome. By default, this shows the assembly and annotated gene models. Additional tracks can be switched on using the left-hand navigation menu of the genome browser. Some tracks are available for every genome: DNA sequence, 6-frame protein translation, %GC content, repetitive elements, protein-coding gene structures, and non-coding RNAs. Other tracks are available only for genomes for which data is currently available. For example, we have RNA-Seq tracks for 8 genomes (see Section 2.2). The RNA-Seq tracks can be used obtain a graphical overview of the gene expression landscape for a gene, or genomic region (Fig. 3). The browser allows users to view their own data in the context of the reference data. For small files (less than 20MB), these can be uploaded directly through the web interface. Larger files must be stored externally (e.g. on a FTP server), where they will be retrieved

Please cite this article in press as: K.L. Howe, et al., WormBase ParaSite − a comprehensive resource for helminth genomics, Mol Biochem Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.11.005

G Model MOLBIO-11030; No. of Pages 9

ARTICLE IN PRESS K.L. Howe et al. / Molecular & Biochemical Parasitology xxx (2016) xxx–xxx

5

Fig. 3. Genome browser view for the Smp 033000 gene in Schistosoma mansoni. Top: a zoomed in view showing 6-frame translation of DNA, the location of start and stop codons. Bottom: a zoomed-out view, showing the genomic context of the gene, with other protein-coding and non-coding genes, and repetitive elements. Both views show expression in 4 life-stages, showing that this gene has highest expression in the Somule 3 h post infection, consistent with the findings of the original study [63].

on-demand by the WormBase ParaSite website. A variety of file formats can be attached, including BigWig, BAM, CRAM and UCSC Track Hubs (http://parasite.wormbase.org/info/Browsing/Upload).

3.4. Advanced querying

3.3. Sequence searching

Advanced search and custom data export is available through our BioMart tool (http://parasite.wormbase.org/biomart). BioMart is designed to allow the simple and intuitive creation of custom tables and sequence files from sets of genes that meet a defined set of criteria [80]. A typical use of WormBase ParaSite BioMart involves three basic steps:

Our BLAST service (using NCBI-BLAST+ [69]) can be used to search a nucleotide or peptide sequence against the complete proteome or genome of any combination of species in WormBase ParaSite. We have short-cut buttons for searching common groups of species (e.g. all nematodes, or all platyhelminths), and also provide a graphical widget for defining a fully customised list. By default, BLAST results are saved on the WormBase ParaSite servers for seven days. However, by saving a BLAST submission to a user account, results are retained indefinitely. BLAST hits can be viewed and downloaded in a variety of formats (http://parasite.wormbase. org/info/Tools/blast.html).

1. Define query filters: these are a set of criteria that must be met for genes to be included in the output. Filters can be narrow (e.g. a specific list of gene identifiers) or broad (e.g. all genes for a species or entire clade of species). 2. Define output attributes: these are the columns that will appear in the output table or, for sequences, the information that will be included in the entry headers. 3. View results. Initially, a preview of the first 10 lines of output are shown in the browser. Once happy that the query is working as intended, the user can then export the full result set to a file.

Please cite this article in press as: K.L. Howe, et al., WormBase ParaSite − a comprehensive resource for helminth genomics, Mol Biochem Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.11.005

G Model MOLBIO-11030; No. of Pages 9

ARTICLE IN PRESS K.L. Howe et al. / Molecular & Biochemical Parasitology xxx (2016) xxx–xxx

6

Fig. 4. Using BioMart to identify C. elegans genes that could be used to model the response of drug targeting transmembrane signaling receptors in Schistosoma mansoni. Left: setting query filters to restrict the query to S. mansoni genes with a C. elegans orthologue, without a human orthologue and associated with transmembrane signaling receptor activity. Right, top: setting output attributes to customize the output table. Right, bottom: preview of the results within the web browser, prior to download.

The types of data that be used as filters and attributes in include: • • • • • • •

Gene IDs, names and descriptions Identifiers for data from external databases (e.g. UniProt, RefSeq) Gene structure (e.g. exons, introns) Protein domains and function (e.g. InterPro, Pfam) Gene Ontology annotations Orthologues and paralogues Sequences (e.g. genomic, transcript, peptide)

A typical use-case for BioMart might be to find all membrane proteins in filarial nematodes that do not have a human orthologue. This query relies on the fact that we have flagged potential transmembrane proteins using the TMHMM [81] software. The query is performed by (a) selecting the “Filarioidea” taxon in the SPECIES filter; (b) selecting “Restrict to genes without orthologues in Human” in the HOMOLOGY filter; (c) selecting “Limit to genes with TMMHMM protein features” in the PROTEIN DOMAINS filter; and (d) Clicking the “Results” button. Another example use-case for shown in Fig. 4, and more detailed instructions and worked examples are available from our help pages (http://parasite.wormbase. org/info/Tools/biomart.html). 3.5. Other tools We provide a collection of files for each genome, including the genome sequence itself (both plain and repeat-masked), and the annotations in a variety of formats. Individual files can be downloaded from our browser (http://parasite.wormbase.org/ftp.html), or bulk downloads can be performed by FTP (ftp://ftp.wormbase. org/pub/wormbase/parasite). The structure of the site is fairly selfexplanatory, with folders nested by (a) releases, (b) species in that

release, (c) genomes for that species (identified by NCBI BioProject), and (d) data files for that genome. Another tool we make available is the Ensembl Variant Effect Predictor (VEP) [82]. This annotates genomic variants from resequencing or population genomics studies with predictions of the how the reference annotations are affected by the variants (e.g. whether they coincide with protein-coding genes, and the putative effect they have on the corresponding protein sequence). The user can upload a list of variants in some standard formats, e.g. the Variant Call Format (VCF). Results are fully exportable, including as annotated VCF files. Full documentation and examples are available on the website (http://parasite.wormbase.org/info/Tools/vep. html). For bioinformaticians wishing to develop applications using the data in WormBase ParaSite, direct programmatic access is available via two routes. Firstly, we provide a “RESTful” application programming interface (API) [83] that can be used with any programming language. A documented catalogue of “endpoints” and example code is available through the RESTful API section of the website (http://parasite.wormbase.org/rest). Secondly, data be retrieved from our BioMart service using the R programming language and BioMaRt Bioconductor package (http://parasite.wormbase.org/ info/Tools/biomart.html). These services allow software developers and analysts to retrieve and manipulate the latest data in WormBase ParaSite on demand, without having to download complete data sets in bulk.

4. Challenges and future directions The availability of many helminth reference genomes opens up the possibility a wide variety of functional genomics investigations, from life-stage specific gene expression to population genetics.

Please cite this article in press as: K.L. Howe, et al., WormBase ParaSite − a comprehensive resource for helminth genomics, Mol Biochem Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.11.005

G Model MOLBIO-11030; No. of Pages 9

ARTICLE IN PRESS K.L. Howe et al. / Molecular & Biochemical Parasitology xxx (2016) xxx–xxx

Making the data from these studies available in way that is useful to users has traditionally required inefficient and redundant interactions between data providers, WormBase curators and numerous archive and specialist resources. In order to reduce these costs as data volumes continue to grow rapidly, we are increasingly moving towards a model where our website pulls data directly from the archives and other complementary resources, using their application programming interfaces. As a pilot for this model, we have brokered the submission of genome variation data from Strongyloides ratti (Mark Viney, pers. comm) and Schistosoma mansoni [84] to the European Variation Archive (EVA) [85]. Our website code has been extended to extract these data from the EVA and display them directly (see Section 3.1). For the past year, we have been aligning selected helminth RNASeq data sets to the corresponding reference genome, and making the alignments available as Track Hubs [86] for visualization in WormBase-ParaSite and other compliant browsers (Fig. 4). We are now collaborating with the Gene Expression Atlas (GxA)[56] to provide curation, analysis, display and querying of helminth gene expression data. Specific high-value data-sets are curated within the GxA framework, and these can be interrogated by the GxA tools (for example querying for genes with a similar expression profile to a given gene). To simplify the user experience, we plan to embed access to these tools directly in the WormBase ParaSite. We will also expand the curation to expression data sets in other species. We will continue to extend WormBase ParaSite with functionality geared towards the use-cases of helminth parasitologists. One example of this is the identification of candidate genes for therapeutics. In the near future, we will develop a pipeline to identify reliable homology between helminth genes and known drug targets in the ChEMBL database [87].

[4]

[5]

[6]

[7]

[8]

[9]

Funding WormBase ParaSite is funded by the UK Biotechnology and Biological Sciences Research Council [BB/K020080].

[10]

Acknowledgements We are indebted to Alessandra Traini, Eleanor Stanley and Jane Lomax for their work on WormBase ParaSite. We would also like to thank Michael Paulini, Avril Coghlan, and other members of the WormBase and WTSI Parasite genomics groups for useful advice and feedback.

[11]

[12]

References [13] [1] K.L. Howe, B.J. Bolt, S. Cain, J. Chan, W.J. Chen, P. Davis, J. Done, T. Down, S. Gao, C. Grove, T.W. Harris, R. Kishore, R. Lee, J. Lomax, Y. Li, H.M. Muller, C. Nakamura, P. Nuin, M. Paulini, D. Raciti, G. Schindelman, E. Stanley, M.A. Tuli, K. Van Auken, D. Wang, X. Wang, G. Williams, A. Wright, K. Yook, M. Berriman, P. Kersey, T. Schedl, L. Stein, P.W. Sternberg, WormBase 2016: expanding to enable helminth genomic research, Nucleic Acids Res. 44 (D1) (2016) D774–D780. [2] L.D. Stein, Z. Bao, D. Blasiar, T. Blumenthal, M.R. Brent, N. Chen, A. Chinwalla, L. Clarke, C. Clee, A. Coghlan, A. Coulson, P. D’Eustachio, D.H. Fitch, L.A. Fulton, R.E. Fulton, S. Griffiths-Jones, T.W. Harris, L.W. Hillier, R. Kamath, P.E. Kuwabara, E.R. Mardis, M.A. Marra, T.L. Miner, P. Minx, J.C. Mullikin, R.W. Plumb, J. Rogers, J.E. Schein, M. Sohrmann, J. Spieth, J.E. Stajich, C. Wei, D. Willey, R.K. Wilson, R. Durbin, R.H. Waterston, The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics, PLoS Biol. 1 (2) (2003) E45. [3] E. Ghedin, S. Wang, D. Spiro, E. Caler, Q. Zhao, J. Crabtree, J.E. Allen, A.L. Delcher, D.B. Guiliano, D. Miranda-Saavedra, S.V. Angiuoli, T. Creasy, P. Amedeo, B. Haas, N.M. El-Sayed, J.R. Wortman, T. Feldblyum, L. Tallon, M. Schatz, M. Shumway, H. Koo, S.L. Salzberg, S. Schobel, M. Pertea, M. Pop, O. White, G.J. Barton, C.K. Carlow, M.J. Crawford, J. Daub, M.W. Dimmic, C.F. Estes, J.M. Foster, M. Ganatra, W.F. Gregory, N.M. Johnson, J. Jin, R. Komuniecki, I. Korf, S. Kumar, S. Laney, B.W. Li, W. Li, T.H. Lindblom, S. Lustigman, D. Ma, C.V. Maina, D.M. Martin, J.P. McCarter, L. McReynolds, M. Mitreva, T.B. Nutman, J. Parkinson, J.M. Peregrin-Alvarez, C. Poole, Q. Ren, L.

[14]

[15]

[16]

[17]

7

Saunders, A.E. Sluder, K. Smith, M. Stanke, T.R. Unnasch, J. Ware, A.D. Wei, G. Weil, D.J. Williams, Y. Zhang, S.A. Williams, C. Fraser-Liggett, B. Slatko, M.L. Blaxter, A.L. Scott, Draft genome of the filarial nematode parasite Brugia malayi, Science 317 (5845) (2007) 1756–1760. P. Abad, J. Gouzy, J.M. Aury, P. Castagnone-Sereno, E.G. Danchin, E. Deleury, L. Perfus-Barbeoch, V. Anthouard, F. Artiguenave, V.C. Blok, M.C. Caillaud, P.M. Coutinho, C. Dasilva, F. De Luca, F. Deau, M. Esquibet, T. Flutre, J.V. Goldstone, N. Hamamouch, T. Hewezi, O. Jaillon, C. Jubin, P. Leonetti, M. Magliano, T.R. Maier, G.V. Markov, P. McVeigh, G. Pesole, J. Poulain, M. Robinson-Rechavi, E. Sallet, B. Segurens, D. Steinbach, T. Tytgat, E. Ugarte, C. van Ghelder, P. Veronico, T.J. Baum, M. Blaxter, T. Bleve-Zacheo, E.L. Davis, J.J. Ewbank, B. Favery, E. Grenier, B. Henrissat, J.T. Jones, V. Laudet, A.G. Maule, H. Quesneville, M.N. Rosso, T. Schiex, G. Smant, J. Weissenbach, P. Wincker, Genome sequence of the metazoan plant-parasitic nematode Meloidogyne incognita, Nat. Biotechnol. 26 (8) (2008) 909–915. C.H. Opperman, D.M. Bird, V.M. Williamson, D.S. Rokhsar, M. Burke, J. Cohn, J. Cromer, S. Diener, J. Gajan, S. Graham, T.D. Houfek, Q. Liu, T. Mitros, J. Schaff, R. Schaffer, E. Scholl, B.R. Sosinski, V.P. Thomas, E. Windham, Sequence and genetic map of Meloidogyne hapla: a compact nematode genome for plant parasitism, Proc. Natl. Acad. Sci. U. S. A. 105 (39) (2008) 14802–14807. Schistosoma japonicum Genome Sequencing and Functional Analysis Consortium, The Schistosoma japonicum genome reveals features of host-parasite interplay, Nature, 460, 7253, (2009) 345-51. Please see https:// www.ncbi.nlm.nih.gov/pubmed/19606140. M. Berriman, B.J. Haas, P.T. LoVerde, R.A. Wilson, G.P. Dillon, G.C. Cerqueira, S.T. Mashiyama, B. Al-Lazikani, L.F. Andrade, P.D. Ashton, M.A. Aslett, D.C. Bartholomeu, G. Blandin, C.R. Caffrey, A. Coghlan, R. Coulson, T.A. Day, A. Delcher, R. DeMarco, A. Djikeng, T. Eyre, J.A. Gamble, E. Ghedin, Y. Gu, C. Hertz-Fowler, H. Hirai, Y. Hirai, R. Houston, A. Ivens, D.A. Johnston, D. Lacerda, C.D. Macedo, P. McVeigh, Z. Ning, G. Oliveira, J.P. Overington, J. Parkhill, M. Pertea, R.J. Pierce, A.V. Protasio, M.A. Quail, M.A. Rajandream, J. Rogers, M. Sajid, S.L. Salzberg, M. Stanke, A.R. Tivey, O. White, D.L. Williams, J. Wortman, W. Wu, M. Zamanian, A. Zerlotini, C.M. Fraser-Liggett, B.G. Barrell, N.M. El-Sayed, The genome of the blood fluke Schistosoma mansoni, Nature 460 (7253) (2009) 352–358. M. Mitreva, D.P. Jasmer, D.S. Zarlenga, Z. Wang, S. Abubucker, J. Martin, C.M. Taylor, Y. Yin, L. Fulton, P. Minx, S.P. Yang, W.C. Warren, R.S. Fulton, V. Bhonagiri, X. Zhang, K. Hallsworth-Pepin, S.W. Clifton, J.P. McCarter, J. Appleton, E.R. Mardis, R.K. Wilson, The draft genome of the parasitic nematode Trichinella spiralis, Nat. Genet. 43 (3) (2011) 228–235. T. Kikuchi, J.A. Cotton, J.J. Dalzell, K. Hasegawa, N. Kanzaki, P. McVeigh, T. Takanashi, I.J. Tsai, S.A. Assefa, P.J. Cock, T.D. Otto, M. Hunt, A.J. Reid, A. Sanchez-Flores, K. Tsuchihara, T. Yokoi, M.C. Larsson, J. Miwa, A.G. Maule, N. Sahashi, J.T. Jones, M. Berriman, Genomic insights into the origin of parasitism in the emerging plant pathogen Bursaphelenchus xylophilus, PLoS Pathog. 7 (9) (2011) e1002219. A.R. Jex, S. Liu, B. Li, N.D. Young, R.S. Hall, Y. Li, L. Yang, N. Zeng, X. Xu, Z. Xiong, F. Chen, X. Wu, G. Zhang, X. Fang, Y. Kang, G.A. Anderson, T.W. Harris, B.E. Campbell, J. Vlaminck, T. Wang, C. Cantacessi, E.M. Schwarz, S. Ranganathan, P. Geldhof, P. Nejsum, P.W. Sternberg, H. Yang, J. Wang, J. Wang, R.B. Gasser, Ascaris suum draft genome, Nature 479 (7374) (2011) 529–533. N.D. Young, A.R. Jex, B. Li, S. Liu, L. Yang, Z. Xiong, Y. Li, C. Cantacessi, R.S. Hall, X. Xu, F. Chen, X. Wu, A. Zerlotini, G. Oliveira, A. Hofmann, G. Zhang, X. Fang, Y. Kang, B.E. Campbell, A. Loukas, S. Ranganathan, D. Rollinson, G. Rinaldi, P.J. Brindley, H. Yang, J. Wang, J. Wang, R.B. Gasser, Whole-genome sequence of Schistosoma haematobium, Nat. Genet. 44 (2) (2012) 221–225. C. Godel, S. Kumar, G. Koutsovoulos, P. Ludin, D. Nilsson, F. Comandatore, N. Wrobel, M. Thompson, C.D. Schmid, S. Goto, F. Bringaud, A. Wolstenholme, C. Bandi, C. Epe, R. Kaminsky, M. Blaxter, P. Maser, The genome of the heartworm, Dirofilaria immitis, reveals drug and vaccine targets, FASEB J. 26 (11) (2012) 4650–4661. J. Wang, M. Mitreva, M. Berriman, A. Thorne, V. Magrini, G. Koutsovoulos, S. Kumar, M.L. Blaxter, R.E. Davis, Silencing of germline-expressed genes by DNA elimination in somatic cells, Dev. Cell 23 (5) (2012) 1072–1080. Y. Huang, W. Chen, X. Wang, H. Liu, Y. Chen, L. Guo, F. Luo, J. Sun, Q. Mao, P. Liang, Z. Xie, C. Zhou, Y. Tian, X. Lv, L. Huang, J. Zhou, Y. Hu, R. Li, F. Zhang, H. Lei, W. Li, X. Hu, C. Liang, J. Xu, X. Li, X. Yu, The carcinogenic liver fluke, Clonorchis sinensis: new assembly, reannotation and analysis of the genome and characterization of tissue transcriptomes, PLoS One 8 (1) (2013) e54732. I.J. Tsai, M. Zarowiecki, N. Holroyd, A. Garciarrubio, A. Sanchez-Flores, K.L. Brooks, A. Tracey, R.J. Bobes, G. Fragoso, E. Sciutto, M. Aslett, H. Beasley, H.M. Bennett, J. Cai, F. Camicia, R. Clark, M. Cucher, N. De Silva, T.A. Day, P. Deplazes, K. Estrada, C. Fernandez, P.W. Holland, J. Hou, S. Hu, T. Huckvale, S.S. Hung, L. Kamenetzky, J.A. Keane, F. Kiss, U. Koziol, O. Lambert, K. Liu, X. Luo, Y. Luo, N. Macchiaroli, S. Nichol, J. Paps, J. Parkinson, N. Pouchkina-Stantcheva, N. Riddiford, M. Rosenzvit, G. Salinas, J.D. Wasmuth, M. Zamanian, Y. Zheng, Taenia solium Genome Consortium, X. Cai, X. Soberon, P.D. Olson, J.P. Laclette, K. Brehm, M. Berriman, The genomes of four tapeworm species reveal adaptations to parasitism, Nature 496 (7443) (2013) 57–63. C.A. Desjardins, G.C. Cerqueira, J.M. Goldberg, J.C. Dunning Hotopp, B.J. Haas, J. Zucker, J.M. Ribeiro, S. Saif, J.Z. Levin, L. Fan, Q. Zeng, C. Russ, J.R. Wortman, D.L. Fink, B.W. Birren, T.B. Nutman, Genomics of Loa loa, a wolbachia-free filarial parasite of humans, Nat. Genet. 45 (5) (2013) 495–500. X. Bai, B.J. Adams, T.A. Ciche, S. Clifton, R. Gaugler, K.S. Kim, J. Spieth, P.W. Sternberg, R.K. Wilson, P.S. Grewal, A lover and a fighter: the genome

Please cite this article in press as: K.L. Howe, et al., WormBase ParaSite − a comprehensive resource for helminth genomics, Mol Biochem Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.11.005

G Model MOLBIO-11030; No. of Pages 9

ARTICLE IN PRESS K.L. Howe et al. / Molecular & Biochemical Parasitology xxx (2016) xxx–xxx

8

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

sequence of an entomopathogenic nematode Heterorhabditis bacteriophora, PLoS One 8 (7) (2013) e69618. R. Laing, T. Kikuchi, A. Martinelli, I.J. Tsai, R.N. Beech, E. Redman, N. Holroyd, D.J. Bartley, H. Beasley, C. Britton, D. Curran, E. Devaney, A. Gilabert, M. Hunt, F. Jackson, S.L. Johnston, I. Kryukov, K. Li, A.A. Morrison, A.J. Reid, N. Sargison, G.I. Saunders, J.D. Wasmuth, A. Wolstenholme, M. Berriman, J.S. Gilleard, J.A. Cotton, The genome and transcriptome of Haemonchus contortus, a key model parasite for drug and vaccine discovery, Genome Biol. 14 (8) (2013) R88. E.M. Schwarz, P.K. Korhonen, B.E. Campbell, N.D. Young, A.R. Jex, A. Jabbar, R.S. Hall, A. Mondal, A.C. Howe, J. Pell, A. Hofmann, P.R. Boag, X.Q. Zhu, T. Gregory, A. Loukas, B.A. Williams, I. Antoshechkin, C. Brown, P.W. Sternberg, R.B. Gasser, The genome and developmental transcriptome of the strongylid nematode Haemonchus contortus, Genome Biol. 14 (8) (2013) R89. H. Zheng, W. Zhang, L. Zhang, Z. Zhang, J. Li, G. Lu, Y. Zhu, Y. Wang, Y. Huang, J. Liu, H. Kang, J. Chen, L. Wang, A. Chen, S. Yu, Z. Gao, L. Jin, W. Gu, Z. Wang, L. Zhao, B. Shi, H. Wen, R. Lin, M.K. Jones, B. Brejova, T. Vinar, G. Zhao, D.P. McManus, Z. Chen, Y. Zhou, S. Wang, The genome of the hydatid tapeworm Echinococcus granulosus, Nat. Genet. 45 (10) (2013) 1168–1175. P.H. Schiffer, M. Kroiher, C. Kraus, G.D. Koutsovoulos, S. Kumar, J.I. Camps, N.A. Nsah, D. Stappert, K. Morris, P. Heger, J. Altmuller, P. Frommolt, P. Nurnberg, W.K. Thomas, M.L. Blaxter, E. Schierenberg, The genome of Romanomermis culicivorax: revealing fundamental changes in the core developmental genetic toolkit in Nematoda, BMC Genomics 14 (2013) 923. Y.T. Tang, X. Gao, B.A. Rosa, S. Abubucker, K. Hallsworth-Pepin, J. Martin, R. Tyagi, E. Heizer, X. Zhang, V. Bhonagiri-Palsikar, P. Minx, W.C. Warren, Q. Wang, B. Zhan, P.J. Hotez, P.W. Sternberg, A. Dougall, S.T. Gaze, J. Mulvenna, J. Sotillo, S. Ranganathan, E.M. Rabelo, R.K. Wilson, P.L. Felgner, J. Bethony, J.M. Hawdon, R.B. Gasser, A. Loukas, M. Mitreva, Genome of the human hookworm Necator americanus, Nat. Genet. 46 (3) (2014) 261–269. J.A. Cotton, C.J. Lilley, L.M. Jones, T. Kikuchi, A.J. Reid, P. Thorpe, I.J. Tsai, H. Beasley, V. Blok, P.J. Cock, S. Eves-van den Akker, N. Holroyd, M. Hunt, S. Mantelin, H. Naghra, A. Pain, J.E. Palomares-Rius, M. Zarowiecki, M. Berriman, J.T. Jones, P.E. Urwin, The genome and life-stage specific transcriptomes of Globodera pallida elucidate key aspects of plant parasitism by a cyst nematode, Genome Biol. 15 (3) (2014) R43. A.R. Jex, P. Nejsum, E.M. Schwarz, L. Hu, N.D. Young, R.S. Hall, P.K. Korhonen, S. Liao, S. Thamsborg, J. Xia, P. Xu, S. Wang, J.P. Scheerlinck, A. Hofmann, P.W. Sternberg, J. Wang, R.B. Gasser, Genome and transcriptome of the porcine whipworm Trichuris suis, Nat. Genet. 46 (7) (2014) 701–706. B.J. Foth, I.J. Tsai, A.J. Reid, A.J. Bancroft, S. Nichol, A. Tracey, N. Holroyd, J.A. Cotton, E.J. Stanley, M. Zarowiecki, J.Z. Liu, T. Huckvale, P.J. Cooper, R.K. Grencis, M. Berriman, Whipworm genome and dual-species transcriptome analyses provide molecular insights into an intimate host-parasite interaction, Nat. Genet. 46 (7) (2014) 693–700. N.D. Young, N. Nagarajan, S.J. Lin, P.K. Korhonen, A.R. Jex, R.S. Hall, H. Safavi-Hemami, W. Kaewkong, D. Bertrand, S. Gao, Q. Seet, S. Wongkham, B.T. Teh, C. Wongkham, P.M. Intapan, W. Maleewong, X. Yang, M. Hu, Z. Wang, A. Hofmann, P.W. Sternberg, P. Tan, J. Wang, R.B. Gasser, The Opisthorchis viverrini genome provides insights into life in the bile duct, Nat. Commun. 5 (2014) 4378. L.J. Tallon, X. Liu, S. Bennuru, M.C. Chibucos, A. Godinez, S. Ott, X. Zhao, L. Sadzewicz, C.M. Fraser, T.B. Nutman, J.C. Dunning Hotopp, Single molecule sequencing and genome assembly of a clinical specimen of Loa loa, the causative agent of loiasis, BMC Genomics 15 (2014) 788. H.M. Bennett, H.P. Mok, E. Gkrania-Klotsas, I.J. Tsai, E.J. Stanley, N.M. Antoun, A. Coghlan, B. Harsha, A. Traini, D.M. Ribeiro, S. Steinbiss, S.B. Lucas, K.S. Allinson, S.J. Price, T.S. Santarius, A.J. Carmichael, P.L. Chiodini, N. Holroyd, A.F. Dean, M. Berriman, The genome of the sparganosis tapeworm Spirometra erinaceieuropaei isolated from the biopsy of a migrating brain lesion, Genome Biol. 15 (11) (2014) 510. X.Q. Zhu, P.K. Korhonen, H. Cai, N.D. Young, P. Nejsum, G. von Samson-Himmelstjerna, P.R. Boag, P. Tan, Q. Li, J. Min, Y. Yang, X. Wang, X. Fang, R.S. Hall, A. Hofmann, P.W. Sternberg, A.R. Jex, R.B. Gasser, Genetic blueprint of the zoonotic pathogen Toxocara canis, Nat. Commun. 6 (2015) 6145. E.M. Schwarz, Y. Hu, I. Antoshechkin, M.M. Miller, P.W. Sternberg, R.V. Aroian, The genome and transcriptome of the zoonotic hookworm Ancylostoma ceylanicum identify infection-specific gene families, Nat. Genet. 47 (4) (2015) 416–422. K. Cwiklinski, J.P. Dalton, P.J. Dufresne, J. La Course, D.J. Williams, J. Hodgkinson, S. Paterson, The Fasciola hepatica genome: gene duplication and polymorphism reveals adaptation to the host environment and the capacity for rapid evolution, Genome Biol. 16 (2015) 71. A.R. Dillman, M. Macchietto, C.F. Porter, A. Rogers, B. Williams, I. Antoshechkin, M.M. Lee, Z. Goodwin, X. Lu, E.E. Lewis, H. Goodrich-Blair, S.P. Stock, B.J. Adams, P.W. Sternberg, A. Mortazavi, Comparative genomics of Steinernema reveals deeply conserved gene regulatory networks, Genome Biol. 16 (2015) 200. K. Wasik, J. Gurtowski, X. Zhou, O.M. Ramos, M.J. Delas, G. Battistoni, O. El Demerdash, I. Falciatori, D.B. Vizoso, A.D. Smith, P. Ladurner, L. Scharer, W.R. McCombie, G.J. Hannon, M. Schatz, Genome and transcriptome of the regeneration-competent flatworm, Macrostomum lignano, Proc. Natl. Acad. Sci. U. S. A. 112 (40) (2015) 12462–12467. V.L. Hunt, I.J. Tsai, A. Coghlan, A.J. Reid, N. Holroyd, B.J. Foth, A. Tracey, J.A. Cotton, E.J. Stanley, H. Beasley, H.M. Bennett, K. Brooks, B. Harsha, R. Kajitani,

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45] [46]

[47] [48]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

[56]

A. Kulkarni, D. Harbecke, E. Nagayasu, S. Nichol, Y. Ogura, M.A. Quail, N. Randle, D. Xia, N.W. Brattig, H. Soblik, D.M. Ribeiro, A. Sanchez-Flores, T. Hayashi, T. Itoh, D.R. Denver, W. Grant, J.D. Stoltzfus, J.B. Lok, H. Murayama, J. Wastling, A. Streit, T. Kikuchi, M. Viney, M. Berriman, The genomic basis of parasitism in the Strongyloides clade of nematodes, Nat. Genet. 48 (3) (2016) 299–307. P.K. Korhonen, E. Pozio, G. La Rosa, B.C. Chang, A.V. Koehler, E.P. Hoberg, P.R. Boag, P. Tan, A.R. Jex, A. Hofmann, P.W. Sternberg, N.D. Young, R.B. Gasser, Phylogenomic and biogeographic reconstruction of the Trichinella complex, Nat. Commun. 7 (2016) 10513. S.T. Small, L.J. Reimer, D.J. Tisch, C.L. King, B.M. Christensen, P.M. Siba, J.W. Kazura, D. Serre, P.A. Zimmerman, Population genomics of the filarial nematode parasite Wuchereria bancrofti from mosquitoes, Mol. Ecol. 25 (7) (2016) 1465–1477. S.N. McNulty, C. Strube, B.A. Rosa, J.C. Martin, R. Tyagi, Y.J. Choi, Q. Wang, K. Hallsworth Pepin, X. Zhang, P. Ozersky, R.K. Wilson, P.W. Sternberg, R.B. Gasser, M. Mitreva, Dictyocaulus viviparus genome, variome and transcriptome elucidate lungworm biology and support future intervention, Sci. Rep. 6 (2016) 20316. Wellcome Trust Sanger Institute, The 50 Helminth Genomes Initiative, 2014, http://www.sanger.ac.uk/science/collaboration/50hgp, Accessed 6 October 2016. G. Cochrane, I. Karsch-Mizrachi, T. Takagi, International nucleotide sequence database the international nucleotide sequence database collaboration, Nucleic Acids Res. 44 (D1) (2016) D48–D50. K. Howe, P. Davis, M. Paulini, M.A. Tuli, G. Williams, K. Yook, R. Durbin, P. Kersey, P.W. Sternberg, WormBase: annotating many nematode genomes, Worm 1 (1) (2012) 15–21. F.J. Logan-Klumpler, N. De Silva, U. Boehme, M.B. Rogers, G. Velarde, J.A. McQuillan, T. Carver, M. Aslett, C. Olsen, S. Subramanian, I. Phan, C. Farris, S. Mitra, G. Ramasamy, H. Wang, A. Tivey, A. Jackson, R. Houston, J. Parkhill, M. Holden, O.S. Harb, B.P. Brunk, P.J. Myler, D. Roos, M. Carrington, D.F. Smith, C. Hertz-Fowler, M. Berriman, GeneDB—an annotation database for pathogens, Nucleic Acids Res. 40 (2012) D98–D108, Database issue. C. Holt, M. Yandell, MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects, BMC Bioinf. 12 (2011) 491. M. Stanke, O. Keller, I. Gunduz, A. Hayes, S. Waack, B. Morgenstern, AUGUSTUS: ab initio prediction of alternative transcripts, Nucleic Acids Res. 34 (2006) W435–W439, Web Server issue. V. Ter-Hovhannisyan, A. Lomsadze, Y.O. Chernoff, M. Borodovsky, Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training, Genome Res. 18 (12) (2008) 1979–1990. I. Korf, Gene finding in novel genomes, BMC Bioinf. 5 (2004) 59. R. She, J.S. Chu, B. Uyar, J. Wang, K. Wang, N. Chen, genBlastG: using BLAST searches to build homologous gene models, Bioinformatics 27 (15) (2011) 2141–2143. T.D. Otto, G.P. Dillon, W.S. Degrave, M. Berriman, RATT: rapid annotation transfer tool, Nucleic Acids Res. 39 (9) (2011) e57. K. Lagesen, P. Hallin, E.A. Rodland, H.H. Staerfeldt, T. Rognes, D.W. Ussery, RNAmmer: consistent and rapid annotation of ribosomal RNA genes, Nucleic Acids Res. 35 (9) (2007) 3100–3108. T.M. Lowe, S.R. Eddy, tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence, Nucleic Acids Res. 25 (5) (1997) 955–964. E.P. Nawrocki, S.W. Burge, A. Bateman, J. Daub, R.Y. Eberhardt, S.R. Eddy, E.W. Floden, P.P. Gardner, T.A. Jones, J. Tate, R.D. Finn, Rfam 12.0: updates to the RNA families database, Nucleic Acids Res. 43 (2015) D130–D137, Database issue. UniProt Consortium, UniProt: a hub for protein information, Nucleic Acids Res, 43, (Database issue), (2015) D204-12. Please see https://www.ncbi.nlm. nih.gov/pubmed/25348405. P. Jones, D. Binns, H.Y. Chang, M. Fraser, W. Li, C. McAnulla, H. McWilliam, J. Maslen, A. Mitchell, G. Nuka, S. Pesseat, A.F. Quinn, A. Sangrador-Vegas, M. Scheremetjew, S.Y. Yong, R. Lopez, S. Hunter, InterProScan 5: genome-scale protein function classification, Bioinformatics 30 (9) (2014) 1236–1240. A. Mitchell, H.Y. Chang, L. Daugherty, M. Fraser, S. Hunter, R. Lopez, C. McAnulla, C. McMenamin, G. Nuka, S. Pesseat, A. Sangrador-Vegas, M. Scheremetjew, C. Rato, S.Y. Yong, A. Bateman, M. Punta, T.K. Attwood, C.J. Sigrist, N. Redaschi, C. Rivoire, I. Xenarios, D. Kahn, D. Guyot, P. Bork, I. Letunic, J. Gough, M. Oates, D. Haft, H. Huang, D.A. Natale, C.H. Wu, C. Orengo, I. Sillitoe, H. Mi, P.D. Thomas, R.D. Finn, The InterPro protein families database: the classification resource after 15 years, Nucleic Acids Res. 43 (2015) D213–D221, Database issue. R.D. Finn, P. Coggill, R.Y. Eberhardt, S.R. Eddy, J. Mistry, A.L. Mitchell, S.C. Potter, M. Punta, M. Qureshi, A. Sangrador-Vegas, G.A. Salazar, J. Tate, A. Bateman, The Pfam protein families database: towards a more sustainable future, Nucleic Acids Res. 44 (D1) (2016) D279–D285. M. Ashburner, C.A. Ball, J.A. Blake, D. Botstein, H. Butler, J.M. Cherry, A.P. Davis, K. Dolinski, S.S. Dwight, J.T. Eppig, M.A. Harris, D.P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J.C. Matese, J.E. Richardson, M. Ringwald, G.M. Rubin, G. Sherlock, Gene ontology: tool for the unification of biology. The Gene Ontology Consortium, Nat. Genet. 25 (1) (2000) 25–29. R. Petryszak, M. Keays, Y.A. Tang, N.A. Fonseca, E. Barrera, T. Burdett, A. Fullgrabe, A.M. Fuentes, S. Jupp, S. Koskinen, O. Mannion, L. Huerta, K. Megy, C. Snow, E. Williams, M. Barzine, E. Hastings, H. Weisser, J. Wright, P. Jaiswal,

Please cite this article in press as: K.L. Howe, et al., WormBase ParaSite − a comprehensive resource for helminth genomics, Mol Biochem Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.11.005

G Model MOLBIO-11030; No. of Pages 9

ARTICLE IN PRESS K.L. Howe et al. / Molecular & Biochemical Parasitology xxx (2016) xxx–xxx

[57]

[58]

[59]

[60]

[61]

[62]

[63]

[64]

[65]

[66] [67]

[68]

[69] [70]

W. Huber, J. Choudhary, H.E. Parkinson, A. Brazma, Expression Atlas update—an integrated database of gene and protein expression in humans, animals and plants, Nucleic Acids Res. 44 (D1) (2016) D746–G752. A. Dobin, C.A. Davis, F. Schlesinger, J. Drenkow, C. Zaleski, S. Jha, P. Batut, M. Chaisson, T.R. Gingeras, STAR: ultrafast universal RNA-seq aligner, Bioinformatics 29 (1) (2013) 15–21. Y.J. Choi, E. Ghedin, M. Berriman, J. McQuillan, N. Holroyd, G.F. Mayhew, B.M. Christensen, M.L. Michalski, A deep sequencing approach to comparatively analyze the transcriptome of lifecycle stages of the filarial worm, Brugia malayi, PLoS Negl. Trop. Dis. 5 (12) (2011) e1409. C. Ballesteros, L. Tritten, M. O’Neill, E. Burkman, W.I. Zaky, J. Xia, A. Moorhead, S.A. Williams, T.G. Geary, The effect of In vitro cultivation on the transcriptome of adult brugia malayi, PLoS Negl. Trop. Dis. 10 (1) (2016) e0004311. J.A. Cotton, S. Bennuru, A. Grote, B. Harsha, A. Tracey, R. Beech, S.R. Doyle, M. Dunn, J.C. Hotopp, N. Holroyd, T. Kikuchi, O. Lambert, A. Mhashilkar, P. Mutowo, N. Nursimulu, J.M. Ribeiro, M.B. Rogers, E. Stanley, L.S. Swapna, I.J. Tsai, T.R. Unnasch, D. Voronin, J. Parkinson, T.B. Nutman, E. Ghedin, M. Berriman, S. Lustigman, The genome of Onchocerca volvulus, agent of river blindness, Nat. Microbiol. 2 (2016) 16216. S.N. McNulty, B.A. Rosa, P.U. Fischer, J.M. Rumsey, P. Erdmann-Gilmore, K.C. Curtis, S. Specht, R.R. Townsend, G.J. Weil, M. Mitreva, An integrated multiomics approach to identify candidate antigens for serodiagnosis of human onchocerciasis, Mol. Cell. Proteomics 14 (12) (2015) 3224–3233. J.D. Stoltzfus, S. Minot, M. Berriman, T.J. Nolan, J.B. Lok, RNAseq analysis of the parasitic nematode Strongyloides stercoralis reveals divergent regulation of canonical dauer pathways, PLoS Negl. Trop. Dis. 6 (10) (2012) e1854. A.V. Protasio, I.J. Tsai, A. Babbage, S. Nichol, M. Hunt, M.A. Aslett, N. De Silva, G.S. Velarde, T.J. Anderson, R.C. Clark, C. Davidson, G.P. Dillon, N.E. Holroyd, P.T. LoVerde, C. Lloyd, J. McQuillan, G. Oliveira, T.D. Otto, S.J. Parker-Manuel, M.A. Quail, R.A. Wilson, A. Zerlotini, D.W. Dunne, M. Berriman, A systematically improved high quality genome and transcriptome of the human blood fluke Schistosoma mansoni, PLoS Negl. Trop. Dis. 6 (1) (2012) e1455. J.J. Collins 3rd, B. Wang, B.G. Lambrus, M.E. Tharp, H. Iyer, P.A. Newmark, Adult somatic stem cells in the human parasite Schistosoma mansoni, Nature 494 (7438) (2013) 476–479. S. Leutner, K.C. Oliveira, B. Rotter, S. Beckmann, C. Buro, S. Hahnel, J.P. Kitajima, S. Verjovski-Almeida, P. Winter, C.G. Grevelding, Combinatory microarray and SuperSAGE analyses identify pairing-dependently transcribed genes in Schistosoma mansoni males, including follistatin, PLoS Negl. Trop. Dis. 7 (11) (2013) e2532. B. Wang, J.J. Collins 3rd, P.A. Newmark, Functional genomic characterization of neoblast-like stem cells in larval Schistosoma mansoni, Elife 2 (2013) e00768. C.L. Valentim, D. Cioli, F.D. Chevalier, X. Cao, A.B. Taylor, S.P. Holloway, L. Pica-Mattoccia, A. Guidi, A. Basso, I.J. Tsai, M. Berriman, C. Carvalho-Queiroz, M. Almeida, H. Aguilar, D.E. Frantz, P.J. Hart, P.T. LoVerde, T.J. Anderson, Genetic and molecular basis of drug resistance and species-specific drug action in schistosome parasites, Science 342 (6164) (2013) 1385–1389. A.J. Vilella, J. Severin, A. Ureta-Vidal, L. Heng, R. Durbin, E. Birney, EnsemblCompara GeneTrees: complete, duplication-aware phylogenetic trees in vertebrates, Genome Res. 19 (2) (2009) 327–335. C. Camacho, G. Coulouris, V. Avagyan, N. Ma, J. Papadopoulos, K. Bealer, T.L. Madden, BLAST+: architecture and applications, BMC Bioinf. 10 (2009) 421. H. Mi, S. Poudel, A. Muruganujan, J.T. Casagrande, P.D. Thomas, PANTHER version 10: expanded protein families and functions, and analysis tools, Nucleic Acids Res. 44 (D1) (2016) D336–D342.

9

[71] I.M. Wallace, O. O’Sullivan, D.G. Higgins, C. Notredame, M-Coffee: combining multiple sequence alignment methods with T-Coffee, Nucleic Acids Res. 34 (6) (2006) 1692–1699. [72] K. Katoh, D.M. Standley, MAFFT multiple sequence alignment software version 7: improvements in performance and usability, Mol. Biol. Evol. 30 (4) (2013) 772–780. [73] A. Yates, W. Akanni, M.R. Amode, D. Barrell, K. Billis, D. Carvalho-Silva, C. Cummins, P. Clapham, S. Fitzgerald, L. Gil, C.G. Giron, L. Gordon, T. Hourlier, S.E. Hunt, S.H. Janacek, N. Johnson, T. Juettemann, S. Keenan, I. Lavidas, F.J. Martin, T. Maurel, W. McLaren, D.N. Murphy, R. Nag, M. Nuhn, A. Parker, M. Patricio, M. Pignatelli, M. Rahtz, H.S. Riat, D. Sheppard, K. Taylor, A. Thormann, A. Vullo, S.P. Wilder, A. Zadissa, E. Birney, J. Harrow, M. Muffato, E. Perry, M. Ruffier, G. Spudich, S.J. Trevanion, F. Cunningham, B.L. Aken, D.R. Zerbino, P. Flicek, Ensembl 2016, Nucleic Acids Res. 44 (D1) (2016) D710–D716. [74] A. Stabenau, G. McVicker, C. Melsopp, G. Proctor, M. Clamp, E. Birney, The Ensembl core software libraries, Genome Res. 14 (5) (2004) 929–933. [75] S.C. Potter, L. Clarke, V. Curwen, S. Keenan, E. Mongin, S.M. Searle, A. Stabenau, R. Storey, M. Clamp, The Ensembl analysis pipeline, Genome Res. 14 (5) (2004) 934–941. [76] J. Severin, K. Beal, A.J. Vilella, S. Fitzgerald, M. Schuster, L. Gordon, A. Ureta-Vidal, P. Flicek, J. Herrero, eHive: an artificial intelligence workflow system for genomic analysis, BMC Bioinf. 11 (2010) 240. [77] G. Parra, K. Bradnam, I. Korf, CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes, Bioinformatics 23 (9) (2007) 1061–1067. [78] F.A. Simao, R.M. Waterhouse, P. Ioannidis, E.V. Kriventseva, E.M. Zdobnov, BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs, Bioinformatics 31 (19) (2015) 3210–3212. [79] R.J. Challis, S. Kumar, K.K.K. Dasmahapatra, C.D. Jiggins, M. Blaxter, Lepbase: the Lepidopteran genome database, bioRxiv (2016). [80] D. Smedley, S. Haider, B. Ballester, R. Holland, D. London, G. Thorisson, A. Kasprzyk, BioMart—biological queries made easy, BMC Genomics 10 (2009) 22. [81] A. Krogh, B. Larsson, G. von Heijne, E.L. Sonnhammer, Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes, J. Mol. Biol. 305 (3) (2001) 567–580. [82] W. McLaren, L. Gil, S.E. Hunt, H.S. Riat, G.R. Ritchie, A. Thormann, P. Flicek, F. Cunningham, The Ensembl variant effect predictor, Genome Biol. 17 (1) (2016) 122. [83] A. Yates, K. Beal, S. Keenan, W. McLaren, M. Pignatelli, G.R. Ritchie, M. Ruffier, K. Taylor, A. Vullo, P. Flicek, The Ensembl REST API: ensembl data for any language, Bioinformatics 31 (1) (2015) 143–145. [84] T. Crellen, F. Allan, S. David, C. Durrant, T. Huckvale, N. Holroyd, A.M. Emery, D. Rollinson, D.M. Aanensen, M. Berriman, J.P. Webster, J.A. Cotton, Whole genome resequencing of the human parasite Schistosoma mansoni reveals population history and effects of selection, Sci. Rep. 6 (2016) 20954. [85] C.E. Cook, M.T. Bergman, R.D. Finn, G. Cochrane, E. Birney, R. Apweiler, The european bioinformatics institute in 2016: data growth and integration, Nucleic Acids Res. 44 (D1) (2016) D20–D26. [86] B.J. Raney, T.R. Dreszer, G.P. Barber, H. Clawson, P.A. Fujita, T. Wang, N. Nguyen, B. Paten, A.S. Zweig, D. Karolchik, W.J. Kent, Track data hubs enable visualization of user-defined genome-wide annotations on the UCSC genome browser, Bioinformatics 30 (7) (2014) 1003–1005. [87] A.P. Bento, A. Gaulton, A. Hersey, L.J. Bellis, J. Chambers, M. Davies, F.A. Kruger, Y. Light, L. Mak, S. McGlinchey, M. Nowotka, G. Papadatos, R. Santos, J.P. Overington, The ChEMBL bioactivity database: an update, Nucleic Acids Res. 42 (2014) D1083–D1090, Database issue.

Please cite this article in press as: K.L. Howe, et al., WormBase ParaSite − a comprehensive resource for helminth genomics, Mol Biochem Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.11.005