Parasite genomes

Parasite genomes

International Journal for Parasitology 35 (2005) 465–479 www.parasitology-online.com Invited review Parasite genomes Ross L. Coppel*, Casilda G. Bla...

248KB Sizes 61 Downloads 145 Views

International Journal for Parasitology 35 (2005) 465–479 www.parasitology-online.com

Invited review

Parasite genomes Ross L. Coppel*, Casilda G. Black Department of Microbiology and the Victorian Bioinformatics Consortium, Monash University, Melbourne, Vic. 3800, Australia Received 24 January 2005; received in revised form 24 February 2005; accepted 24 February 2005

Abstract The availability of genome sequences and the associated transcriptome and proteome mapping projects has revolutionised research in the field of parasitology. As more parasite species are sequenced, comparative and phylogenetic comparisons are improving the quality of gene prediction and annotation. Genome sequences of parasites are also providing important data sets for understanding parasite biology and identifying new vaccine candidates and drug targets. We review some of the preliminary conclusions from examination of parasite genome sequences and discuss some of the bioinformatics approaches taken in this analysis. q 2005 Australian Society for Parasitology Inc. Published by Elsevier Ltd. All rights reserved. Keywords: Genome sequence; Bioinformatics; Protozoa; Helminths; Structural biology; Gene mapping; Functional genomics

1. Introduction The study of infectious organisms, including parasites, has been revolutionised by the availability of complete genome sequences. Allied with the technologies of microarrays and proteomics, the complete gene lists, their patterns of transcription and timing of synthesis and location of protein products provide a data set of enormous power and complexity. Before we discuss some of the approaches to analysing this data and some of the early findings from genomic determination, it is worthwhile taking a moment to comment on just how unusual genome sequences are as examples of scientific practice. Because of their great expense and complexity, it is likely that, at least in the near future, they represent experiments that are unrepeatable and thus unlike most experimentation we think of as having the characteristics of science, unverifiable. Thus, at the time of publication of the Plasmodium falciparum genome (Gardner et al., 2002a), very little of the sequence, no more than 1–2%, had been independently verified. Since the 2002 publication, almost no sequence verification is performed as genes are routinely amplified using primers derived from the sequence but * Corresponding author. Tel.: C61 3 9905 4822; fax: C61 3 9905 4811. E-mail address: [email protected] (R.L. Coppel).

seldom checked in detail. Verification is supplied in a roundabout way by the similarity of genome sequences of related species of parasite, but this is not usually performed by independent laboratories. Another oddity of genome sequences is the effective absence of the conventional review process prior to publication. The genome centre as part of their analysis will predict the location of coding genes and attempt to define their function. Sometimes this can be extremely hard to do, particularly in sequences with unusual AT compositions such as P. falciparum. Sequences with this base composition were not used for training of the gene-finding algorithms, which typically were trained on human or mouse sequences with significantly higher GC coding. It is not possible for manuscript reviewers to repeat this process and re-derive the genome annotation and perform further analysis. Thus they are somewhat in the position of a reviewer asked to assess a paper by examination of the discussion section only, and cannot really be sure that the conclusions cited are firmly based on the data. Finally, in a paradoxical way, genome sequence papers can render their object of study more mysterious. Again using the case of Plasmodium, prior to the genome sequence we had identified something in the order of 50–100 gene products. The gene products were all described in detail and ascribed a role in some physiological process such as invasion, immune evasion, intermediary metabolism,

0020-7519/$30.00 q 2005 Australian Society for Parasitology Inc. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.ijpara.2005.01.010

466

R.L. Coppel, C.G. Black / International Journal for Parasitology 35 (2005) 465–479

cytoadherence or modification of the host cell. The genome showed us how little we knew. Of the more than 5200 open reading frames, about 60% are not similar enough to sequences in any other organism to assign function. Wellcharacterised genes such as the serine repeat antigen (SERA) and the SERA homologue were shown to be two of a total of nine homologous genes, with as yet incompletely understood roles (Gardner et al., 2002a; Miller et al., 2002), and enumeration of families such as the reticulocyte binding protein (RBP) homologues and the erythrocyte binding antigens (EBAs) have revealed many unexpected additional members. It will be the work of decades before we are able to make sense of this vast array of genes and understand not just their functions individually but also collectively. Nevertheless, unverifiable, unreviewable and daunting as they are, genome sequences provide an incredibly fertile source of new questions to ask about parasite biology and how we may intervene to damage the parasite.

2. Informatic analyses of parasite genomes The process of determining a genome sequence, involving shotgun sequencing, read editing and assembly and overlapping of contigs into a finished sequence is beyond the scope of this article (for an accessible description of the process see http://www.wellcome.ac.uk/ en/genome/technologies/hg17b004.html). It is at the next stage of preliminary annotation when gene predictions are made that things start to get interesting. By and large, such a process is performed by computer programs (HEXAMER, GLIMMER, PHAT) that look for particular features in the sequence and have been trained on sets of known genes from different organisms. They are good at finding long exons but can have difficulty accurately predicting open reading frames (ORF) composed of multiple small exons, particularly in organisms with high AT content such as P. falciparum. Thus gene predictions for chromosome 2, the first P. falciparum chromosome sequenced, were not exceptionally accurate (Gardner et al., 1998; Huestis and Fishcher, 2001). The initial gene predictions identified 210 genes encoded by 349 exons (Gardner et al., 1998). A newer annotation (Huestis and Fishcher, 2001), much of it verified by PCR experiments, suggested that these genes were encoded by 499 exons of which only 257 exons were in common with the original 349 predictions. Over 30% of ORF predictions were incorrect in some way. Thirty-seven proteins had new or altered N-termini and 22 proteins had new or altered C-termini, leading to a substantial number of proteins with an altered cellular location by loss or gain of transmembrane domains or signal sequences. Although a reasonable amount of this improved gene finding methodology was incorporated into the first draft annotation released in 2002 (Gardner et al., 2002a), it is almost certain that a number of gene predictions in this annotation are still

incorrect. It may be that P. falciparum is a worst-case scenario among parasite genomes because of its high codon bias and the large number of unknown proteins. However, it is simply not known whether gene predictions for protozoal or helminth genome sequences show similar error levels, but it seems reasonable to expect that there will be some level of error. With some form of ongoing curation, these errors will undoubtedly be corrected over time, but the gene predictions should not be regarded as definitive at present. Beyond the gene predictions (i.e. the prediction of gene models giving the start and stop base of each exon) are the annotations (i.e. the assignment of a predicted protein or a gene name) themselves. These represent a first approximation and even those quite strongly based on sequence similarity to known sequences can be in error. For example, there are a two P. falciparum sequences that were assigned identities as members of the mitogen-activated protein (MAP) kinase family on the basis of sequence homology. Dorin and co-workers (2005) noted that the organism appeared to lack kinase activity necessary to activate such MAP kinases. There was a sequence, PfPK7, with homology to MAP kinase kinase (MAPKK) but it had no demonstrable MAPKK activity, and could not phosphorylate the putative MAPK homologs in vitro. This in turn led to a re-evaluation of the putative MAPK sequences suggesting that they in turn were not true homologs and further suggesting that malaria parasites lack a classical three-component module-dependent MAPK signalling pathway, a finding unique among eukaryotes. Similar validation will be required for most of the genes of each parasitic protozoan species. This is a most daunting prospect indeed, not only for parasitologists, but also for funding agencies. Information gathered from experimental studies will need to be added to the genome annotation in a timely manner. At present, there is no generally agreed method as to how this might be done and several schemes are being proposed (Berry et al., 2004). All of them will require the continued involvement of the scientific community and continued commitment to the process. A recent report by the American Academy of Microbiology highlights the problem and notes that in addition to the issues we have discussed here, there are a number of described enzyme functions in organisms for which no genes have yet been identified (http://www.asm.org/Academy/index.asp?bidZ32664). The authors propose establishment of a database of peerreviewed, experimentally verified annotations. For parasite genomes, currently such a database would cover a very low percentage of the total number of predicted genes. Most molecular biologists have a passing acquaintance with bioinformatics analyses, although this is not necessarily true for biologists in general. There are numerous ways to self-train in these techniques and the World Wide Web offers many resources of varying quality. Table 1 provides a listing of sites which provide links to a variety of resources, tutorials and exercises. There are some excellent textbooks now available of varying levels of complexity. ‘Introduction

R.L. Coppel, C.G. Black / International Journal for Parasitology 35 (2005) 465–479 Table 1 General bioinformatics resources for sequence analysis and useful links Resource

URL

Search tools and databases Artemis http://www.sanger.ac.uk/Software/Artemis/ BLAST http://www.ncbi.nlm.nih.gov/BLAST/ ChloroP http://www.cbs.dtu.dk/services/ChloroP/ Ensembl http://www.ensembl.org/ Entrez http://www.ncbi.nlm.nih.gov/Entrez/index.html Expasy Proteomics http://au.expasy.org/tools/ Tools Genefinder http://ftp.genome.washington.edu/cgi-bin/ Genefinder Genscan http://genes.mit.edu/GENSCAN.html Glimmer http://www.tigr.org/software/glimmer/ Organelle DB http://organelledb.lsi.umich.edu Pfam http://pfam.jouy.inra.fr/ Prosite http://www.expasy.org/cgi-bin/prosite-search-de PSORT http://psort.nibb.ac.jp/ SignalP http://www.cbs.dtu.dk/services/SignalP/ Swiss-Prot http://au.expasy.org/sprot/ TargetP http://www.cbs.dtu.dk/services/TargetP/ General help sites A Guide to Molecular Sequence Analysis Beginners Guide to Molecular Biology Biocomputing Tutorials Generic model organism database construction set GenomeWeb Google List of Bioinformatics Resources Introduction to Human Genome Computing Knowledge.com Bioinformatics Links NCBI Resource Guide Research and Education at bioinformatik.de The Bioinformatics Resource of the Collaborative Computational Project 11 Training Page of the Victorian Bioinformatics Consortium Research institutes Bioinformatics Links at the University of Bergen Computational Biology Links at UCSC NCBI Sanger Institute

http://www.brunel.ac.uk/depts/bl/project/bio comp/sequence/seqanal_guide/contents.html http://www.rothamsted.bbsrc.ac.uk/notebook/ courses/guide/ http://www.hgmp.mrc.ac.uk/Embnetut/Universl/ embnettu.html http://www.gmod.org/

http://www.hgmp.mrc.ac.uk/GenomeWeb/ genome-db.html http://directory.google.com/Top/Science/ Biology/Bioinformatics/ http://stein.cshl.org/talks/BioWWW/ch

http://directory.knowledge.com/science/biology/ bioinformatics http://www.ncbi.nlm.nih.gov/Sitemap/Resource Guide.html http://www.bioinformatik.de/cgi-bin/browse/ Catalog/Research_and_Education/Online_ Courses_and_Tutorials/ http://www.hgmp.mrc.ac.uk/CCP11/index.jsp

http://www.vicbioinformatics.com/vbcbob/ vbceat.shtml

http://www.ii.uib.no/~inge/list.html

467

Table 1 (continued) Resource

URL

Stanford Genome Technology Center The Institute for Genomic Research (TIGR) Tutorial on Protein Sequence Analysis

http://sequence-www.stanford.edu http://www.tigr.org

http://umber.sbs.man.ac.uk/dbbrowser/ bioactivity/prefacefrm.html

to Bioinformatics’ by Lesk (2002) and ‘Bioinformatics for Dummies’ by Claverie and Notredame (2003) are books aimed at beginners. ‘Bioinformatics and Functional Genomics’ by Pevsner (2003) is an excellent intermediate instructional tome. By and large, however, these books provide training that is useful for the analysis of one or a few sequences but the techniques are less readily applicable for whole genome analysis because of the programming skill required to chain together multiple analyses and process multiple genes. Various sites have been set up in which a number of analyses are performed on the entire genome and these are available for inspection. These genome database sites are typically composed of a relational database, which stores the genome sequence, the annotation and results of BLAST searches, hydrophobicity plots and various other informatic analyses. The advantage of such sites is that they provide whole genome analysis for those without the requisite computer skills to perform them. There is a disadvantage in that many of the analyses are performed using default parameters, which may miss a proportion of true-positive results in some circumstances. However, this is unlikely to matter except in a small proportion of cases and as long as users are aware of this, they can perform alternative analyses if they wish. Undoubtedly, the most influential of these sites in parasitology research is PlasmoDB (Table 2). Developed with the support of the Burroughs-Wellcome Fund, PlasmoDB is based on a highly complex relational schema, which allows it to store genome sequences and analyses as well as results from microarray, serial analysis of gene expression (SAGE) and proteomics experiments. The site is now at version 4.3 and continues to develop. The site has a good tuition mode and its features are described in several reviews (Kissinger et al., 2002; Kissinger and Roos, 2004). The design of PlasmoDB has been used to develop genome databases for other organisms including Toxoplasma gondii (ToxoDB) and Cryptosporidium (CryptoDB) (Table 2). The other genome database of importance to parasitologists is the GeneDB site developed by the Sanger Institute and the European Bioinformatics Institute (Table 2).

http://www.cse.ucsc.edu/~karplus/compbio_ pages.html

3. Parasite genome sequences

http://www.ncbi.nlm.nih.gov/ http://www.sanger.ac.uk

The majority of parasite genomes now available are derived from protozoa, with only a single parasitic helminth

468

R.L. Coppel, C.G. Black / International Journal for Parasitology 35 (2005) 465–479

Table 2 Parasite genomics, genetics and biology information resources Information resource

URL

ApiDB ApiDoTS (ApiESTDB) Blaxter lab nematode genomics database (NEMBASE) CryptoDB DeRisi laboratory malaria transcriptome database Genomes OnLine Database (GOLD) KEGG implementation of P. falciparum metabolic pathways Malaria full-length complementary DNA project Malaria parasite metabolic pathways NCBI malaria genetics and genomics NCBI nematode genomics resources GeneDB P. falciparum GeneDB site Parasite databases of clustered sequences PlasmoCyc genome pathway database PlasmoDB Sanger Institute P. falciparum genome project Sanger Institute parasite genomes project Structural genomics of parasitic protozoa consortium TcruziDB TIGR parasites database ToxoDB WHO/TDR malaria database

http://www.cs.uga.edu/~gao/project/gus/gus.htm http://www.cbil.upenn.edu/apidots/ http://www.nematodes.org/ http://cryptodb.org http://malaria.ucsf.edu/index.php http://www.genomesonline.org/ http://www.genome.ad.jp/dbget-bin/www_bfind?p.falciparum http://fullmal.ims.u-tokyo.ac.jp/ http://sites.huji.ac.il/malaria/ http://ncbi.nlm.nih.gov/projects/Malaria/ http://www.ncbi.nlm.nih.gov/genome/guide/nematode/ http://www.genedb.org http://www.genedb.org/genedb/malaria/index.jsp http://paradb.cis.upenn.edu/ http://plasmocyc.stanford.edu/ http://www.plasmodb.org http://www.sanger.ac.uk/Projects/P_falciparum http://www.sanger.ac.uk/Projects/Protozoa/ http://depts.washington.edu/sgpp http://tcruzidb.org http://www.tigr.org/tdb/parasites/ http://www.toxodb.org/ http://www.wehi.edu.au/MalDB-www/who.html

sequence currently available. We present a summary of the initial findings for each of the completed genomes, but there are quite a large number of partial sequences, unassembled shotgun sequences and expressed sequence tag (EST) surveys in various stages of completion and reporting. This area has also been recently reviewed by others (Ersfeld, 2003). 3.1. Apicomplexan genomes 3.1.1. The Plasmodium genomes The genome of P. falciparum is w23 Mb in length, comparable to the size of other protozoan parasites (see Table 3). It consists of 14 chromosomes ranging in size from 0.6 to 3.5 Mb (Kappe et al., 2001). The DNA is remarkably AT rich with an average GC content of 18% (Pollack et al., 1982), a property believed to account for the instability of large fragments of P. falciparum DNA in Escherichia coli (Coppel and Black, 1998). The AT content rises to more than 90% in introns and intergenic regions and is higher than 97% at the site of putative centromeres. Three sequencing centres performed the P. falciparum genome project, The Institute for Genomic Research (TIGR, Rockville, MD, USA), the Sanger Institute (Hinxton, UK) and Stanford University (Stanford, CA, USA), sequencing individual chromosomes. Funding was supplied by the Burroughs Wellcome Fund, the National Institutes of Health (NIH) and the Department of Defence (Carucci et al., 1998; Gardner, 1999). The approach to the genome was dictated in part by the split nature of the project. Chromosomes of P. falciparum are not rendered visible by conventional karyotypic

procedures, so the chromosomes were separated by pulsed field gel electrophoresis and purified from the gel. The individual chromosomes were sheared and used to construct chromosome-specific libraries. Shotgun sequencing of these libraries was performed and gap closure undertaken by the usual mix of methods including PCR and pinning to the YAC contigs determined during the mapping project. Not all chromosomes could be separated and at least three chromosomes (6, 7 and 8) co-migrated in a structure euphoniously named the BLOB. Optical mapping techniques were used to assist the assembly of sequence data into large contigs (Lai et al., 1999). At the time of its inception in the mid 1990s, it was not clear whether the sequencing of an organism with such an AT rich genome was technically feasible, given the numerous reports of difficulties that had bedevilled the sequencing of individual genes (Coppel and Black, 1998). The success in finishing chromosomes 2 and 3 laid these fears partly to rest; however, some gaps in the sequence proved difficult to close. A snapshot of the current status of the Plasmodium genome projects is provided in Table 3 and shows that sequences are available for several parasite species. Chromosomes 3 and 2 were published in 1998 and 1999, respectively, and a complete first draft of the genome sequence was published in 2002 (Gardner et al., 1998, 2002a,b; Bowman et al., 1999; Hall et al., 2002; Hyman et al., 2002). At the same time as the P. falciparum genome was released the genome sequences of the rodent malaria parasite Plasmodium yoelii yoelii and the mosquito vector Anopheles gambiae were reported (Carlton et al., 2002; Holt et al., 2002). The genomes of another five Plasmodium

R.L. Coppel, C.G. Black / International Journal for Parasitology 35 (2005) 465–479

469

Table 3 Current status of parasite genome projects Parasite Protozoa Apicomplexa Plasmodium falciparum 3D7 Plasmodium falciparum Dd2 Plasmodium vivax Salvador I Plasmodium y. yoelii 17XNL Plasmodium berghei ANKA Plasmodium c. chabaudi AS Plasmodium knowlesi H Plasmodium reichenowi Plasmodium gallinaceum Theileria parva Theileria annulata Toxoplasma gondii Eimeria tenella Houghton Cryptosporidium parvum IOWA Cryptosporidium hominis Babesia bovis Sarcocystis neurona

Genome size (Mb)

Project status

Sequencing centre

Website reference

23

Published

23 30 23 26 30 23 23 23 8.5 8.5 w65

Incomplete Incomplete Published Incomplete Incomplete Incomplete Incomplete Incomplete Incomplete Incomplete Incomplete

TIGR, Sanger, Univ. Stanford Univ. Florida TIGR TIGR Sanger Sanger Sanger Sanger Sanger TIGR Sanger TIGR, Sanger

60 9

Incomplete Published

Sanger UMN, VCU

Published Incomplete Incomplete

UMN, VCU Sanger TIGR, Wash. Univ. TIGR, Wash. Univ.

http://www.tigr.org/tdb/e2k1/pfa1/ http://www.sanger.ac.uk/Projects/P_falciparum/ http://parasite.vetmed.ufl.edu/falc.htm http://www.tigr.org/tdb/e2k1/pva1/ http://www.tigr.org/tdb/e2k1/pya1/ http://www.sanger.ac.uk/Projects/P_berghei/ http://www.sanger.ac.uk/Projects/P_chabaudi/ http://www.sanger.ac.uk/Projects/P_knowlesi/ http://www.sanger.ac.uk/Projects/P_reichenowi/ http://www.sanger.ac.uk/Projects/P_gallinaceum/ http://www.tigr.org/tdb/e2k1/tpa1/ http://www.sanger.ac.uk/Projects/T_annulata/ http://www.tigr.org/tdb/e2k1/tga1/ http://www.sanger.ac.uk/Projects/T_gondii/ http://www.sanger.ac.uk/Projects/E_tenella/ http://www.cbc.umn.edu/ResearchProjects/AGAC/Cp/index. htm http://www.parvum.mic.vcu.edu/ http://www.sanger.ac.uk/Projects/B_bovis/ http://www.ncbi.nlm.nih.gov

9 9.4

Neospora canium Kinetoplastida Trypanosoma brucei TREU 927/4 Trypanosoma cruzi

Incomplete

35

Incomplete

TIGR, Sanger

44

Incomplete

35 35 w34

Incomplete Incomplete Incomplete

w34 w34

Incomplete Incomplete

TIGR, SBRI, Karolinska Sanger Sanger EULEISH, SBRI, Sanger Sanger Sanger

70–90 12 20

Incomplete Incomplete Incomplete

TIGR MBL Sanger, TIGR

Entamoeba dispar

20

Incomplete

TIGR, Sanger

Entamoeba invadens

20

Incomplete

TIGR, Sanger

110

Complete

270

Incomplete

TIGR, Sanger, Univ. Edinburgh TIGR, Sanger

Trypanosoma congolense Trypanosoma vivax Leishmania major Friedlin Leishmania infantum Leishmania braziliensis Amitochondriates Trichomonas vaginalis Giardia lamblia Entamoaeba histolytica

Metazoa Brugia malayi

Schistosoma mansoni Schistosoma japonicum Onchocerca volvulus Wuchereria bancrofti Haemonchus contortus

Incomplete w150

Incomplete Incomplete Incomplete

Necator americanus

Incomplete

Ascaris suum

Incomplete

CNHGC Shanghai Smith College FilGenNet Sanger, Univ. Edinburgh Sanger, Univ. Edinburgh Sanger, Univ. Edinburgh

http://www.ncbi.nlm.nih.gov

http://www.tigr.org/tdb/e2k1/tba1/ http://www.sanger.ac.uk/Projects/T_brucei/ http://www.tigr.org/tdb/e2k1/tca1/ http://www.sanger.ac.uk/Projects/T_congolense/ http://www.sanger.ac.uk/Projects/T_vivax/ http://www.sanger.ac.uk/Projects/T_brucei/ http://www.sanger.ac.uk/Projects/L_infantum/ http://www.sanger.ac.uk/Projects/L_braziliensis/ http://www.tigr.org/tdb/e2k1/tvg/ http:/jbpc.mbl.edu/Giardia-HTML/ http://www.sanger.ac.uk/Projects/E_histolytica/ http://www.tigr.org/tdb/e2k1/eha1/ http://www.tigr.org/tdb/e2k1/eha1/ http://www.sanger.ac.uk/Projects/T_dispar/ http://www.tigr.org/tdb/e2k1/eha1/ http://www.sanger.ac.uk/Projects/T_invadens/ http://www.tigr.org/tdb/e2k1/bma1/ http://www.sanger.ac.uk/Projects/B_malayi/ http://www.tigr.org/tdb/e2k1/sma1/ http://www.sanger.ac.uk/Projects/S_mansoni/ http://schistosoma.chgc.sh.cn/ http://www.nematode.net http://www.ncbi.nlm.nih.gov http://www.nematode.net http://www.nematode.net http://www.nematode.net

470

R.L. Coppel, C.G. Black / International Journal for Parasitology 35 (2005) 465–479

species (Plasmodium berghei, Plasmodium chabaudi, Plasmodium vivax, Plasmodium knowlesi and Plasmodium reichenowi) are currently at various stages of sequencing and annotation (Table 3). The availability of genome data for a range of different malaria parasites has been extremely useful (Thompson et al., 2001; Carlton et al., 2002; Waters, 2002; Carlton, 2003). The high degree of synteny between genomes is aiding the process of genome annotation. It appears that over 60% of genes identified in P. falciparum have orthologs in P. yoelii. These common genes appear to occur in arrangements where synteny among species is largely conserved. The sites where synteny breaks down correspond to the location of species-specific genes involved in host–parasite interactions, for example merozoite surface protein 2 (MSP2) and erythrocyte-binding protein 1 (EBL1). Genes involved in antigenic variation, immune evasion and host cell adhesion show the least similarity between the species, despite the fact that they are clustered at sub-telomeric locations on chromosomes in all species (Doolittle, 2002). Indeed, the sub-telomeric regions of the Plasmodium chromosomes have revealed a conserved order of features, including repetitive DNA sequences, members of multigene families involved in virulence and antigenic variation, a number of conserved pseudogenes, and several genes of unknown function. Genes in this area include the variant antigen genes (var) and members of the rif and stevor gene families of P. falciparum (Gardner et al., 2002a; Rasti et al., 2004), the P. vivax vir (del Portillo et al., 2001) genes, the P. yoelii yir genes (Carlton et al., 2002), and the P. chabaudi cir genes (Janssen et al., 2001, 2002). The proteins encoded by the var and rif genes (PfEMP1 and rifins) are expressed on the surface of the infected red blood cell (Kyes et al., 1999, 2001) whereas stevor gene products localise to internal membrane structures of the red cell known as Maurer’s clefts (Kaviratne et al., 2002; Blythe et al., 2004; McRobert et al., 2004). A search of PlasmoDB revealed that 29 var, 44 rif and five stevor genes are located within a 20 kb distance of telomeres (Kissinger et al., 2002). From the complete genome sequence of the 14 chromosomes of P. falciparum 3D7 a total of 5268 genes was predicted. In addition, the genome contains a 6 kb linear mitochondrial genome encoding three proteins plus rRNAs and a 35 kb circular plastid (apicoplast) genome encoding 30 proteins (Wilson et al., 1996; Krungkrai, 2004; Waller and McFadden, 2005). The initial application of genome information to malaria research has been assisting scientists to identify genes similar to sequences of interest or for which they already possessed partial information. For example, recognition of enzymes involved in the mevalonate-independent pathway of isoprenoid synthesis enabled researchers to focus in on a pathway potentially vulnerable to attack. Indeed, inhibitors of this pathway have been found and two compounds identified that block the pathway and cure mice of experimental malaria infection (Jomaa et al., 1999). Other investigators also utilised early sequence data

to uncover novel targets for drug development, such as several plant-like lipid biosynthesis enzymes that are essential for parasite function but are absent in humans (Waller et al., 1998). Mining of the genome data has also revealed several paralogs of genes involved in red blood cell invasion, including erythrocyte-binding antigen 175 (EBA175), SERA, proteins that contain epidermal growth factor-like domains similar to those found in MSP1, MSP4 and MSP5 and genes similar to the P. vivax RBP (reviewed in Topolska et al. (2004)). Reports of studies are just starting to appear that take the genome sequence as a whole to infer general properties of the organism. In a recent report, the genome data was searched for proteins involved in transcriptional regulation (Coulson et al., 2004), as these elements have proved rather elusive. Although the basic machinery of transcription initiation is present, there is a striking paucity of regulatory proteins. Simply, the large families of transcriptional regulators found in other eukaryotes are virtually absent in P. falciparum. In stark contrast, there are twice as many CCCH-type zinc-finger proteins as found in other eukaryotes, suggesting that post-transcriptional removal of mRNA is a common method of gene regulation in malaria parasites. In another study, a bioinformatics approach was taken to search for regulatory elements in the genome sequence itself using the heat shock protein (Hsp) family as a model system (Militello et al., 2004). The results showed that the P. falciparum hsp genes do not contain standard eukaryotic regulatory elements. However, a novel G-rich regulatory motif, named the G-box, was identified upstream of several P. falciparum hsp genes and the P. yoelii, P. berghei and P. vivax hsp86 genes. The G-boxes were required for maximal reporter gene expression in transient transfection assays. The G-box is not homologous to any known eukaryotic regulatory motifs and to date is the bestdefined functional element identified in Plasmodium spp. Recently, Hall and colleagues (2005) have added data from microarrays and proteomics to the genome sequence to define repertoires of genes expressed at different stages. Of great interest was the observation that post-transcriptional gene silencing appears to be an important mechanism for restricting translation of mRNA to certain stages. A 47-base motif (containing a UUGU submotif) located in the 3 0 untranslated region (UTR) of several known post-transcriptionally silenced genes has been implicated in the regulation of this phenomenon (Hall et al., 2005). 3.1.2. The Toxoplasma gondii genome The apicomplexan parasite T. gondii is an obligate intracellular protozoan parasite which is a significant human and veterinary pathogen. Genetic manipulation, including stable and transient transfection, is well established for this organism (Kim and Weiss, 2004). The pathogenic life cycle stages of T. gondii are easily propagated in the laboratory; the mouse infection model is well established and reagents for cell biology studies are readily available.

R.L. Coppel, C.G. Black / International Journal for Parasitology 35 (2005) 465–479

For these reasons, T. gondii is an important model system for the study of apicomplexan biology (Roos et al., 1994; Kim and Weiss, 2004). The genome project at TIGR is sequencing the T. gondii B7 strain, a recent clonal isolate of the type II parasites most commonly associated with AIDS. Based on the assembly of contigs in ToxoDB (Kissinger et al., 2003) (Table 2), the current size estimate for the T. gondii genome is w65 Mb and consists of 14 chromosomes (D. Sibley, personal communication). Currently, T. gondii has over 60,000 ESTs that have been sequenced from different life cycle stages and the three major genotypes (Ajioka, 1998; Ajioka et al., 1998; Li et al., 2003). Overall, the genes of T. gondii are much more intron-rich than those of Plasmodium or Cryptosporidium. In general, genes for surface antigens and proteins secreted from rhoptries, micronemes and dense granules have fewer introns than housekeeping genes. Furthermore, many of these genes are located near the ends of chromosomes (Kim and Weiss, 2004). Comparative analyses of apicomplexan genomes has shown that most apicomplexan-specific genes are related to their unique apical specialisation and encode components of the secretory organelles and surface antigens (Li et al., 2003). Included in this category are genes encoding secretory proteins such as apical membrane antigen 1 (AMA1) and other microneme proteins, likely involved in cell recognition (Soldati et al., 2001), the major surface antigen (SAGs) (Boothroyd et al., 1998) and a number of proteins characterised from sporozoites and oocysts. Another highly conserved, yet uniquely apicomplexan protein is the myosin known as TgMyoA in T. gondii (Hettmann et al., 2000) and MyoA in P. falciparum (Pinder et al., 1998). This motor protein plays an important role in the characteristic gliding motility of this group of parasites (Pinder et al., 1998). Several metabolic enzymes are also included on this list, such as the unique apyrase known as nucleoside triphosphate hydrolase first characterised in T. gondii (Asai et al., 1995) and also found in Neospora caninum (Asai et al., 1998). 3.1.3. The Cryptosporidium genome Cryptosporidium parvum is an apicomplexan parasite that is the causative agent of cryptosporidiosis and is an important AIDS pathogen. This parasite is capable of infecting a broad range of mammalian hosts and causes acute gastrointestinal disease that can be fatal in immunocompromised hosts. C. parvum is notoriously difficult to work with in the laboratory and long-term in vitro culture is not available. The genomes of both human (type I, strain H, recently renamed Cryptosporidium hominis (Morgan-Ryan et al., 2002)) and bovine (type II, strain IOWA) isolates of C. parvum have been sequenced and annotation is ongoing. The complete sequences of both genomes have recently been published (Abrahamsen et al., 2004; Xu et al., 2004a, b). The genome is relatively small for a protozoan at

471

w9 Mb, consisting of eight chromosomes ranging in size from w0.9 to w1.4 Mb. Compared with P. falciparum, genome reduction and compaction has occurred predominantly through the shortening of intergenic regions, the loss and shortening of introns and a reduction in the number and mean length of the genes themselves (Keeling, 2004; Templeton et al., 2004). The C. hominis genome is predicted to encode 3994 genes (of which 60% exhibit similarity to known genes) while there are an estimated 3807 genes in C. parvum. Two unique and abundant repeats have been identified in the genome, almost exclusively in non-coding regions, indicating a possible regulatory or other conserved function (Bankier et al., 2003; Xu et al., 2004a,b). A comprehensive database, CryptoDB, now serves as a public resource for genomic, EST and genome sequence survey (GSS) data for both of the sequenced genomes (Puiu et al., 2004) (Table 2). Analysis of the sequence data has revealed that the parasite possesses a highly tailored glycolysis-based metabolism. Both aerobic and anaerobic pathways are present, the former requiring an alternative electron transport system in a simplified mitochondrion (Williams and Keeling, 2003; Putignani et al., 2004; Henriquez et al., 2005). A large repertoire of transporters has been identified, presumably to compensate for the lack of de novo biosynthetic capacity for purines, pyrimidines and amino acids. There is no evidence of apicoplastencoded genes, consistent with previous studies showing that Cryptosporidium lacks an apicoplast (Tetley et al., 1998; Zhu et al., 2000). The C. hominis genome encodes multiple proteins associated with apical organelles including micronemes and rhoptries, however, no specific dense granule proteins were identified. In contrast to other protozoan parasites, no extensive sub-telomeric clusters of variant surface antigens (exemplified by the large families of var, stevor and rifin genes of P. falciparum) were observed in the Cryptosporidium genome. However, more than 20 genes were identified that encode mucin-like proteins (Barnes et al., 1998; Cevallos et al., 2000) and several other novel families of cell-surface and secreted putative proteins were also detected. Interestingly, a putative ortholog of the Plasmodium chloroquine resistance-linked gene PfCRT (Fidock et al., 2000) was identified, although this parasite does not possess a food vacuole. The elucidation of core metabolic pathways, including enzymes with similarities to plant and bacterial counterparts, will facilitate the process of drug discovery and development (Striepen and Kissinger, 2004). A comparative genome database, ApiDB (Table 2), which combines all of the apicomplexan genome sequences, has been used for comparative analysis of nucleotide biosynthesis in Cryptosporidium (Striepen et al., 2004). The results revealed the lack of all six genes encoding enzymes for de novo synthesis, a pathway present in all other apicomplexans for which sequence data are available. This absence appears to be compensated for by a family of unique pyrimidine salvage enzymes, one of which is

472

R.L. Coppel, C.G. Black / International Journal for Parasitology 35 (2005) 465–479

thymidine kinase (TK), the target of the antiviral drug ganciclovir. Comparative genomic analysis has also identified several genes known to encode mitochondrial proteins in other apicomplexans (Abrahamsen et al., 2004). Expression of one of these (Cpn60) fused to a reporter in T. gondii resulted in proper targeting of the chimeric protein to the mitochondrion, indicating the presence of a functional import sequence. 3.2. Kinetoplastid genomes 3.2.1. The Leishmania major genome Leishmania are protozoan parasites (order Kinetoplastida) which alternate life cycle stages between an intracellular amastigote residing in vertebrate macrophages and an extracellular promastigote living in the digestive tract of sandflies. The numerous human-infective Leishmania spp. are responsible for a spectrum of disease pathologies ranging from asymptomatic to lethal (Ashford, 2000). Leishmania, as well as other typanosomatids, possess unusual mechanisms of gene expression, including polycistronic transcription (Johnson et al., 1987; Martinez-Calvillo et al., 2003) and RNA editing of mitochondrial transcripts (Sibley and Boothroyd, 1992; Seiwert, 1995). In these organisms, the mature nuclear mRNAs are generated from primary transcripts by transsplicing, a process that adds a capped 39-nucleotide miniexon or splice leader (SL) to the 5 0 end on the mRNAs (Parsons et al., 1984). Post-transcriptional mechanisms involving the 3 0 untranslated regions appear to regulate the steady-state levels of most of the mature mRNAs (Myung et al., 2002). Promoters for RNA polymerase I (Pol I) and Pol III have been well characterised in trypanosomatids (Zomerdijk et al., 1991; Rudenko et al., 1995; Yan et al., 1999), however, little is known about the sequences promoting expression of Pol II-transcribed genes (Gilinger and Bellofatto, 2001; Martinez-Calvillo et al., 2004). Leishmania major (Friedlin strain) was chosen as the genome reference strain for both technical and biological reasons: a physical map had already been constructed, the strain is capable of passage through the sandfly vector, it has served as the model strain for studying the immunological aspects of infection and some stages of its life cycle can be propagated in vitro (Ivens et al., 1998). The Leishmania genome sequencing project was carried out by several sequencing centres (including SBRI, Sanger, and EULEISH). In order to sequence the entire genome, large insert cosmid libraries were generated, mapped to individual chromosomes, separated by pulsed field gel electrophoresis and sheared into shorter fragments and sequenced. The complete sequence of the shortest chromosome (chr 1) was published in 1999 and several others (chr 3, 4, 5, 24) have since been completed (Myler et al., 1999; Worthey et al., 2003). More recently, a shotgun optical map of the entire genome was constructed to facilitate the remaining

assemblies and validate ongoing and finished sequences (Zhou et al., 2004). The total number of genes is estimated to be in the order of 8000. About 53% of these genes have unknown functions, a proportion similar to that found in P. falciparum (Ersfeld, 2003). The nuclear (haploid) genome size is w34 Mb and contains 36 chromosomes, ranging in size from 0.3 to 2.5 Mb (Wincker et al., 1996). Analysis of the finished sequences of chromosomes 1, 3 and 4 has revealed exciting biological insights. Most interestingly, the genes are organised into large (O100–500 kb) polycistronic clusters of adjacent genes on the same DNA strand (Myler et al., 2000). Chromosome 1 contains two such clusters organised in a divergent manner (i.e. the two sets of genes are both transcribed towards the telomeres) (Martinez-Calvillo et al., 2003). Nuclear run-on analysis suggests that transcription is initiated in both directions within the divergent region (Myler et al., 2001). Chromosome 3 contains two convergent polycistronic gene clusters separated by a tRNA gene, with a single divergent gene at one telomere (Worthey et al., 2003). Transfection studies support the presence of a bidirectional promoter in the region between the two gene clusters of Chr 1 (Martinez-Calvillo et al., 2003) and Chr 3 (Martinez-Calvillo et al., 2004). It has recently been shown that Pol II transcription on both strands of Chr 3 terminates in the tRNA gene region, which itself is transcribed by Pol III (Martinez-Calvillo et al., 2004). The surface of Leishmania parasites is covered with a dense coat of glycoconjugates, amongst them a membranebound lipophosphoglycan (LPG). Based on their location and abundance, it is believed that these molecules are major virulence factors with roles in establishing infection and parasite survival within the host (Turco et al., 2001). Analysis of the available genome sequence has aided the elucidation of the biochemical pathways of LPG synthesis. Studies in L. major have shown that the parasite loses its virulence when an essential enzyme in the LPG pathway is knocked out (Spath et al., 2000). However, when the equivalent pathway was knocked out in the closely related species Leishmania mexicana, no effect on virulence was observed (Ilg, 2000). Novel drugs to combat leishmaniasis are urgently needed as only three classes of drugs (with adverse side-effects) are currently available (Guerin et al., 2002). The mining of genomic information will greatly facilitate the more rapid identification of potential new drug targets. 3.2.2. The Trypanosoma brucei genome Trypanosoma brucei species are the causative agents of trypanosomiasis or African sleeping sickness (T. b. rhodesiense and T. b. gambiense) and N’gana, a wasting disease of domestic cattle (T. b. brucei). The parasite is transmitted through the bite of a tsetse fly. Unlike a number of other parasites, African trypanosomes are extracellular parasites throughout their entire life cycles.

R.L. Coppel, C.G. Black / International Journal for Parasitology 35 (2005) 465–479

The T. brucei genome contains three main chromosome types (termed minichromosomes, intermediate chromosomes and megabase chromosomes) classified by their size ranges (Gull, 2001). The megabase chromosomes have internal regions comprising protein-coding genes (which are transcribed in a polycistronic manner) and telomeric expression sites (ES) for metacyclic and bloodstream versions of the variant surface glycoprotein (VSG) genes. The parasite has evolved a strategy for switching the expression of their major surface antigen (encoded by the VSG genes). This process, termed VSG switching, allows a sub-population of parasites in the blood to continuously change their surface coat to a different isotype of VSG (Vanhamme et al., 2001; Pays et al., 2004). In order to maintain the ability to switch and use new VSG variants, the parasite genome contains a repertoire of hundreds of VSG genes, only one of which is expressed at a time from ES located on most (but not all) of the 11 pairs of megabase chromosomes (Melville et al., 1998, 2000). Using fine resolution mapping, it has been shown that the minichromosomes and intermediate chromosomes of T. brucei have a canonical structure based around a large central core of 177bp palindromic repeats. The intermediate chromosomes differ from minichromosomes only in the length of nonrepetitive subtelomeric sequences they possess (Wickstead et al., 2004). A unique feature of sub-telomeric regions in the T. brucei genome is the presence of a novel family of genes, termed retrotransposon hot spot (RHS) genes (Bringaud et al., 2002). In addition to intact genes, several RHS pseudogenes reside within the RHS cluster. RHS-related sequence has also been identified in Trypanosoma cruzi. RHS genes/ pseudogenes are clustered in the genome and often occur as tandem repeats. The clusters examined to date are located upstream of bloodstream ES, and also in sub-telomeric regions on chromosomes not carrying an ES. The RHS sequences show no similarities to known genes and their function is unknown. As the T. brucei genome project is still incomplete, only a few studies utilising the genome database resources have been published. For instance, searches of the databases for proteins located in the glycosome (a kinetoplastid specific organelle that contains most of the glycolytic enzymes) revealed 16 different genes encoding homologues of plant enzymes (Guerra-Giraldez et al., 2002; Hannaert et al., 2003). Unlike Plasmodium, trypanosomes do not have an apicoplast, prompting the authors to speculate that an ancient gene transfer event occurred from an algal endosymbiont. Orthologous genes have also been identified in Leishmania (Hannaert et al., 2003) and offer the possibility that herbicide-type chemicals could be used to target the enzymes encoded by these genes. Trypanosoma cruzi is the causative agent of a potentially fatal illness known as Chagas’ disease or American trypanosomiasis. It has been shown that approximately 9–14% of the total parasite DNA comprises highly

473

repetitive sequences (Castro et al., 1981; Craig et al., 1990). The first such sequence described was a satellite DNA sequence, with a repeat length of 195–196 bp and lacking detectable homology with the 177-bp satellite repeat of T. brucei. Although it was postulated that this tandemly repeated satellite DNA was located in minichromosomes (Gonzalez et al., 1984), it has since been shown that T. cruzi lacks minichromsosomes and the satellite DNA sequences are distributed across several, but not all, of the parasite’s chromosomes (Cano et al., 1995). In addition to the satellite DNA, the T. cruzi genome contains several families of interspersed repetitive sequences which resemble the short interspersed nucleotide elements (SINE) found in the genomes of mammals and other higher eukaryotes (reviewed in Requena et al. (1996)). The T. cruzi genome sequencing project is currently underway at TIGR, in collaboration with the Seattle Biomedical Research Institute (SBRI) and Karolinska Institute (Table 3). The sequence data generated to date has already provided a valuable insight into gene synteny and the evolution of genome architecture in trypanosomatids, including a striking conservation of gene order and the presence of an ancestral retrotransposon-like element (Ghedin et al., 2004a). 3.3. Parasitic nematode genomes 3.3.1. The Brugia malayi genome The genome of the human filarial worm Brugia malayi, has recently been sequenced at 9X redundancy using a whole genome shot-gun approach (Ghedin et al., 2004b) (Table 3). This represents the first parasitic nematode genome to be fully sequenced, following those of the freeliving nematodes Caenorhabditis elegans and Caenorhabditis briggsae. The Filarial Genome Project (FGP) was initiated in 1994 with funds from the World Health Organisation and is led by a consortium of international laboratories that collaborate in generating genomics resources. In 2001, the NIH/NIAID awarded TIGR a grant to begin large-scale sequencing of the B. malayi genome. From the genomic data, the nuclear genome is estimated to be between 85 and 95 Mb, consistent with pre-genome predictions (McReynolds et al., 1986; Blaxter et al., 2002). More than 26,000 ESTs have been clustered into w8000 unique genes, using algorithms developed at TIGR (Quackenbush et al., 2001) and NEMBASE (Blaxter et al., 1999, 2002) (Table 2). The functions of many of these genes are unknown as very few have homologues in other organisms and many appear to be unique to Brugia. Using the assembled segments of the genome, the gene arrangement of B. malayi and C. elegans were compared to look at synteny between species (Ghedin et al., 2004b). In most cases, where homologous genes were found in both organisms, genes that were located side by side on a B. malayi contig would usually appear 2–10 Mb apart on a C. elegans chromosome, indicating that synteny is rather weak in these distantly related nematodes. About 15% of

474

R.L. Coppel, C.G. Black / International Journal for Parasitology 35 (2005) 465–479

the B. malayi genome contains repeats, similar to the repeat content of C. elegans. Of these, O13,000 novel repeats were identified, including some repeat families with O100 members that share no homology with other nematode repeats, or to any other known repeats. RNA interference (RNAi) is already being used to investigate gene function in B. malayi (Aboobaker and Blaxter, 2003) and such experiments will greatly benefit from the complete genome sequence. The genome project will also aid the search for new drug targets for the treatment of lymphatic filariasis. 3.3.2. Caenorhabditis elegans as a model for the study of gene function in parasitic nematodes The free-living nematode C. elegans is an excellent model system for studying the developmental and functional biology of metazoans (Hashmi et al., 2001). Genetic and physical maps of the six chromosomes have been constructed and the sequence of the entire genome has been determined and annotated (The C. elegans Sequencing Consortium, 1998). An extensive collection of mutants, and a variety of chromosome maps of mutant genes and molecular markers are publicly available (http://elegans.swmed.edu/genome. shtml). C. elegans shares morphological characteristics and biological processes with other nematodes, including the presence of a protective cuticle, dauer stages, biochemical adaptations to extreme conditions, molting and reproduction. Parasitic nematode EST projects have revealed that greater than 40% of these genes are homologous to genes in C. elegans, including several vaccine and drug target candidates (Hashmi et al., 2001). Studies of gene function in parasitic nematodes have been hindered by the lack of molecular genetic tools to investigate their role during development directly. In C. elegans, however, the ability to create gene-specific mutants by transposon insertion, chemical mutagenesis and RNAi followed by their rescue by complementation has been successfully used to assign functions to gene products (Kamath and Ahringer, 2003; Kamath et al., 2003; Lee et al., 2003; Pothof et al., 2003; Simmer et al., 2003; Vastenhouw et al., 2003; Lettre et al., 2004). The availability of the complete genome sequence of C. elegans, combined with molecular protocols that allow rapid progression from gene sequence to potential function, have paved the way for investigations into cellular and developmental processes. This in turn has facilitated the study of the corresponding processes in parasitic nematodes (Hashmi et al., 2001). For example, a recent study has shown that a gene family of cathepsin L-like cysteine proteases (CPLs) of B. malayi and B. pahangi, and the closely related CPL genes identified in Onchocerca volvulus and C. elegans, are involved in larval molting and cuticle and eggshell remodeling (Guiliano et al., 2004). In another study, it was determined that cofactor-independent phosphoglycerate mutase has an essential function in C. elegans and is highly conserved in parasitic nematodes (Zhang et al., 2004), suggesting it may be a possible drug target for new antihelmintics.

3.4. Parasite vector genomes An understanding of any parasite involves its interactions with its various hosts, especially definitive and intermediate. In the case of P. falciparum, the genome of its mosquito vector A. gambiae PEST strain has been completed (Holt et al., 2002). The genome is a massive 278 Mb and is predicted to encode about 14,000 genes. This gene number is quite similar to that found in the fruit fly Drosophila melanogaster, but the total genome size is about 160 Mb larger in Anopheles, predominantly due to a vast amount of intergenic sequence, which has been lost from the Drosophila genome over time. Most of the annotation of the Anopheles genome was generated automatically, so that there is still a very strong provisional nature to the conclusions that can be drawn. It appears that there has been an expansion, relative to what is found in the Drosophila genome, in the number of serine proteases and proteins involved as effectors of innate immunity and in other proteolytic processes. Of interest are a family of proteins containing extracellular adhesion domains such as fibrinogen and cadherin. These are typically found in lectinlike proteins capable of binding carbohydrates and in cell– cell contact processes, respectively. Some fibrinogendomain containing proteins were upregulated following blood feeding and may be implicated in innate immune processes that make vectors resistant to infection by the malaria parasite (Dimopoulos et al., 2002; Christophides et al., 2004). A number of other gene families show altered regulation following blood feeding but many of these appear to be involved in digestion of the blood and cellular processes secondary to the appearance of nutrients and the commencement of egg laying (Holt et al., 2002). It is not apparent at present whether a particular set of genes have evolved to limit malaria replication in the mosquito or whether resistance is essentially an accidental phenomenon. Theoretically, it may be possible to exploit such mechanisms and there has been much talk about the construction of transgenic mosquitoes that cannot carry malaria and using these to displace current vector populations in endemic areas. It is difficult to see how this might be possible under current regulatory and legislative conditions and how the risk may be minimised sufficiently to win over a public that is highly suspicious of genetically modified organisms. An EST project for the T. brucei vector, the tsetse fly (Glossina morsitans morsitans) is currently underway at the Sanger Institute (http://www.sanger.ac.uk/Projects/ G_morsitans/) (Hertz-Fowler and Berriman, 2004). As an initial step, w21,000 ESTs from the adult midgut of the tsetse fly were sequenced and clustered into w9000 unique genes. Putative functions were ascribed to w4000 of these genes based on homology (Lehane et al., 2003). The International Glossina Genomics Initiative (IGGI) has recently been established to co-ordinate the ongoing genome sequencing efforts (Butler, 2004).

R.L. Coppel, C.G. Black / International Journal for Parasitology 35 (2005) 465–479

4. Data integration and future developments Parasite genome sequences do not of themselves provide a full explanation for the biology of the organism. For example, the gene complement of C. parvum and C. hominis is essentially identical, yet the two parasites show marked differences in biological properties including host range and pathogenicity (Xu et al., 2004a,b). It may be that sequence differences explain the biological differences but there are other differences that may be responsible, such as differences in expression, particularly in response to environmental conditions within the host. More information is required and further studies using functional genomics approaches may be informative. This in turn will generate large volumes of information as the genome sequence of course is only the first step in multi-gene analyses of organisms. The transcriptional profile at various times in the life cycle or in response to various stimuli can be determined by a combination of EST sequencing, SAGE analysis and microarrays. To this may be added proteomics analyses, again at various stages in the parasite life cycle, as well as the results of various cellular and immunological assays. Cumulatively, this generates a multidimensional data set of vast proportions. There are many issues that arise in considering the analysis of this data that we can only touch on briefly. Firstly, it is becoming clear that analyses such as microarrays are very sensitive to initial conditions and to fine details of the protocol such as mRNA preparation and hybridisation conditions. It is still uncertain how the large amount of variation inherent in these techniques will be handled within microarray databases and how validation and agreement about the definitive set of results will occur. Once data is validated, the issue of integration and data mining arises. There is no perfect solution at present and a range of different approaches are being taken. The initial step is to bring these functional genomics data sets into a single source where they can be queried along with genome information. PlasmoDB takes this approach but the analytical methods available are still quite rudimentary, employing simple queries and Boolean operators. The ‘point and click’ interface of web sites is one that discourages active data exploration and encourages complacency. We will need to move to environments that draw heavily on visualisation and exploratory three-dimensional environments to enable scientists to harness the unrivalled pattern-matching ability of the human brain. The availability of such methods together with modeling approaches that allow phenomena to be explained in terms of multiple gene products changing simultaneously will allow fuller exploitation of these valuable sequence resources.

Acknowledgements We would like to acknowledge support of the National Institutes of Health (Grant DK-32094), the Australian

475

National Health and Medical Research Council and the Burroughs Wellcome Fund. RLC is an international fellow of the Howard Hughes Medical Institute.

References Aboobaker, A.A., Blaxter, M.L., 2003. Use of RNA interference to investigate gene function in the human filarial nematode parasite Brugia malayi. Mol. Biochem. Parasitol. 129, 41–51. Abrahamsen, M.S., Templeton, T.J., Enomoto, S., Abrahante, J.E., Zhu, G., Lancto, C.A., Deng, M., Liu, C., Widmer, G., Tzipori, S., Buck, G.A., Xu, P., Bankier, A.T., Dear, P.H., Konfortov, B.A., Spriggs, H.F., Iyer, L., Anantharaman, V., Aravind, L., Kapur, V., 2004. Complete genome sequence of the apicomplexan, Cryptosporidium parvum. Science 304, 441–445. Ajioka, J.W., 1998. Toxoplasma gondii: ESTs and gene discovery. Int. J. Parasitol. 28, 1025–1031. Ajioka, J.W., Boothroyd, J.C., Brunk, B.P., Hehl, A., Hillier, L., Manger, I.D., Marra, M., Overton, G.C., Roos, D.S., Wan, K.L., Waterston, R., Sibley, L.D., 1998. Gene discovery by EST sequencing in Toxoplasma gondii reveals sequences restricted to the Apicomplexa. Genome Res. 8, 18–28. Asai, T., Miura, S., Sibley, L.D., Okabayashi, H., Takeuchi, T., 1995. Biochemical and molecular characterization of nucleoside triphosphate hydrolase isozymes from the parasitic protozoan Toxoplasma gondii. J. Biol. Chem. 270, 11391–11397. Asai, T., Howe, D.K., Nakajima, K., Nozaki, T., Takeuchi, T., Sibley, L.D., 1998. Neospora caninum: tachyzoites express a potent type-I nucleoside triphosphate hydrolase. Exp. Parasitol. 90, 277–285. Ashford, R.W., 2000. The leishmaniases as emerging and reemerging zoonoses. Int. J. Parasitol. 30, 1269–1281. Bankier, A.T., Spriggs, H.F., Fartmann, B., Konfortov, B.A., Madera, M., Vogel, C., Teichmann, S.A., Ivens, A., Dear, P.H., 2003. Integrated mapping, chromosomal sequencing and sequence analysis of Cryptosporidium parvum. Genome Res. 13, 1787–1799. Barnes, D.A., Bonnin, A., Huang, J.X., Gousset, L., Wu, J., Gut, J., Doyle, P., Dubremetz, J.F., Ward, H., Petersen, C., 1998. A novel multidomain mucin-like glycoprotein of Cryptosporidium parvum mediates invasion. Mol. Biochem. Parasitol. 96, 93–110. Berry, A.E., Gardner, M.J., Caspers, G.J., Roos, D.S., Berriman, M., 2004. Curation of the Plasmodium falciparum genome. Trends Parasitol. 20, 548–552. Blaxter, M., Aslett, M., Guiliano, D., Daub, J., 1999. Parasitic helminth genomics. Filarial genome project. Parasitology 118 (Suppl.), S39–S51. Blaxter, M., Daub, J., Guiliano, D., Parkinson, J., Whitton, C., 2002. The Brugia malayi genome project: expressed sequence tags and gene discovery. Trans. R. Soc. Trop. Med. Hyg. 96, 7–17. Blythe, J.E., Surentheran, T., Preiser, P.R., 2004. STEVOR-a multifunctional protein? Mol. Biochem. Parasitol. 134, 11–15. Boothroyd, J.C., Hehl, A., Knoll, L.J., Manger, I.D., 1998. The surface of Toxoplasma: more and less. Int. J. Parasitol. 28, 3–9. Bowman, S., Lawson, D., Basham, D., Brown, D., Chillingworth, T., Churcher, C.M., Craig, A., Davies, R.M., Devlin, K., Feltwell, T., Gentles, S., Gwilliam, R., Hamlin, N., Harris, D., Holroyd, S., Hornsby, T., Horrocks, P., Jagels, K., Jassal, B., Kyes, S., McLean, J., Moule, S., Mungall, K., Murphy, L., Oliver, K., Quail, M.A., Rajandream, M.-A., Rutter, S., Skelton, J., Squares, R., Squares, S., Sulston, J.E., Whitehead, S., Woodward, J.R., Newbold, C., Barrell, B.G., 1999. The complete nucleotide sequence of chromosome 3 of Plasmodium falciparum. Nature 400, 532–538. Bringaud, F., Biteau, N., Melville, S.E., Hez, S., El-Sayed, N.M., Leech, V., Berriman, M., Hall, N., Donelson, J.E., Baltz, T., 2002. A new,

476

R.L. Coppel, C.G. Black / International Journal for Parasitology 35 (2005) 465–479

expressed multigene family containing a hot spot for insertion of retroelements is associated with polymorphic subtelomeric regions of Trypanosoma brucei. Eukaryot. Cell 1, 137–151. Butler, D., 2004. African labs win major role in tsetse-fly genome project. Nature 427, 384. Cano, M.I., Gruber, A., Vazquez, M., Cortes, A., Levin, M.J., Gonzalez, A., Degrave, W., Rondinelli, E., Zingales, B., Ramirez, J.L., Alonso, C., Requena, J.M., da Silveira, J.F., 1995. Molecular karyotype of clone CL Brener chosen for the Trypanosoma cruzi genome project. Mol. Biochem. Parasitol. 71, 273–278. Carlton, J., 2003. The Plasmodium vivax genome sequencing project. Trends Parasitol. 19, 227–231. Carlton, J.M., Angiuoli, S.V., Suh, B.B., Kooij, T.W., Pertea, M., Silva, J.C., Ermolaeva, M.D., Allen, J.E., Selengut, J.D., Koo, H.L., Peterson, J.D., Pop, M., Kosack, D.S., Shumway, M.F., Bidwell, S.L., Shallom, S.J., van Aken, S.E., Riedmuller, S.B., Feldblyum, T.V., Cho, J.K., Quackenbush, J., Sedegah, M., Shoaibi, A., Cummings, L.M., Florens, L., Yates, J.R., Raine, J.D., Sinden, R.E., Harris, M.A., Cunningham, D.A., Preiser, P.R., Bergman, L.W., Vaidya, A.B., van Lin, L.H., Janse, C.J., Waters, A.P., Smith, H.O., White, O.R., Salzberg, S.L., Venter, J.C., Fraser, C.M., Hoffman, S.L., Gardner, M.J., Carucci, D.J., 2002. Genome sequence and comparative analysis of the model rodent malaria parasite Plasmodium yoelii yoelii. Nature 419, 512–519. Carucci, D.J., Gardner, M.J., Tettelin, H., Cummings, L.M., Smith, H.O., Adams, M.D., Hoffman, S.L., Venter, J.C., 1998. The malaria genome sequencing project. Exp. Rev. Mol. Med. 1998, 1–9. Castro, C., Craig, S.P., Castaneda, M., 1981. Genome organization and ploidy number in Trypanosoma cruzi. Mol. Biochem. Parasitol. 4, 273–282. Cevallos, A.M., Bhat, N., Verdon, R., Hamer, D.H., Stein, B., Tzipori, S., Pereira, M.E., Keusch, G.T., Ward, H.D., 2000. Mediation of Cryptosporidium parvum infection in vitro by mucin-like glycoproteins defined by a neutralizing monoclonal antibody. Infect. Immun. 68, 5167–5175. Christophides, G.K., Vlachou, D., Kafatos, F.C., 2004. Comparative and functional genomics of the innate immune system in the malaria vector Anopheles gambiae. Immunol. Rev. 198, 127–148. Claverie, J.-M., Notredame, C., 2003. Bioinformatics for Dummies. Wiley, New York, NY. Coppel, R.L., Black, C.G., 1998. Malaria parasite DNA. In: Sherman, I.W. (Ed.), Malaria: Parasite Biology, Pathogenesis and Protection. ASM Press, New York, NY, pp. 185–202. Coulson, R.M., Hall, N., Ouzounis, C.A., 2004. Comparative genomics of transcriptional control in the human malaria parasite Plasmodium falciparum. Genome Res. 14, 1548–1554. Craig, S.P., Castro, C., Eakin, A.E., Castaneda, M., 1990. Trypanosoma (Schizotrypanum) cruzi: repetitive DNA sequence evolution in three geographically distinct isolates. Comp. Biochem. Physiol., B95, 657–662. del Portillo, H.A., Fernandez-Becerra, C., Bowman, S., Oliver, K., Preuss, M., Sanchez, C.P., Schneider, N.K., Villalobos, J.M., Rajandream, M.A., Harris, D., Pereira da Silva, L.H., Barrell, B., Lanzer, M., 2001. A superfamily of variant genes encoded in the subtelomeric region of Plasmodium vivax. Nature 410, 839–842. Dimopoulos, G., Christophides, G.K., Meister, S., Schultz, J., White, K.P., Barillas-Mury, C., Kafatos, F.C., 2002. Genome expression analysis of Anopheles gambiae: responses to injury, bacterial challenge, and malaria infection. Proc. Natl Acad. Sci. USA 99, 8814–8819. Doolittle, R.F., 2002. The grand assault. Nature 419, 493–494. Dorin, D., Semblat, J.P., Poullet, P., Alano, P., Goldring, J.P., Whittle, C., Patterson, S., Chakrabarti, D., Doerig, C., 2005. PfPK7, an atypical MEK-related protein kinase, reflects the absence of classical threecomponent MAPK pathways in the human malaria parasite Plasmodium falciparum. Mol. Microbiol. 55, 184–186. Ersfeld, K., 2003. Genomes and genome projects of protozoan parasites. Curr. Issues Mol. Biol. 5, 61–74.

Fidock, D.A., Nomura, T., Talley, A.K., Cooper, R.A., Dzekunov, S.M., Ferdig, M.T., Ursos, L.M., Sidhu, A.B., Naude, B., Deitsch, K.W., Su, X.Z., Wootton, J.C., Roepe, P.D., Wellems, T.E., 2000. Mutations in the P. falciparum digestive vacuole transmembrane protein PfCRT and evidence for their role in chloroquine resistance. Mol. Cell 6, 861–871. Gardner, M.J., 1999. The genome of the malaria parasite. Curr. Opin. Genet. Dev. 9, 704–708. Gardner, M.J., Tettelin, H., Carucci, D.J., Cummings, L.M., Aravind, L., Koonin, E.V., Shallom, S., Mason, T., Yu, K., Fujii, C., Pederson, J., Shen, K., Jing, J.P., Aston, C., Lai, Z.W., Schwartz, D.C., Pertea, M., Salzberg, S., Zhou, L.X., Sutton, G.G., Clayton, R., White, O., Smith, H.O., Fraser, C.M., Adams, M.D., Venter, J.C., Hoffman, S.L., 1998. Chromosome 2 sequence of the human malaria parasite Plasmodium falciparum. Science 282, 1126–1132. Gardner, M.J., Hall, N., Fung, E., White, O., Berriman, M., Hyman, R.W., Carlton, J.M., Pain, A., Nelson, K.E., Bowman, S., Paulsen, I.T., James, K., Eisen, J.A., Rutherford, K., Salzberg, S.L., Craig, A., Kyes, S., Chan, M.S., Nene, V., Shallom, S.J., Suh, B., Peterson, J., Angiuoli, S., Pertea, M., Allen, J., Selengut, J., Haft, D., Mather, M.W., Vaidya, A.B., Martin, D.M., Fairlamb, A.H., Fraunholz, M.J., Roos, D.S., Ralph, S.A., McFadden, G.I., Cummings, L.M., Subramanian, G.M., Mungall, C., Venter, J.C., Carucci, D.J., Hoffman, S.L., Newbold, C., Davis, R.W., Fraser, C.M., Barrell, B., 2002a. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature 419, 498–511. Gardner, M.J., Shallom, S.J., Carlton, J.M., Salzberg, S.L., Nene, V., Shoaibi, A., Ciecko, A., Lynn, J., Rizzo, M., Weaver, B., Jarrahi, B., Brenner, M., Parvizi, B., Tallon, L., Moazzez, A., Granger, D., Fujii, C., Hansen, C., Pederson, J., Feldblyum, T., Peterson, J., Suh, B., Angiuoli, S., Pertea, M., Allen, J., Selengut, J., White, O., Cummings, L.M., Smith, H.O., Adams, M.D., Venter, J.C., Carucci, D.J., Hoffman, S.L., Fraser, C.M., 2002b. Sequence of Plasmodium falciparum chromosomes 2, 10, 11 and 14. Nature 419, 531–534. Ghedin, E., Bringaud, F., Peterson, J., Myler, P., Berriman, M., Ivens, A., Andersson, B., Bontempi, E., Eisen, J., Angiuoli, S., Wanless, D., Von Arx, A., Murphy, L., Lennard, N., Salzberg, S., Adams, M.D., White, O., Hall, N., Stuart, K., Fraser, C.M., El-Sayed, N.M., 2004a. Gene synteny and evolution of genome architecture in trypanosomatids. Mol. Biochem. Parasitol. 134, 183–191. Ghedin, E., Wang, S., Foster, J.M., Slatko, B.E., 2004b. First sequenced genome of a parasitic nematode. Trends Parasitol. 20, 151–153. Gilinger, G., Bellofatto, V., 2001. Trypanosome spliced leader RNA genes contain the first identified RNA polymerase II gene promoter in these organisms. Nucleic Acids Res. 29, 1556–1564. Gonzalez, A., Prediger, E., Huecas, M.E., Nogueira, N., Lizardi, P.M., 1984. Minichromosomal repetitive DNA in Trypanosoma cruzi: its use in a high-sensitivity parasite detection assay. Proc. Natl Acad. Sci. USA 81, 3356–3360. Guerin, P.J., Olliaro, P., Sundar, S., Boelaert, M., Croft, S.L., Desjeux, P., Wasunna, M.K., Bryceson, A.D., 2002. Visceral leishmaniasis: current status of control, diagnosis, and treatment, and a proposed research and development agenda. Lancet Infect. Dis. 2, 494–501. Guerra-Giraldez, C., Quijada, L., Clayton, C.E., 2002. Compartmentation of enzymes in a microbody, the glycosome, is essential in Trypanosoma brucei. J. Cell. Sci. 115, 2651–2658. Guiliano, D.B., Hong, X., McKerrow, J.H., Blaxter, M.L., Oksov, Y., Liu, J., Ghedin, E., Lustigman, S., 2004. A gene family of cathepsin Llike proteases of filarial nematodes are associated with larval molting and cuticle and eggshell remodeling. Mol. Biochem. Parasitol. 136, 227–242. Gull, K., 2001. The biology of kinetoplastid parasites: insights and challenges from genomics and post-genomics. Int. J. Parasitol. 31, 443–452. Hall, N., Pain, A., Berriman, M., Churcher, C., Harris, B., Harris, D., Mungall, K., Bowman, S., Atkin, R., Baker, S., Barron, A., Brooks, K.,

R.L. Coppel, C.G. Black / International Journal for Parasitology 35 (2005) 465–479 Buckee, C.O., Burrows, C., Cherevach, I., Chillingworth, C., Chillingworth, T., Christodoulou, Z., Clark, L., Clark, R., Corton, C., Cronin, A., Davies, R., Davis, P., Dear, P., Dearden, F., Doggett, J., Feltwell, T., Goble, A., Goodhead, I., Gwilliam, R., Hamlin, N., Hance, Z., Harper, D., Hauser, H., Hornsby, T., Holroyd, S., Horrocks, P., Humphray, S., Jagels, K., James, K.D., Johnson, D., Kerhornou, A., Knights, A., Konfortov, B., Kyes, S., Larke, N., Lawson, D., Lennard, N., Line, A., Maddison, M., McLean, J., Mooney, P., Moule, S., Murphy, L., Oliver, K., Ormond, D., Price, C., Quail, M.A., Rabbinowitsch, E., Rajandream, M.A., Rutter, S., Rutherford, K.M., Sanders, M., Simmonds, M., Seeger, K., Sharp, S., Smith, R., Squares, R., Squares, S., Stevens, K., Taylor, K., Tivey, A., Unwin, L., Whitehead, S., Woodward, J., Sulston, J.E., Craig, A., Newbold, C., Barrell, B.G., 2002. Sequence of Plasmodium falciparum chromosomes 1, 3–9 and 13. Nature 419, 527–531. Hall, N., Karras, M., Raine, J.D., Carlton, J.M., Kooij, T.W., Berriman, M., Florens, L., Janssen, C.S., Pain, A., Christophides, G.K., James, K., Rutherford, K., Harris, B., Harris, D., Churcher, C., Quail, M.A., Ormond, D., Doggett, J., Trueman, H.E., Mendoza, J., Bidwell, S.L., Rajandream, M.A., Carucci, D.J., Yates 3rd., J.R., Kafatos, F.C., Janse, C.J., Barrell, B., Turner, C.M., Waters, A.P., Sinden, R.E., 2005. A comprehensive survey of the Plasmodium life cycle by genomic, transcriptomic, and proteomic analyses. Science 307, 82–86. Hannaert, V., Saavedra, E., Duffieux, F., Szikora, J.P., Rigden, D.J., Michels, P.A., Opperdoes, F.R., 2003. Plant-like traits associated with metabolism of Trypanosoma parasites. Proc. Natl Acad. Sci. USA 100, 1067–1071. Hashmi, S., Tawe, W., Lustigman, S., 2001. Caenorhabditis elegans and the study of gene function in parasites. Trends Parasitol. 17, 387–393. Henriquez, F.L., Richards, T.A., Roberts, F., McLeod, R., Roberts, C.W., 2005. The unusual mitochondrial compartment of Cryptosporidium parvum. Trends Parasitol. 21, 68–74. Hertz-Fowler, C., Berriman, M., 2004. Continuing tsetse and Trypanosoma genome sequencing projects. Trends Parasitol. 20, 308–310. Hettmann, C., Herm, A., Geiter, A., Frank, B., Schwarz, E., Soldati, T., Soldati, D., 2000. A dibasic motif in the tail of a class XIV apicomplexan myosin is an essential determinant of plasma membrane localization. Mol. Biol. Cell 11, 1385–1400. Holt, R.A., Subramanian, G.M., Halpern, A., Sutton, G.G., Charlab, R., Nusskern, D.R., Wincker, P., Clark, A.G., Ribeiro, J.M., Wides, R., Salzberg, S.L., Loftus, B., Yandell, M., Majoros, W.H., Rusch, D.B., Lai, Z., Kraft, C.L., Abril, J.F., Anthouard, V., Arensburger, P., Atkinson, P.W., Baden, H., de Berardinis, V., Baldwin, D., Benes, V., Biedler, J., Blass, C., Bolanos, R., Boscus, D., Barnstead, M., Cai, S., Center, A., Chaturverdi, K., Christophides, G.K., Chrystal, M.A., Clamp, M., Cravchik, A., Curwen, V., Dana, A., Delcher, A., Dew, I., Evans, C.A., Flanigan, M., Grundschober-Freimoser, A., Friedli, L., Gu, Z., Guan, P., Guigo, R., Hillenmeyer, M.E., Hladun, S.L., Hogan, J.R., Hong, Y.S., Hoover, J., Jaillon, O., Ke, Z., Kodira, C., Kokoza, E., Koutsos, A., Letunic, I., Levitsky, A., Liang, Y., Lin, J.J., Lobo, N.F., Lopez, J.R., Malek, J.A., McIntosh, T.C., Meister, S., Miller, J., Mobarry, C., Mongin, E., Murphy, S.D., O’Brochta, D.A., Pfannkoch, C., Qi, R., Regier, M.A., Remington, K., Shao, H., Sharakhova, M.V., Sitter, C.D., Shetty, J., Smith, T.J., Strong, R., Sun, J., Thomasova, D., Ton, L.Q., Topalis, P., Tu, Z., Unger, M.F., Walenz, B., Wang, A., Wang, J., Wang, M., Wang, X., Woodford, K.J., Wortman, J.R., Wu, M., Yao, A., Zdobnov, E.M., Zhang, H., Zhao, Q., Zhao, S., Zhu, S.C., Zhimulev, I., Coluzzi, M., della Torre, A., Roth, C.W., Louis, C., Kalush, F., Mural, R.J., Myers, E.W., Adams, M.D., Smith, H.O., Broder, S., Gardner, M.J., Fraser, C.M., Birney, E., Bork, P., Brey, P.T., Venter, J.C., Weissenbach, J., Kafatos, F.C., Collins, F.H., Hoffman, S.L., 2002. The genome sequence of the malaria mosquito Anopheles gambiae. Science 298, 129–149. Huestis, R., Fischer, K., 2001. Prediction of many new exons and introns in P. falciparum chromosome 2. Mol. Biochem. Parasitol. 118, 187–199. Hyman, R.W., Fung, E., Conway, A., Kurdi, O., Mao, J., Miranda, M.,

477

Nakao, B., Rowley, D., Tamaki, T., Wang, F., Davis, R.W., 2002. Sequence of Plasmodium falciparum chromosome 12. Nature 419, 534–537. Ilg, T., 2000. Lipophosphoglycan is not required for infection of macrophages or mice by Leishmania mexicana. Eur. Mol. Biol. Organ. J. 19, 1953–1962. Ivens, A.C., Lewis, S.M., Bagherzadeh, A., Zhang, L., Chan, H.M., Smith, D.F., 1998. A physical map of the Leishmania major Friedlin genome. Genome Res. 8, 135–145. Janssen, C.S., Barrett, M.P., Lawson, D., Quail, M.A., Harris, D., Bowman, S., Phillips, R.S., Turner, C.M., 2001. Gene discovery in Plasmodium chabaudi by genome survey sequencing. Mol. Biochem. Parasitol. 113, 251–260. Janssen, C.S., Barrett, M.P., Turner, C.M., Phillips, R.S., 2002. A large gene family for putative variant antigens shared by human and rodent malaria parasites. Proc. R. Soc. Lond. B Biol. Sci. 269, 431–436. Johnson, P.J., Kooter, J.M., Borst, P., 1987. Inactivation of transcription by UV irradiation of T. brucei provides evidence for a multicistronic transcription unit including a VSG gene. Cell 51, 273–281. Jomaa, H., Wiesner, J., Sanderbrand, S., Altincicek, B., Weidemeyer, C., Hintz, M., Turbachova, I., Eberl, M., Zeidler, J., Lichtenthaler, H.K., Soldati, D., Beck, E., 1999. Inhibitors of the nonmevalonate pathway of isoprenoid biosynthesis as antimalarial drugs. Science 285, 1573–1576. Kamath, R.S., Ahringer, J., 2003. Genome-wide RNAi screening in Caenorhabditis elegans. Methods 30, 313–321. Kamath, R.S., Fraser, A.G., Dong, Y., Poulin, G., Durbin, R., Gotta, M., Kanapin, A., Le Bot, N., Moreno, S., Sohrmann, M., Welchman, D.P., Zipperlen, P., Ahringer, J., 2003. Systematic functional analysis of the Caenorhabditis elegans genome using RNAi. Nature 421, 231–237. Kappe, S.H., Gardner, M.J., Brown, S.M., Ross, J., Matuschewski, K., Ribeiro, J.M., Adams, J.H., Quackenbush, J., Cho, J., Carucci, D.J., Hoffman, S.L., Nussenzweig, V., 2001. Exploring the transcriptome of the malaria sporozoite stage. Proc. Natl Acad. Sci. USA 98, 9895–9900. Kaviratne, M., Khan, S.M., Jarra, W., Preiser, P.R., 2002. Small variant STEVOR antigen is uniquely located within Maurer’s clefts in Plasmodium falciparum-infected red blood cells. Eukaryot. Cell 1, 926–935. Keeling, P.J., 2004. Reduction and compaction in the genome of the apicomplexan parasite Cryptosporidium parvum. Dev. Cell 6, 614–616. Kim, K., Weiss, L.M., 2004. Toxoplasma gondii: the model apicomplexan. Int. J. Parasitol. 34, 423–432. Kissinger, J.C., Roos, D.S., 2004. Getting the most out of bioinformatics resources. In: Waters, A.P., Janse, C.J. (Eds.), Malaria Parasites: Genomes and Molecular Biology. Caister Academic Press, Norfolk, pp. 65–99. Kissinger, J.C., Brunk, B.P., Crabtree, J., Fraunholz, M.J., Gajria, B., Milgram, A.J., Pearson, D.S., Schug, J., Bahl, A., Diskin, S.J., Ginsburg, H., Grant, G.R., Gupta, D., Labo, P., Li, L., Mailman, M.D., McWeeney, S.K., Whetzel, P., Stoeckert, C.J., Roos, D.S., 2002. The Plasmodium genome database. Nature 419, 490–492. Kissinger, J.C., Gajria, B., Li, L., Paulsen, I.T., Roos, D.S., 2003. ToxoDB: accessing the Toxoplasma gondii genome. Nucleic Acids Res. 31, 234– 236. Krungkrai, J., 2004. The multiple roles of the mitochondrion of the malarial parasite. Parasitology 129, 511–524. Kyes, S.A., Rowe, J.A., Kriek, N., Newbold, C.I., 1999. Rifins: a second family of clonally variant proteins expressed on the surface of red cells infected with Plasmodium falciparum. Proc. Natl Acad. Sci. USA 96, 9333–9338. Kyes, S., Horrocks, P., Newbold, C., 2001. Antigenic variation at the infected red cell surface in malaria. Ann. Rev. Microbiol. 55, 673–707. Lai, Z., Jing, J., Aston, C., Clarke, V., Apodaca, J., Dimalanta, E.T., Carucci, D.J., Gardner, M.J., Mishra, B., Anantharaman, T.S., Paxia, S., Hoffman, S.L., Craig Venter, J., Huff, E.J., Schwartz, D.C., 1999. A shotgun optical map of the entire Plasmodium falciparum genome. Nat. Genet. 23, 309–313.

478

R.L. Coppel, C.G. Black / International Journal for Parasitology 35 (2005) 465–479

Lee, S.S., Lee, R.Y., Fraser, A.G., Kamath, R.S., Ahringer, J., Ruvkun, G., 2003. A systematic RNAi screen identifies a critical role for mitochondria in C. elegans longevity. Nat. Genet. 33, 40–48. Lehane, M.J., Aksoy, S., Gibson, W., Kerhornou, A., Berriman, M., Hamilton, J., Soares, M.B., Bonaldo, M.F., Lehane, S., Hall, N., 2003. Adult midgut expressed sequence tags from the tsetse fly Glossina morsitans morsitans and expression analysis of putative immune response genes. Genome Biol. 4, R63. Lesk, A.M., 2002. Introduction to Bioinformatics. Oxford University Press, Oxford. Lettre, G., Kritikou, E.A., Jaeggi, M., Calixto, A., Fraser, A.G., Kamath, R.S., Ahringer, J., Hengartner, M.O., 2004. Genome-wide RNAi identifies p53-dependent and -independent regulators of germ cell apoptosis in C. elegans. Cell Death Differ. 11, 1198–1203. Li, L., Brunk, B.P., Kissinger, J.C., Pape, D., Tang, K., Cole, R.H., Martin, J., Wylie, T., Dante, M., Fogarty, S.J., Howe, D.K., Liberator, P., Diaz, C., Anderson, J., White, M., Jerome, M.E., Johnson, E.A., Radke, J.A., Stoeckert Jr.., C.J., Waterston, R.H., Clifton, S.W., Roos, D.S., Sibley, L.D., 2003. Gene discovery in the apicomplexa as revealed by EST sequencing and assembly of a comparative gene database. Genome Res. 13, 443–454. Martinez-Calvillo, S., Yan, S., Nguyen, D., Fox, M., Stuart, K., Myler, P.J., 2003. Transcription of Leishmania major Friedlin chromosome 1 initiates in both directions within a single region. Mol. Cell 11, 1291–1299. Martinez-Calvillo, S., Nguyen, D., Stuart, K., Myler, P.J., 2004. Transcription initiation and termination on Leishmania major chromosome 3. Eukaryot. Cell 3, 506–517. McReynolds, L.A., DeSimone, S.M., Williams, S.A., 1986. Cloning and comparison of repeated DNA sequences from the human filarial parasite Brugia malayi and the animal parasite Brugia pahangi. Proc. Natl Acad. Sci. USA 83, 797–801. McRobert, L., Preiser, P., Sharp, S., Jarra, W., Kaviratne, M., Taylor, M.C., Renia, L., Sutherland, C.J., 2004. Distinct trafficking and localization of STEVOR proteins in three stages of the Plasmodium falciparum life cycle. Infect. Immun. 72, 6597–6602. Melville, S.E., Leech, V., Gerrard, C.S., Tait, A., Blackwell, J.M., 1998. The molecular karyotype of the megabase chromosomes of Trypanosoma brucei and the assignment of chromosome markers. Mol. Biochem. Parasitol. 94, 155–173. Melville, S.E., Leech, V., Navarro, M., Cross, G.A., 2000. The molecular karyotype of the megabase chromosomes of Trypanosoma brucei stock 427. Mol. Biochem. Parasitol. 111, 261–273. Militello, K.T., Dodge, M., Bethke, L., Wirth, D.F., 2004. Identification of regulatory elements in the Plasmodium falciparum genome. Mol. Biochem. Parasitol. 134, 75–88. Miller, S.K., Good, R.T., Drew, D.R., Delorenzi, M., Sanders, P.R., Hodder, A.N., Speed, T.P., Cowman, A.F., de Koning-Ward, T.F., Crabb, B.S., 2002. A subset of Plasmodium falciparum SERA genes are expressed and appear to play an important role in the erythrocytic cycle. J. Biol. Chem. 277, 47524–47532. Morgan-Ryan, U.M., Fall, A., Ward, L.A., Hijjawi, N., Sulaiman, I., Fayer, R., Thompson, R.C., Olson, M., Lal, A., Xiao, L., 2002. Cryptosporidium hominis n. sp. (Apicomplexa: Cryptosporidiidae) from Homo sapiens. J. Eukaryot. Microbiol. 49, 433–440. Myler, P.J., Audleman, L., deVos, T., Hixson, G., Kiser, P., Lemley, C., Magness, C., Rickel, E., Sisk, E., Sunkin, S., Swartzell, S., Westlake, T., Bastien, P., Fu, G., Ivens, A., Stuart, K., 1999. Leishmania major Friedlin chromosome 1 has an unusual distribution of protein-coding genes. Proc. Natl Acad. Sci. USA 96, 2902–2906. Myler, P.J., Sisk, E., McDonagh, P.D., Martinez-Calvillo, S., Schnaufer, A., Sunkin, S.M., Yan, S., Madhubala, R., Ivens, A., Stuart, K., 2000. Genomic organization and gene function in Leishmania. Biochem. Soc. Trans. 28, 527–531. Myler, P.J., Beverley, S.M., Cruz, A.K., Dobson, D.E., Ivens, A.C., McDonagh, P.D., Madhubala, R., Martinez-Calvillo, S., Ruiz, J.C., Saxena, A., Sisk, E., Sunkin, S.M., Worthey, E., Yan, S., Stuart, K.D.,

2001. The Leishmania genome project: new insights into gene organization and function. Med. Microbiol. Immunol. (Berl.) 190, 9–12. Myung, K.S., Beetham, J.K., Wilson, M.E., Donelson, J.E., 2002. Comparison of the post-transcriptional regulation of the mRNAs for the surface proteins PSA (GP46) and MSP (GP63) of Leishmania chagasi. J. Biol. Chem. 277, 16489–16497. Parsons, M., Nelson, R.G., Watkins, K.P., Agabian, N., 1984. Trypanosome mRNAs share a common 5 0 spliced leader sequence. Cell 38, 309–316. Pays, E., Vanhamme, L., Perez-Morga, D., 2004. Antigenic variation in Trypanosoma brucei: facts, challenges and mysteries. Curr. Opin. Microbiol. 7, 369–374. Pevsner, J., 2003. Bioinformatics and Functional Genomics. Wiley, New York, NY. Pinder, J.C., Fowler, R.E., Dluzewski, A.R., Bannister, L.H., Lavin, F.M., Mitchell, G.H., Wilson, R.J., Gratzer, W.B., 1998. Actomyosin motor in the merozoite of the malaria parasite, Plasmodium falciparum: implications for red cell invasion. J. Cell Sci. 111, 1831–1839. Pollack, Y., Katzen, A.L., Spira, D.T., Golenser, J., 1982. The genome of Plasmodium falciparum. I: DNA base composition. Nucleic Acids Res. 10, 539–546. Pothof, J., van Haaften, G., Thijssen, K., Kamath, R.S., Fraser, A.G., Ahringer, J., Plasterk, R.H., Tijsterman, M., 2003. Identification of genes that protect the C. elegans genome against mutations by genomewide RNAi. Genes Dev. 17, 443–448. Puiu, D., Enomoto, S., Buck, G.A., Abrahamsen, M.S., Kissinger, J.C., 2004. CryptoDB: the Cryptosporidium genome resource. Nucleic Acids Res. 32, D329–D331. Putignani, L., Tait, A., Smith, H.V., Horner, D., Tovar, J., Tetley, L., Wastling, J.M., 2004. Characterization of a mitochondrion-like organelle in Cryptosporidium parvum. Parasitology 129, 1–18. Quackenbush, J., Cho, J., Lee, D., Liang, F., Holt, I., Karamycheva, S., Parvizi, B., Pertea, G., Sultana, R., White, J., 2001. The TIGR gene indices: analysis of gene transcript sequences in highly sampled eukaryotic species. Nucleic Acids Res. 29, 159–164. Rasti, N., Wahlgren, M., Chen, Q., 2004. Molecular aspects of malaria pathogenesis. Fed. Eur. Microbiol. Soc. Immunol. Med. Microbiol. 41, 9–26. Requena, J.M., Lopez, M.C., Alonso, C., 1996. Genomic repetitive DNA elements of Trypanosoma cruzi. Parasitol. Today 12, 279–283. Roos, D.S., Donald, R.G., Morrissette, N.S., Moulton, A.L., 1994. Molecular tools for genetic dissection of the protozoan parasite Toxoplasma gondii. Methods Cell Biol. 45, 27–63. Rudenko, G., Blundell, P.A., Dirks-Mulder, A., Kieft, R., Borst, P., 1995. A ribosomal DNA promoter replacing the promoter of a telomeric VSG gene expression site can be efficiently switched on and off in T. brucei. Cell 83, 547–553. Seiwert, S.D., 1995. The ins and outs of editing RNA in kinetoplastids. Parasitol. Today 11, 362–368. Sibley, L.D., Boothroyd, J.C., 1992. Construction of a molecular karyotype for Toxoplasma gondii. Mol. Biochem. Parasitol. 51, 291–300. Simmer, F., Moorman, C., van der Linden, A.M., Kuijk, E., van den Berghe, P.V., Kamath, R.S., Fraser, A.G., Ahringer, J., Plasterk, R.H., 2003. Genome-wide RNAi of C. elegans using the hypersensitive rrf-3 strain reveals novel gene functions. PLoS Biol. 1, E12. Soldati, D., Dubremetz, J.F., Lebrun, M., 2001. Microneme proteins: structural and functional requirements to promote adhesion and invasion by the apicomplexan parasite Toxoplasma gondii. Int. J. Parasitol. 31, 1293–1302. Spath, G.F., Epstein, L., Leader, B., Singer, S.M., Avila, H.A., Turco, S.J., Beverley, S.M., 2000. Lipophosphoglycan is a virulence factor distinct from related glycoconjugates in the protozoan parasite Leishmania major. Proc. Natl Acad. Sci. USA 97, 9258–9263. Striepen, B., Kissinger, J.C., 2004. Genomics meets transgenics in search of the elusive Cryptosporidium drug target. Trends Parasitol. 20, 355–358.

R.L. Coppel, C.G. Black / International Journal for Parasitology 35 (2005) 465–479 Striepen, B., Pruijssers, A.J., Huang, J., Li, C., Gubbels, M.J., Umejiego, N.N., Hedstrom, L., Kissinger, J.C., 2004. Gene transfer in the evolution of parasite nucleotide biosynthesis. Proc. Natl Acad. Sci. USA 101, 3154–3159. Templeton, T.J., Iyer, L.M., Anantharaman, V., Enomoto, S., Abrahante, J.E., Subramanian, G.M., Hoffman, S.L., Abrahamsen, M.S., Aravind, L., 2004. Comparative analysis of apicomplexa and genomic diversity in eukaryotes. Genome Res. 14, 1686–1695. Tetley, L., Brown, S.M., McDonald, V., Coombs, G.H., 1998. Ultrastructural analysis of the sporozoite of Cryptosporidium parvum. Microbiology 144, 3249–3255. The C. elegans Sequencing Consortium, 1998. Genome sequence of the nematode C.elegans: a platform for investigating biology. Science 282, 2012–2018. Thompson, J., Janse, C.J., Waters, A.P., 2001. Comparative genomics in Plasmodium: a tool for the identification of genes and functional analysis. Mol. Biochem. Parasitol. 118, 147–154. Topolska, A.E., Wang, L., Black, C.G., Coppel, R.L., 2004. Merozoite cell biology. In: Waters, A.P., Janse, C.J. (Eds.), Malaria Parasites: Genomes and Molecular Biology. Caister Academic Press, Norfolk, pp. 365–444. Turco, S.J., Spath, G.F., Beverley, S.M., 2001. Is lipophosphoglycan a virulence factor? A surprising diversity between Leishmania species. Trends Parasitol. 17, 223–226. Vanhamme, L., Pays, E., McCulloch, R., Barry, J.D., 2001. An update on antigenic variation in African trypanosomes. Trends Parasitol. 17, 338–343. Vastenhouw, N.L., Fischer, S.E., Robert, V.J., Thijssen, K.L., Fraser, A.G., Kamath, R.S., Ahringer, J., Plasterk, R.H., 2003. A genome-wide screen identifies 27 genes involved in transposon silencing in C. elegans. Curr. Biol. 13, 1311–1316. Waller, R.F., McFadden, G.I., 2005. The apicoplast: a review of the derived plastid of apicomplexan parasites. Curr. Issues Mol. Biol. 7, 57–79. Waller, R.F., Keeling, P.J., Donald, R.G., Striepen, B., Handman, E., LangUnnasch, N., Cowman, A.F., Besra, G.S., Roos, D.S., McFadden, G.I., 1998. Nuclear-encoded proteins target to the plastid in Toxoplasma gondii and Plasmodium falciparum. Proc. Natl Acad. Sci. USA 95, 12352–12357. Waters, A.P., 2002. Orthology between the genomes of Plasmodium falciparum and rodent malaria parasites: possible practical applications. Philos. Trans. R. Soc. Lond. B Biol. Sci. 357, 55–63. Wickstead, B., Ersfeld, K., Gull, K., 2004. The small chromosomes of Trypanosoma brucei involved in antigenic variation are constructed around repetitive palindromes. Genome Res. 14, 1014–1024. Williams, B.A., Keeling, P.J., 2003. Cryptic organelles in parasitic protists and fungi. Adv. Parasitol. 54, 9–68.

479

Wilson, R.J., Denny, P.W., Preiser, P.R., Rangachari, K., Roberts, K., Roy, A., Whyte, A., Strath, M., Moore, D.J., Moore, P.W., Williamson, D.H., 1996. Complete gene map of the plastid-like DNA of the malaria parasite Plasmodium falciparum. J. Mol. Biol. 261, 155– 172. Wincker, P., Ravel, C., Blaineau, C., Pages, M., Jauffret, Y., Dedet, J.P., Bastien, P., 1996. The Leishmania genome comprises 36 chromosomes conserved across widely divergent human pathogenic species. Nucleic Acids Res. 24, 1688–1694. Worthey, E.A., Martinez-Calvillo, S., Schnaufer, A., Aggarwal, G., Cawthra, J., Fazelinia, G., Fong, C., Fu, G., Hassebrock, M., Hixson, G., Ivens, A.C., Kiser, P., Marsolini, F., Rickell, E., Salavati, R., Sisk, E., Sunkin, S.M., Stuart, K.D., Myler, P.J., 2003. Leishmania major chromosome 3 contains two long convergent polycistronic gene clusters separated by a tRNA gene. Nucleic Acids Res. 31, 4201–4210. Xu, P., Widmer, G., Wang, Y., Ozaki, L.S., Alves, J.M., Serrano, M.G., Puiu, D., Manque, P., Akiyoshi, D., Mackey, A.J., Pearson, W.R., Dear, P.H., Bankier, A.T., Peterson, D.L., Abrahamsen, M.S., Kapur, V., Tzipori, S., Buck, G.A., 2004a. The genome of Cryptosporidium hominis. Nature 431, 1107–1112. Xu, P., Widmer, G., Wang, Y., Ozaki, L.S., Alves, J.M., Serrano, M.G., Puiu, D., Manque, P., Akiyoshi, D., Mackey, A.J., Pearson, W.R., Dear, P.H., Bankier, A.T., Peterson, D.L., Abrahamsen, M.S., Kapur, V., Tzipori, S., Buck, G.A., 2004b. Corrigendum: the genome of Cryptosporidium hominis. Nature 432, 415. Yan, S., Lodes, M.J., Fox, M., Myler, P.J., Stuart, K., 1999. Characterization of the Leishmania donovani ribosomal RNA promoter. Mol. Biochem. Parasitol. 103, 197–210. Zhang, Y., Foster, J.M., Kumar, S., Fougere, M., Carlow, C.K., 2004. Cofactor-independent phosphoglycerate mutase has an essential role in Caenorhabditis elegans and is conserved in parasitic nematodes. J. Biol. Chem. 279, 37185–37190. Zhou, S., Kile, A., Kvikstad, E., Bechner, M., Severin, J., Forrest, D., Runnheim, R., Churas, C., Anantharaman, T.S., Myler, P., Vogt, C., Ivens, A., Stuart, K., Schwartz, D.C., 2004. Shotgun optical mapping of the entire Leishmania major Friedlin genome. Mol. Biochem. Parasitol. 138, 97–106. Zhu, G., Marchewka, M.J., Keithly, J.S., 2000. Cryptosporidium parvum appears to lack a plastid genome. Microbiology 146, 315–321. Zomerdijk, J.C., Kieft, R., Shiels, P.G., Borst, P., 1991. Alpha-amanitinresistant transcription units in trypanosomes: a comparison of promoter sequences for a VSG gene expression site and for the ribosomal RNA genes. Nucleic Acids Res. 19, 5153–5158.