Gene family phylogeny and the evolution of parasite cell surfaces

Gene family phylogeny and the evolution of parasite cell surfaces

G Model ARTICLE IN PRESS MOLBIO-10970; No. of Pages 12 Molecular & Biochemical Parasitology xxx (2016) xxx–xxx Contents lists available at Science...

3MB Sizes 5 Downloads 123 Views

G Model

ARTICLE IN PRESS

MOLBIO-10970; No. of Pages 12

Molecular & Biochemical Parasitology xxx (2016) xxx–xxx

Contents lists available at ScienceDirect

Molecular & Biochemical Parasitology

Gene family phylogeny and the evolution of parasite cell surfaces Andrew P. Jackson Department of Infection Biology, Institute of Infection and Global Health, University of Liverpool, Liverpool Science Park Ic2, 146 Brownlow Hill, Liverpool L3 5RF, UK

a r t i c l e

i n f o

Article history: Received 25 January 2016 Received in revised form 18 March 2016 Accepted 19 March 2016 Available online xxx Keywords: Cell-surface Gene family Phylogeny Orthology Paralogy Protolog

a b s t r a c t Parasite genomes typically contain unique contingency gene families encoding multi-copy effector proteins that are often expressed abundantly on the parasite cell surface and beyond. The functions of these gene families are incompletely understood but it is clear that they perform fundamental roles at the host-parasite interface. Over evolutionary timescales, the evolution of these gene families is likely to have decisive effects on the pathology and virulence of parasitic infections. In this review, I will compare the evolutionary dynamics of multiple examples from trypanosomatids and apicomplexan parasites to demonstrate how their inherent mutability makes their phylogeny very different to ‘normal’ gene families. I will argue that phylogenetic analyses could help to understand the functions of these enigmatic genes. © 2016 Elsevier B.V. All rights reserved.

1. Introduction Comparison of anything reveals similarities and differences. Comparison of genomes invariably shows us features that change rapidly in form and quantity, features that hardly change at all over immense time, and indeed, everything in between. The most changeable features of unicellular parasite genomes consistently pertain to cell surfaces, or to the secretory realm beyond. Hence, while trypanosomatid parasites (Trypanosoma, Leishmania, Leptomonas and others) display a common physiology and ultrastructure, their cell surfaces molecules are lineage-specific and mutually exclusive [1–3]. Similarly, apicomplexan parasites (Plasmodium, Babesia, Eimeria, Toxoplasma and others) share an underlying ultrastructure and developmental regimen that is reflected in their genome content [4], but their cell surface architectures are so distinct that a common ancestral structure cannot currently be imagined [5,6]. The genomes of other unicellular parasites such as the plant-pathogenic Phytophthora spp. [7,8] and Entamoeba spp. [9] continue this trend. Rapid evolutionary change of these genes is intuitive. The cell surface and its immediate environs comprise the host-parasite interface, and cell surface-expressed gene families have strong associations with pathology and virulence. No other compartment is subject to such powerful co-evolutionary pressures. This is reflected in the structural diversity of cell surface gene families,

E-mail address: [email protected]

often called contingency gene families [10] because precise regulation of the expression of structural isoforms allows pathogens to respond flexibly to diverse environmental pressures. To this we might also add the idea of contingency regions of genomes, e.g. sub-telomeres. These are typically outside of regular chromosomal cores, and house contingency gene loci in conditions that facilitate their specialized (often irregular) expression [4,11]. The true scale of parasite contingency gene families has only become apparent in the genome sequencing era. Plasmodium genomes can contain hundreds of pir genes [12,13]; African trypanosomes can have in excess of 2500 VSG genes in their genome [14–17]. In parasites with complex life cycles, multi-copy families are often developmentally regulated. However, it generally remains the case that we do not understand all (or any) of the functions they perform. In this review, I will present a series of phylogenies for parasite cell-surface gene families of trypanosomatids and apicomplexan parasites that reflect the diversity in evolutionary dynamics that exist. I will show how phylogenetic analysis of these genes can help to understand their enigmatic origins and the processes that regulate their conspicuous diversity. I will also advance the view that phylogenetics provides an experimental rationale for determining gene functions. Deciding which paralog(s) to manipulate in an experiment is a difficult but crucial question when dealing with large and diverse families. Phylogenetic analysis identifies differences in evolutionary dynamic among paralogous loci, the simplest being whether they are conserved or species-specific. It can identify the locus most likely to

http://dx.doi.org/10.1016/j.molbiopara.2016.03.007 0166-6851/© 2016 Elsevier B.V. All rights reserved.

Please cite this article in press as: A.P. Jackson, Gene family phylogeny and the evolution of parasite cell surfaces, Mol Biochem Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.03.007

G Model MOLBIO-10970; No. of Pages 12 2

ARTICLE IN PRESS A.P. Jackson / Molecular & Biochemical Parasitology xxx (2016) xxx–xxx

Fig. 1. Homology and the definition of a protolog. Each phylogeny considers four species (A–D) that share a gene family with four sub-families and an ancestral locus (the ‘protolog’; boxed) that may or may not be present in an outgroup (‘OG’). (a) In the absence of gene duplication or recombination between paralogs, clades have widespread distributions, consisting of orthologs of each sub-family from each species. (b) With gene duplication within species but no recombination, clades are still widespread, but consist of several conspecific copies of each sub-family, themselves forming clades that are co-orthologous. (c) Under conditions of concerted evolution, (i.e. rapid turnover due to gene loss or conversion), there is loss of orthology and clades consist of paralogs from a single species. A single-copy protolog may be present.

represent the origin of a family, as well as distinct sub-families that evolving under different selective environments. My contention is that such discontinuities are caused by functional differences and that, in the absence of a full understanding of functions a priori, the cladistic structure can guide our decisions as to which genes should be knocked out to expose the functional consequences of a particular evolutionary event, and which other genes we should employ in rescuing gene function to test hypotheses of redundancy or functional differentiation. 2. Cell-surface gene family phylogenetics Given that parasite cell-surface gene families are known for their mutability and variation, phylogenetic analysis within and between species is crucial to understanding their biology. Interspecific comparisons seek to distinguish orthologs and paralogs. Orthologs are homologous gene copies present in different species and descended from a common ancestor. They generally retain the same genomic position in related species. By contrast, paralogs are descendants of a gene duplication event either in the same genome or an ancestor, often associated with the creation of a new locus in a different genomic context. Orthologs are thought more likely to maintain the same function in different species [18]; therefore, by segregating gene families in to orthologous clades we begin the task of understanding functional evolution. All else being equal, we would expect orthologs to cluster together in a phylogeny. This is shown in Fig. 1(a) for a gene family in four species (A–D) consisting of four loci (plus a related but divergent locus that does not belong to the family, i.e. an ‘outgroup’). Each gene is present in each species, leading to clades of orthologs in the phylogeny. If recent gene duplications have occurred, we may instead see each species represented by a clade of paralogs (Fig. 1(b)), but these clades remain most closely related to paralogs in a different species (i.e. they are co-orthologs).

In the phylogenies of many cell-surface gene families concerted evolution complicates this simple distinction between orthologs and paralogs. Concerted evolution describes how, for a set of species each possessing a multi-copy gene family inherited from their ancestor, the copies in each species are more closely related to each other than they are to homologs in other species. We may think of concerted evolution occurring as ancestral sequence types (which retain the signature of orthology) are gradually replaced by recently derived sequences [19] (e.g. Fig. 1(c)). This may happen due to high gene turnover (i.e. rapid and random duplication and loss of gene copies) or gene conversion, whereby gene sequences are ‘overwritten’ by homologous donor sequences during the repair of DNA strand breaks [20,21]. Under these circumstances we observe loss of orthology. A literal interpretation of Fig. 1(c) would be that all gene copies had emerged after speciation. However, as we typically have good biological reasons for thinking that the ancestral state was similar to the derived states, we do not interpret this literally, but as evidence for rapid gene turnover. Multi-copy gene families and concerted evolution are not unique to parasites; nevertheless, they are a consistent feature of parasite genomes [7,16,22–24]. Besides the many paralogs of recent origin, these families often include rare members, which I will refer to as ‘protologs’ because they are the most likely to represent the ancestral state, prior to the elaboration of a gene family. Putative protologs usually have atypical structures and are located outside of the contingency region, (e.g. in a chromosome-internal locus). They are typically present as orthologs in multiple species both parasitic and free-living, and branch closest to the root in the phylogeny. Together, these various properties point to their ancient origins. Experimental approaches to the evolution of gene function clearly need to address a protolog, if there is one, in comparison with the derived functions of parasite-specific genes.

Please cite this article in press as: A.P. Jackson, Gene family phylogeny and the evolution of parasite cell surfaces, Mol Biochem Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.03.007

G Model MOLBIO-10970; No. of Pages 12

ARTICLE IN PRESS A.P. Jackson / Molecular & Biochemical Parasitology xxx (2016) xxx–xxx

3

Fig. 2. Maximum likelihood phylogeny of b-type Variant Surface Glycoprotein-like sequences in African trypanosomes, estimated from amino acid sequences using a LG +  model. The phylogeny is unrooted. Clades are labelled by species and sub-family, as identified previously [Jackson et al. [16]]. T. congolense has two distinct b-VSG clades, previously referred to as ‘Fam13 and ‘Fam16 [Jackson et al. [35]]. T. brucei sequences may be split between those encoding canonical variant antigens ‘b-type VSG’), those encoding invariant, VSG-related (‘VR’) proteins and ‘ESAG2 (enclosed by a dashed line among ‘Fam13 ). Bootstrap values are shown below subtending selected nodes.

3. Loss of orthology in contingency gene families The phylogenies of contingency gene families display evolutionary dynamics that are profoundly abnormal, with a complete lack of orthology between genes in different parasite species. One example is the phylogeny of Variant Surface Glycoproteins (VSG) from three African trypanosome species, shown in Fig. 2. VSG are expressed as a continuous monolayer across the surface of African trypanosomes (Trypanosoma spp.) during their bloodstream (vertebrate) stage. While African trypanosome genomes contain hundreds of VSG genes [14–16], only a single VSG is expressed at any one time. Through sequential replacement of the VSG coat, their function is to evade the immune response through antigenic variation, ultimately resulting in recurrent parasitaemia and chronic infection [25,26]. The active VSG is only transcribed from one of several telomeric expression sites that have an independent promoter [27]. Transposition of an inactive VSG into the expression site through biased gene conversion and/or epigenetic control of expression sites, is thought to cause antigenic switching [25,28]. In Fig. 2, VSG cluster by species; sequences from different trypanosomes do not mix, as they would if there was orthology between VSG in different species. Analyses of Trypanosoma brucei VSG diversity during infections and between strains have indicated that gene conversion is the principal mechanism of sequence evolution [16,29–31]. It is required not only for antigenic switching but also for the assembly of novel antigens during the chronic stages of infection. The T. brucei genome is adapted to promote ectopic recombination among VSG through the conservation of repetitive motifs immediately up- (i.e. the 70 bp repeat sequence) and downstream (i.e. the cysteine-rich C-terminal domain) of VSG variable domains [25,28]. Hence, we can imagine how ancestral genes shared between trypanosome species can quickly become ‘overwritten’ by this homogenizing effect after speciation. The fitness benefits of

sequence evolution relate to antigenic diversity rather than any particular structure, meaning that VSG are functional redundant. Opposition to concerted evolution only arises when new functions are derived, creating negative fitness consequences for their homogenization. One example is ESAG2 in T. brucei, shown in Fig. 2. ESAG2 is an invariant VSG sub-family, located within the VSG expression site; the ESAG2 protein localizes to the cell body during the bloodstream stage [32]. Its precise function is unknown but it does not appear to be a variant antigen. Irrespective of its new function, ESAG2 clusters with VSG from T. congolense rather than T. brucei, indicating that it is an ancestral-type sequence that has resisted the homogenizing forces that have led to concerted evolution of T. brucei VSG generally [16]. The transferrin receptor (TFR) of T. brucei and T. congolense provides another example of neofunctionalization of VSG, albeit with an older origin [33]. This gene encodes a membrane transporter required for the uptake of transferrin, which provides the parasite with haem [34]. There are multiple copies of TFR in both T. brucei and T. congolense but in phylogenetic analyses, they cluster tightly together and distinct from related VSG sequences [16,35], demonstrating that they retain co-orthology. Like ESAG2, TFR phylogeny shows how a gene can ‘escape’ the concerted evolution typical of VSG, if it adopts a new function. A similar pattern to VSG can be seen in the phylogeny of another variant antigen family (ves1) that encode the Variant Erythrocyte Surface Antigen (VESA) in the apicomplexan hemoparasites Babesia spp. VESA is a heterodimeric protein encoded by the multi-copy ves1˛ and ves1ˇ gene families in Babesia bovis [36,37], with similar situations in other Babesia spp. Like VSG, Ves1 genes are expressed from specific sites (i.e. Loci of Active Transcription (LAT)) to ensure mono-allelic expression [38–40]. In a comparison of ves1-like genes in B. bovis, B. bigemina and B. divergens, concerted evolution was found to be complete among ves1 [23]. In addition, the Babesia comparison revealed truncated forms of ves1 in all three species, named ves2 [23], which have been independently derived from

Please cite this article in press as: A.P. Jackson, Gene family phylogeny and the evolution of parasite cell surfaces, Mol Biochem Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.03.007

G Model MOLBIO-10970; No. of Pages 12 4

ARTICLE IN PRESS A.P. Jackson / Molecular & Biochemical Parasitology xxx (2016) xxx–xxx

Fig. 3. Plasmodium Interspersed Repeat (pir) evolution (a) Maximum likelihood phylogeny of PIR protein sequences in Plasmodium spp., estimated using a LG +  model. The phylogeny is unrooted. Terminal nodes are labelled by species as shown in the key. Bootstrap values are shown below subtending selected nodes. (b) Relationship between genetic distance between Plasmodium species shown in (a) based on glyceraldehyde-3-phosphate dehydrogenase (GAPDH) nucleotide sequences and the frequency of orthology (two heterospecific sequences as sister taxa) or paraphyly (a sequence nested within a clade of heterospecific homologs). (c) A clade containing orthologs for an atypical pir sequence type (i.e. vir-D; Neafsey et al. [65]).

Please cite this article in press as: A.P. Jackson, Gene family phylogeny and the evolution of parasite cell surfaces, Mol Biochem Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.03.007

G Model MOLBIO-10970; No. of Pages 12

ARTICLE IN PRESS A.P. Jackson / Molecular & Biochemical Parasitology xxx (2016) xxx–xxx

5

Fig. 4. Phylogeny of nucleoside transporter protein sequences in (a) apicomplexan parasites and (b) trypanosomatids, estimated using a LG +  model. Clades of orthologous or co-orthologous sequences are enclosed by shaded boxes. Nodes corresponding to basal gene duplication events are marked by black stars. Putative protologs are enclosed by dashed lines. The positions of four nucleoside transporter paralogs in L. major (NT1-4, see text) are noted. The trees are rooted using free-living outgroups to trypanosomatids (Bodo saltans) and apicomplexan parasites (Vitrella brassicaformis), which are written in bold. Bootstrap values are shown below subtending selected nodes.

the N-terminal domain of ves1 and are perhaps secreted [36,37]. Whatever their novel role is, ves2 represent structural derivations of ves1 that have subsequently remained distinct and resisted the concerted evolution affecting ves1. I would argue that loss of orthology among VSG and ves1 genes is due to their functional redundancy, and selection for antigenic diversity rather than any particular antigenic structure. Furthermore, adaptations for the promotion of sequence diversity ensure that contingency gene families are driven towards concerted evolution. In this respect, both families evoke the var genes of Plasmodium falciparum, which encode the PfEMP1 proteins responsible for parasite sequestration [reviewed by Ref. [41]], the antigenic variation and hyper-variability of which described elsewhere [42–46]. Among var genes one particular paralog var2csa stands out for its functional and evolutionary properties. VAR2CSA was first recognized as being uniquely expressed by P. falciparum infecting pregnant women, where it binds the placental specific ligand chrondroitin sulfate A [47]. It was also noted early that var2csa belongs to an unusual subfamily of var that do not appear to be antigenically-variant and are well conserved in all P. falciparum strains [48], indeed in related Plasmodium species [49,50]. Recent research has shown that var2csa is likely to provide more than a specific PfEMP1 isoform for malaria in pregnancy, since it is transcribed by all individuals [51–53], though VAR2CSA is seldom expressed [54,55]. In fact, Ukaegbu et al. [56] show that there may be another reason for the exceptional conservation of var2csa. They show that transcription of var2csa is a response to changes in the regulation of other var, and indeed var2csa occupies a pivotal position in the antigenic switching cascade in P. falciparum [56]. This is supportive of hypotheses that var2csa has been conserved as a key

regulator of var switching [48] and protected from the homogenizing effect that otherwise typifies the var repertoire. While they are paramount in P. falciparum infections, most Plasmodium genomes lack var genes; instead, the Plasmodium interspersed repeats (pir) is the largest multi-gene family [22,57–59]. PIR are transmembrane proteins exported by the parasite on, or close, to the surface of the infected red blood cell, or dispersed in the host cell cytoplasm [60–62]. Their function is unknown, but they do not appear to be variant antigens in the sense of VSG, ves1 or var, since many pir are expressed simultaneously and may be functionally non-redundant to some extent [13,63,64]. The pir phylogeny, shown in Fig. 3(a), has some now familiar features. Sequences cluster primarily by species, or closely related species groups. However, unlike the VSG and ves1 trees, there is overlap of pir sequences from different rodent-infecting Plasmodium (i.e. P. berghei, P. yoelli and P. chabaudi), and also among P. vivax, P. cynomolgi and P. knowlesi. The presence of clades that are taxonomically widespread demonstrates that concerted evolution is not complete and suggests that pir paralogs are sufficiently functionally differentiated that negative selection preserves certain distinct structures. Recent evidence suggests that pir are indeed functionally differentiated, PIR are expressed in distinct sub-cellular compartments of the red blood cells infected by P. vivax [61] and in distinct developmental life stages of P. berghei [13]. Nonetheless, there is complete loss of orthology when less-related genomes are compared. The proportion of orthologs can be seen to decline steeply with genetic divergence of Plasmodium species (Fig. 3(b)). Like all foregoing examples, pir sequences offer abundant evidence for recombination that would provide a mechanism for concerted evolution [13,64]. Hence, the impression given by pir phylogeny is of a slower pace of concerted evolution than previous examples,

Please cite this article in press as: A.P. Jackson, Gene family phylogeny and the evolution of parasite cell surfaces, Mol Biochem Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.03.007

G Model MOLBIO-10970; No. of Pages 12 6

ARTICLE IN PRESS A.P. Jackson / Molecular & Biochemical Parasitology xxx (2016) xxx–xxx

Fig. 5. Phylogeny of Major Surface Protease (MSP) protein sequences in trypanosomatid parasites, estimated using a LG +  model. The tree is rooted using Bodo saltans. Clades are labelled by sub-family at right [see Yao [78]]. Terminal nodes are labelled by species as shown in the key. Bootstrap values are shown below subtending selected nodes. Putative protologs (MSP-D) are enclosed by dashed lines.

Please cite this article in press as: A.P. Jackson, Gene family phylogeny and the evolution of parasite cell surfaces, Mol Biochem Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.03.007

G Model MOLBIO-10970; No. of Pages 12

ARTICLE IN PRESS A.P. Jackson / Molecular & Biochemical Parasitology xxx (2016) xxx–xxx

7

Fig. 6. Phylogeny of amastin protein sequences in trypanosomatid parasites, estimated using a LG +  model. The tree is rooted using Bodo saltans. Clades are labelled by sub-family at right [see Jackson [100]]. Terminal nodes are labelled by species as shown in the key. Bootstrap values are shown below subtending selected nodes. Putative protologs (␣-amastin) are enclosed by dashed lines.

slowed perhaps by some degree of functional differentiation due to developmental regulation, but ultimately with the same result. Finally, the pir phylogeny once again offers an anomaly to the inexorable concerted evolution. One pir isoform, first identified as vir-D in P. vivax [65,66], is often present in single-copy, and like vsa2csa it retains orthology across Plasmodium species (Fig. 3(c)). The function of this exceptional gene is not yet known but its position within the phylogeny is not inconsistent with a protolog.

4. Orthology and functional differentiation Typically, gene duplications are selected for because paralogs acquire novel roles or subdivide existing functions [67]. Most parasite gene families, as in all organisms, consist of structurally distinct paralogs with subtly different roles that evolved long before the origins of contemporary species (i.e. Fig. 1(a)). Concerted evolution in this situation is highly unlikely because the sequences are too divergent for gene conversion to operate, and because, even if it were to,

Please cite this article in press as: A.P. Jackson, Gene family phylogeny and the evolution of parasite cell surfaces, Mol Biochem Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.03.007

G Model MOLBIO-10970; No. of Pages 12 8

ARTICLE IN PRESS A.P. Jackson / Molecular & Biochemical Parasitology xxx (2016) xxx–xxx

Fig. 7. Phylogeny of Serine Repeat Antigen (SERA) protein sequences in Plasmodium spp., estimated using a LG +  model. The tree is rooted using Theileria annulata (TA10955). Clades are labelled by sub-family at right [see Arisue et al. [105]]. Terminal nodes are labelled by species as shown in the key. The positions of nine SERA paralogs in P. falciparum are noted; SERA9 is P. falciparum specific and additional to the SERA tandem array (see text). Bootstrap values are shown below subtending selected nodes. Putative protologs (SERA8/Clade I) are enclosed by dashed lines.

purifying selection would eliminate any structural changes (including homogenisation) that undermined their distinct functions. Thus, there is an association between orthology and functional differentiation. In contrast to the contingency gene families we have already met, some cell-surface gene families offer evidence for this. In both trypanosomatids and apicomplexans, there are multiple, paralogous nucleoside transporter genes, required for scavenging of nucleosides from the host [34]. Their phylogenies, shown in Fig. 4, demonstrate how several parasite-specific loci evolved after the origin of parasitism and then retained their

orthology during the differentiation of each parasitic group. Gene sequences cluster by locus, whether as single orthologs representing apicomplexan species (left), or with multi-copy tandem arrays as in trypanosomatids (right). The distinct clades correspond in part to the known functional differences between nucleoside transporters, for example, the four paralogs in Leishmania major are developmentally regulated in distinct life cycle stages, with NT1 and NT2 being transcribed preferentially in promastigote forms while NT3 and NT4 are expressed principally in vertebrate life stages [68]. This is reflected in the

Please cite this article in press as: A.P. Jackson, Gene family phylogeny and the evolution of parasite cell surfaces, Mol Biochem Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.03.007

G Model MOLBIO-10970; No. of Pages 12

ARTICLE IN PRESS A.P. Jackson / Molecular & Biochemical Parasitology xxx (2016) xxx–xxx

physiological conditions under which the corresponding proteins function optimally, with the amastigote-specific NT4 being optimized for acidic conditions of the parasitophorous vacuole [69]. This pattern is observed in the phylogenies of many other cell-surface gene families but, in the main, the evidence for functional differentiation is implied and not yet proven. In Plasmodium spp. certain Merozoite Surface Proteins (i.e. MSP1, 4, 5, 8 and 10) are GPI-anchored ligands on the parasite surface during its erythrocytic stages, which share an EGF-like domain at their carboxy terminal. Orthologs of each MSP protein are conserved in all Plasmodium species and all family members bind the erythrocyte surface [70–72]. Specific host ligands have been identified for MSP1 only, which is essential for invasion by merozoites in P. falciparum [71,72]. At present, the precise functions of other MSP are unknown and the evidence that each is distinct is circumstantial. The structures of MSP4, 5, 8 and 10 are highly diverse; indeed they cannot be related other than by the C-terminal EGF domain, and antibodies raised against one MSP do not cross-react with others [73,74]. Thus, they probably do not bind the known partners of MSP1. While MSP1 is cleaved from the erythrocyte surface during invasion, MSP4 is internalized after RBC invasion in P. falciparum and then maintained [75]. In P. vivax, MSP4 is expressed across the entire merozoite surface, while MSP5 (its sibling lineage), is localized only to the apical complex [74]. In P. falciparum, MSP8 does not appear to be expressed on the merozoite at all, but instead within the parasitophorous vacuole of the ring stage [76,77]. Hence, MSP provide an example of a parasite-specific, cell-surface protein family engaged in host recognition and invasion that has retained orthology during diversification. It remains to be proven that these orthologs perform distinct noon-redundant functions. In trypanosomatids, an unrelated MSP protein family, the Major Surface Proteases, plays an equally prominent role at the hostparasite interface. The MSP gene family encode metalloproteases contributing to pathogenesis and virulence in Leishmania [78,79]. MSP undermine host defences by degrading components of innate immunity [80–82] and by manipulating immune cell signalling pathways [83–88]. In Trypanosoma, MSP are equally abundant in gene copy number and protein abundance. Here, MSP have a role in stage differentiation of T. brucei [89] and cell invasion by T. cruzi [90,91], while they may also suppress cell-mediated innate immunity [92]. The MSP phylogeny is described in Fig. 5. It shows how, beginning a single ancestral locus, MSP has differentiated into distinct clades in both Leishmania and Trypanosoma. Each clade is associated with a conserved locus, and some of these distinct lineages are developmentally regulated [78]. For instance, MSP-A and MSPC are up-regulated in bloodstream form T. brucei, while MSP-B is predominantly seen in the procyclic form [93,94]. Therefore, the trypanosomatids may have elaborated their MSP repertoire at least in part to regulate function during the life-cycle. However, Fig. 5 also demonstrates that MSP in Leishmania and Trypanosoma cluster by genus, suggesting that their similarities in genomic structure, developmental regulation and pathogenesis have evolved independently. The phylogeny also reveals one particular locus that is single copy (unlike most parasite-specific loci), and which, based on its basal position (i.e. closest to the mid-point of the tree), represents the protolog of the parasite-specific loci. This locus, which has been called MSP-D [95] is the only MSP shared by trypanosomatids and the non-parasitic Kinetoplastid Bodo saltans [96]. The function of MSP-D is not yet known although it may be secreted [95]. Elucidating the functional differences between parasite-specific proteins and MSP-D will be crucial to understanding the expansion of this family in trypanosomatids.

9

5. Transitions in evolutionary dynamics as signatures of functional change We have compared the phylogenetics of cell-surface gene families that display concerted evolution and functional redundancy with those that retain orthology and display, or at least suggest, functional differentiation. Finally, we will look at gene families that span this divide, combining both dynamics in their phylogenies. Amastins are a family of transmembrane, cell-surface glycoproteins in trypanosomatids [97]. They are most prominent, and most familiar, as stage-specific proteins expressed in the intracellular, amastigote stage of Leishmania and T. cruzi during their respective vertebrate life stages [98,99]. Fig. 6 shows that there were two phases of expansion in trypanosomatids [100]. The first occurred in the common ancestor of Trypanosoma and Leishmania, which created four distinct sub-families (␣, ␤, ␥ and p ␦) that remain orthologous in contemporary species. Amastin from Bodo saltans is most closely related to ␣-amastin indicating that this sub-family occupies a basal position in the tree (see Fig. 6) and is the most plausible protolog [96]. The second phase of expansion occurred after the divergence of Leishmania from monoxenic insect parasites (e.g. Leptomonas pyrrhicoris), which led to multiple ␦-amastin loci that are unique to Leishmania [100]. Hence, there is a notable change in dynamic following the origin of ␦-amastin, coincident with the evolution of an intracellular life stage by Leishmania. The dynamic shifts from maintenance of a few orthologous loci, to widespread within-genome paralogy caused by the duplication of ␦-amastin [100]. Following the current argument, we should expect this shift to be reflected in functional evolution. Recent work by de Paiva et al. [101] has suggested that amastin be homologous to claudins [102], a component of tight junctions. They further show that ␦-amastin is localized in L. braziliensis to such structures in the parasitophorous vacuole membrane, where it is essential for development of the amastigote [101]. Perhaps due to quantity of ␦-amastin required for this function, or possibly the diversifying selection this more exposed position solicits from the host immune system, this functional derivation has led to a change in evolutionary dynamic. Exploring the functions of other amastin sub-families is clearly vital to understanding why ␦amastin evolved this phenotype. Other sub-families have different expression profiles, and are sometimes not expressed in amastigotes [103]. Thus, their functions are likely different, but a role in tight junctions cannot at present be confirmed. The Serine Repeat Antigen (SERA) family of cysteine proteases in Plasmodium spp. is also associated with parasite development and cell invasion. The gene family is always arranged in a tandem gene array, but the number of tandem duplicates varies by species [104,105]. All SERAs are exported to the parasitophorous vacuole [106,107] but we know most about SERA5, which is considerably more abundant than other family members [108]. SERA5 is essential for growth of blood-stage P. falciparum [109,110] and it has a key role in the egress of merozoites from the host erythrocyte [111]. The phylogeny of sera genes (Fig. 7) shows that, up to a point, orthology is maintained across species such that paralogs cluster by position within the tandem array, rather than by species. The three 3 -most positions in the array, (sera6-8 in P. falciparum) contain single-copy genes that maintain orthology and are known as clades I–III [104]. Clade I (i.e. sera8 and orthologs) represents the protolog, and is the only sera present outside of Plasmodium (in Theileria annulata). Upstream of clades I–III, there is a variable number of paralogs belonging to clade IV (sera1-5 in P. falciparum). Clade IV paralogs do cluster by species or species groups [105], suggestive of a different evolutionary dynamic involving rapid gene duplication within species. The significance of this change in dynamic part-way along the tandem array may relate to a structural difference among sera. In

Please cite this article in press as: A.P. Jackson, Gene family phylogeny and the evolution of parasite cell surfaces, Mol Biochem Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.03.007

G Model MOLBIO-10970; No. of Pages 12 10

ARTICLE IN PRESS A.P. Jackson / Molecular & Biochemical Parasitology xxx (2016) xxx–xxx

clade I–III sera, the active site has a cysteine residue but this is replaced in clade IV sera by serine [112,113]. The latter remain functional enzymes and are necessary for normal blood-stage development [110,113]. Like amastin, we do not currently know enough about the functional distinctions, if any, of sera to interpret the importance of the cysteine-to-serine derivation. Targeted knock-out of sera in P. berghei have suggested that clade III sera can compensate for loss of clade IV function, casting doubt on the significance of the amino acid replacement [114]. However, it is notable that rodent infecting Plasmodium like P. berghei have the least developed sera repertoire [105] and that their clade IV genes retain orthology between species (see Fig. 7).

6. The cell-surface re-invented: Plus c¸a change, plus c’est la même chose? The absence of orthology from a multi-copy gene family, and the concerted evolution that this entails, is a signature of rapid gene turnover. Parasites like T. brucei, B. bovis and P. falciparum provide some of the most compelling and sophisticated adaptations for promoting this process, which is intuitive, given the intense interactions between organisms and the immune systems of their hosts, mediated by their cell-surface molecules. In this review, I have presented the dynamics of gene families as categorical, as if some are driven by concerted evolution while others by functional differentiation. But of course reality is not categorical, all families are affected by these different pressures in different measures; examples like VSG and nucleoside transporters are ideal types on a continuous scale. The contention is that phylogeny reflects something about function. At present, the evidence for such inferences is scant, consisting mainly of differences in gene expression profile. In designing the experiments to determine function, phylogeny can play a key role if we accept the contention provisionally. In the case of amastin sub-families and MSP clades among trypanosomatids, as well as widespread pir clades and sera (i.e. cysteine-type vs. serine-type) in Plasmodium, we must design experiments that manipulate representatives of distinct lineages that maintain orthology. These might compare the phenotypes of knock-out mutants, and whether one lineage can rescue knock-outs of another (i.e. the extent of functional differentiation). Similarly, our experiments must also target variant antigen isoforms that resist concerted evolution and retain orthology, like vir-D among pir, var2csa among var, and ESAG2 among VSG. Finally, to explain why cell surface gene families have come to such prominence in parasite genomes, we must focus on functional comparisons of protologs, such as MSP-D and ␣amastin in trypanosomatids and sera8 in Plasmodium, compared with parasite-specific paralogs. If loss of orthology characterizes these phylogenies, what is the significance for how the surface evolves? It indicates that, on an evolutionary timescale, the cell-surface is constantly re-invented; ancestral structures are superseded by species-specific homologs (or even analogs) with something like a Red Queen effect [115,116]; parasite surfaces change to continue fulfilling their interactive role, rather than to change it. This does not exclude adaptation since that clearly happens, (indeed the exceptions to concerted evolution are a clear indication of its frequency), but it argues that adaptation appears against a background of constant renewal, due to selection for antigenic diversity per se. Of all parasite cellular domains, the cell surface is consistently the fastest evolving. The phylogenies of cell-surface gene families, especially contingency gene families, reflect the underlying dynamic driving divergence, which is that antigenic diversity is an end in itself, and becomes the object of selection. Under these

conditions, structures are constantly, indeed entirely, changed but without necessarily changing their essential function.

References [1] N.M. El-Sayed, P.J. Myler, G. Blandin, M. Berriman, J. Crabtree, G. Aggarwal, et al., Comparative genomics of trypanosomatid parasitic protozoa, Science 309 (2005) 404–409. [2] Acosta-Serrano, C. Hutchinson, E.S. Nakayasu, I. Almeida, M. Carrington, Comparison and evolution of the surface architecture of trypanosomatid parasites, in: J.D. Barry, R. McCulloch, J.C. Mottram, A. Acosta-Serrano (Eds.), African Trypanosomes: After the Genome, Horizon Bioscience, Wymondham, UK, 2007, pp. 319–338. [3] E. Handman, A.T. Papenfuss, T.P. Speed, J.W. Goding, Leishmania surface proteins, in: P.J. Myler, N. Fasel (Eds.), Leishmania: After the Genome, Caister Academic Press, Norfolk, UK, 2008, pp. 177–204. [4] J.C. Kissinger, J. DeBarry, Genome cartography: charting the apicomplexan genome, Trends Parasitol. 27 (2011) 345–354. [5] C.H. Kuo, J.C. Kissinger, Consistent and contrasting properties of lineage-specific genes in the apicomplexan parasites Plasmodium and Theileria, BMC Evol. Biol. 8 (2008) 108. [6] A.J. Reid, Large, rapidly evolving gene families are at the forefront of host-parasite interactions in Apicomplexa, Parasitology 142 (Suppl. 1) (2015) S57–70. [7] B.J. Haas, S. Kamoun, M.C. Zody, R.H. Jiang, R.E. Handsaker, L.M. Cano, et al., Genome sequence and analysis of the Irish potato famine pathogen Phytophthora infestans, Nature 461 (2009) 393–398. [8] P.D. Spanu, The genomics of obligate (and nonobligate) biotrophs, Annu. Rev. Phytopathol. 50 (2012) 91–109. [9] G.D. Weedall, N. Hall, Evolutionary genomics of Entamoeba, Res. Microbiol. 162 (2011) 637–645. [10] K.W. Deitsch, E.R. Moxon, T.E. Wellems, Shared themes of antigenic variation and virulence in bacterial, protozoal, and fungal infections, Microbiol. Mol. Biol. Rev. 61 (1997) 281–293. [11] J.D. Barry, M.L. Ginger, P. Burton, R. McCulloch, Why are parasite contingency genes often associated with telomeres? Int. J. Parasitol. 33 (2003) 29–45. [12] J.M. Carlton, J.H. Adams, J.C. Silva, S.L. Bidwell, H. Lorenzi, E. Caler, et al., Comparative genomics of the neglected human malaria parasite Plasmodium vivax, Nature 455 (2008) 757–763. [13] T.D. Otto, U. Böhme, A.P. Jackson, M. Hunt, B. Franke-Fayard, W.A. Hoeijmakers, et al., A comprehensive evaluation of rodent malaria parasite genomes and gene expression, BMC Biol. 12 (2014) 86. [14] M. Berriman, E. Ghedin, C. Hertz-Fowler, G. Blandin, H. Renauld, D.C. Bartholomeu, et al., The genome of the African trypanosome Trypanosoma brucei, Science 309 (2005) 416–422. [15] A.P. Jackson, M. Sanders, A. Berry, J. McQuillan, M.A. Aslett, M.A. Quail, et al., The genome sequence of Trypanosoma brucei gambiense, causative agent of chronic human african trypanosomiasis, PLoS Negl. Trop. Dis. 4 (2009) e658. [16] A.P. Jackson, A. Berry, M. Aslett, H.C. Allison, P. Burton, J. Vavrova-Anderson, et al., Antigenic diversity is generated by distinct evolutionary mechanisms in African trypanosome species, Proc. Natl. Acad. Sci. U. S. A. 109 (2012) 3416–3421. [17] G.A. Cross, H.S. Kim, B. Wickstead, Capturing the variant surface glycoprotein repertoire (the VSGnome) of Trypanosoma brucei Lister 427, Mol. Biochem. Parasitol. 195 (2014) 59–73. [18] T. Gabaldón, E.V. Koonin, Functional and evolutionary implications of gene orthology, Nat. Rev. Genet. 14 (2013) 360–366. [19] J.F. Elder Jr., B.J. Turner, Concerted evolution of repetitive DNA sequences in eukaryotes, Q. Rev. Biol. 70 (1995) 297–320. [20] D. Liao, T. Pavelitz, J.R. Kidd, K.K. Kidd, A.M. Weiner, Concerted evolution of the tandemly repeated genes encoding human U2 snRNA (the RNU2 locus) involves rapid intrachromosomal homogenization and rare interchromosomal gene conversion, EMBO J. 16 (1997) 588–598. [21] G. Santoyo, D. Romero, Gene conversion and concerted evolution in bacterial genomes, FEMS Microbiol. Rev. 29 (2005) 169–183. [22] C.S. Janssen, R.S. Phillips, C.M. Turner, M.P. Barrett, Plasmodium interspersed repeats: the major multigene superfamily of malaria parasites, Nucleic Acids Res. 32 (2004) 5712–5720. [23] A.P. Jackson, T.D. Otto, A. Darby, A. Ramaprasad, D. Xia, I.E. Echaide, et al., The evolutionary dynamics of variant antigen genes in Babesia reveal a history of genomic innovation underlying host-parasite interaction, Nucleic Acids Res. 42 (2014) 7113–7131. [24] M.M. Zilversmit, E.K. Chase, D.S. Chen, P. Awadalla, K.P. Day, G. McVean, Hypervariable antigen genes in malaria have ancient roots, BMC Evol. Biol. 13 (2013) 110. [25] D. Horn, Antigenic variation in African trypanosomes, Mol. Biochem. Parasitol. 195 (2014) 123–129. [26] K.R. Matthews, R. McCulloch, L.J. Morrison, The within-host dynamics of African trypanosome infections, Philos. Trans. R. Soc. Lond. B Biol. Sci. 370 (1675) (2015), pii: 20140288. [27] R. McCulloch, D. Horn, What has DNA sequencing revealed about the VSG expression sites of African trypanosomes, Trends Parasitol. 25 (2009) 359–363.

Please cite this article in press as: A.P. Jackson, Gene family phylogeny and the evolution of parasite cell surfaces, Mol Biochem Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.03.007

G Model MOLBIO-10970; No. of Pages 12

ARTICLE IN PRESS A.P. Jackson / Molecular & Biochemical Parasitology xxx (2016) xxx–xxx

[28] G. Rudenko, African trypanosomes: the genome and adaptations for immune evasion, Essays Biochem. 51 (2011) 47–62. [29] L. Marcello, J.D. Barry, Analysis of the VSG gene silent archive in Trypanosoma brucei reveals that mosaic gene expression is prominent in antigenic variation and is favored by archive substructure, Genome Res. 17 (2007) 1344–1352. [30] J.P. Hall, H. Wang, J.D. Barry, Mosaic VSGs and the scale of Trypanosoma brucei antigenic variation, PLoS Pathog. 9 (2013) e1003502. [31] M.R. Mugnier, G.A. Cross, F.N. Papavasiliou, The in vivo dynamics of antigenic variation in Trypanosoma brucei, Science 347 (2015) 1470–1473. [32] C. Gadelha, W. Zhang, J.W. Chamberlain, B.T. Chait, B. Wickstead, M.C. Field, Architecture of a host-parasite interface: complex targeting mechanisms revealed through proteomics, Mol. Cell. Proteomics 14 (2015) 1911–1926. [33] D. Salmon, J. Hanocq-Quertier, F. Paturiaux-Hanocq, A. Pays, P. Tebabi, D.P. Nolan, et al., Characterization of the ligand-binding site of the transferrin receptor in Trypanosoma brucei demonstrates a structural relationship with the N-terminal domain of the variant surface glycoprotein, EMBO J. 16 (1997) 7272–7278. [34] P. Borst, A.H. Fairlamb, Surface receptors and transporters of Trypanosoma brucei, Annu. Rev. Microbiol. 52 (1998) 745–778. [35] A.P. Jackson, H.C. Allison, J.D. Barry, M.C. Field, C. Hertz-Fowler, M. Berriman, A cell-surface phylome for African trypanosomes, PLoS Negl. Trop. Dis. 7 (2013) e2121. [36] D.R. Allred, J.M. Carlton, R.L. Satcher, J.A. Long, W.C. Brown, P.E. Patterson, et al., The ves multigene family of B. bovis encodes components of rapid antigenic variation at the infected erythrocyte surface, Mol. Cell 5 (2000) 153–162. [37] Y.P. Xiao, B. Al-Khedery, D.R. Allred, The Babesia bovis VESA1 virulence factor subunit 1b is encoded by the beta branch of the ves multigene family, Mol. Biochem. Parasitol. 171 (2010) 81–88. ´ P.B. Drummond, D.M. Swetnam, B. Al-Khedery, D.R. Allred, [38] A.K. Zupanska, Universal primers suitable to assess population dynamics reveal apparent mutually exclusive transcription of the Babesia bovis ves1alpha gene, Mol. Biochem. Parasitol. 66 (2009) 47–53. [39] B. Al-Khedery, D.R. Allred, Antigenic variation in Babesia bovis occurs through segmental gene conversion of the ves multigene family, within a bidirectional locus of active transcription, Mol. Microbiol. 59 (2006) 402–414. [40] K.A. Brayton, A.O. Lau, D.R. Herndon, L. Hannick, L.S. Kappmeyer, S.J. Berens, et al., Genome sequence of Babesia bovis and comparative analysis of apicomplexan hemoprotozoa, PLoS Pathog. 3 (2007) 1401–1413. [41] L. Hviid, A.T. Jensen, PfEMP1—a parasite protein family of key importance in Plasmodium falciparum malaria immunity and pathogenesis, Adv. Parasitol. 88 (2015) 51–84. [42] C.P. Ward, G.T. Clottey, M. Dorris, D.D. Ji, D.E. Arnot, Analysis of Plasmodium falciparum PfEMP-1/var genes suggests that recombination rearranges constrained sequences, Mol. Biochem. Parasitol. 102 (1999) 167–177. [43] J. Peters, E. Fowler, M. Gatton, N. Chen, A. Saul, Q. Cheng, High diversity and rapid changeover of expressed var genes during the acute phase of Plasmodium falciparum infections in human volunteers, Proc. Natl. Acad. Sci. U. S. A. 99 (2002) 10689–10694. [44] P.C. Bull, M. Berriman, S. Kyes, M.A. Quail, N. Hall, M.M. Kortok, et al., Plasmodium falciparum variant surface antigen expression patterns during malaria, PLoS Pathog. 1 (2005) e26. [45] S.M. Kraemer, S.A. Kyes, G. Aggarwal, A.L. Springer, S.O. Nelson, Z. Christodoulou, et al., Patterns of gene recombination shape var gene repertoires in Plasmodium falciparum: comparisons of geographically diverse isolates, BMC Genomics 8 (2007) 45. [46] M. Frank, L. Kirkman, D. Costantini, S. Sanyal, C. Lavazec, T.J. Templeton, K.W. Deitsch, Frequent recombination events generate diversity within the multi-copy variant antigen gene families of Plasmodium falciparum, Int. J. Parasitol. 38 (2008) 1099–1109. [47] B. Gamain, A.R. Trimnell, C. Scheidig, A. Scherf, L.H. Miller, J.D. Smith, Identification of multiple chondroitin sulfate A (CSA)-binding domains in the var2CSA gene transcribed in CSA-binding parasites, J. Infect. Dis. 191 (2005) 1010–1013. [48] S.M. Kraemer, J.D. Smith, Evidence for the importance of genetic structuring to the structural and functional specialization of the Plasmodium falciparum var gene family, Mol. Microbiol. 50 (2003) 1527–1538. [49] J. Bockhorst, F. Lu, J.H. Janes, J. Keebler, B. Gamain, P. Awadalla, et al., Structural polymorphism and diversifying selection on the pregnancy malaria vaccine candidate VAR2CSA, Mol. Biochem. Parasitol. 155 (2007) 103–112. [50] D.B. Larremore, S.A. Sundararaman, W. Liu, W.R. Proto, A. Clauset, D.E. Loy, et al., Ape parasite origins of human malaria virulence genes, Nat. Commun. 6 (2015) 8368. [51] B.W. Mok, U. Ribacke, N. Rasti, F. Kironde, Q. Chen, P. Nilsson, M. Wahlgren, Default pathway of var2csa switching and translational repression in Plasmodium falciparum, PLoS One 3 (2008) e1982. [52] B. Amulic, A. Salanti, T. Lavstsen, M.A. Nielsen, K.W. Deitsch, An upstream open reading frame controls translation of var2csa, a gene implicated in placental malaria, PLoS Pathog. 5 (2009) e1000256. [53] C. Bancells, K.W. Deitsch, A molecular switch in the efficiency of translation reinitiation controls expression of var2csa, a gene implicated in pregnancy-associated malaria, Mol. Microbiol. 90 (2013) 472–488.

11

[54] T. Lavstsen, P. Magistrado, C.C. Hermsen, A. Salanti, A.T. Jensen, R. Sauerwein, et al., Expression of Plasmodium falciparum erythrocyte membrane protein 1 in experimentally infected humans, Malar. J. 4 (2005) 21. [55] M.F. Duffy, A. Caragounis, R. Noviyanti, H.M. Kyriacou, E.K. Choong, K. Boysen, et al., Transcribed var genes associated with placental malaria in Malawian women, Infect. Immun. 74 (2006) 4875–4883. [56] U.E. Ukaegbu, X. Zhang, A.R. Heinberg, M. Wele, Q. Chen, K.W. Deitsch, A unique virulence gene occupies a principal position in immune evasion by the malaria parasite Plasmodium falciparum, PLoS Genet. 11 (2015) e1005234. [57] H.A. del Portillo, C. Fernandez-Becerra, S. Bowman, K. Oliver, M. Preuss, C.P. Sanchez, et al., A superfamily of variant genes encoded in the subtelomeric region of Plasmodium vivax, Nature 410 (2001) 839–842. [58] J.M. Carlton, S.V. Angiuoli, B.B. Suh, T.W. Kooij, M. Pertea, J.C. Silva, et al., Genome sequence and comparative analysis of the model rodent malaria parasite Plasmodium yoelii yoelii, Nature 419 (2002) 512–519. [59] C.S. Janssen, M.P. Barrett, C.M. Turner, R.S. Phillips, A large gene family for putative variant antigens shared by human and rodent malaria parasites, Proc. Biol. Sci. 269 (2002) 431–436. [60] D. Cunningham, J. Lawton, W. Jarra, P. Preiser, J. Langhorne, The pir multigene family of Plasmodium: antigenic variation and beyond, Mol. Biochem. Parasitol. 170 (2010) 65–73. [61] M. Bernabeu, F.J. Lopez, M. Ferrer, L. Martin-Jaular, A. Razaname, G. Corradin, et al., Functional analysis of Plasmodium vivax VIR proteins reveals different subcellular localizations and cytoadherence to the ICAM-1 endothelial receptor, Cell. Microbiol. 14 (2012) 386–400. [62] E.M. Pasini, J.A. Braks, J. Fonager, O. Klop, E. Aime, R. Spaccapelo, et al., Proteomic and genetic analyses demonstrate that Plasmodium berghei blood stages export a large and diverse repertoire of proteins, Mol. Cell. Proteomics 12 (2013) 426–448. [63] P. Ebbinghaus, J. Krücken, Characterization and tissue-specific expression patterns of the Plasmodium chabaudi cir multigene family, Malar. J. 10 (2011) 272. [64] J. Lawton, T. Brugat, X.Y. Yam, A.J. Reid, U. Boehme, T.D. Otto, et al., Characterization and gene expression analysis of the cir multi-gene family of Plasmodium chabaudi chabaudi (AS), BMC Genomics 13 (2012) 125. [65] D.E. Neafsey, K. Galinsky, R.H. Jiang, L. Young, S.M. Sykes, S. Saif, et al., The malaria parasite Plasmodium vivax exhibits greater genetic diversity than Plasmodium falciparum, Nat. Genet. 44 (2012) 1046–1050. [66] C. Frech, N. Chen, Variant surface antigens of malaria parasites: functional and evolutionary insights from comparative gene family classification and analysis, BMC Genomics 14 (2013) 427. [67] M. Lynch, J.S. Conery, The evolutionary fate and consequences of duplicate genes, Science 290 (2000) 1151–1155. [68] N.S. Akopyants, R.S. Matlib, E.N. Bukanova, M.R. Smeds, B.H. Brownstein, G.D. Stormo, S.M. Beverley, Expression profiling using random genomic DNA microarrays identifies differentially expressed genes associated with three major developmental stages of the protozoan parasite Leishmania major, Mol. Biochem. Parasitol. 136 (2004) 71–86. [69] D. Ortiz, M.A. Sanchez, H.P. Koch, H.P. Larsson, S.M. Landfear, An acid-activated nucleobase transporter from Leishmania major, J. Biol. Chem. 284 (2009) 16164–16169. [70] Y. Cheng, Y. Wang, D. Ito, D.H. Kong, K.S. Ha, J.H. Chen, et al., The Plasmodium vivax merozoite surface protein 1 paralog is a novel erythrocyte-binding ligand of P. vivax, Infect. Immun. 81 (2013) 1585–1595. [71] C.S. Lin, A.D. Uboldi, D. Marapana, P.E. Czabotar, C. Epp, H. Bujard, et al., The merozoite surface protein 1 complex is a platform for binding to human erythrocytes by Plasmodium falciparum, J. Biol. Chem. 289 (2014) 25655–25669. [72] M.R. Baldwin, X. Li, T. Hanada, S.C. Liu, A.H. Chishti, Merozoite surface protein 1 recognition of host glycophorin A mediates malaria parasite invasion of red blood cells, Blood 125 (2015) 2704–2711. [73] C.G. Black, T. Wu, L. Wang, A.R. Hibbs, R.L. Coppel, Merozoite surface protein 8 of Plasmodium falciparum contains two epidermal growth factor-like domains, Mol. Biochem. Parasitol. 114 (2001) 217–226. [74] C.G. Black, L. Wang, T. Wu, R.L. Coppel, Apical location of a novel EGF-like domain-containing protein of Plasmodium falciparum, Mol. Biochem. Parasitol. 127 (2003) 59–68. [75] M.J. Boyle, C. Langer, J.A. Chan, A.N. Hodder, R.L. Coppel, R.F. Anders, J.G. Beeson, Sequential processing of merozoite surface proteins during and after erythrocyte invasion by Plasmodium falciparum, Infect. Immun. 82 (2014) 924–936. [76] D.R. Drew, R.A. O’Donnell, B.J. Smith, B.S. Crabb, A common cross-species function for the double epidermal growth factor-like modules of the highly divergent Plasmodium surface proteins MSP-1 and MSP-8, J. Biol. Chem. 279 (2004) 20147–20153. [77] D.R. Drew, P.R. Sanders, B.S. Crabb, Plasmodium falciparum merozoite surface protein 8 is a ring-stage membrane protein that localizes to the parasitophorous vacuole of infected erythrocytes, Infect. Immun. 73 (2005) 3912–3922. [78] C. Yao, Major surface protease of trypanosomatids: one size fits all, Infect. Immun. 78 (2010) 22–31. [79] M. Olivier, V.D. Atayde, A. Isnard, K. Hassani, M.T. Shio, Leishmania virulence factors: focus on the metalloprotease GP63, Microbes Infect. 14 (2012) 1377–1389.

Please cite this article in press as: A.P. Jackson, Gene family phylogeny and the evolution of parasite cell surfaces, Mol Biochem Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.03.007

G Model MOLBIO-10970; No. of Pages 12 12

ARTICLE IN PRESS A.P. Jackson / Molecular & Biochemical Parasitology xxx (2016) xxx–xxx

[80] M.M. Kulkarni, W.R. McMaster, E. Kamysz, W. Kamysz, D.M. Engman, B.S. McGwire, The major surface-metalloprotease of the parasitic protozoan, Leishmania, protects against antimicrobial peptide-induced apoptotic killing, Mol. Microbiol. 62 (2006) 1484–1497. [81] T. Lieke, S. Nylén, L. Eidsmo, W.R. McMaster, A.M. Mohammadi, A. Khamesipour, et al., Leishmania surface protein gp63 binds directly to human natural killer cells and inhibits proliferation, Clin. Exp. Immunol. 153 (2008) 221–230. [82] D. Matheoud, N. Moradin, A. Bellemare-Pelletier, M.T. Shio, W.J. Hong, M. Olivier, et al., Leishmania evades host immunity by inhibiting antigen cross-presentation through direct cleavage of the SNARE VAMP8, Cell Host Microbe 14 (2013) 15–25. [83] M.A. Gomez, I. Contreras, M. Hallé, M.L. Tremblay, R.W. McMaster, M. Olivier, Leishmania GP63 alters host signaling through cleavage-activated protein tyrosine phosphatases, Sci. Signal. 2 (2009) ra58. [84] M. Hallé, M.A. Gomez, M. Stuible, H. Shimizu, W.R. McMaster, M. Olivier, M.L. Tremblay, The Leishmania surface protease GP63 cleaves multiple intracellular proteins and actively participates in p38 mitogen-activated protein kinase inactivation, J. Biol. Chem. 284 (2009) 6893–6908. [85] I. Contreras, M.A. Gómez, O. Nguyen, M.T. Shio, R.W. McMaster, M. Olivier, Leishmania-induced inactivation of the macrophage transcription factor AP-1 is mediated by the parasite metalloprotease GP63, PLoS Pathog. 6 (10) (2010) e1001148. [86] G. Arango Duque, M. Fukuda, S.J. Turco, S. Stäger, A. Descoteaux, Leishmania promastigotes induce cytokine secretion in macrophages through the degradation of synaptotagmin XI, J. Immunol. 193 (2014) 2363–2372. [87] M.T. Shio, J.G. Christian, J.Y. Jung, K.P. Chang, M. Olivier, PKC/ROS-Mediated NLRP3 inflammasome activation is attenuated by Leishmania zinc-metalloprotease during infection, PLoS Negl. Trop. Dis. 9 (2015) e0003868. [88] H. Álvarez de Celis, C.P. Gómez, A. Descoteaux, P. Duplay, Dok proteins are recruited to the phagosome and degraded in a GP63-dependent manner during Leishmania major infection, Microbes Infect. 17 (2015) 285–294. [89] P.M. Grandgenett, K. Otsu, H.R. Wilson, M.E. Wilson, J.E. Donelson, A function for a specific zinc metalloprotease of African trypanosomes, PLoS Pathog. 3 (2007) 1432–1445. [90] I.C. Cuevas, J.J. Cazzulo, D.O. Sánchez, gp63 homologues in Trypanosoma cruzi: surface antigens with metalloprotease activity and a possible role in host cell infection, Infect. Immun. 71 (2003) 5739–5749. [91] M.M. Kulkarni, C.L. Olson, D.M. Engman, B.S. McGwire, Trypanosoma cruzi GP63 proteins undergo stage-specific differential posttranslational modification and are important for host cell infection, Infect. Immun. 77 (2009) 2193–2200. [92] A. Oladiran, M. Belosevic, Recombinant glycoprotein 63 (Gp63) of Trypanosoma carassii suppresses antimicrobial responses of goldfish (Carassius auratus L.) monocytes and macrophages, Int. J. Parasitol. 42 (2012) 621–633. [93] D.J. LaCount, A.E. Gruszynski, P.M. Grandgenett, J.D. Bangs, J.E. Donelson, Expression and function of the Trypanosoma brucei major surface protease (GP63) genes, J. Biol. Chem. 278 (2003) 24658–24664. [94] M.D. Urbaniak, M.L. Guther, M.A. Ferguson, Comparative SILAC proteomic analysis of Trypanosoma brucei bloodstream and procyclic lifecycle stages, PLoS One 7 (2012) e36619. [95] V. Marcoux, G. Wei, H. Tabel, H.J. Bull, Characterization of major surface protease homologues of Trypanosoma congolense, J. Biomed. Biotechnol. (2010) 418157. [96] A.P. Jackson, T.D. Otto, M. Aslett, S.D. Armstrong, F. Bringaud, A. Schlacht, et al., Kinetoplastid phylogenomics reveals the evolutionary innovations associated with the origins of parasitism, Curr. Biol. 26 (2016) 161–172. [97] A. Rochette, F. McNicoll, J. Girard, M. Breton, E. Leblanc, M.G. Bergeron, B. Papadopoulou, Characterization and developmental gene regulation of a large gene family encoding amastin surface proteins in Leishmania spp, Mol. Biochem. Parasitol. 140 (2005) 205–220. [98] S.M. Teixeira, D.G. Russell, L.V. Kirchhoff, J.E. Donelson, A differentially expressed gene family encoding amastin a surface protein of Trypanosoma cruzi amastigotes, J. Biol. Chem. 269 (1994) 20509–20516. [99] Y. Wu, Y. El Fakhry, D. Sereno, S. Tamar, B. Papadopoulou, A new developmentally regulated gene family in Leishmania amastigotes encoding a homolog of amastin surface proteins, Mol. Biochem. Parasitol. 110 (2000) 345–357.

[100] A.P. Jackson, The evolution of amastin surface glycoproteins in trypanosomatid parasites, Mol. Biol. Evol. 27 (2010) 33–45. [101] R.M. de Paiva, V. Grazielle-Silva, M.S. Cardoso, B.N. Nakagaki, R.P. Mendonc¸a-Neto, A.M. Canavaci, et al., Amastin knockdown in Leishmania braziliensis affects parasite-macrophage interaction and results in impaired viability of intracellular amastigotes, PLoS Pathog. 11 (2015) e1005296. [102] M. Furuse, K. Fujita, T. Hiiragi, K. Fujimoto, S. Tsukita, Claudin-1 and -2: novel integral membrane proteins localizing at tight junctions with no sequence similarity to occludin, J. Cell Biol. 141 (1998) 1539–1550. [103] M.M. Kangussu-Marcolino, R.M. de Paiva, R.R. Araújo, R.P. de Mendonc¸a-Neto, L. Lemos, D.C. Bartholomeu, et al., Distinct genomic organization, mRNA expression and cellular localization of members of two amastin sub-families present in Trypanosoma cruzi, BMC Microbiol. 13 (2013) 10. [104] N. Arisue, M. Hirai, M. Arai, H. Matsuoka, T. Horii, Phylogeny and evolution of the SERA multigene family in the genus Plasmodium, J. Mol. Evol. 65 (2007) 82–91. [105] N. Arisue, S. Kawai, M. Hirai, N.M. Palacpac, M. Jia, A. Kaneko, et al., Clues to evolution of the SERA multigene family in 18 Plasmodium species, PLoS One 6 (2011) e17775. [106] P. Delplace, B. Fortier, G. Tronchin, J.F. Dubremetz, A. Vernes, Localization biosynthesis, processing and isolation of a major 126 kDa antigen of the parasitophorous vacuole of Plasmodium falciparum, Mol. Biochem. Parasitol. 23 (1987) 193–201. [107] B. Knapp, E. Hundt, U. Nau, H.A. Küpper, Molecular cloning, genomic structure and localization in a blood stage antigen of Plasmodium falciparum characterized by a serine stretch, Mol. Biochem. Parasitol. 32 (1989) 73–83. [108] S. Aoki, J. Li, S. Itagaki, B.A. Okech, T.G. Egwang, H. Matsuoka, N.M. Palacpac, et al., Serine repeat antigen (SERA5) is predominantly expressed among the SERA multigene family of Plasmodium falciparum, and the acquired antibody titers correlate with serum inhibition of the parasite growth, J. Biol. Chem. 277 (2002) 47533–47540. [109] S.K. Miller, R.T. Good, D.R. Drew, M. Delorenzi, P.R. Sanders, A.N. Hodder, et al., A subset of Plasmodium falciparum SERA genes are expressed and appear to play an important role in the erythrocytic cycle, J. Biol. Chem. 277 (2002) 47524–47532. [110] W.D. Fairlie, T.P. Spurck, J.E. McCoubrie, P.R. Gilson, S.K. Miller, G.I. McFadden, et al., Inhibition of malaria parasite development by a cyclic peptide that targets the vital parasite protein SERA5, Infect. Immun. 76 (2008) 4332–4344. [111] X.L. Pang, T. Mitamura, T. Horii, Antibodies reactive with the N-terminal domain of Plasmodium falciparum serine repeat antigen inhibit cell proliferation by agglutinating merozoites and schizonts, Infect. Immun. 67 (1999) 1821–1827. [112] M.C. Kiefer, K.A. Crawford, L.J. Boley, K.E. Landsberg, H.L. Gibson, D.C. Kaslow, P.J. Barr, Identification and cloning of a locus of serine repeat antigen (sera)-related genes from Plasmodium vivax, Mol. Biochem. Parasitol. 78 (1996) 55–65. [113] A.N. Hodder, D.R. Drew, V.C. Epa, M. Delorenzi, R. Bourgon, S.K. Miller, et al., Enzymic, phylogenetic, and structural characterization of the unusual papain-like protease domain of Plasmodium falciparum SERA5, J. Biol. Chem. 278 (2003) 48169–48177. [114] E.D. Putrianti, A. Schmidt-Christensen, I. Arnold, V.T. Heussler, K. Matuschewski, O. Silvie, The Plasmodium serine-type SERA proteases display distinct expression patterns and non-essential in vivo roles during life cycle progression of the malaria parasite, Cell. Microbiol. 12 (2010) 725–739. [115] L. Van Valen, A new evolutionary law, Evol. Theory (1973) 1–30. [116] M.A. Brockhurst, T. Chapman, K.C. King, J.E. Mank, S. Paterson, G.D. Hurst, Running with the Red Queen: the role of biotic conflicts in evolution, Proc. Biol. Sci. 281 (2014), pii: 20141382.

Please cite this article in press as: A.P. Jackson, Gene family phylogeny and the evolution of parasite cell surfaces, Mol Biochem Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.03.007