Infection, Genetics and Evolution 6 (2006) 277–286 www.elsevier.com/locate/meegid
The phylogenetic and evolutionary history of a novel alpha-globin-type gene in orangutans (Pongo pygmaeus) Michael E. Steiper a,b,*, Nathan D. Wolfe c, William B. Karesh d, Annelisa M. Kilbourn d,ä, Edwin J. Bosi e, Maryellen Ruvolo f a Department of Anthropology, Hunter College of the City University of New York, New York, USA Department of Anthropology, The Graduate Center of the City University of New York, New York, USA c Epidemiology and Molecular Microbiology and Immunology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, USA d Field Veterinary Program, Wildlife Conservation Society, Bronx, New York, USA e Sabah Wildlife Department, Sepilok Orangutan Rehabilitation Center, Sandakan, Sabah, Malaysia f Department of Anthropology, Harvard University, Cambridge, Massachusetts, USA b
Received 28 April 2005; received in revised form 19 July 2005; accepted 1 August 2005 Available online 19 September 2005
Abstract The alpha-globin genes are implicated in human resistance to malaria, a disease caused by Plasmodium parasites. This study is the first to analyze DNA sequences from a novel alpha-globin-type gene in orangutans, a species affected by Plasmodium. Phylogenetic methods show that the gene is a duplication of an alpha-globin gene and is located 50 of alpha-2 globin. The alpha-globin-type gene is notable for having four amino acid replacements relative to the orangutan’s alpha-1 and alpha-2 globin genes, with no synonymous differences. Pairwise Ka/Ks methods and likelihood ratio tests (LRTs) revealed that the evolutionary history of the alpha-globin-type gene has been marked by either neutral or positive evolution, but not purifying selection. A comparative analysis of the amino acid replacements of the alpha-globin-type gene with human hemoglobinopathies and hemoglobin structure showed that two of the four replaced sites are members of the same molecular bond, one that is crucial to the proper functioning of the hemoglobin molecule. This suggested an adaptive evolutionary change. Functionally, this locus may result in a thalassemia-like phenotype in orangutans, possibly as an adaptation to combat Plasmodium. # 2005 Elsevier B.V. All rights reserved. Keywords: Gene duplication; Natural selection; Correlated change; Malaria; Primates; Globins
1. Introduction The adult human hemoglobin molecule is comprised of two alpha-globin and two beta-globin proteins. Both alphaand beta-globin genes are part of large gene superfamilies that reside on separate chromosomes (Proudfoot et al., 1980b). In humans, there are three major classes of alphaglobin genes: zeta-, alpha-, and theta-globin (Fig. 1a). The genes reside on chromosome 16. The genomic organization of the alpha-globin region is broadly similar among humans and our closest evolutionary relatives, the hominoids * Corresponding author. Tel.: +1 212 772 5418; fax: +1 212 772 5423. E-mail address:
[email protected] (M.E. Steiper). ä Deceased. 1567-1348/$ – see front matter # 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.meegid.2005.08.001
(humans, chimpanzees, gorillas, orangutans, and gibbons) (Lauer et al., 1980; Marks et al., 1986a,b; Bailey et al., 1997). The 50 -most gene of the alpha-globin gene family, zeta-globin, encodes the embryonically expressed alphachain (Lauer et al., 1980; Pressley et al., 1980; Proudfoot et al., 1982). During gestation, production is shifted to both alpha-1 and alpha-2 globin, which are the main producers of alpha-globin throughout adult life (Higgs et al., 1989). In humans, alpha-1 and alpha-2 globin genes encode for the same exact amino acid chain, with their similarity maintained by gene conversion (Zimmer et al., 1980; Liebhaber et al., 1981; Hess et al., 1983, 1984; Michelson and Orkin, 1983). The 30 -most gene, theta-globin, is expressed, but its precise function is unresolved (Marks et al., 1986b,c; Leung et al., 1987, 1989; Hsu et al., 1988).
278
M.E. Steiper et al. / Infection, Genetics and Evolution 6 (2006) 277–286
Fig. 1. (a) Genetic map of the Alpha-globin region of hominoids based on the human reference sequence. Regions broadly similar between alpha-1 and alpha-2 globin are denoted homologous regions X, Y, and Z, as indicated elsewhere in the literature (e.g. Bailey et al., 1997). (b) Amino acid differences among different orangutan alpha-globin genes, including the alpha-globin-type gene. Positions that match the top sequence are indicated by a ‘’. (c) DNA differences among different orangutan alpha-1 globin, alpha-2 globin, and the alpha-globin-type gene. For alpha-1, part of the 50 and 30 region is unknown and indicated by ‘N’s. Positions that match the top sequence are indicated by a ‘’.
Aside from the functional genes, pseudogenes are also part of this gene family (Proudfoot and Maniatis, 1980a; Proudfoot et al., 1982). The primate alpha-globin superfamily is a very fluid genomic region (Marks et al., 1986a). Unequal crossingover among the alpha-globin genes has led to numerous instances of alpha-globin gene number polymorphism, forming the basis for gene family change. The best-known alpha-globin gene number polymorphism is alpha-thalassemia, a reduction in alpha-globin gene copy number in humans that has been implicated in resistance to Plasmodium parasites (Flint et al., 1986). Aside from alpha-globin gene number reduction, increases in alpha-globin gene copy number are known in a range of primates including humans, macaques, orangutans, and chimpanzees (Goossens et al., 1980; Zimmer et al., 1980; Takenaka et al., 1991, 1993). Thus far, no DNA sequences have been collected from the additional alpha-globin genes of non-human primates. It is therefore unknown whether these additional alpha-globin genes are exact duplicates of alpha-1 and alpha-2, or if they are different. Evidence based on amino acid sequencing suggests these genes may be different from alpha-1 and
alpha-2 globin (Boyer et al., 1971a,b, 1973). The additional alpha-globin genes of primates are hypothesized to be under natural selection by malaria (Boyer et al., 1971b; Takenaka et al., 1993), akin to the relationship between malarial selection and alpha-thalassemia in humans. Recently, the population genetics of the alpha-2 globin genomic region of orangutans (Pongo pygmaeus) were studied (Steiper et al., 2005). Orangutans are interesting subjects to explore malarial selection in non-human primates for a number of reasons: Plasmodium parasites are present in high frequencies in wild and semi-captive orangutans (Wolfe et al., 2002), individuals can become ill from the parasite (Wolfe, 1999), and anti-malarial medicines ameliorate their symptoms (Wolfe, 1999). During the analysis of alpha-2 globin, a novel version of alpha-globin was discovered in Bornean orangutans (Fig. 1b and c). The DNA sequence of this novel gene is unique relative to the other members of the orangutan alpha-globin gene family; it has 4–5 amino acid replacements relative to alpha-1 and alpha-2 globin, as well as up- and downstream differences. In this study, we provide evidence that these DNA sequences are from a novel gene and we analyze these genes
M.E. Steiper et al. / Infection, Genetics and Evolution 6 (2006) 277–286
evolutionarily to assess the role of natural selection on this locus. This gene is referred to here as the orangutan ‘alphaglobin-type’ gene. Three approaches were undertaken to understand the evolutionary history of this novel alpha-globin-type gene. First, the alpha-globin-type gene of orangutans was compared phylogenetically to the other members of the orangutan alpha-globin gene family, as well as the alphaglobin genes of other hominoids. This allowed an estimate of this gene’s relationship to the other alpha-globin genes and its genomic placement relative to these other alpha-globin genes. Second, the role of natural selection on the evolution of this novel alpha-globin gene was explored using both phylogenetic and pairwise methods. Finally, the four amino acids replaced in the alpha-globin-type gene were placed in the context of their role in hemoglobin structure, their relationship to human hemoglobinopathies, and their variability among mammals, in order to assess their possible functional significance.
2. Materials and methods 2.1. Sampling, PCR, and DNA sequencing Orangutan samples were obtained from field sites, zoos, and research facilities from both Sumatran and Bornean subspecies. These analyses assumed unrelatedness among individuals. A 1.5 kilobase portion of the alpha-globin region was amplified, cloned, and sequenced as specified in Steiper et al. (2005). This PCR amplified both alpha-2 globin, and, when present, the novel DNA sequence analyzed here, the alpha-globin-type gene. Sequences are available in Genbank (accession DQ186522-4). DNA sequences were collected from 4 Sumatran and 16 Bornean orangutans. These individuals were drawn from throughout the range of orangutans, though many were from northern Borneo. We do not expect excessive relatedness among individuals. 2.2. Population genetics The frequency of the alpha-globin-type gene was further examined using restriction digests on a larger sample of orangutans than was examined by DNA sequencing. In this analysis, 45 orangutans were screened (6 Sumatran and 39 Bornean). Primers were designed to amplify a region containing a restriction enzyme recognition site that distinguished alpha-globin DNA from the alpha-globintype gene (Geno-F: 50 -TCC-TGG-CCC-CGG-ACC-CAAAC-30 ; Geno-R: 50 -CCG-CCG-CTC-ACC-TTG-AAGTTG-30 ). This PCR product was exposed to the restriction enzyme BclI (NEB, Beverly, MA), which cuts 50 T^GATCA-30 , a motif present in the alpha-globin-type gene, but not in the other alpha-globins. As in the DNA sequencing data set, these individuals were drawn from
279
throughout the range of orangutans, though many were from northern Borneo. We do not expect excessive relatedness among individuals. 2.3. Phylogenetic methods Two distinct phylogenetic analyses were undertaken to account for the history and genomic structure of the alphaglobin region. This approach was taken because these genes reside within a much larger homologous region (Fig. 1a) (Lauer et al., 1980; Liebhaber et al., 1981; Marks et al., 1986b; Bailey et al., 1997). In one phylogenetic analysis, the 50 noncoding region and the coding region of both alpha-1 and alpha-2 globin were included. In a second analysis, the 30 noncoding region of both genes was included for analysis (Fig. 1c). A sample of orangutan alpha-2 globin alleles from Steiper et al. (2005) were analyzed along with three alphaglobin-type alleles and the alpha-1 and alpha-2 globin genes of other hominoids (human #J00153.1, gibbon #M94634, orangutan #AH003091). Sequences were aligned using ClustalX (Thompson et al., 1997) and modified by eye. The 50 and coding region alignment contained 847 base pairs and the 30 region alignment contained 153 base pairs. Human pseudo-alpha-1 globin was used as an outgroup. For both alignments, phylogenies were reconstructed using distance, parsimony, and likelihood techniques. Distance-based trees were built using the neighbor-joining (NJ) algorithm (Saitou and Nei, 1987) using the HKY correction for genetic distances (Hasegawa et al., 1985). Maximum parsimony (MP) trees were reconstructed with transitions and transversions equally weighted in both alignments, reflecting the lack of such a bias in this data set. A heuristic search with 50 random additions was performed. Maximum likelihood (ML) trees were estimated using the most likely model of molecular evolution derived from Modeltest (Posada and Crandall, 1998) with a heuristic search of 50 random taxon additions. All analyses were performed using PAUP* (Swofford, 2002). Both pairwise and maximum likelihood approaches were used to estimate the ratio of the replacement (Ka) to silent (Ks) substitutions in the coding region of alpha-2 and alphaglobin-type genes (Ka/Ks or v). These enabled tests of different hypotheses for the selective history of both genes. These analyses were performed on alpha-2 globin coding regions from one human, one chimpanzee, one gibbon, two orangutans (51.1 and Rudy.1) and one alpha-globin-type gene (Rizan.1). Orangutans 51.1 and Rudy.1 were chosen because their sequences show at least one synonymous mutation on each branch of the gene tree within orangutans. Pairwise Ka/Ks ratio estimates were generated with FENS (de Koning and Stewart, 1998) using the method of Li (1993), and employed a t-test to detect significant departures from neutral evolution (Ka/Ks = 1). Likelihood ratio tests (LRT) were performed on different hypotheses about the selective history of the alpha-2 and alpha-globin-type genes. Estimates for the likelihood of trees under a range of
280
M.E. Steiper et al. / Infection, Genetics and Evolution 6 (2006) 277–286
parameters and estimates of the Ka/Ks ratio (v) were generated in PAML v. 3.0 (Yang, 1997). In all calculations, the transition/transversion ratio was estimated, no molecular clock was assumed, and no rate heterogeneity was allowed.
3. Results 3.1. Population genetics and phylogenetic analysis The alpha-globin-type gene was directly observed via DNA sequencing in three different orangutans. The three alpha-globin-type alleles differed among one another at only one basepair. Compared to the orangutan alpha-2 globin gene, the novel alpha-globin-type sequence had 4–5 non-synonymous substitutions with no synonymous changes (Fig. 1b and c). There were also numerous up- and downstream differences between it alpha-1 and alpha-2 globin (Fig. 1c). A restriction digest analysis of 45 orangutans determined that the alpha-globin-type gene was present in 7 orangutans (16%), all from northern Borneo. Two observations support that alpha-globin-type gene was a novel alpha-globin gene and not an allele of alphaglobin. First, in one of the three orangutans where the alphaglobin-type gene was sequenced, two different alpha-2 globin alleles were also found. The other two orangutans may have been homozygous at alpha-2 globin, disallowing this test of exclusion since only one type of alpha-2 allele
would be present in a homozygote. Second, one individual that was positive for the alpha-globin-type allele based on restriction analysis also had two alleles at alpha-2 globin. These two observations showed that the alpha-globin-type gene was not an allelic version of alpha-2 globin and represented a novel gene. Phylogenetic analyses bolstered this finding. To further investigate the relationship of the alphaglobin-type gene to the other orangutan alpha-globin genes, two alignments of hominoid alpha-globin (including the alpha-globin-type gene) were phylogenetically analyzed, the 50 and coding region and the 30 region (Fig. 1c). Phylogenetic analysis of the 50 and coding region linked alpha-2 and alpha-1 globin as sister taxa in each hominoid species, confirming the impact of persistent gene conversion at these loci (Fig. 2a). In this analysis, the orangutan alphaglobin-type genes form a cluster within the orangutan alpha1 and alpha-2 globin genes. The alpha-globin-type sequences are specifically linked to one alpha-2 globin sequence. This relationship may reflect the alpha-globintype gene’s origination from an alpha-2 globin ancestor, though gene conversion may have erased the differences in this portion of the alpha-1 and alpha-2 globin regions. Analyses of the non-homologous 30 regions of these alpha-globin genes, which are not subject to interlocus gene conversion, may prove more useful for deciphering the position of the alpha-globin-type gene. This is because immediately downstream of alpha-2 globin is the alpha-1
Fig. 2. (Tree a) 50% consensus maximum parsimony a globin gene of the 50 and coding regions, based on 184 trees, each 267 steps in length. Each species is shown inside a box. The alpha-globin-type alleles are in boldface type. Numbers above branches represent the percentage of times that a particular grouping was found in the consensus. Similar results were found in the maximum likelihood and distance analysis. (Tree b) Maximum likelihood tree for the alpha-globin gene 30 region (ln likelihood = 540.43410). The alpha-globin-type alleles are in boldface type. The maximum parsimony tree had the identical topology as the likelihood tree and was 72 steps long.
M.E. Steiper et al. / Infection, Genetics and Evolution 6 (2006) 277–286
globin region, while downstream from the alpha-1 globin gene there is a distinctive alpha-globin gene member, thetaglobin (Fig. 1a and c). In the phylogenetic analyses of the 30 region, the alpha-globin-type gene is consistently allied to orangutan alpha-2 globin, at the exclusion of alpha-1 globin (Fig. 2b). This linkage of alpha-2 and the alpha-globin-type gene was also observable based on the alignment in Fig. 1c, which clearly showed the similarities between these two genes to the exclusion of alpha-1 globin. Based on this phylogenetic analysis, two hypotheses emerged for the position of the alpha-type gene: between alpha-2 and alpha-1, and 50 to alpha-2. The presence of an alpha-globin gene 30 to the alpha-globin-type gene (Fig. 1c) rules out placement as the 30 -most alpha-globin gene. While gene conversion acts on alpha-2 and alpha-1 globin (Zimmer et al., 1980; Liebhaber et al., 1981; Hess et al., 1983, 1984; Michelson and Orkin, 1983), no conversion tracts are detected between the alpha-globin-type gene and alpha-2 globin using the method of Sawyer (1999). With three loci in a row, and the alpha-globin-type gene in the middle, it is unlikely that only the two external genes would continue to undergo gene conversion, without gene conversion affecting the intervening locus. Given the known map of the orangutan alpha-globin region (Marks et al., 1986b), this is direct evidence that immediately downstream of the alpha-globintype is another globin gene. Therefore, the most parsimonious placement for the alpha-globin-type locus is a position immediately upstream of alpha-2 globin. 3.2. The evolutionary history of the alpha-globin-type gene of orangutans Because of the pattern of four non-synonymous and no synonymous changes at the alpha-globin-type gene, this gene was examined for the signature of natural selection. Evidence for adaptive evolutionary changes were examined using two methods: pairwise Ka/Ks tests and likelihood ratio tests (LRT). Pairwise Ka/Ks statistics were calculated for a DNA sequence alignment of hominoid alpha-2 globin genes and the alpha-globin-type gene (Table 1). Most pairwise Ka/Ks values between the alpha-2 globin genes were significantly lower than the neutral expectation of Ka/Ks = 1.0, evidence for purifying selection. Comparisons between the alpha-globin-type gene and all other genes, on
281
Fig. 3. Maximum likelihood estimates for the numbers of replacement and synonymous changes (above each branch) and the v value (below each branch) for the lineages in this tree of hominoid alpha-2 and orangutan alpha-globin-type globin. Calculated using PAML (Yang, 1997).
the other hand, did not differ significantly from the neutral expectation. Pairwise Ka/Ks values from the orangutan alpha-2 globin genes to the alpha-globin-type gene were 1.3 and 1.9. This suggested that after the duplication event, the alpha-globin-type gene was evolving either neutrally or positively, but not under purifying selection. A phylogeny of the hominoid alpha-globin genes and the numbers of synonymous and non-synonymous DNA changes along each branch is presented in Fig. 3. Likelihood ratio tests (LRTs) were able to evaluate specific hypotheses about the selective history of the alpha-globin-type gene using a phylogenetic approach by calculating the likelihood of a data set, given this phylogeny and an evolutionary model. These different evolutionary models are described as follows. First, the likelihood score of a selectively neutral model was calculated with a v value for all branches in the tree fixed at 1 (Ka/Ks = v) (A) (Table 2). Second, the likelihood of a ‘free-ratio’ model (B), where all branches are allowed to vary and have their own unique v value, was calculated. Third, the score of a tree with one estimated v value for the entire tree was calculated (model C). Fourth, scores were calculated from trees with two variable v values (model D)—one estimated for the branch leading to the alpha-globin-type gene and a second (v0 estimated for all remaining branches. Finally, scores were calculated with a fixed v0 value of 1.0 for the branch leading to the alphaglobin-type gene and a second estimated v value for the
Table 1 Pairwise Ka/Ks values for primate alpha-globins 1 Human a-2 1 2 3 4 5 6 * **
p < 0.05. p < 0.01.
2 Chimpanzee a-2
3 Gibbon a-2
4 Orangutan 1 a-2
5 Orangutan2 a-2
6 Orangutan Alpha-globin-type
0.0
0.2151* 0.2628*
0.1337* 0.1610* 0.2448
0.0937** 0.1143* 0.1964* 0.1986
0.3289 0.4166 0.6327 1.3348 1.9124
282
M.E. Steiper et al. / Infection, Genetics and Evolution 6 (2006) 277–286
Table 2 Likelihood values for different evolutionary models Model
Description a
N pars.b
ln likelihoodc
A B C D E
Neutrality (v fixed at 1) ‘Free-ratio’ (v free to vary in all branches) (see Fig. 4) One v estimated for all branches (v = 0.1385) Two v values estimated: alpha-globin-type vs. all other branches Alpha-globin-type branch v fixed at 1; second v estimated for rest of branches (v = 0.082)
10 19 11 12 11
661.63 646.25 651.98 648.17 648.81
a b c
Described further in text. Number of parameters estimated by maximum likelihood. Likelihood value estimated for the model, given the data set, by PAML (Yang, 1997).
remaining branches in the tree (model E). From these five models, multiple hypotheses were addressed using LRTs. First, the ‘neutral’ model (A) was compared to the remaining models (B, C, D, E) to test whether the evolutionary history of the alpha-2 globin and alphaglobin-type gene has been predominantly determined by neutral evolution. The neutral model was rejected as a very poor fit compared to the other models ( p < 0.001) (Table 3). A second group of analyses tested the ‘free-ratio’ model (B) against one or two estimated v values for the entire tree (C, D and E). Unsurprisingly, the ‘free-ratio’ model, with the highest number of estimated parameters, yielded the highest likelihood values, but there was no statistical difference between this tree and those with fewer estimated v’s. This test means that statistically, the v values for the tree can be fit by one estimate for the entire tree—the tree was not improved by having different v values for different branches. The single estimated v was 0.1385, evidence for purifying selection. However, the branch leading to the alpha-globin-type gene was reconstructed to have four replacement and no silent mutations, qualitatively different from the other branches of the tree (Fig. 3). Although not statistically rigorous given this test, additional hypotheses were explored for branch leading to alpha-globin-type gene. Table 3 Likelihood ratio tests of different models of evolution Likelihood ratio testa
2DL value b
d.f.c
p-value d
A vs. A vs. A vs. A vs.
B C D E
30.75 19.29 26.91 25.64
9 1 2 1
<0.001*** <0.001*** <0.001*** <0.001***
B vs. C B vs. D B vs. E
11.46 3.84 5.11
8 7 8
0.177 0.798 0.745
C vs. D
7.62
1
0.005**
D vs. E
1.27
1
0.259
a
Models described in Table 2 and in text. Two times the difference between the likelihood values estimated for the two models. c Degrees of freedom. d Probability that one model is a significantly better fit to the data than the other. ** p < 0.01. *** p < 0.001. b
A test examined whether one or two major selective environments better explained the v values among the branches. This was done by comparing model C (one v) to model D (one v for the branch leading to the alpha-globintype gene and another v for the rest of the tree). Here, the two-v model (D) is a significantly better fit than the one v model (C) ( p = 0.005). This suggested that the selection operating on the alpha-globin-type gene was different from the selection operating on alpha-2 globin. This was in agreement with the pairwise analysis that showed that the history of alpha-2 is marked by purifying selection, but the alpha-globin-type gene was not. This left two hypotheses for the evolutionary history along the alpha-globin-type gene lineage: positive selection and neutral evolution. These competing hypotheses for the evolutionary history of the alpha-globin-type did not reveal significant differences ( p = 0.26). Together, LRTs and pairwise Ka/Ks analyses suggested that the alpha-globin-type gene has had a different selective environment than alpha-2 globin, which has been under purifying selection. Since its duplication, the alpha-globintype gene has been evolving by either positive selection or neutral evolution. An examination of the four amino acid substitutions in the alpha-globin-type gene will assist in distinguishing between these alternative hypotheses. 3.3. Comparative functional analysis of the Alpha-globin-type amino acid chain Unfortunately, it was not possible to test directly whether the alpha-globin-type gene was transcribed or translated in the orangutan, as only minute quantities of orangutan samples were available for this study. As a proxy, the alphaglobin-type gene sequences were examined for features inconsistent with gene expression, such as nonsense or missense mutations or mutations in promoter regions. No such changes were found. The four non-synonymous substitutions in the alpha-globin-type gene would likely change the properties of the hemoglobin made from it (Figs. 4 and 5). To assess the functional significance of these changes, three indirect methods were used. First, an alignment of 148 therian alpha-globin proteins was analyzed to obtain the number of different amino acid sites present at homologous amino acid sites, as a benchmark to identify
M.E. Steiper et al. / Infection, Genetics and Evolution 6 (2006) 277–286
283
Fig. 4. Description of the four amino acid positions replaced in the alpha-globin-type of orangutans. ‘Amino Acid Position’ is the position in the alpha-chain. ‘Quality’ refers to Li et al.’s (1985) parsing of Grantham’s (1974) distance matrix into discrete changes ranging from conservative to radical. ‘Structural properties’ were taken from Sack et al. (1978). ‘Variability within Therians’ refers to the number of different amino acid states at a given site based on an alignment of 148 alpha-globin proteins (the mean was 3.2 states per site). ‘‘Known Human Mutations and their Properties’’ shows similar human mutations and their known phenotypic results. (1) Moo-Penn et al. (1983), Honig et al. (1984); (2) Pootrakul et al. (1974), Vella et al., 1974; and (3) Griffiths et al. (1977), Chihchuan et al. (1981), Al-Awamy et al. (1985). Bold denotes noteworthy aspects mentioned in greater length in the text.
sites that were relatively variable or constrained among taxa. The average number of different states at a given amino acid site in alpha-globin was 3.2. Second, each of the replaced amino acids of the alpha-globin-type gene was examined in light of its known role in human hemoglobin structure. Third, each amino acid replacement of the alphaglobin-type gene was compared to the phenotypic properties of humans with similar hemoglobin mutations/hemoglobinopathies. This was carried out for the four amino acids replaced at the alpha-globin-type gene, sites 14, 15, 18, and 67 (Fig. 4). The most noteworthy replacement was found at amino acid 14, where a tryptophan residue was replaced in the alpha-globin-type gene with a cysteine residue. Based on chemical properties, this is the single most radical shift between any two amino acids (Grantham, 1974). Furthermore, the 14th amino acid was constrained in mammals,
where the only other known residue was a phenylalanine (Braunitzer et al., 1977; Sgouros et al., 1988), another large, aromatic amino acid. Based on human hemoglobin structure, this amino acid is probably constrained because it makes a hydrogen bond between itself and amino acid 67, linking helix A of the alpha-chain to helix E (Sack et al., 1978) (Fig. 5). Note that the 67th amino acid was also mutated in the alpha-globin-type gene. No tryptophan to cysteine mutations have been cataloged at site 14 in humans, but a tryptophan to arginine replacement is known, HbEvanston (Moo-Penn et al., 1983; Honig et al., 1984). Interestingly, HbEvanston leads to alpha-thalassemia in humans. This is because the replacement of tryptophan leaves a large gap in the structure between itself and amino acid 67, loosening the hemoglobin molecule (Honig et al., 1984). Judging from its size alone, cysteine would likely leave an even larger gap in the hemoglobin structure.
Fig. 5. Human hemoglobin structure (SwissPDB) showing portions of hemoglobin’s helix A and helix E of an alpha-globin chain. This diagram is based on the known structure of human hemoglobin. At left, the hydrogen bond between sites 14 and 67 is shown, which bonds the two helices in normal human hemoglobin. At right is the view of the alpha-globin-type protein showing the large gap between helices A and E caused by the replacement of tryptophan by cysteine and threonine by isoleucine. This diagram is not based on the known structure of orangutan hemoglobin made from the alpha-globin-type gene, it is instead a hypothetical representation for the outcome of these amino acid replacements. The diagram shows how sizable the gap would be between these two helices, if the amino acids were replaced, however, it is not precisely clear how these replacements would affect the overall hemoglobin structure of orangutans without further analyses.
284
M.E. Steiper et al. / Infection, Genetics and Evolution 6 (2006) 277–286
Crucially, the other half of this bond, amino acid 67, is also replaced in the alpha-globin-type gene.
4. Discussion In orangutans, approximately 16% of individuals harbor the alpha-globin-type gene. A different study of alphaglobin gene copy number found that 42% of orangutans have three alpha-globin genes (Takenaka et al., 1993), though these alpha-globin genes were not sequenced. Here we provide evidence that the novel alpha-globin-type gene of orangutans is this third alpha-globin gene. Unlike the nearly complete similarity between alpha-1 and alpha-2 globin, the third alpha-globin gene has four non-synonymous changes. This does not resolve the differences in incidence found between this study and that of Takenaka et al. (1993) (18% versus 42%). This difference could be due to sampling error, though additional third alpha-globin genes not detected by the methods of this study could be present in orangutans. Another possibility is different patterns of relatedness among individuals in either study. Alpha-globin gene copy polymorphisms have been described in many primates (Zimmer et al., 1980; Takenaka et al., 1991, 1993), but without DNA sequencing of the different genes. Early protein electrophoresis studies, however, recovered and sequenced unique alpha-globin amino acid chains from both chimpanzees and gorillas (Boyer et al., 1971a,b, 1973). The amino acid sequences of these proteins had eight and nine amino acid differences from the alpha-1 and alpha-2 globin amino acid chains. Interestingly, these replacements differ from those found in the orangutan alpha-globin-type gene, showing that the underlying DNA leading to these novel alpha-globin chains is different between orangutans and the African apes. This suggests that there has been at least two independent duplication events of the alpha-globin genes of apes, supporting the idea that alpha-globin is an evolutionarily fluid region (Marks et al., 1986a). Both Boyer et al. (1971b) and Takenaka et al. (1993) hypothesize that malaria could be a selective force driving alpha-globin gene duplication in non-human primates. The DNA sequence analysis of the orangutan alpha-globin-type gene bears on these hypotheses. Boyer et al.’s (1971b) hypothesis is that frequent unequal crossing-over occurring at closely juxtaposed alpha-globin loci would lead to thalassemia. This model does not fit the orangutan case. If unequal crossing-over were common, there would necessarily be both single and triplicated alpha-globin chromosomes present. In orangutans, no chromosomes with single alpha-globin genes are observed (Takenaka et al., 1993). Takenaka et al. (1993) make a less specific hypothesis about the triplicated chromosomes, simply linking differences in gene copy number to tropical areas where malaria is likely endemic in primates. This is similar to the case in humans, but in
humans it is fewer than the normal number of alpha-globin loci that is inferred as an adaptation against malaria. In primates, additional alpha-globin genes are linked to tropical regions. The present study significantly contributes and extends this hypothesis, suggesting that the extra alphaglobin gene present in orangutans is not simply an additional exact copy of the alpha-1 and alpha-2 genes. Instead, this gene is a novel DNA sequence that causes differences at the amino acid level. This derived sequence originated as a duplication during an unequal crossing-over. Following this event, this novel gene evolved under neutral or positive evolution into the alpha-globin-type gene. Though the statistical evidence does not overwhelmingly support an adaptive, non-neutral hypothesis, the fact that two of the four amino acids replaced in the alpha-globin-type gene are structurally linked suggests a possible role for natural selection in its evolutionary history. Specifically, these two replacements seem to be a case of correlated amino acid change, rather than two unrelated mutation events. Furthermore, one of these replacements mirrors a mutation that causes thalassemia in humans (Moo-Penn et al., 1983; Honig et al., 1984), a disease linked to malaria. This can be coupled to the evidence that independent alpha-globin duplication events have occurred in both African and Asian apes. While the statistical evidence from the Ka/Ks and LRT analyses do not overwhelmingly support a strong pattern of non-neutral evolution and are more supportive of nonpurifying selection, the multiple lines of related evidence together generally support an adaptive hypothesis for the evolution of the alpha-globin-type gene of orangutans. Additional work is necessary to further explore the role of natural selection on this locus. Detailed tests that can simultaneously address gene duplication, Ka/Ks ratios, and the probabilities of correlated molecular change would assist in solidifying rejection of neutral evolution for this locus. Although there is evidence that malarial parasites are a likely selective pressure in orangutans (Wolfe, 1999; Wolfe et al., 2002), Takenaka et al. (1993) found no relationship between orangutans with additional alpha-globin genes and malarial infection. This study cannot be considered conclusive, however, since it examined the hematological indices of a limited number of captive orangutans. It is unclear how Plasmodium would influence an individual orangutan’s hematology, especially in captive animals that may not reflect the course of infection in wild conditions. Therefore, it is crucial to conduct detailed functional studies of the alpha-globin-type gene and its associations with Plasmodium in wild primates. If Plasmodium infection in orangutans is mitigated by the adaptive presence of the alpha-globin-type gene, this supports the idea that humans are not the only primates adversely affected by Plasmodium parasites (Wang et al., 2003; Steiper et al., 2005) and furthermore that this disease may have a history in preagricultural humans and earlier hominids (Rich et al., 1998; Ayala et al., 1999; Coluzzi, 1999; Rich and Ayala, 2000; Steiper et al., 2005).
M.E. Steiper et al. / Infection, Genetics and Evolution 6 (2006) 277–286
Acknowledgements We thank the WCS-Harvard Orangutan Study for the Sabah orangutan DNA samples. For non-Sabah samples we thank Oliver Ryder and Leona Chemnick at the San Diego Zoo’s Center for the Reproduction of Endangered Species (C.R.E.S.)/Frozen Zoo, the Orangutan S.S.P., Yerkes R.P.C., and the Topeka Zoo. This work was supported by grants from the National Science Foundation (BCS-0097057), the Lindbergh Foundation, and the Harvard Anthropology Department to M.S. and M.R. Research Centers in Minority Institutions award RR-03037 from the National Center for Research Resources of the National Institutes of Health, which supports the infrastructure of the Anthropological Genetics Laboratory at Hunter College, is also acknowledged. The contents are solely the responsibility of the authors and do not necessarily represent the official views of the NCRR/NIH. We also thank two anonymous reviewers for their comments on this manuscript and Andrew Berry, Dan Hartl, and David Pilbeam for comments on an earlier draft of this work. Randall Collura, Amanda Lobell, and Glenn Maston were generous with assistance in the laboratory and with analyses.
References Al-Awamy, B.H., Niazi, G.A., Naeem, M.A., Wilson, J.B., Huisman, T.H., 1985. Hemoglobin Handsworth or a2 18(A16)Gly!Arg b2 in a Saudi newborn. Hemoglobin 9, 183–186. Ayala, F.J., Escalante, A.A., Rich, S.M., 1999. Evolution of Plasmodium and the recent origin of the world populations of Plasmodium falciparum. Parassitologia 41, 55–68. Bailey, A.D., Shen, C.C., Shen, C.-K.J., 1997. Molecular origin of the mosaic sequence arrangements of higher primate a-globin duplication units. Proc. Natl. Acad. Sci. U.S.A. 94, 5177–5182. Boyer, S.H., Crosby, E.F., Noyes, A.N., Fuller, G.F., Leslie, S.E., Donaldson, L.J., Vrablik, G.R., Schaefer Jr., E.W., Thurmon, T.F., 1971a. Primate hemoglobins: some sequences and some proposals concerning the character of evolution and mutation. Biochem. Genet. 5, 405–448. Boyer, S.H., Noyes, A.N., Boyer, M.L., Marr, K., 1973. Hemoglobin 3 chains in apes. Primary structures and the presumptive nature of back mutation in a normally silent gene. J. Biol. Chem. 248, 992–1003. Boyer, S.H., Noyes, A.N., Vrablik, G.R., Donaldson, L.J., Schaefer Jr., E.W., Gray, C.W., Thurmon, T.F., 1971b. Silent hemoglobin alpha genes in apes: potential source of thalassemia. Science 171, 182–185. Braunitzer, G., Schrank, B., Stangl, A., 1977. Die Sequenz der a-Kettan der Haemoglobine des Schweines und des Lamas (Aspekte zur Atmung im Hochland). H-S Z. Physiol. Chem. 358, 409–412. Chih-chuan, L., Hai-nan, T., Kuo-feng, C., 1981. Hemoglobin handsworth (gamma 18 (A16) Gly leads to Arg) in a Chinese. Hemoglobin 5, 191– 193. Coluzzi, M., 1999. The clay feet of the malaria giant and its African roots: hypotheses and inferences about origin, spread and control of Plasmodium falciparum. Parassitologia 41, 277–283. de Koning, A.P.J., Stewart, C.-B., 1998. FENS – Facilitated estimates of nucleotide substitutions, University at Albany, Suny. Flint, J., Hill, A.V., Bowden, D.K., Oppenheimer, S.J., Sill, P.R., Serjeantson, S.W., Bana-Koiri, J., Bhatia, K., Alpers, M.P., Boyce, A.J., et al., 1986. High frequencies of a-thalassaemia are the result of natural selection by malaria. Nature 321, 744–750.
285
Goossens, M., Dozy, A.M., Embury, S.H., Zachariades, Z., Hadjiminas, M.G., Stamatoyannopoulos, G., Kan, Y.W., 1980. Triplicated a-globin loci in humans. Proc. Natl. Acad. Sci. U.S.A. 77, 518–521. Grantham, R., 1974. Amino acid difference formula to help explain protein evolution. Science 185, 862–864. Griffiths, K.D., Lang, A., Lehmann, H., Mann, J.R., Plowman, D., Raine, D.N., 1977. Haemoglobin handsworth a 18 (A16) glycine leads to arginine. FEBS Lett. 75, 93–95. Hasegawa, M., Kishino, H., Yano, T., 1985. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22, 160–174. Hess, J.F., Fox, M., Schmid, C., Shen, C.K., 1983. Molecular evolution of the human adult a-globin-like gene region: insertion and deletion of Alu family repeats and non-Alu DNA sequences. Proc. Natl. Acad. Sci. U.S.A. 80, 5970–5974. Hess, J.F., Schmid, C.W., Shen, C.K., 1984. A gradient of sequence divergence in the human adult a-globin duplication units. Science 226, 67–70. Higgs, D.R., Vickers, M.A., Wilkie, A.O., Pretorius, I.M., Jarman, A.P., Weatherall, D.J., 1989. A review of the molecular genetics of the human a-globin gene cluster. Blood 73, 1081–1104. Honig, G.R., Shamsuddin, M., Vida, L.N., Mompoint, M., Valcourt, E., Bowie, L.J., Jones, E.C., Powers, P.A., Spritz, R.A., Guis, M., et al., 1984. Hemoglobin evanston (a 14 Trp!Arg). An unstable alphachain variant expressed as alpha-thalassemia. J. Clin. Invest. 73, 1740–1749. Hsu, S.L., Marks, J., Shaw, J.P., Tam, M., Higgs, D.R., Shen, C.C., Shen, C.K., 1988. Structure and expression of the human u-1 globin gene. Nature 331, 94–96. Lauer, J., Shen, C.K., Maniatis, T., 1980. The chromosomal arrangement of human alpha-like globin genes: sequence homology and alpha-globin gene deletions. Cell 20, 119–130. Leung, S., Proudfoot, N.J., Whitelaw, E., 1987. The gene for theta-globin is transcribed in human fetal erythroid tissues. Nature 329, 551–554. Leung, S.O., Whitelaw, E., Proudfoot, N.J., 1989. Transcriptional and translational analysis of the human u-globin gene. Nucleic Acids Res. 17, 8283–8300. Li, W.H., 1993. Unbiased estimation of the rates of synonymous and nonsynonymous substitution. J. Mol. Evol. 36, 96–99. Li, W.H., Wu, C.I., Luo, C.C., 1985. A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes. Mol. Biol. Evol. 2, 150–174. Liebhaber, S.A., Goossens, M., Kan, Y.W., 1981. Homology and concerted evolution at the a-1 and a-2 loci of human alpha-globin. Nature 290, 26–29. Marks, J., Shaw, J.-P., Perez-Stable, C., Hu, W.-S., Ayres, T.M., Shen, C., Shen, C.-K.J., 1986a. The primate a-globin gene family: a paradigm of the fluid genome. Cold Spring Harb. Symp. Quant. Biol. 51, 499– 508. Marks, J., Shaw, J.-P., Shen, C.-K.J., 1986b. The orangutan adult a-globin gene locus: duplicated functional genes and a newly detected member of the primate a-globin gene family. Proc. Natl. Acad. Sci. U.S.A. 83, 1413–1417. Marks, J., Shaw, J.P., Shen, C.K., 1986c. Sequence organization and genomic complexity of primate u-1 globin gene, a novel a-globin-like gene. Nature 321, 785–788. Michelson, A.M., Orkin, S.H., 1983. Boundaries of gene conversion within the duplicated human a-globin genes. Concerted evolution by segmental recombination. J. Biol. Chem. 258, 15245–15254. Moo-Penn, W.F., Baine, R.M., Jue, D.L., Johnson, M.H., McGuffey, J.E., Benson, J.M., 1983. Hemoglobin evanston: alpha 14(A12) Trp leads to Arg. A variant hemoglobin associated with alpha-thalassemia-2. Biochim. Biophys. Acta 747, 65–70. Pootrakul, S., Srichiyanont, S., Wasi, P., Suanpan, S., 1974. Hemoglobin Siam (alpha 2 15 arg beta 2): a new alpha-chain variant. Humangenetik 23, 199–204.
286
M.E. Steiper et al. / Infection, Genetics and Evolution 6 (2006) 277–286
Posada, D., Crandall, K.A., 1998. MODELTEST: testing the model of DNA substitution. Bioinformatics 14, 817–818. Pressley, L., Higgs, D.R., Clegg, J.B., Weatherall, D.J., 1980. Gene deletions in alpha thalassemia prove that the 50 z-locus is functional. Proc. Natl. Acad. Sci. U.S.A. 77, 3586–3589. Proudfoot, N.J., Gil, A., Maniatis, T., 1982. The structure of the human zglobin gene and a closely linked, nearly identical pseudogene. Cell 31, 553–563. Proudfoot, N.J., Maniatis, T., 1980a. The structure of a human a-globin pseudogene and its relationship to alpha-globin gene duplication. Cell 21, 537–544. Proudfoot, N.J., Shander, M.H., Manley, J.L., Gefter, M.L., Maniatis, T., 1980b. Structure and in vitro transcription of human globin genes. Science 209, 1329–1336. Rich, S.M., Ayala, F.J., 2000. Population structure and recent evolution of Plasmodium falciparum. Proc. Natl. Acad. Sci. U.S.A. 97, 6994– 7001. Rich, S.M., Light, M.C., Hudson, R.R., Ayala, F.J., 1998. Malaria’s eve: evidence of a recent population bottleneck throughout the world populations of Plasmodium falciparum. Proc. Natl. Acad. Sci. U.S.A. 95, 4425–4430. Sack, J.S., Andrews, L.C., Magnus, K.A., Hanson, J.C., Rubin, J., Love, W.E., 1978. Location of amino acid residues in human deoxy hemoglobin. Hemoglobin 2, 153–169. Saitou, N., Nei, M., 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406– 425. Sawyer, S.A., 1999. GENECONV: a computer package for the statistical detection of gene conversion. Distributed by the author, Department of Mathematics, Washington University in St. Louis, available at http:// www.math.wustl.edu/sawyer. Sgouros, J.G., Kleinschmidt, T., Braunitzer, G., 1988. The primary structure of the hemoglobin of the Indian false vampire (Megaderma lyra Microchiroptera). Biol. Chem. H-S 369, 47–53.
Steiper, M.E., Wolfe, N.D., Karesh, W.B., Kilbourn, A.M., Bosi, E.J., Ruvolo, M., 2005. The population genetics of the a-2 globin locus of orangutans (Pongo pygmaeus). J. Mol. Evol. 60, 400–408. Swofford, D.L., 2002. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods) Sinauer Associates, Sunderland, MA. Takenaka, A., Udono, T., Miwa, N., Varavudhi, P., Takenaka, O., 1993. High frequency of triplicated a-globin genes in tropical primates, crab-eating macaques (Macaca fascicularis), chimpanzees (Pan troglodytes), and orang-utans (Pongo pygmaeus). Primates 34, 55–60. Takenaka, A., Ueda, S., Terao, K., Takenaka, O., 1991. Multiple a-globin genes in crab-eating macaques (Macaca fascicularis). Mol. Biol. Evol. 8, 320–326. Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F., Higgins, D.G., 1997. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25, 4876–4882. Vella, F., Casey, R., Lehmann, H., Labossierre, A., Jones, T.G., 1974. Haemoglobin Ottawa: a2 15 (A13) Gly!Arg b2. Biochim. Biophys. Acta 336, 25–29. Wang, H.Y., Tang, H., Shen, C.K., Wu, C.I., 2003. Rapidly evolving genes in human. I. The glycophorins and their possible role in evading malaria parasites. Mol. Biol. Evol. 20, 1795–1804. Wolfe, N.D., 1999. Pathogen evolution and exchange in Bornean orangutans. Department of Immunology and Infectious Disease, Harvard School of Public Health, Harvard University, Boston. Wolfe, N.D., Karesh, W.B., Kilbourn, A.M., Cox-Singh, J., Bosi, E.J., Rahman, H.A., Prosser, A.T., Singh, B., Andau, M., Spielman, A., 2002. The impact of ecological conditions on the prevalence of malaria among orangutans. Vector Borne Zoonot. 2, 97–103. Yang, Z., 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13, 555–556. Zimmer, E.A., Martin, S.L., Beverley, S.M., Kan, Y.W., Wilson, A.C., 1980. Rapid duplication and loss of genes coding for the alpha chains of hemoglobin. Proc. Natl. Acad. Sci. U.S.A. 77, 2158–2162.