HLA Class I Diversity in Kolla Amerindians Ann-Margaret Little, Iain Scott, Susanna Pesoa, Steven G. E. Marsh, Rafael Argu¨ello, Steven T. Cox, Daniel Ramon, Carlos Vullo, and J. Alejandro Madrigal ABSTRACT: Human leukocyte antigen (HLA) class I polymorphism was studied within a population of 70 unrelated Kolla Amerindians from the far northwest of Argentina close to the Bolivian border. The results indicate that the HLA-A, -B, and -C alleles typical of other Amerindian populations also predominate in the Kolla. These alleles belong to the following allele groups: HLAA*02, *68, *31, *24, HLA-B*35, *15, *51, *39, *40, *48, and Cw*01, *03, *04, *07, *08, and *15. For the HLA-A locus, heterogeneity was seen for HLA-A*02 with A*0201, *0211, and *0222; and for A*68 with *68012 and *6817, the latter being a novel allele identified in this population. Analysis of HLA-B identified heterogeneity for all Amerindian allele groups except HLA-B*48, including the identification of the novel B*5113 allele. For HLA-C heterogeneity was identified within the Cw*07,
ABBREVIATIONS SSOP sequence specific oligonuclotide probing RSCA reference strand conformation analysis HLA human leukocyte antigen
INTRODUCTION Human leukocyte antigen (HLA) genes located within the human major histocompatibility complex (MHC) on From the Anthony Nolan Research Institute (A.-M.L., I.S., S.P., S.G.E.M., R.A., S.T.C., D.R., J.A.M.), The Royal Free Hospital, London, United Kingdom; Department of Haematology (A.-M.L., S.G.E.M., J.A.M.), The Royal Free Hospital School of Medicine, London, United Kingdom; HLA Laboratory (S.P., C.V.), Hospital Nacional de Clinicas, Cordoba, Argentina; Immunobiology Department (R.A.), Faculty of Medicine of Torre´on, Universidad Autonama de Coahuila, Torre´on, Coah, Mexico; and the Imperial College School of Medicine (J.A.M.), London, United Kingdom. Address reprint requests to: Professor J. A. Madrigal, Anthony Nolan Research Institute, The Royal Free Hospital, Pond Street, London NW3 2QG, United Kingdom; Tel: ⫹44 (020) 7284-8315; Fax: ⫹44 (020) 7284-8331; E-mail:
[email protected]. Received October 26, 2000; revised November 15, 2000; accepted November 21, 2000. Human Immunology 62, 170 –179 (2001) © American Society for Histocompatibility and Immunogenetics, 2001 Published by Elsevier Science Inc.
*04, and *08 groups with Cw*0701/06, *0702, *04011, *0404, *0803, and *0809 identified. The most frequent “probable” haplotype found in this population was B*3505-Cw*04011. This study supports previous studies, which demonstrate increased diversity at HLA-B compared with HLA-A and -C. The polymorphism identified within the Kolla HLA-A, -B, and -C alleles supports the hypothesis that HLA evolution is subject to positive selection for diversity within the peptide binding site. Human Immunology 62, 170 –179 (2001). © American Society for Histocompatibility and Immunogenetics, 2001. Published by Elsevier Science Inc. KEYWORDS: HLA; Kolla Amerindians; polymorphism; evolution
NK MHC
natural killer cells major histocompatibility complex
chromosome 6 encode polymorphic HLA class I and II molecules [1]. These molecules play a crucial role in the development of immune responses against infectious pathogens, and malignant tissue through their interaction with effector cells such as T cells and natural killer cells (NK). The polymorphism exhibited by HLA class I and II molecules is localized mainly to the functional domain of the molecule, the peptide binding site, such that each HLA class I and II allotype can bind thousands of different peptides derived from intracellular and extracellular proteins, respectively. It is variety in the types of peptides that bind to HLA molecules that trigger reaction from circulating T cells, whereas the role of bound peptide in interactions with NK cells is less clear. With over 1300 HLA class I and II alleles defined, it is 0198-8859/01/$–see front matter PII S0198-8859(00)00248-2
HLA Class I Diversity in Kolla Amerindians
clear that not every population shares the same alleles, such that particular alleles are found within different ethnic groups. One theory for the extensive polymorphism found at HLA loci is that positive selection for advantageous HLA molecules has occurred during human evolution and that this selection has been affected by the local antigenic stimuli experienced by individual populations, reviewed in Little et al. [2]. Thus, the extreme level of polymorphism at HLA loci lends itself to the genetic studies of populations. Despite the role of HLA in antigen presentation, and the focusing of allelic variation to within the encoded antigen binding site, direct evidence of pathogen driven selection of allotypes is extremely scarce. Assuming HLA polymorphism is maintained by pathogen driven selection there are several reasons why this is so difficult to observe. One crucial factor is that most of the world’s current populations are the product of genetic admixture resulting in a large number of HLA specificities at low frequencies for a given locus. By reducing the genetic complexity of the population, evidence of selective forces in maintaining HLA diversity may be unmasked. In the case of HLA studies within Amerindian populations studied, insight has been gained both in the genetic relationship of tribes and the molecular evolution and probable function of HLA loci [3–5]. Archaeologic and genetic studies suggest the peopling of the Americas first took place between 11,000 and 40,000 years BP [4 –7]. The descendants of this population have restricted diversity within many genetic systems including the HLA, suggesting either a small founder population or a genetic bottleneck sometime between the migration from Siberia and the populating of the Americas. Studies of HLA class I polymorphism within Amerindian populations have been particularly illuminating. Despite the low level of polymorphism of HLA class I within indigenous populations such as genetically isolated Amerindian tribes, sequencing studies identified many variants not seen outside central and South America. These variants are more predominantly found for HLA-B and are generally the product of recombination between probable founder alleles [8]. The study of HLA polymorphism within genetically isolated groups is useful in anthropologic and genetics studies. The information gleamed also has an impact on improving our understanding of the range of diversity found within human populations. This aids ultimately in the selection of donors for both solid organ and stem cell transplantation, particularly when donor and recipient differ in ethnic origin. Additionally, the knowledge of regional HLA diversities will play an important role in the selection of immunotherapy approaches to treat various regional-specific pathogens. This study presents an analysis of the HLA class I
171
polymorphism of the Kolla people, who inhabit a remote desert region, at an altitude of 3780 m in the far northwest Argentina, close to the Bolivian border. The current population of approximately 3000 individuals are largely endogamous, although some admixture with Bolivian Kolla is known. The tribe now speak Quechua and Aymara, languages that belong to the Andean linguistic group, reflecting influences from Bolivian and Peruvian populations.
METHODS AND MATERIALS Sample Collection Peripheral blood lymphocytes were collected from 70 unrelated individuals. Where possible, these individuals were the mother and father of the family. Efforts were taken to ensure siblings of those already sampled were excluded. DNA was extracted by standard salting-out protocols. Polymerase Chain Reaction–Sequence Specific Oligonuclotide Probing Assay HLA-A and -C typing was performed using methods previously described [9, 10]. Typing for HLA-B was performed using the method described previously [11] with slight modifications: The 3⬘ primers D1 (GCG GCG GTC CAG GAG CT) and D2 (GCG GCG GTC CAG GAG CG) were used independently in combination with the 5⬘ primer. This allowed the separate amplification of alleles to reduce oligotyping ambiguities in certain heterozygous combinations. Further polymerase chain reaction–sequence specific oligonuclotide probing (PCR-SSOP) subtyping was performed for HLA-A2 and B35 positive pairs as described [12, 13]. Reference Strand Conformation Analysis Allelic level typing was performed by reference strand conformation analysis (RSCA) for HLA-A, -B, and -C loci as previously described [14]. Sequence-Based Typing Direct sequencing of HLA-A, -B, and -C products was performed as previously described [15]. Frequency Analysis Frequencies corresponding to DNA-defined alleles were calculated by counting the number of subjects possessing each allele/sequence, assuming that individuals typed as homozygous possessed two copies of the identical allele/ sequence.
172
A.-M. Little et al.
TABLE 1 HLA-A frequencies within the Kolla population Broad HLA-A specificity and frequency
na
A*02
0.514
72
A*68
0.157
22
A*31 A*24 A*03
0.128 0.078 0.071
18 11 10
A*29b A*74 A*33b A*01
0.028 0.007 0.007 0.007
4 1 1 1
HLA-A allele specificity and frequency A*0201 A*0211 A*0222 A*68012 A*6817 A*68a A*31012 A*24021 A*03011c A*03011 A*0302
0.493 0.007 0.014 0.088 0.051 0.021 0.128 0.078 0.043 0.021 0.007
A*7401/02
0.007
A*0101
0.007
TABLE 2 HLA-B frequencies within the Kolla population Broad HLA-B specificity and frequency
n 69 1 2 13 7 2 18 11 5 4 1 4 1 1 1
n ⫽ number of individuals tested. Allelic typing not performed. c Intron 2 variation found in this allele.
na
B*35
0.314
44
B*15
0.236
33
B*51
0.136
19
B*39
0.107
15
B*40
0.064
9
B*49b B*48 B*44 B*07
0.064 0.036 0.036 0.007
9 5 5 1
a
b
RESULTS Allele/Specificity Frequencies SSOP typing of the 70 Kolla Amerindian DNA samples provided low to medium level resolution of HLA class I type. These results (Tables 1, 2, and 3) indicate that the HLA-A, -B, and -C specificities typical of other Amerindian tribes studied also predominate in the Kolla. These Amerindian specificities, HLA-A*02, *68, *31, *24, HLA-B*35, *15, *51, *39, *40, *48, and HLACw*01, *03, *04, *07, *08, and *15 have a cumulative frequency of 0.877, 0.893, and 0.892, respectively. The frequency for HLA-A is lower than HLA-B as there are individuals who possess a non-Amerindian HLA-A type but possess HLA-B and -C types considered to be Amerindian, e.g., A*03, B*35, Cw*04. Other specificities were found and these probably represent a low level of admixture with non-Amerindian HLA types (HLAA*03, *29, *74, *33, *01; HLA-B*1503, *49, *44, *07). The HLA-Cw*07 specificity was further resolved and found to encompass Cw*07011 on haplotypes with B*49 and one example of Cw*0702 with B*07. These HLA-Cw*07 alleles are therefore considered as non-Amerindian and have been accounted for when calculating the HLA-C Amerindian gene frequency above. Other non-Amerindian HLA-C alleles identified are Cw*1203 and *1601. Although the level of resolution of PCR-SSOP was limited, it was clear that within specificities there was heterogeneity. To further resolve the HLA class I polymorphism within the Kolla, high resolution HLA typing
HLA-B allele specificity and frequency
n
B*3505 B*3501 B*3506 B*35b B*1501 B*1504 B*1522 B*1539 B*1503 B*15 B*51011 B*51131 B*3903 B*3912 B*39013 B*3909 B*39061 B*39vc B*4004 B*4008 B*4002
0.200 0.071 0.036 0.007 0.114 0.093 0.007 0.007 0.007 0.007 0.114 0.021 0.050 0.021 0.014 0.007 0.007 0.007 0.036 0.014 0.014
28 10 5 1 16 13 1 1 1 1 16 3 7 3 2 1 1 1 5 2 2
B*4801 B*4403 B*0702/04
0.036 0.036 0.007
5 5 1
a
Number of individuals tested. Allelic typing not done. c Discrepant result between RSCA and SSO, which could not be repeated due to lack of DNA. b
was performed initially by RSCA. RSCA identifies allelic variantion based on the differential mobility of DNA fragments (with different nucleotide sequence) in a nondenaturing polyacrylamide gel [16]. Heterogeneity in RSCA mobility values for DNA samples belonging to the different specificity groups defined by SSOP were further characterized by direct sequencing of exons 2 and 3 of selected samples allowing the assignment of allelic names to each variant detected. From this, those variants with the same duplex mobility as determined by RSCA were assumed to represent the same allele. In all cases where more than one sample of the same mobility was sequenced the same allele was identified. Using this approach 12 HLA-A, 22 HLA-B, and 11 HLA-C locus alleles were identified (Tables 1, 2, and 3). For HLA-A, heterogeneity was seen for HLA-A*02 and *68 with three and two alleles identified respectively (Table 1). The most frequent HLA-A*02 allele, A*02011 (allele frequency 0.493) is found at high frequencies within Caucasoid individuals, but has also been found in other Amerindian groups, whereas A*0211 (one individual) and A*0222 (two individuals) are restricted
HLA Class I Diversity in Kolla Amerindians
173
TABLE 3 HLA-C frequencies within the Kolla population Broad HLA-C specificity and frequency
na
Cw*04
0.312
43
Cw*01
0.210
29
Cw*07
0.181
25
Cw*03
0.123
17
Cw*15
0.109
15
Cw*08
0.029
4
Cw*16
0.029
4
Cw*1203
0.007
1
HLA-C allele specificity and frequency Cw*04011 Cw*0404 Cw*04b Cw*0102 Cw*01b Cw*07011c Cw*0702 Cw*03041 Cw*03b Cw*15021 Cw*15b Cw*0803 Cw*0809 Cw*1601 Cw*16b Cw*1203
0.072 0.007 0.232 0.029 0.181 0.065 0.116 0.029 0.094 0.036 0.072 0.007 0.021 0.014 0.014 0.007
n 10 1 32 4 25 10 15 4 13 5 10 1 3 2 2 1
a Number of individuals studied (HLA-C typing was not performed for one individual). b Allelic typing not done. c HLA-Cw*07011, *07012, and *0706 were not distinguished as their differences lie outside of exons 2 and 3, which were targeted by the methods used.
to Amerindian populations and admixed populations derived from them. HLA-A*68012 (allele frequency 0.088) has been found in other South American Amerindian groups [3], and the second A*68 allele identified in the Kolla, A*6817 (allele frequency 0.051) is a novel HLA-A allele that has not yet been described in any other population [17]. Within the presumed non-Amerindian specificities, heterogeneity was identified within the A*03 group, both A*03011 and A*0302 alleles were identified. Further variation was also identified within intron 2 of A*03011 with both known intron 2 variations found within the Kolla Amerindians (unpublished data). The A*03011 allele was identified on haplotypes containing supposedly Amerindian HLA-B and -C alleles, whereas the A*03011 intron 2 variant allele was always detected on haplotypes with B*4901 and Cw*07011. This finding suggests that this variant allele was a more recent introduction to the Kolla population and originated from one non-Amerindian individual or family. RSCA analysis of HLA-B identified heterogeneity for all Amerindian specificities except HLA-B*48 (Table 2). No RSCA variation was detected in the specificities of presumed non-Amerindian origin. The most heterogenous groups within the HLA-B locus are the B*15 and B*39 groups, each with five different alleles identified. Within the B*15 group, the alleles (with respective allele frequencies), B*1501 (0.114) and B*1504 (0.093)
predominate. B*1501 is found in various different ethnic populations including Caucasoids, Orientals, and Amerindians; whereas B*1504 appears restricted to Amerindian populations [3, 18]. The other B*15 alleles identified, B*1522 [19, 20] and B*1539 [21], have also been described within Amerindian populations and each are found in a single Kolla individual. The finding of B*1503 within one Kolla individual is unusual, as B*1503 is generally found within populations of African origin and its presence here may be the result of recent admixture. The same individual also possesses the A*7401/02 allele (predominantly found in African populations) identified in the HLA-A locus analysis. All of the B*39 alleles identified are typical of other Amerindian tribes. The most frequent B*39 allele found was B*3903 [18] (allele frequency 0.050), followed by B*3912 [22] in three individuals; B*39011/013 in two individuals; and B*3909 [23] and B*39061 [19] each in a single individual. One B*39 allele was not subtyped due to lack of DNA to resolve a discrepancy between RSCA and SSOP results, but may represent an additional novel allele. Following B*15 and B*39, the next most heterogeneous groups of alleles are the B*35 and B*40 groups. The B*35 group contains B*3505 (allele frequency 0.200), B*3501 (allele frequency 0.071), and B*3506 (allele frequency 0.036). Both B*3505 and B*3506 are typical Amerindian alleles [3], whereas B*3501 is found in various different ethnic groups. The B*40 group contains B*4004, B*4008, and B*4002, all of which have been described in either South/Central American populations including Amerindian [3, 24]. The B*51 group of alleles was shown to consist of the B*51011 allele, which has been detected in many different ethnic groups and a previously unknown B*51 allele: B*51131 was detected within three Kolla individuals [25]. HLA-B*51131 differs from B*51011 by two nucleotide substitutions one synonymous, the other nonsynonymous results in a single amino acid difference at residue 116 in the HLA-B51 molecule’s peptide binding site. Heterogeneity was observed within the Cw*04, *07, *08 groups with Cw*04011, *0404, *07011, *0702, *0803, and a novel allele Cw*0809 identified. Both Cw*04011 and Cw*0404 have been described in Amerindian populations. In the Kolla population all Cw*04 alleles are found in association with B*35 alleles, with the Cw*0404 found in association with B*3501 in a single individual, all other B*3501 haplotypes studied were found with Cw*04011. A novel HLA-Cw*08 allele, Cw*0809 was identified in three out of four Kolla individuals possessing the Cw*08 specificity, the second Cw*08 allele was identified as Cw*0803. Cw*0809 was identified initially by sequence based typing and the
174
A.-M. Little et al.
TABLE 4 Nucleotide and amino acid differences between Cw*0801, *0803, and *0809
Cw*0801 Cw*0803 Cw*0809 a
Nuca 29
AAb ⫺15
Nuc 412
AA 114
Nuc 419
AA 116
Nuc 595
AA 99
Nuc 598
AA 122
T — C
Ile — Thr
A — G
Asn — Asp
T — C
Phe — Ser
G A —
Gly Arg —
A — G
Lys — Glu
Nuc ⫽ nucleotide; b AA ⫽ amino acid; — indicates identity with Cw*0801.
the specificities identified (Table 5). The three HLAA*02 alleles differ within pockets C, D, and E, none associate with anchor residues of the bound peptide. However, the substitution of threonine and histidine in A*0201 at residues 73 and 74 for isoleucine and aspartic acid in A*0211 results in a significant change from a positive charge to negative. The change from an aliphatic leucine in A*0201 to aromatic tryptophan in A*0222 is also significant. HLA-A*68012 and A*6817 differ by a single amino acid residue within pocket F. This change from a negative aspartic acid in A*68012 to valine in A*6817 is likely to define the carboxy-terminus residue of the bound peptides. A*6801 binds peptides that predominantly have a positive charge at their carboxyterminus, such as arginine and lysine [26]. The valine found in A*6817 at residue 116 is not seen in other HLA class I allotypes, and this novel replacement is likely to alter the repertoire of peptides that bind the A*6817 allotype compared with A*6801. Greater diversity is found within the specificities encoded by the HLA-B locus (Table 6). The HLA-B*15 group is represented by the founder allele, B*1501 (Table 8), and the Amerindian alleles B*1504, B*1522, and B*1539. HLA-B*1504 differs from B*1501 by two amino acids at residues 95 and 97. The change of a small aliphatic leucine in B*1501 to a bulky tryptophan at residue 95 in B*1504, together with the change of a positively charged arginine to threonine at 97, will affect the structure of pockets C and E of the peptide binding site.
sequence obtained was confirmed by full length sequencing of cloned genomic Cw*0809 (data not shown). All Cw*08 alleles were found in association with B*4801. Cw*0809 differs from Cw*0803 by five nucleotides and from Cw*0801 by four nucleotide substitutions at positions 29, 412, 419, and 598. These four substitutions result in four amino acid differences at residue -15 within the leader peptide, residues 114 and 116 on the base of the peptide binding site and residue 176 at the end of the ␣2 helix. These differences are summarized in Table 4. Function of Polymorphism Identified All alleles identified differ from other related alleles (within each specificity group defined by the first two digits of the allele name) by at least one amino acid residue located within the peptide binding site of the encoded HLA class I molecule. This suggests that the different allotypes will have distinct (albeit subtle) peptide-binding properties. Most of the polymorphic residues listed have been shown to be involved in peptide contact with reference to the A*0201 crystal structure. The three-dimensional structure of HLA class I molecules has identified six “pockets” within the peptide binding region named A, B, C, D, E, and F, and the residues that potentially contact these pockets are indicated in Tables 5, 6, and 7. Pockets B and F have been defined as anchor pockets that bind to position 2 and the carboxy-terminus position of the bound peptide, respectively. At the HLA-A locus, there is limited diversity within
TABLE 5 HLA-A polymorphism at amino acids within peptide binding pockets Pockets A Residue: A*0201 A*0211 A*0222 A*24021 A*31012 A*68012 A*6817
B
C
D
E
F
63 66 99 167
9
63 66 70
99
9
70 73 74 97 99 114 156 97 114 156 77 80 81 95 116
E K Y — — — — — — — — F — N — N N — N N —
F — — S T Y Y
E K H — — — — — — — — — — N — N N Q N N Q
Y — — F — — —
F — — S T Y Y
H — — — — Q Q
W — — G — — —
T H R Y I D — — — — — — — D M F I D M — — D M — — D M —
H — — — Q R R
L — W Q — W W
R — — M M M M
H — — — Q R R
L — W Q — W W
D — — N — — —
T — — I — — —
L V — — — — A L — I — I — I
Y — — — D D V
HLA Class I Diversity in Kolla Amerindians
175
TABLE 6 HLA-B polymorphism at amino acids within peptide binding pockets Pockets A
B
C
D
E
F
Residue 63 99 163 171 9 24 45 63 67 99 9 74 97 99 114 156 97 114 152 156 77 80 81 95 116 143 147 B*1501 B*1504 B*1522 B*1539 B*3501 B*3505 B*3506 B*3901 B*3903 B*3906 B*3909 B*3912 B*4002 B*4004 B*4008 B*4801 B*5101 B*5113
E — N — N N N N N N N N — — N — N N
Y — — — — — — — — — S — — — — — — —
L — — — — — — T T T T T E E E E — —
Y — — — — — — — — — — — — — — — H H
Y — — — — — — — — — — D H H H — — —
A M E S Y Y Y R Y — — — — — — — T — — T N F — — — — — — — — — — — — — — — T N F — — — — — — T N F — — — S — — T N F — — — — — S E N C — — D — — S E N C — — D S — S E N C — — D T — S E N C S — D — S S E N C — D D — — T K E — — H — S — T K E — — H — — — T K N F — H — S — S E — — — — — S — — T N F — — — T — — T N F — — — T —
The ␣1 domain of HLA-B*1522 is identical to that of B*3501 (whereas the remainder of the heavy chain is identical to B*1501). Therefore the B anchor pocket of the peptide binding site of B*1522 is likely to be very different. The B*1501 allotype has a preference for glutamine and leucine residues at position 2 of the bound peptide, whereas B*3501 and presumably B*1522 bind peptides with a proline at this position. B*1539 differs from B*1501 by a single amino acid at residue 156, where the change from a bulky tryptophan to leucine is likely to influence the peptide binding pockets D and E. The HLA-B*35 and B*40 allele groups are represented by a founder allele and two Amerindian alleles. All three HLA-B*35 alleles possess a conserved B pocket. In B*3506, the carboxy-terminal anchor, pocket
D — — — — — N N N N N N N N N N N N
W — — L L L L L L L L L L L L L L L
R T — — — S — — S T — — S — S S T T
D — — — — — N N N N N N N N N N N N
E — — — V V V V V V V V V V V V — —
W — — L L L L L L L L L L L L L L L
S — — — — — — — — — — — — — — — N N
N — — — — — — — — — — — — — — — I I
L — — — — — — — — — — — — — — — A A
L W — — I — I — — W — — — I — — W W
S — — — — — F F F F F F Y Y Y Y Y F
T — — — — — — — — — — — — — — S — —
W — — — — — — — — — — — — — — L — —
F is altered by a substitution of a serine (B*3501 and B*3505) for a phenylalanine that potentially may limit the types of peptides that can bind. Further differences exist within nonanchor pockets (C, D, and E), which potentially could influence the acceptable repertoire of peptides that can bind. The peptide binding motifs of HLA-B*3501 and *3505 have been defined and compared [27]. This analysis demonstrated the effect differences within nonanchor pockets has on peptide binding, with the B*3505 allotype exhibiting a preference for positively charged residues at positions 6 and 7 of the bound peptide. The B*3501 allotype did not show such a preference for positively charged amino acids at positions 6 and 7. This is likely due to the presence of positively charged arginine at residue 97 of the B*3501 encoded heavy chain, which is located in the region
TABLE 7 HLA-C polymorphism at amino acids within peptide binding pockets Pockets A
B
C
D
E
F
Residue:
66
99
163
9
24
66
99
9
73
97
99
113
114
97
114
152
156
77
80
95
116
147
Cw*0102 Cw*0304 Cw*0401 Cw*0404 Cw*0702 Cw*0803 Cw*0809 Cw*1502
K — — — — — — N
C Y F F S Y Y Y
T L — — — — — —
F Y S S D Y Y Y
S A A — A A A A
K — — — — — — N
C Y F F S Y Y Y
F Y S S D Y Y Y
T — A A A — — —
W R R R R R R R
C Y F F S Y Y Y
Y — — — — — — H
D — N N — N N —
W R R R R R R R
D — N N — N N —
E — — — A T T —
R L — L L L L L
S — N N — — — N
N — K K — — — K
L I — — — — — I
Y — F F S F F L
W — — — L — — —
176
A.-M. Little et al.
TABLE 8 Potential founder and donor alleles for Kolla Amerindian alleles Amerindian allele
Recipient (founder) allele
A*0211 A*0222 A*6817
A*02011 A*02011 A*68012
B*1504 B*1522 B*1539 B*3505 B*3506 B*3903 B*39061 B*3909 B*3912 B*4004 B*4008 B*51131 Cw*0404
B*1501 B*1501 B*1501 B*3501 B*3501 B*39011/13 B*39011/13 B*39011/13 B*39011/13 B*4002 B*4002 B*51011 Cw*0401
Cw*0803 Cw*0809
Cw*0801 Cw*0801
Donor allelea A*31012 A*68012 Point mutation or B*3506, B*39011,c or various HLA-C B*51011/131 B*3501 B*3501, B*39011, B*4002, B*51011 B*4002, *4801 B*39011 B*4002, *4801 B*51011/131, B*1504 Cw*0702/03/13 Cw*0602/04, Cw*07,d Cw*18 B*3501 B*1522, B*39011,c B*51011/131 B*39011c Point mutation or Cw*03041, Cw*0702, Cw*0803, Cw*15021 Point mutatione Point mutation Point mutation Cw*0702, B*1501/04/22/39, B*3501/05
Minimum exchangeb
Maximum exchangeb
290–292 539 419
271–354 419–805 various
354–369 1–272 538–539 363 412–419 363 354–362 368–369 97–126 353–379 259–272 419–435 539
320–411 1–352 various 273–378 388–543 293–362 various 342–378 74–174 273–386 various 370–526 various
595 29 598 412–419
none none none
a
Donor alleles considered from other Kolla alleles. Nucleotide number from exon1-7/8, intron sequences were not included in the numbering. c Other Kolla B*39 alleles. d Not Cw*0703. e Also seen in Cw*0806. b
where peptide positions 6 and 7 interact with the class I heavy chain. HLA-B*4004 differs from B*4002 by four amino acids at residues 94, 95, 97, and 103, of which residues 95 and 97 are thought to directly interact with bound peptide via the defined pockets C and E. B*4008 differs by two amino acids from the founder B*4002 allele at residues 63 and 67. These substitutions of a glutamate and serine for asparagine and phenylalanine, respectively, involve significant changes in both size and charge and are likely to evoke a distinct peptide binding profile for B*4008 through the effect on anchor pocket B. As previously described a novel HLA-B*51 allele, B*5113 has been described in the Kolla, which has a single coding change from that of HLA-B*51011. The effect of this substitution from tyrosine to phenylalanine at position 116 is unknown, but does result in the F pocket being less polar. Similar to the other allele groups, the HLA-B*39 alleles exhibit polymorphism, resulting in probable alterations in peptide-binding properties. The founder allotype B*3901, differs from B*3903 and B*3909 by a single amino acid residue at positions 97 and 99, respectively. The change from an arginine in B*3901 to a serine in B*3903 will impact on pockets C and E
within the peptide binding site, whereas the change from a tyrosine in B*3901 to a serine in B*3909 will impact on pockets A, B, and D. B*3906 and B*3912 differ from B*3901 by two amino acids at positions 95 and 97 for B*3906 and residues 9 and 11 for B*3912. The change from leucine 95 and arginine 97 in B*3901 to tryptophan and threonine in B*3906 is likely to affect peptide binding pockets C, E and F. The change from an aromatic tyrosine at residue 9 in B*3901 to acidic aspartate in B*3912 will affect the properties of pocket B, whereas the effect of substituion at residue 11 (serine to alanine) will have less of an impact on peptide binding. HLA-Cw*04011 and *0404 allotypes differ by a single amino acid at residue 156, which interacts with pocket E of the peptide-binding site (Table 7). The change from a positively charged arginine in Cw*0401 to leucine in Cw*0404 is likely to alter the characteristics of bound peptides without affecting the anchor residues. Peptide position 7 interacts with pocket E, and a T-cell epitope has been defined as possessing negatively charged glutamate at this position [28]. Therefore it is likely that the loss of the positive charged amino acid at residue 156 may prevent the binding of peptides with
HLA Class I Diversity in Kolla Amerindians
negative amino acids at position 7. Thus Cw*0401 and *0404 allotypes may present peptides with subtle differences to T-cell receptors. Cw*0803 and Cw*0809 differ by two amino acids predicted to contact peptidebinding properties. The first difference of asparagine in Cw*0803 to negatively charged aspartic acid in Cw*0809 at residue 114 is likely to influence pocket D and pocket F, which influence the binding of peptide positions 3 and 7, respectively. The second difference is a change from aromatic bulky phenylalanine in Cw*0803 to serine in Cw*0809 at residue 116. This substitution will enlarge the available space within pocket F of Cw*0809 allotype and may have a significant affect on the carboxyl-terminus of bound peptides. Origin of Kolla Amerindian Alleles The genetic mechanisms that have given rise to the polymorphism exhibited within the Kolla Amerindians are probably the result of point mutation and recombination events between alleles, predominantly allele conversion. Within the HLA-A, -B, and -C groups of alleles identified, putative founder alleles are found (Table 8). Previous studies of genetically isolated Amerindian tribes have found that founder alleles appear to be replaced by newly arisen variants such that the frequency of Amerindian (new) alleles is greater than that of founder (old) alleles. Within the Kolla Amerindians, founder alleles predominate within some groups (A*0201, A*68012, B*1501, B*51011, Cw*04011), whereas in other groups the founder alleles do not predominate (B*3501, B*39011/013, B*4002) or are not found as with Cw*0801 (Table 8). The novel valine found at residue 116 of A*6817 may be the result of point mutation within A*68012, but may also have arisen through interlocus gene conversion with several potential HLA-B and -C donor alleles. Gene conversion is likely to also have played a part in the generation of HLA-B*3909 and B*3912. It has previously been described that HLA-B*3909 is probably the product of recombination between HLA-B*39011 and a Cw*07. This also appears to be the case with HLAB*3912, as the three nucleotide differences from B*39011 have only been identified in HLA-Cw*0602, Cw*07, and Cw*18 of which only Cw*0702 is generally found in Amerindians. Haplotypes Potential Amerindian HLA-B-C haplotypes were discerned by direct counting and comparison with known haplotypes (Table 9). The most frequent haplotype is B*3505-Cw*04011 observed in 28 of 136 haplotypes studied. The HLA-B-C associations were easy to discern.
177
TABLE 9 HLA-B-C presumed haplotypes in the Kolla Amerindians Haplotype
n
Frequency (n ⫽ 68)a
B*3505-Cw*04(011) B*3501-Cw*04(011) B*3506-Cw*04(011) B*3501-Cw*0404 B*15011-Cw*01(02) B*1504-Cw*01(02) B*1504-Cw*03041 B*15-Cw*03041 B*1522-Cw*01(02) B*1539-Cw*03(041) B*51011-Cw*15(021) B*5101-Cw*03041 B*5113-Cw*15(021) B*3903-Cw*0702 B*3912-Cw*0702 B*39011/013-Cw*0702 B*39061-Cw*0702 B*3909-Cw*0702 B*39v-Cw*0702 B*4004-Cw*03(041) B*4008-Cw*03(041) B*4002-Cw*03(041) B*4801-Cw*0809 B*4801-Cw*0803 B*4801-Cw*03 B*4403-Cw*1601 B*1503-Cw*1203 B*4901-Cw*0701 B*0702/04-Cw*0702
28 10 5 1 16 10 1 1 1 1 12 3 2 7 3 2 1 1 1 5 2 2 3 1 1 5 1 9 1
0.206 0.074 0.037 0.007 0.118 0.074 0.007 0.007 0.007 0.007 0.088 0.022 0.015 0.051 0.022 0.015 0.007 0.007 0.007 0.037 0.015 0.015 0.022 0.007 0.007 0.037 0.007 0.066 0.007
a No. of individuals included in analysis. From the total of 70 individuals, no HLA-C data was available for one individual, and for a second individual the haplotypes could not be discerned (HLA-B*51011, *5113, Cw*03041, *15021).
All three B*35 alleles were found associated with Cw*04011, with a single B*3501 associated with Cw*0404. All six B*39 alleles were found associated with Cw*0702 and all three B*40 alleles were found on haplotypes with Cw*03041. For the B*15 alleles, all B*1501, B*1504 alleles, and B*1522 were found associated with Cw*01, with the exception of a single B*1504 allele and an undefined B*15 allele that were associated with Cw*03. B*51011 and B*5113 were most commonly found on haplotypes with Cw*15021. Where multiple haplotypes were found for alleles presumed to be introduced to the Amerindian population within the last 500 years, all haplotypes were identical. This extended to HLA-A for the B*4403-Cw*1601 haplotype as all four individuals possessed A*29. For the B*49-Cw*0701 haplotype, three individuals possessed A*0201 with the remaining six possessing the A*03011 intron 1 variant allele.
178
DISCUSSION Studies of Amerindian HLA polymorphism using high resolution techniques have been illuminating regarding the diversity of HLA found within these indigenous populations. The general conclusion from the various studies that have been performed, including the present study, is that each group of Amerindians possesses a limited number of HLA-A and -B alleles as defined by the first two digits of the allele name. These two digits tend to correspond to serologic reactivity of the encoded HLA molecule. However, within each two digit specificity defined, further diversity has been identified, with HLA-B being more diverse than HLA-A, and HLA-C being the least diverse. Within the groups of Amerindians so far studied, numerous novel HLA-A, -B, and -C alleles have been identified, most of which have not been seen outside of Amerindian (or populations admixed with Amerindians) populations. This finding supports the theory that the diversity seen at HLA loci within the Amerindians has occurred since the peopling of the American continent. As not all of the Amerindian groups studied share identical alleles, this would suggest that diversification of HLA loci occurred after separation of the founding population into localized populations or tribes. This finding supports the hypothesis that the diversity found is the result of selection on alleles that are of benefit to the population such that an immune response against local pathogens can be surmounted. Indeed, all of the Amerindian alleles found within the Kolla individuals studied differ from the closest related allele by nonsynonymous nucleotide substitutions that encode changes in the amino acids located in and around the peptide binding site. With the exception of the novel alleles A*6817, B*5113, and Cw*0809 all Kolla alleles have been described in other South American populations, but have not necessarily been found in North American Amerindians. The founder alleles have also been found in various other populations including North American Amerindians. One explanation for the difference between alleles found in North and South American Amerindians is that South American Amerindians were founded by a population who arrived on the Southern American continent after migration from South East Asia or Japan. To support this hypothesis, alleles common to both South American Amerindians and South East Asian populations have been identified. However, none of the Kolla Amerindian alleles identified in this study have been found in South East Asian populations, and as the founder alleles for the Kolla are shared with north American Amerindians it seems more likely that the Kolla and North American Amerindians share common ancesters who migrated to South America via North America. The mechanisms that have generated the HLA diver-
A.-M. Little et al.
sity within the alleles found in the Kolla population include examples of all the known mechanisms thought to be involved in generating HLA polymorphism, i.e., point mutation, gene conversion, allele conversion, and single recombination. The polymorphism found within the Kolla HLA alleles is located to residues known to be polymorphic with the exception of A*6817 and Cw*0809, which differ from A*68012 and Cw*0801, respectively, by substitutions at positions previously not found to be polymorphic. Thus, this study supports other studies of Amerindian populations, and indeed all world populations, in that diversity at HLA-B loci exceeds that of HLA-A and HLA-C, such that selection for novel alleles at HLA-B is a more dynamic process than that of the other loci. The name Cw*0809 was officially assigned by the WHO Nomenclature Committee in September 2000. This follows the agreed policy that, subject to the conditions stated in the most recent Nomenclature Report [29], names will be assigned to new sequences as they are identified. Lists of such new names will be published in the following WHO Nomenclature Report. ACKNOWLEDGMENTS
This study was supported by the Anthony Nolan Bone Marrow Trust. The authors thank the volunteers that took part in this study through their donation of blood samples.
REFERENCES 1. Marsh SGE, Parham P, Barber LD: The HLA FactsBook. London: Academic Press, 2000. 2. Little A-M, Parham P: Polymorphism and evolution of HLA genes and molecules. Rev Immunogen 1:105, 1999. 3. Belich MP, Madrigal JA, Hildebrand WH, Zemmour J, Williams RC, Luz R, Petzl-Erler ML, Parham P: Unusual HLA-B alleles in two tribes of Brazilian Indians. Nature 357:326, 1992. 4. Gibbons A: Geneticists trace the DNA trail of the first Americans [news]. Science 259:312, 1993. 5. Neel JV, Biggar RJ, Sukernik RI: Virologic and genetic studies relate Amerind origins to the indigenous people of the Mongolia/Manchuria/southeastern Siberia region. Proc Natl Acad Sci USA 91:10737, 1994. 6. Wallace DC, Torroni A: American Indian prehistory as written in the mitochondrial DNA: a review. Hum Biol 64:403, 1992. 7. Cavalli-Sforza LL, Menozzi P, Piazza A: Demic expansions and human evolution [published erratum appears in Science 261:1508]. Science 259:639, 1993. 8. Parham P, Arnett K, Adams E, Little A, Tees K, Barber L, Marsh S, Ohta T, Markow T, Petzl-Erler M: Episodic evolution and turnover of HLA-B in the indigenous hu-
HLA Class I Diversity in Kolla Amerindians
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
man populations of the Americas. Tissue Antigens 50: 219, 1997. Oh SH, Fleischhauer K, Yang SY: Isoelectric focusing subtypes of HLA-A can be defined by oligonucleotide typing. Tissue Antigens 41:135, 1993. Kennedy LJ, Poulton KV, Dyer PA, Ollier WE, Thomson W: Definition of HLA-C alleles using sequence-specific oligonucleotide probes (PCR-SSOP). Tissue Antigens 46: 187, 1995. Middleton D, Williams F, Cullen C, Mallon E: Modification of an HLA-B PCR-SSOP typing system leading to improved allele determination. Tissue Antigens 45:232, 1995. Rufer N, Breur-Vriesendorp B, Tiercy J, Gauchat-Feiss D, Shi X, Slavcev A, Lardy N, Chapuis B, Gratwohl A, Speiser D, et al.: HLA-B35-subtype mismatches in ABDR serologically matched unrelated donor-recipient pairs. Hum Immunol 41:96, 1994. Gauchat-Feiss D, Rufer N, Speiser D, Jeannet M, Roosnek E, Tiercy JM: Heterogeneity of HLA-B35. Oligotyping and direct sequencing for B35 subtypes reveals a high mismatching rate in B35 serologically compatible kidney and bone marrow donor/recipient pairs. Transplantation 60:869, 1995. Argu¨ello J, Little A-M, Bohan E, Goldman J, Marsh S, Madrigal J: High resolution HLA class I typing by reference strand mediated conformation analysis (RSCA). Tissue Antigens 52:57, 1998. Cox ST, Arguello JR, Marsh SG, Lau M, Kwan PL, Madrigal JA, Little AM: Sequence of HLA-A*6808. Tissue Antigens 53:597, 1999. Argu¨ello R, Little A-M, Pay A, Gallardo D, Rojas I, Marsh S, Goldman J, Madrigal J: Mutation-detection and typing of polymorphic loci through double-strand conformation analysis. Nature Genetics 18:192, 1998. Ramon D, Scott I, Cox S, Pesoa S, Vullo C, Little A-M, Madrigal J: HLA-A*6817, identified in the Kolla Amerindians of North-West Argentina possessess a novel nucleotide substitution. Tissue Antigens 55:453, 2000. Watkins DI, McAdam SN, Liu X, Strang CR, Milford EL, Levine CG, Garber TL, Dogon AL, Lord CI, Ghim SH, et al.: New recombinant HLA-B alleles in a tribe of South American Amerindians indicate rapid evolution of MHC class I loci. Nature 357:329, 1992. Garber TL, Butler LM, Trachtenberg EA, Erlich HA, Rickards O, De Stefano G, Watkins DI: HLA-B alleles of the Cayapa of Ecuador: new B39 and B15 alleles [pub-
179
20.
21.
22.
23.
24.
25.
26.
27. 28.
29.
lished erratum appears in Immunogenetics 1995;42:308]. Immunogenetics 42:19, 1995. Martinez-Laso J, De Juan D, Martinez-Quiles N, GomezCasado E, Cuadrado E, Arnaiz-Villena A: The contribution of the HLA-A, -B, -C and -DR, -DQ DNA typing to the study of the origins of Spaniards and Basques. Tissue Antigens 45:237, 1995. Olivo-Diaz A, Gomez-Casado E, Gorodezky C, MartinezLaso J, Longas J, Gonzalez-Hevilla M, Alvarez M, ArnaizVillena A: A new HLA-B15 allele (B*1541) found in a Mexican of Nahua (Aztec) descent. Immunogenetics 48: 148, 1998. Marcos CY, Fernandez-Vina MA, Lazaro AM, Moraes ME, Moraes JR, Stastny P: Novel HLA-A and HLA-B alleles in South American Indians. Tissue Antigens 53:476, 1999. Ramos M, Postigo JM, Vilches C, Layrisse Z, Lopez de Castro JA: Primary structure of a novel HLA-B39 allele (B*3909) from the Warao Indians of Venezuela. Further evidence for local HLA-B diversification in South America. Tissue Antigens 46:401, 1995. Adams E, Little A, Arnett K, Leushner J, Parham P: Identification of a novel HLA-B40 allele (B*4008) in a patient with leukemia. Tissue Antigens 46:204, 1995. Scott I, Dunn PP, Day S, Pesoa S, Little AM, Madrigal JA, Vullo C: A novel HLA allele, HLA-B*5113, identified in the Kolla Amerindians of North-West Argentina. Tissue Antigens 53:194, 1999. Guo HC, Jardetzky TS, Garrett TP, Lane WS, Strominger JL, Wiley DC: Different length peptides bind to HLAAw68 similarly at their ends but bulge out in the middle. Nature 360:364, 1992. Kenneally A, Liang B, Barber L: The peptide-binding motif of HLA-B*3505. Immunogenetics 51:866, 2000. Wilson CC, Kalams SA, Wilkes BM, Ruhl DJ, Gao F, Hahn BH, Hanson IC, Luzuriaga K, Wolinsky S, Koup R, Buchbinder SP, Johnson RP, Walker BD: Overlapping epitopes in human immunodeficiency virus type 1 gp120 presented by HLA A, B, and C molecules: effects of viral variation on cytotoxic T-lymphocyte recognition. J Virol 71:1256, 1997. Bodmer JG, Marsh SG, Albert ED, Bodmer WF, Bontrop RE, Dupont B, Erlich HA, Hansen JA, Mach B, Mayr WR, Parham P, Petersdorf EW, Sasazuki T, Schreuder GM, Strominger JL, Svejgaard A, Terasaki PI: Nomenclature for factors of the HLA system, 1998. Tissue Antigens 53:407, 1999.