International Congress Series 1296 (2006) 106 – 114
www.ics-elsevier.com
Investigating the health of our ancestors: Insights from the evolutionary genetic consequences of prehistoric diseases Mark Stoneking * Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, D-04103 Leipzig, Germany
Abstract. Investigations into the health and diseases of prehistoric human populations are traditionally based on various approaches for analyzing skeletal remains. In this paper, I propose that a complementary approach, based on patterns of genetic variation in contemporary populations, can also provide insights into prehistoric health and disease. The idea is that selection for increased reproductive fitness, because of some selective force (such as resistance to a particular disease), will leave a signature on the gene(s) involved. I describe how modern genomic approaches can identify such genes, and two examples are given of genes that have clearly been influenced by selection. However, while modern genomics is making it relatively straightforward to identify genes that have been subject to selection, the challenge still remains as to how to identify the responsible selective force. Resumen. Las investigaciones sobre la salud y las enfermedades en las poblaciones humanas prehisto´ricas se han basado tradicionalmente en diversas aproximaciones al ana´lisis de los restos del esqueleto. En este capı´tulo propongo que una aproximacio´n complementaria, basada en los patrones de variacio´n gene´tica en las poblaciones contempora´neas, puede aportar tambie´n informacio´n sobre la salud y la enfermedad en la prehistoria. La idea es que la seleccio´n para una mayor eficacia reproductiva, por causa de alguna fuerza selectiva (como la resistencia a alguna enfermedad particular), dejara´ una huella en los genes implicados. Describo co´mo las modernas aproximaciones geno´micas pueden identificar tales genes, y se dan dos ejemplos de genes que han sido claramente influenciados por la seleccio´n. Sin embargo, mientras que la geno´mica moderna esta´ haciendo relativamente simple identificar los genes que han sido objeto de la seleccio´n, el reto sigue siendo co´mo identificar la fuerza selectiva responsable. D 2006 Elsevier B.V. All rights reserved. Keywords: Genome scan; Selection; Genetic distance; Lactase persistence; Prion protein
* Tel.: +49 341 3550 502; fax: +49 341 3550 555. E-mail address:
[email protected]. 0531-5131/ D 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.ics.2006.03.035
M. Stoneking / International Congress Series 1296 (2006) 106–114
107
1. Introduction Traditionally, there are a number of ways to investigate the health and disease status of prehistoric populations, all of which are based on skeletal remains. Skeletons can be examined directly for features that indicate the presence of disease or injury [1,2]. Even if such indications are absent, estimates of morbidity, mortality, and longevity can be obtained from the age-distribution of a sample of skeletal remains, thereby providing indirect evidence concerning the health of a prehistoric population [3]. More recently, direct analysis of ancient DNA obtained from skeletal remains has been used to detect the DNA of disease-causing organisms [4,5], or to look for genotypes associated with a particular disease, such as hemoglobinopathies [6]. While all of these methods have proven to be useful, they all have their drawbacks. Not all diseases/injuries are manifested in the skeleton, and individuals may harbor a disease for some time before manifesting symptoms of the disease in the skeleton; the sample of skeletons that manifest a particular disease therefore typically under-estimates the true incidence of the disease, potentially by a considerable amount. Accurate estimates of the morbidity, mortality, and longevity of a prehistoric population requires accurate methods for estimating the age of a skeleton, and moreover assumes that the sample of remains is a random sample of the people in the population; both of these are problematic [7]. Ancient DNA is plagued with contamination issues [8,9], and moreover while obtaining DNA from a pathogen may indicate the presence of the pathogen in the remains, it is not clear how to interpret negative results: when is the absence of evidence sufficient to indicate evidence of absence? Given these problems, it would seem useful to seek additional sources of insights into prehistoric health and disease. One such approach, that I will discuss here, is based on patterns of genetic variation in contemporary populations. The idea is that in the past, when confronted with infectious diseases (or novel selective forces), our ancestors evolved genetic resistance to such diseases (or evolved in response to the novel selective force), and that patterns of variation at the gene(s) involved will then differ from the typical pattern of variation. In other words, all of us carry within our genes a record of the history of those genes, and by searching for those genes that have unusual patterns of variation, we can learn something about the diseases and other selective forces that influenced prehistoric human populations. A classic example is sickle cell anemia, caused by an allele of the h-globin gene [10]. Homozygotes for the normal (A) allele are susceptible to malaria; homozygotes for the defective (S) allele suffer from high mortality due to anemia; but heterozygotes do not suffer from anemia and are resistant to malaria. This is a classic example of a balanced polymorphism, in which the reproductive fitness of the heterozygote is higher than the fitnesses of either homozygote, and therefore the S allele is maintained at a much higher frequency in African populations living in a malarial environment than would be expected, given that homozygotes for the S allele suffer such high mortality. The unusually high frequency of an allele that has a high mortality rate when homozygous was an indication that some selective force (in this case, resistance to malaria) must be maintaining this allele at such a high frequency, and thus indicates that malaria has indeed been a powerful selective force on human populations. Finding unusual patterns
108
M. Stoneking / International Congress Series 1296 (2006) 106–114
of variation for other genes therefore might provide insights into other important selective forces. 2. Finding unusual genes How can we go about finding genes that exhibit unusual patterns of variation? Two approaches have been used: the candidate gene approach and the genome scan approach. In the candidate gene approach, genes that are thought to have a potential influence on a particular phenotype, for example that might be related to resistance for a particular disease, are screened in appropriate populations, and departures from neutral expectations, that might indicate selection, are looked for. To date, all cases of genes that have been subject to selection in the past have been found by this approach, and in the following sections I will describe two such examples and what they tell us about prehistoric health and disease. A drawback of the candidate gene approach is that if one is either not very lucky or not very clever in choosing the set of candidate genes for investigation, one could end up doing a lot of work to investigate patterns of variation at these genes and not end up with anything of significance. There is therefore growing interest in the genome scan approach, which does not pre-suppose any information on phenotype, or any pre-selection of candidate genes. Instead, the genome scan approach involves screening a large number of marker loci across the genome in one or more populations, determining the average pattern of variation for the set of markers, and then looking for outliers, i.e. those markers that deviate significantly from the average pattern [11]. The marker loci are selected to be randomly spaced across the genome; the idea is that when selection occurs for a particular gene, any marker near the gene will also increase in frequency (Fig. 1), a phenomenon known as bhitch-hikingQ. The degree to which a nearby marker will show a hitch-hiking effect depends on the strength of the selection and how long ago the selection started; over time, recombination and new mutations will decrease the strength of the signal of the hitch-hiking effect. Thus, genome scan approaches are best suited for detecting relatively strong selective events that happened relatively recently in human prehistory [12]. As an example of the genome scan approach, my group had approximately 350 marker loci typed in a sample of Europeans and a sample of sub-Saharan Africans [13]. Fig. 2
Fig. 1. Cartoon illustrating the effect of selection on linked variation. Bars represent haplotypes (chromosomes) and circles represent mutations, with the dark bar and circle indicating the selected mutation (and haplotype), and open circles and light bars indicating non-selected mutations/haplotypes. (A) Before selection on the mutation indicated by the solid circle. (B) Immediately after selection, this haplotype has increased in frequency, accompanied by a decrease in variation (increase in homozygosity) surrounding the selected mutation. (C) With the passage of time following selection, recombination and new mutations increase the variation around the selected mutation.
M. Stoneking / International Congress Series 1296 (2006) 106–114
109
Fig. 2. Scatter plot of two measures of genetic differentiation between an African and a European population, for 332 loci (adapted from Kayser et al. [13]). LN-RV is the natural logarithm of the ratio in the allele size variance [37] for Africans vs. Europeans; large values of LN-RV indicate loci with much greater variance in Africans than in Europeans. Rst is a measure of genetic distance between populations [38]; large values of Rst indicate larger than average genetic differences between Africans and Europeans. There are several loci with high LN-RV and/or Rst values, and these indicate genomic regions that have potentially been subject to different selection in Europeans vs. Africans.
shows a plot of two measures of genetic distance between the Africans and Europeans for these marker loci; several outliers showing unusually large genetic distances are apparent. We are currently investigating some of the genes near these outliers, and preliminary analyses indicate that selection has indeed probably influenced the pattern of variation for at least some of these (D. Hughes and M. Stoneking, unpublished data). Genome scans are potentially quite powerful methods, as the number of available marker loci has increased enormously – a recent study published data on 1.5 million marker loci in three populations [14] – and there are several studies now that have utilized genome scans to look for evidence of selection [15,13,16,17]. The major drawback of the genome scan approach is that while it is relatively straightforward (albeit computationally intensive) to identify genes that exhibit unusual patterns of variation, and hence are candidates for selection, it is not at all straightforward to then identify the nature of the selective force or disease that is responsible for the altered pattern of variation (as my group is learning as we investigate potential candidates identified from genome scans!). 3. Case study I: Lactase persistence In most mammals, the ability to digest lactose (the major sugar present in milk) disappears soon after weaning. However, some humans retain the ability to digest lactose into adulthood, and the frequency of lactose persistence (LP) across populations correlates with dairying; the LP frequency is about 80% in European populations, but less than 10% in native American or African groups that do not practice dairying [18]. Moreover, within Europe, there is a gradient in LP frequency from nearly 90% in northwestern European populations (which rely heavily on dairying) to around 20–30% in southeastern European populations (which rely less on dairying). In addition, the frequency of LP is significantly higher in African populations that practice milk drinking than in African populations that do not drink milk [18]. Genetically, LP segregates as an autosomal dominant trait, and recently a mutation has been discovered in the regulatory region of the lactase gene (the gene that encodes the
110
M. Stoneking / International Congress Series 1296 (2006) 106–114
enzyme lactase, which breaks down lactose) that co-segregates with LP in families [19]. The frequency of this mutation, which is known as the CT-14 kb polymorphism, is correlated with the frequency of LP in European populations, but not in African milkdrinking populations, which suggests that a different mutation is responsible for LP in Africa [18,20]. An investigation of haplotypes carrying the LP allele shows that there is a dramatic increase in the extent of homozygosity near the LP allele (cf. Fig. 1), which is a signal of very strong selection for this allele [21]. The observed long stretch of extended homozygosity around the LP allele thus implies that individuals with this allele enjoyed a much higher reproductive fitness than individuals without the allele. But why should there have been such strong selection? The conventional scenario is that in an environment where milk was readily available, the nutritional advantages of being able to digest lactose into adulthood provides the selective advantage for the LP allele. However, given that currently people who cannot digest lactose do not suffer any obvious decrease in reproductive fitness, it is difficult to imagine that the nutritional advantage conferred by the LP allele could account for such a strong signal of selection. To be sure, perhaps in the past milk it was a much more important nutritional source, and hence the LP allele would confer a significant fitness effect. Nevertheless, it is worth considering possible alternative benefits of LP. One suggestion is that milk may have been important not just as a nutritional source, but also as a source of drinking water, especially in arid zones [22]. However, the fact that the LP allele increases in frequency towards northwest Europe, which is less arid than southern Europe, argues against this hypothesis. Another suggestion is that drinking milk may have been an important source of dietary calcium for preventing rickets and/or osteoporosis [23]. But again, it is difficult to imagine that this would result in such strong selection for the LP allele. Another speculation, which has arisen out of discussion with members of my group, is that perhaps the benefit in LP was that it allowed earlier weaning of infants, and hence a faster resumption of post-natal fertility in women. Women who wean their infants earlier thus would end up with more children during their reproductive life span, which in turn would result in an increase in the frequency of the LP allele. Whether or not this effect would be strong enough to explain the apparent selection for LP could be investigated by modeling/simulations. In any event, this discussion illustrates that it is much easier to identify the signal of selection than it is to identify why selection has happened. 4. Case study II: Prion protein gene The prion protein is a mysterious protein found in the brain whose normal function remains unknown. What is known is that occasionally, by an unknown post-translational mechanism, the prion protein adopts an alternative conformation that leads to the formation of spontaneous aggregates in the brain, which in turn leads to neurodegenerative bprionQ diseases such as Creutzfeldt–Jacob disease, kuru, bovine spongiform encephalopathy (BSE), and scrapie [24,25]. Homozygosity for a particular amino acid polymorphism, a methionine–valine substitution at codon 129, is associated with increased susceptibility to, and an earlier age of onset of, prion diseases [26]; apparently heterozygosity inhibits the spontaneous conformational change in the prion protein that leads to disease.
M. Stoneking / International Congress Series 1296 (2006) 106–114
111
Recently, the codon 129 polymorphism was analyzed in a sample of women from the Fore of highland New Guinea who had been repeatedly exposed to, but never developed, kuru [27]. The Fore are notable because kuru is largely restricted to this group, and in Nobel prize-winning work by Carleton Gadjusek and his associates, kuru was shown to be transmitted by the regular consumption of human brains during mortuary feasts [28]. Although kuru was first thought to be caused by a slow virus, it is now classified as a prion disease [29]. The sample of women studied by Mead et al. [27] were all over the age of 50, had all participated in multiple mortuary feasts, and hence had been exposed to kuru on multiple occasions, yet none had developed the disease. Twenty-three of the 30 women studied were heterozygous for the codon 129 polymorphism, which is significantly different both from Hardy–Weinberg expectations and from the genotype frequencies in a sample of the Fore that had not participated in mortuary feasts. This in itself is a remarkable finding; because departure from Hardy–Weinberg expectations is a notoriously inefficient method of detecting selection, there must indeed be a strong effect of balancing selection due to kuru on the prion protein gene to produce the results observed in the Fore. But what is even more remarkable is that Mead et al. [27] went on to sequence the prion protein gene in four other populations (Africans, Japanese, and two European populations), and found that all of the populations deviated significantly from neutrality (Fig. 3), suggesting that strong balancing selection has been influencing the prion protein gene in all human populations. Moreover, the signal of selection is apparently the strongest yet documented for any gene in humans [30]. So what might account for the selection? One possibility is that heterozygosity at the prion protein gene provides resistance to some unknown infectious disease (analogous to the resistance to malaria conferred by heterozygosity for the S and A alleles of the hglobin gene). Another possibility is that heterozygosity provides resistance to prion diseases transmitted by consuming animal flesh, such as BSE. But the most provocative hypothesis is that balancing selection arose because heterozygosity at the prion protein gene imparts resistance to human prion diseases, for which the exposure is the regular consumption of human flesh (as with kuru in the Fore). In other words, the strong signal of
Fig. 3. Tajima’s D values [39] for the prion protein gene for five populations (Japan, Africa, Papua New Guinea, and two European populations), compared to the distribution of D values for 313 other genes [40]. The D values are a test for neutrality, with large positive values indicating balancing selection and large negative values indicating directional selection. However, D values are also influenced by changes in population size, and the fact that the average D value is significantly less than zero is consistent with a past demographic expansion in human populations. The D values for the prion protein gene are among the highest ever observed, indicating very strong balancing selection on this gene. Figure adapted from Stoneking [41].
112
M. Stoneking / International Congress Series 1296 (2006) 106–114
balancing selection for the prion protein gene suggests widespread prehistoric human cannibalism [27]. Cannibalism tends to invoke a strong emotional response, and for that reason the standards of proof for accepting archaeological evidence of cannibalism seem to be unfairly high [31]. Nevertheless, the strong selection documented at the prion protein gene is consistent with the growing view (however disquieting it might be) from archaeological evidence [32–34] that cannibalism may have been widespread among prehistoric populations. 5. Conclusion Patterns of genetic variation in contemporary populations can tell us much about our past, and hence can usefully supplement traditional investigations of prehistoric health and disease based on skeletal remains. In particular, genes that show strong evidence of past selection, such as the S allele at the h-globin gene, the LP allele, and the codon 129 polymorphism of the prion protein gene, indicate that something must have had a profound effect on our reproductive fitness – otherwise, there would not be such a signal of selection. Genomic approaches, and the wealth of data that are becoming increasingly available, promise to make it relatively straightforward to identify genes that were probably the target of selection. But the challenge will remain: how do we then identify the reason for selection on a particular gene? Acknowledgements I thank David Hughes and Sean Myles for useful discussion. Research supported by funds from the Max Planck Society. Appendix A. Note added in proof Recently, it has been shown that the large positive Tajima D values observed for the prion protein gene in worldwide populations (cf. Fig. 3) are an artifact resulting from analyzing only previously ascertained polymorphic sites in the prion protein gene, rather than obtaining complete sequences of the prion protein gene [35,36]. There is thus no evidence for balancing selection on this gene in worldwide populations, although the evidence for balancing selection on the prion protein gene in the Fore of highland New Guinea remains persuasive. References [1] Aufderheide A, Rodriguez-Martin C. The Cambridge encyclopedia of human paleopathology. Cambridge, UK7 Cambridge University Press; 1998. [2] Ortner D. Identification of pathological conditions in human skeletal remains. San Diego, CA, USA7 Academic Press; 2003. [3] Larsen C. Bioarchaeology: the lives and lifestyles of past people. J Archaeol Res 2002;10:119 – 66. [4] Salo WL, Aufderheide AC, Buikstra J, Holcomb TA. Identification of Mycobacterium tuberculosis DNA in a pre-Columbian Peruvian mummy. Proc Natl Acad Sci U S A 1994;91:2091 – 4. [5] Haas CJ, Zink A, Palfi G, Szeimies U, Nerlich AG. Detection of leprosy in ancient human skeletal remains by molecular identification of Mycobacterium leprae. Am J Clin Pathol 2000;114:428 – 36.
M. Stoneking / International Congress Series 1296 (2006) 106–114
113
[6] Beraud-Colomb E, Roubin R, Martin J, Maroc N, Gardeisen A, Trabuchet G, et al. Human beta-globin gene polymorphisms characterized in DNA extracted from ancient bones 12,000years old. Am J Hum Genet 1995;57:1267 – 74. [7] Wood J, Milner G, Harpending H, Weiss K. The osteological paradox: problems of inferring prehistoric health from skeletal samples. Curr Anthropol 1992;33:343 – 70. [8] Cooper A, Poinar HN. Ancient DNA: do it right or not at all. Science 2000;289:1139. [9] Paabo S, Poinar H, Serre D, Jaenicke-Despres V, Hebler J, Rohland N, et al. Genetic analyses from ancient DNA. Annu Rev Genet 2004;38:645 – 79. [10] Allison A. Protection afforded by sickle-cell trait against subterian malarial infection. Br Med J 1954;1: 290 – 4. [11] Storz JF. Using genome scans of DNA polymorphism to infer adaptive population divergence. Mol Ecol 2005;14:671 – 88. [12] Sabeti PC, Reich DE, Higgins JM, Levine HZ, Richter DJ, Schaffner SF, et al. Detecting recent positive selection in the human genome from haplotype structure. Nature 2002;419:832 – 7. [13] Kayser M, Brauer S, Stoneking M. A genome scan to detect candidate regions influenced by local natural selection in human populations. Mol Biol Evol 2003;20:893 – 900. [14] Hinds DA, Stuve LL, Nilsen GB, Halperin E, Eskin E, Ballinger DG, et al. Whole-genome patterns of common DNA variation in three human populations. Science 2005;307:1072 – 9. [15] Akey JM, Zhang G, Zhang K, Jin L, Shriver MD. Interrogating a high-density SNP map for signatures of natural selection. Genome Res 2002;12:1805 – 14. [16] Akey JM, Eberle MA, Rieder MJ, Carlson CS, Shriver MD, Nickerson DA, et al. Population history and natural selection shape patterns of genetic variation in 132 genes. PLoS Biol 2004;2:e286. [17] Storz JF, Payseur BA, Nachman MW. Genome scans of DNA variability in humans reveal evidence for selective sweeps outside of Africa. Mol Biol Evol 2004;21:1800 – 11. [18] Swallow DM. Genetics of lactase persistence and lactose intolerance. Annu Rev Genet 2003;37:197 – 219. [19] Enattah NS, Sahi T, Savilahti E, Terwilliger JD, Peltonen L, Jarvela I. Identification of a variant associated with adult-type hypolactasia. Nat Genet 2002;30:233 – 7. [20] Mulcare CA, Weale ME, Jones AL, Connell B, Zeitlyn D, Tarekegn A, et al. The T allele of a singlenucleotide polymorphism 13.9kb upstream of the lactase gene (LCT) (C-13.9kbT) does not predict or cause the lactase-persistence phenotype in Africans. Am J Hum Genet 2004;74:1102 – 10. [21] Bersaglieri T, Sabeti PC, Patterson N, Vanderploeg T, Schaffner SF, Drake JA, et al. Genetic signatures of strong recent positive selection at the lactase gene. Am J Hum Genet 2004;74:1111 – 20. [22] Cook G. Did persistence of intestinal lactase into adult life originate in the Arabian peninsula? Man 1978;13: 418 – 27. [23] Birlouez-Aragon I. Effect of lactose hydrolysis on calcium absorption during duodenal milk perfusion. Reprod Nutr Dev 1988;28:1465 – 72. [24] Prusiner SB. Novel proteinaceous infectious particles cause scrapie. Science 1982;216:136 – 44. [25] Collinge J. Prion diseases of humans and animals: their causes and molecular basis. Annu Rev Neurosci 2001;24:519 – 50. [26] Palmer MS, Dryden AJ, Hughes JT, Collinge J. Homozygous prion protein genotype predisposes to sporadic Creutzfeldt–Jakob disease. Nature 1991;352:340 – 2. [27] Mead S, Stumpf MP, Whitfield J, Beck JA, Poulter M, Campbell T, et al. Balancing selection at the prion protein gene consistent with prehistoric kurulike epidemics. Science 2003;300:640 – 3. [28] Gajdusek DC. Unconventional viruses and the origin and disappearance of kuru. Science 1977;197: 943 – 60. [29] Goldfarb LG. Kuru: the old epidemic in a new mirror. Microbes Infect 2002;4:875 – 82. [30] Hedrick PW. A heterozygote advantage. Science 2003;302:57. [31] Diamond JM. Talk of cannibalism. Nature 2000;407:25 – 6. [32] Fernandez-Jalvo Y, Carlos Diez J, Caceres I, Rosell J. Human cannibalism in the Early Pleistocene of Europe (Gran Dolina, Sierra de Atapuerca, Burgos, Spain). J Hum Evol 1999;37:591 – 622. [33] Marlar RA, Leonard BL, Billman BR, Lambert PM, Marlar JE. Biochemical evidence of cannibalism at a prehistoric Puebloan site in southwestern Colorado. Nature 2000;407:74 – 8. [34] White TD. Once we were cannibals. Sci Am 2001;285:58 – 65.
114
M. Stoneking / International Congress Series 1296 (2006) 106–114
[35] Kreitman M, Di Rienzo A. Balancing claims for balancing selection. Trends Genet 2004;20:300 – 4. [36] Soldevila M, Calafell F, Helgason A, Stefansson K, Bertranpetit J. Assessing the signatures of selection in PRNP from polymorphism data: results support Kreitman and Di Rienzo’s opinion. Trends Genet 2005;21: 381 – 91. [37] Schlo¨tterer C. A microsatellite-based multilocus screen for the identification of local selective sweeps. Genetics 2002;160:753 – 63. [38] Slatkin M. A measure of population subdivision based on microsatellite allele frequencies. Genetics 1995;139:457 – 62. [39] Tajima F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 1989;123:585 – 95. [40] Stephens JC, Schneider JA, Tanguay DA, Choi J, Acharya T, Stanley SE, et al. Haplotype variation and linkage disequilibrium in 313 human genes. Science 2001;293:489 – 93. [41] Stoneking M. Widespread prehistoric human cannibalism: easier to swallow? Trends Ecol Evol 2003;18: 489 – 90.