Molecular clocks

Molecular clocks

Current Biology Magazine entirely clear, and it may also involve telomerase-independent mechanisms. Recently, somatic mutations of POT1 were reported...

2MB Sizes 0 Downloads 109 Views

Current Biology

Magazine entirely clear, and it may also involve telomerase-independent mechanisms. Recently, somatic mutations of POT1 were reported in 3.5% of cases of chronic lymphocytic leukemia. Most of these mutations clustered within the POT1-OB (oligonucleotide/ oligosaccharide-binding) DNA-binding folds, and hence might compromise binding of POT1 to the singlestranded 3’ overhang. Furthermore, rare germline variants of POT1 have been identified in familial cases of glioma and melanoma. Is shelterin conserved in other species? All eukaryotes protect their chromosome ends with a telomerebinding protein complex. However, a shelterin-like complex is not always present. As in mammals, fission yeast telomeres are bound by a shelterinlike complex, consisting of a TPP1/ POT1-like dimer, Tpz1–Pot1, and a TRF-like protein, Taz1. The Tpz1– Pot1 complex is connected to Taz1 via Rap1 and Poz1, establishing a link between the double-stranded and single-stranded telomeric DNAbinding factors. In contrast, the architecture of the telomere-binding complex and the proteins involved are quite distinct in budding yeast. Telomeres in budding yeast are bound by Rap1, which is the only structurally conserved shelterin component, although the mammalian and fission yeast Rap1 do not bind DNA. The single-stranded telomeric DNA in yeast is protected by the yeast CST complex. Where can I find out more? Arnoult, N., and Karlseder, J. (2015). Complex interactions between the DNA-damage response and mammalian telomeres. Nat. Struct. Mol. Biol. 22, 859–866. de Lange, T. (2010). How shelterin solves the telomere end-protection problem. Cold Spring Harb. Symp. Quant. Biol. 75, 167–177. Holohan, B., Wright, W.E., and Shay, J.W. (2014). Cell biology of disease: Telomeropathies: an emerging spectrum disorder. J. Cell Biol. 205, 289–299. Palm, W., and de Lange, T. (2008). How shelterin protects mammalian telomeres. Annu. Rev. Genet. 42, 301–334. Schmidt, J.C., and Cech, T.R. (2015). Human telomerase: biogenesis, trafficking, recruitment, and activation. Genes Dev. 29, 1095–1105.

Laboratory for Cell Biology and Genetics, Rockefeller University, 1230 York Avenue, New York, NY 10065, USA. *E-mail: [email protected]

Primer

Molecular clocks Michael S.Y. Lee1 and Simon Y.W. Ho2 In the 1960s, several groups of scientists, including Emile Zuckerkandl and Linus Pauling, had noted that proteins experience amino acid replacements at a surprisingly consistent rate across very different species. This presumed single, uniform rate of genetic evolution was subsequently described using the term ‘molecular clock’. Biologists quickly realised that such a universal pacemaker could be used as a yardstick for measuring the timescale of evolutionary divergences: estimating the rate of amino acid exchanges per unit of time and applying it to protein differences across a range of organisms would allow deduction of the divergence times of their respective lineages (Figure 1). In the 50 years since, leaps in genomic sequencing technology and new computational tools have revealed a more complex and interesting reality: the rates of genetic change vary greatly across the tree of life. The term ‘molecular clock’ is now used more broadly to refer to a suite of methods and models that assess how rates of genetic evolution vary across the tree of life, and use this information to put an absolute timescale on this tree. Modern molecular clocks are thus critical to inferring evolutionary timescales and understanding the process of genetic change. Analyses of genomic data using clock models that accommodate variation in evolutionary rates have shed new light on the tree of life, as well as the organismal and environmental factors driving genetic change along its branches. However, some major theoretical, empirical and computational challenges remain. Evolutionary rate variation Modern molecular clocks can handle various forms of evolutionary rate heterogeneity. Rates can vary across different parts of the genome (site effects), across taxa (lineage effects), and across time (here termed ‘epoch effects’). Site effects occur when

different parts of the genome evolve at distinct rates (Figure 2A). A widely recognized example involves proteincoding genes, which have a higher rate of evolution at the third position of codons than at the first and second. This is because changes at first and second codon sites are more likely to change the encoded amino acid, with potential consequences for protein function. In animals, mitochondrial DNA evolves faster than nuclear DNA, for reasons that are still debated. These site effects were the first major sources of rate heterogeneity to be characterized and accounted for during genetic analysis. Lineage effects occur when different taxa exhibit distinct rates of molecular evolution (Figure 2B). For example, rodents have higher rates of genetic change than do other mammals, partly due to their short generation times. Likewise, parasitic plants evolve more rapidly than their free-living relatives. The importance of this form of rate variation took longer to be appreciated, but was confirmed in the 1970s when formal statistical tests of amonglineage rate variation were developed. This led to the introduction of ‘relaxedclock’ approaches, which attempt to statistically model rate variation across branches of the evolutionary tree. These methods allow evolutionary timescales to be estimated using molecular clock approaches even when rates vary across lineages. Epoch effects occur when rates of evolution differ across different time slices (Figure 2C). For instance, evolutionary rates in influenza were found to have undergone a sharp increase around 1990. Such temporal heterogeneity is harder to detect and model than either site effects or lineage effects. This is partly because it generates patterns of genetic divergence among living taxa that are very similar to those expected when rates have remained constant through time. An extra layer of interest and complexity emerges when two or more sources of rate heterogeneity interact. Site and lineage effects interact when different genes have different patterns of rate variability across taxa (Figure 2D). Mitochondrial DNA has greatly accelerated rates of evolution in snakes and dragon lizards

Current Biology 26, R387–R407, May 23, 2016

R399

Current Biology

Magazine numbers of calibrations can be used simultaneously, but this can change the balance between the signals from the calibrations and from the genetic data. Incorporating as much calibrating information as possible can severely constrain the possible range of inferred timescales. On the other hand, using a smaller subset of temporal information allows the molecules more latitude to speak for themselves.

Primates Carnivores

Current Biology

Figure 1. The simplest molecular clock approach for inferring evolutionary timescales. The rate of genetic change is first ascertained for one part of the tree of life (e.g. primates), often by calibrating the amount of genetic divergence to the absolute age of divergence as suggested by the fossil record. This rate is then extrapolated across the rest of the tree, allowing relative genetic divergences between all other taxa (e.g. carnivores) to be translated into absolute time, even without recourse to fossil evidence.

compared with typical lizards, but nuclear DNA shows no such trend. Genomic analyses suggest that such interactions are widespread. Selection might be relaxed on particular genes in particular taxa and thus lead to rapid molecular evolution. For example, the genes coding for tooth enamel are no longer under stabilizing selection in toothless mammals such as anteaters and sloths. Thus, those genes evolve much more rapidly in these lineages, but this pattern is not seen for most other genes. Such complex patterns of rate variation can be accommodated using partitioned clock models, where different portions of the genome are recognized as evolving according to separate clocks or ‘pacemakers’. Calibrating the molecular clock Genetic divergences alone, even when analysed using the most sophisticated molecular clock models, are only able to provide a relative timescale. For example, DNA evidence suggests that the major lineages (orders) of living birds diverged from each other during the first quarter of the evolutionary history of modern birds, but does not tell us the absolute timeframe of this diversification. The molecular clock needs to be calibrated in order to translate these relative dates into R400

absolute ones. We would then be able to make statements such as “the major lineages of birds diverged in an interval of 20 million years spanning the end of the Cretaceous period”. Calibrations are typically derived from the fossil record: for instance, when dating a molecular tree of vertebrates, the clade ‘modern birds’ must be at least as old as the most ancient fossil that can be robustly assigned to that group, currently the 67-million-yearold Vegavis. Well-dated geological events — such as island formation, continental rifting or river capture — can also be used to constrain the ages of evolutionary divergences between taxa presumed to have been affected by these events. For very shallow trees, spanning short time periods, such as those of virus epidemics, the ages of ‘fossilized’ genomes sampled across real time can be used to calibrate the molecular clock. With caution, previous estimates of evolutionary rates and divergence times can also be used for calibration. Most attention has focused on statistical approaches for capturing information from the fossil record for calibrating molecular trees, but similarly rigorous approaches are now being developed for biogeographic information. Different types and

Current Biology 26, R387–R407, May 23, 2016

Evolutionary timescales Molecular clocks are vital to reconstructing the detailed timescale and branching pattern of the tree of life, especially in soft-bodied groups that have left few or no fossils. In turn, this can shed light on how major evolutionary events have been influenced by Earth history. However, the use of inappropriate clock models or erroneous calibrations can produce highly misleading estimates of evolutionary timescales. These issues have led to vigorous debates about the timing and drivers of major evolutionary events, including the origins of animal phyla, the ordinal divergences of birds and mammals or the radiation of flowering plants. Some of the earliest molecular clock analyses of divergences between animal phyla concluded that metazoans diverged about a billion years ago — nearly twice the age of the explosion of animal fossils in Cambrian rocks. These results were at least partly driven by failure to account for lineage effects: genetic change generally occurs more slowly in vertebrates than in invertebrates, but early molecular analyses extrapolated the slow vertebrate evolutionary rate across the entire animal tree. This caused the estimates of animal divergence times to be stretched deep into the Precambrian. Subsequent analyses with better models of rate variation and more carefully chosen calibrations moved the initial radiation of animals to a later time — into the early Ediacaran period, when the world was gripped by several massive glaciation events (‘snowball earth’). Nevertheless, this still precedes the first definitive metazoan fossils by tens of millions of years. In other groups of organisms, improved molecular clock analyses have also often increased the congruence between timescales

Current Biology

Magazine

Genomic clocks Our ability to collect vast amounts of genetic data is rapidly outstripping our ability to rigorously analyse them, and molecular clock analyses are no exception. Genomic datasets now routinely contain hundreds to thousands of gene loci, each (potentially) evolving according to a separate molecular clock. Other

A

Site effect

B

Lineage effect

Time (My ago)

0

100

Clade X

Clad Clade X

200

C

D

Epoch effect

Site & Lineage effects

0

Time (My ago)

inferred from molecular data and from the fossil record. However, a persistent pattern is that the molecular dates of evolutionary events are often still substantially older than suggested by the fossil record. Recent molecular clock studies suggest that modern birds and placental mammals both radiated deep within the Cretaceous period (alongside dinosaurs), yet the oldest well-accepted fossils of both groups occur very close to the end of the Cretaceous. Similarly, molecular estimates have placed the origin of flowering plants (angiosperms) more than 60 million years before the first fossils from this group appear (Figure 3). These repeated discrepancies between molecular and fossil dates still need to be explained. As vertebrates and plants are generally well represented in Cretaceous fossils, the lack of modern bird, mammal or angiosperm fossils is unlikely to be due to lack of fossil preservation or discovery. One intriguing but largely untested suggestion is that molecular evolution might occur much more rapidly during evolutionary radiations, leading to big genetic divergences in short time intervals. This would be likely to cause current clock models to overestimate divergence ages. Molecular clock analyses can also shed light on the organismal traits and ecological processes driving evolution. For example, phylodynamic analyses reveal that influenza virus strains with the highest rates of molecular evolution are the most adept at evading the host immune system. As a consequence, these are the most likely to cause severe symptoms, switch hosts and persist across summer into the following winter. The link between higher rates of evolution and evolutionary success might prove to be more general, and relevant for phenotypic as well as genetic traits.

100

Clade X

Clade X

200

Key:

Gene 1

Gene 2

Fast evolutionary rate

Slow evolutionary rate Current Biology

Figure 2. Modern molecular clocks can accommodate complex variation in rates of genetic change across the tree of life. (A) Rate variation across sites: gene 1 evolves rapidly but gene 2 evolves slowly, across all lineages. (B) Rate variation across lineages: genes 1 and 2 both evolve rapidly in clade X. (C) Rate variation across time periods or ‘epochs’: genes 1 and 2 both evolved rapidly between 140 and 80 Ma. (D) Site and lineage effects can interact, so different sites exhibit different rate patterns across lineages: here, gene 1 evolves rapidly in clade X, but gene 2 shows no such pattern (Silhouettes by A. Palci).

datasets contain only a few loci, but sampled across thousands of species. Finally, there are datasets in the pipeline (e.g. for insects and vertebrates) that will contain thousands of loci from thousands of species. All of these vast datasets preclude analyses using the most complex molecular clock models, which are very computationally intensive, simultaneously trying to estimate evolutionary relationships, genetic divergences and absolute timescales. Instead, a new generation of fast, approximate methods are being developed that are able to first quickly infer evolutionary trees that only depict

relative genetic divergences, and then stretch these into time-calibrated trees using simple molecular clock models. A different approach might allow the efficient analysis of genomic datasets while minimally compromising methodological rigour. For example, it is possible to find groups of genes that have been subject to the same lineage effects, meaning that they have evolved according to the same molecular clock. By focusing on these groups of genes, we can greatly reduce the number of molecular clock models that are needed. Also, we can choose to identify and analyse only the genes that display the most desirable

Current Biology 26, R387–R407, May 23, 2016

R401

Current Biology

Magazine

Gymnosperms

Amborella

Water Lillies

Magnolias

Monocots

Eudicots

Angiosperms (Flowering plants)

0 Present day

of evolutionary timescales. Instead, more room for improvement might lie in developing better models of rate variation and refining our knowledge and use of calibrations. The resulting improvements to molecular estimates of timescales will lead to a better understanding of the temporal dynamics of life at all levels: from the origins of animal phyla during and after snowball Earth, to genetic evolution in parasitic plants, to influenza outbreaks in modern societies. For these reasons, molecular clocks will continue to play a key role in shaping our understanding of the evolution of life and the genes that code for it.

100 AGE

Oldest fossils of crown angiosperms

FURTHER READING

(Millions of years ago)

Beaulieu, J.M., O’Meara, B.C., Crane, P., and Donoghue, M.J. (2015). Heterogeneous rates of molecular evolution and diversification could explain the Triassic age estimate for angiosperms. Syst. Biol. 64, 869–878. dos Reis, M., et al. (2015). Uncertainty in the timing of origin of animals and the limits of precision in molecular timescales. Curr. Biol. 25, 2939–2950. dos Reis, M., Donoghue, P.C.J., and Yang, Z. (2016). Bayesian molecular clock dating of species divergences in the genomics era. Nat. Rev. Genet. 17, 71–80. Duchene, S., and Ho, S.Y.W. (2014). Using multiple relaxed-clock models to estimate evolutionary timescales from DNA sequence data. Mol. Phyl. Evol. 77, 65–70. Drummond, A.J., Ho, S.Y.W., Phillips M.J., and Rambaut, A. (2006). Relaxed phylogenetics and dating with confidence. PLoS Biol. 4, e88. http://dx.doi.org/10.1371/journal.pbio.0040088. Jarvis, E.D., et al. (2014). Whole-genome analyses resolve early branches in the tree of life of modern birds. Science 346, 1320–1331. Landis, M.J. (2016). Biogeographic dating of speciation times using paleogeographically informed processes. Syst. Biol. [in press; preprint http://biorxiv.org/content/ early/2015/10/08/028738) Peterson, K.J., Lyons, J.B., Nowak, K.S., Takacs, C.M., Wargo, M.J., and McPeek, M.A. (2004). Estimating metazoan divergence times with a molecular clock. Proc. Natl. Acad. Sci. USA 101, 6536–6541. Yang, Z., and Rannala, B. (2006). Bayesian estimation of species divergence times under a molecular clock using fossil calibrations with soft bounds. Mol. Biol. Evol. 23, 212–226. Zuckerkandl, E., and Pauling, L. (1962). Molecular disease, evolution, and genic heterogeneity. In Horizons in Biochemistry, M. Kasha, and B. Pullman, eds. (New York: Academic Press). pp. 189–225.

200 Age of crown angiosperms (molecular clock estimate) Oldest fossils of total-group angiosperms

300 Age of total-group angiosperms (molecular clock estimate) Current Biology

Figure 3. Divergence dates derived from molecular clocks are often older than those suggested by the fossil record. The molecular estimate for the date when flowering plants split from conifers (age of total-group angiosperms) and the date of the oldest divergences within flowering plants (age of crown-group angiosperms) are both much older than dates implied by the fossil record. (Water lily image by Brian Choo; Amborella image by Scott Zona.)

characteristics and best fit our models, such as the genes that show the least rate variation across lineages. These approaches reduce the number of parameters in the clock models and the size of the dataset, so that more computationally intensive — and arguably accurate — analyses can still be performed. Perhaps surprisingly, molecular clock analyses of genomic data sets have still failed to resolve many of the long-standing questions about evolutionary timescales. One reason for this is that adding large amounts of genetic information often only leads to small improvements in molecular clock estimates of divergence dates. Data from additional genes largely serve to refine estimates of genetic distances, which are already likely to be well estimated R402

even using moderate amounts of data. Increasing the amount of genetic data does little to mitigate the major remaining sources of error — the model of rate variation and the calibrations — which are the other two critical components of molecular clock inferences. In such cases, seemingly minor changes to the model of rate variation or the calibrations can have a much greater impact than substantially increasing the amount of DNA sequence data. Thus, although gathering vast amounts of genomic data will undoubtedly shed new light on molecular evolution and its link to the phenotype, it is not the only way to advance our knowledge of the shape and timescale of the tree of life. We are reaching the point where increasing the amount of molecular data brings a declining benefit to our estimates

Current Biology 26, R387–R407, May 23, 2016

1

School of Biological Sciences, Flinders University, GPO Box 2100, Adelaide SA 5001, Australia; and Earth Sciences Section, South Australian Museum, North Terrace, Adelaide SA 5000, Australia. 2School of Life and Environmental Sciences, University of Sydney, Sydney NSW 2006, Australia. E-mail: [email protected] (M.S.Y.L.), [email protected] (S.Y.W.H.)