Quantitative Genetic Variation TFC Mackay, North Carolina State University, Raleigh, NC, USA r 2016 Elsevier Inc. All rights reserved.
Natural populations exhibit variation for most measurable aspects of their phenotypes: morphology, physiology, behavior, and disease susceptibility. This phenotypic variation is typically continuously distributed and correspondingly these phenotypes are called ‘quantitative traits’ (Falconer and Mackay, 1996; Lynch and Walsh, 1998). The continuous variation arises from genetic complexity and environmental sensitivity: segregating alleles at multiple loci mean that many genotypes can give rise to the same phenotype, and the same genotype can have different phenotypic effects in different environments. Quantitative genetic variation is the substrate for phenotypic evolution in natural populations and for selective breeding of domestic crop and animal species. Quantitative genetic variation also underlies susceptibility to common complex diseases and behavioral disorders, as well as responses to pharmacological therapies. Knowledge of the detailed genetic underpinnings of variation for quantitative traits is thus critical for addressing unresolved evolutionary questions about the maintenance of genetic variation for quantitative traits within populations and the mechanisms of divergence of quantitative traits between populations and species; for increasing the rate of selective improvement of agriculturally important species; and for the development of novel and more personalized therapeutic interventions for improving human health. The birth of genetics began with Mendel’s discovery of the laws of segregation and independent assortment of alleles with large qualitative effects that can be tracked in pedigrees. However, these laws seemed to contradict the observation that many traits vary continuously and that offspring phenotypes were often near the average of both parents. Early in the twentieth century, Fisher resolved the apparent contradiction between the discrete nature of Mendelian inheritance with the continuous variation for quantitative traits, and provided the theoretical basis for understanding quantitative trait genetics (Fisher, 1918). Each locus affecting a quantitative trait has the same properties as a Mendelian locus. The differences are that the phenotypic effects are smaller and need to be measured (as opposed to categorized); there are many such loci acting together; the quantitative effects are sensitive to the environment; and the population allele frequencies need to be taken into account.
Properties of Quantitative Traits To understand how individual loci with discrete allelic differences can give rise to patterns of trait variation, we can consider the simple case of a single diploid locus with two alleles, and thus three genotypes (A1A1, A1A2, A2A2; by convention, the A2A2 genotype is associated with decreased values of the trait). Building from this simple bi-allelic model is not meant to imply that traits in nature have such a simple genetic basis, but rather, provides the foundation upon which our understanding of multi-locus inheritance can be constructed.
372
The mean value of each genotype is the population mean, averaged across all other segregating genotypes and the range of environments experienced by the population. These mean values, which are expressed in the units in which the phenotype is measured, can be assigned general genotypic values of þ a (A1A1), d (A1A2), and a (A2A2). These values are scaled relative to the midpoint between the two homozygotes, which is set to zero. Thus the additive effect, a, is one half of the difference in mean phenotype between the two homozygous genotypes; and the dominance effect, d, is the difference between the mean phenotypic value of the heterozygous genotype and the average of the two homozygous genotypes (Falconer and Mackay, 1996). d can vary to express any degree of dominance from d ¼ þ a (the A1 allele is dominant), d ¼ a (the A2 allele is dominant) to anything in between. If d ¼ 0, we call the effect additive (or co-dominant) because the mean phenotype of heterozygotes is the same as the mean of the two homozygotes; d 4 þ a and d o a represent cases of overand under-dominance, respectively. Mendelian loci often exhibit a phenomenon called ‘epistasis,’ whereby the effects of one locus mask the effects of another and give rise to distorted segregation ratios in a cross of doubly heterozygous genotypes. Epistasis can be modeled for quantitative traits by considering the joint effects of two loci affecting the trait, A and B, each with two alleles. Thus, there are 9 possible two locus genotypes. If the effects of these genotypes can be predicted given the homozygous or dominance effects at each of the two loci (e.g., the effect of genotype A1A2B1B1 is dA þ aB), then we say the two loci act additively. Note that we use the same word, additive, to refer to absence of dominance at a single locus and absence of epistasis for multiple loci. If, however, the effects of the two locus genotypes are not equal to those expected from the two individual loci, then there is epistasis, which is parameterized by additive by additive (aaAB), additive by dominance (adAB, daAB), and dominance by dominance interactions (ddAB) (Falconer and Mackay, 1996; Mackay, 2014). These interaction effects are not restricted to the traditional epistatic effects detected for Mendelian loci, but can take on any quantitative value. The hallmark of epistasis for a quantitative trait is that the additive and/or dominance effect of one locus varies, depending on the genotype of the interacting locus (Mackay, 2014). Of course epistatic interactions are not constrained to occur between only two loci but can be of higher order and very difficult to parameterize. Alleles affecting quantitative traits can take on a range of effects depending on the physical and social environment experienced by the individuals with the same genotype, a phenomenon called ‘phenotypic plasticity.’ These effects can be large, if the individuals are reared in very different environments (sometimes called macro-environments) or more subtle, when environmental conditions are tightly controlled. Thus, even genetically identical individuals reared in the same environments will have different phenotypes, called environmental
Encyclopedia of Evolutionary Biology, Volume 3
doi:10.1016/B978-0-12-800049-6.00041-X
Quantitative Genetic Variation
noise or micro-environmental plasticity (Falconer and Mackay, 1996; Lynch and Walsh, 1998). To further complicate matters, different genotypes can have differential plastic responses to the same large environmental change (called genotype by environment interaction (GEI)) or the same range of microenvironments (Falconer and Mackay, 1996; Lynch and Walsh, 1998). In addition, different genotypes may experience different environments because of their phenotype (called genotype– environment correlation). Examples of the latter include feeding dairy cows according to their milk yield, training race horses according to their pedigree, and ‘niche construction’ in nature (Falconer and Mackay, 1996). Finally, quantitative trait phenotypes can differ between different populations if the allele frequencies at the loci affecting the trait differ, even if the underlying genetic architecture is identical. This is easy to visualize by considering just one locus. If a population is fixed for the A2A2 genotype, then the population mean will be a, whereas if the population is fixed for the A1A1 genotype, the population mean will be þ a. When we say that we wish to understand the genetic architecture of any quantitative trait, this is what we need to know (Mackay, 2001). (1) The numbers and identities of genes at which mutations affecting the trait arise, and the subset of these genes at which alleles affecting the trait segregate in natural populations. (2) The distribution of allelic effects of new mutations and segregating variants in nature. This could range from alleles of large effect segregating at a few loci to alleles of small effect segregating at many loci; or an exponential distribution of allelic effects with alleles with moderately large effects segregating at a few loci and alleles with small effects segregating at a large number of loci. (3) The effects of new mutations and segregating alleles on other quantitative traits (pleiotropy), including intermediate molecular phenotypes and reproductive fitness. (4) The manner in which alleles at multiple loci interact (additive or epistatic). (5) The answers to the above questions in a range of ecologically relevant environments, and the magnitude of GEI. (6) The identity, site class (i.e., regulatory or coding regions), and allele frequencies of causal molecular variants. This seems like an impossible task, given the nonlinearity of the genotype–phenotype map for quantitative traits. In the pregenome era, the general principles of genetic architecture have been worked out by realizing that although the loci cannot be individually distinguished, they collectively affect the trait variance. Further, relatives resemble each other for quantitative trait phenotypes; the extent of the resemblance depends on how close the relationship is. These two principles can be combined to infer the relative contribution of genetic and environmental variance to the phenotypic variance of a quantitative trait by measuring trait phenotypes on collections of related individuals (Falconer and Mackay, 1996; Lynch and Walsh, 1998). This allows the relative contributions to be estimated, and for the different sources of genetic variation to be distinguished by comparing different types of relatives, as described in more detail below.
Partitioning Variance Components We begin with a description of the components of phenotypic variation (s2P ) for the trait; a quantity that can be readily
373
estimated by measuring the trait on a sample of individuals from the population of interest. We know that we can broadly partition this into a component attributable to all sources of genetic (s2G ) and environmental (s2E ) variation. The genetic variance can be further partitioned into components consisting of additive genetic variance (s2A ), dominance variance (s2D ), and epistatic interaction variance (s2I ) (Falconer and Mackay, 1996; Lynch and Walsh, 1998). These genetic variance components are themselves complicated, and are functions of additive, dominance, and epistatic effects of all loci contributing to genetic variation of the trait and their allele frequencies. Genotype–environment correlation adds 2covGE (the correlation is parameterized by the covariance to retain the units of variation) to the phenotypic variance; and genotype– environment interaction adds s2GE (Falconer and Mackay, 1996; Lynch and Walsh, 1998). In experimental quantitative genetic analyses, genotypes are randomized across standardized environmental conditions, so there should be no contribution of genotype–environment correlation or interaction to the phenotypic variance. However, this is not possible in natural settings. If covGE exists and we do not know about it, the practical consequence is that it is considered part of the genotype, and s2G is increased by 2covGE (Falconer and Mackay, 1996). If we do not know the environments experienced by the individuals, s2GE increases s2E . s2E can itself be partitioned into components due to any shared common spatial (s2EcðSÞ ), temporal (s2EcðTÞ ), or maternal (s2EcðMÞ ) environments, which increase the phenotypic resemblance of related individuals over and above that due to genetic relatedness, and the residual, environmental variance (s2EW ) (Falconer and Mackay, 1996; Lynch and Walsh, 1998). In experimental quantitative genetic studies, s2EcðSÞ and s2EcðTÞ can usually be avoided or estimated; s2EcðMÞ can be more problematic, especially for mammals. Now let us consider the causes of phenotypic resemblance among relatives: this is because they share the same alleles and sometimes genotypes (called genetic covariance) and also because they share environments (common environmental variance). (Note that here we refer to the genetic covariance among individuals, not between traits as discussed in later chapters.) The probability that relatives share alleles due to having a common ancestor is the coefficient of relatedness, r; and the probability that relative share genotypes due to common ancestry is u. In a random breeding population, r ¼ ½ for parents and offspring and full-siblings (both parents in common); r ¼ ¼ for half-siblings (one parent in common), and r ¼ 1 for monozygotic twins. Relatives in a random mating population can only share genotypes due to common ancestry if their pedigree can be traced back to the same individual(s). For example, u ¼ 0 for parents and offspring and half-siblings and u ¼ ¼ for full-siblings. The following expression gives the genetic covariance for any degree of relationship in a random mating population: covG ¼ rs2D þ us2D þ r2s2AA þ rus2AD þ u2s2DD þ r3s2AAA þ r2us2AAD þ ru2s2ADD þ u3s2DDD etc. (Falconer and Mackay, 1996; Lynch and Walsh, 1998). For most relatives, the largest coefficient is for the additive genetic variance, and coefficients on the epistatic variance terms are small (at most ¼). The exception is monozygotic twins, for which r ¼ u ¼ 1. However, it should be noted that all estimates of additive variance from data on relatives include
374
Quantitative Genetic Variation
fractions of the higher order additive by additive interaction variances, and that for n loci there are n additive effects but of the order of n2 pairwise interactions, n3 three-way interactions, etc. Inbreeding (mating of related individuals) causes a redistribution of genetic variance between and within inbred lines. If all genetic variance is additive, the variance within inbred lines is (1 F)s2A and between inbred lines is (1 þ F)s2A , where F, the inbreeding coefficient is the probability that alleles are identical by descent (i.e., from the same common ancestor) (Falconer and Mackay, 1996; Lynch and Walsh, 1998). Thus, if lines are inbred to homozygosity, F ¼ 1 and there is no genetic variance within each line, while the genetic variance between lines is double that in the outbred population from which they were derived. With dominance and epistasis, the re-distribution of genetic variance depends on allele frequencies of the underlying loci, so there is no general solution in terms of variance components in the base population until F ¼ 1. At that point, the genetic variance between inbred lines is 2s2A þ 4s2AA þ 8s2AAA þ …; i.e., pairwise and higher order epistatic interactions contribute to genetic divergence between fully inbred lines (Falconer and Mackay, 1996; Lynch and Walsh, 1998). We can readily estimate the regression of offspring phenotypic values on those of their parents, as well as estimate the intraclass correlation coefficient among full- or half-sib families. These simple classic designs serve to illustrate the complexities of estimating genetic variance components for quantitative traits. Regressions (b) or intraclass correlations (t) are, respectively, estimates of the phenotypic covariance between offspring and parents divided by the total phenotypic variance, or of the phenotypic covariance between full- (or half-) sib families divided by the total phenotypic variance. If we ignore three-way and higher epistatic interactions, the expected genetic and environmental contributions to each in an outbred population are as follows (Falconer and Mackay, 1996; Lynch and Walsh, 1998): Regression of offspring on one parent: bOP ¼
1 2 2 sA
þ 14 s2AA þ s2EcðOPÞ s2P
Intraclass correlation of full sibs: tFS ¼
1 2 2 sA
1 2 þ 14 s2D þ 14 s2AA þ 18 s2AD þ 16 sDD þ s2EcðFSÞ
s2P
Intraclass correlation of half sibs: tHS ¼
1 2 4 sA
1 2 þ 16 sAA þ s2EcðHSÞ
s2p
Thus, in each case, we have two estimates (the regression or intraclass correlation and the total phenotypic variance) and several unknowns (additive genetic variance plus other sources of genetic variance and any spatial, temporal or maternal common environment). Thus, we can estimate the joint contribution of all terms in the numerators of these expressions, but cannot partition them further. In principle, we could estimate some individual components by comparing genetic covariance for different degree relatives. However, this is
difficult in practice because the estimates of variance components have high standard errors and require extremely large sample sizes. Organisms can be inbred to homozygosity as well as outcrossed, providing the opportunity to infer the relative contribution of additive genetic and interaction variance components and enable more sophisticated experimental designs, but these are not generally applicable to all organisms. Finally, common environment effects in organisms where they cannot be eliminated or accounted for will always bias the estimate of genetic covariance upwards.
Heritability Of all the genetic variance components, s2A , the additive genetic variance, is the most important for two reasons. First, it makes the largest contribution to the resemblance of most types of relatives. Second, the additive genetic variance is defined so that it represents the fraction of the genetic variance that is transmitted from parents to offspring. An important concept in quantitative genetics is the narrow sense heritability genetic variance (h2), which is defined as the ratio of additive s2 to the total phenotypic variance (h2 ¼ sA2 ) (Falconer and P Mackay, 1996; Lynch and Walsh, 1998). Put simply, the narrow sense heritability of a quantitative trait is the reliability of an individual’s own phenotype as an indicator of its ‘breeding value’; the proportion of the deviation of a parent’s phenotype from the population mean that is expected to be transmitted to the progeny. If an individual’s phenotype is x units above average, then the average phenotypic value of the offspring will be xh2 units above average. This concept underpins the ability of a population to respond phenotypically to natural or artificial truncation selection: if parents are chosen based on the deviation of their average phenotypic value (X) of a quantitative trait from the population mean (m), then the average phenotype of their offspring will be h2(X m) units above the population mean. The narrow sense heritability can be readily estimated (with differing degrees of bias and precision) as 2bOP, 2tFS, and 4tHS; from response to truncation selection; and using more elaborate experimental designs (Falconer and Mackay, 1996; Lynch and Walsh, 1998). Heritability can be a slippery concept. The word suggests that it refers to whether or not a phenotype is inherited; this is not true. All phenotypes are inherited in the sense that developmental genetic programs are responsible for their manifestation. Heritability is a population concept and refers to the fraction of the variation in phenotypes among individuals that is due to additive genetic variance. Additive genetic variance in turn depends on allele frequencies at contributing loci and can be different in different populations if allele frequencies are different. Neither does additive genetic variance convey information about the gene action of contributing loci; additive variance is generated for loci with dominance/recessive gene action as well as epistasis over a wide range of allele frequencies. Heritability can also differ between populations if they experience different environments. Heritability therefore conveys no information about any difference in mean phenotype between populations (Falconer and Mackay, 1996; Lynch and Walsh, 1998).
Quantitative Genetic Variation
Quantitative Trait Loci Mapping Estimation of variance components, particularly the additive genetic variance, is critical for predicting short-term responses to natural or artificial selection. However, addressing the questions raised above regarding the genetic architecture of quantitative traits requires that we know the causal molecular polymorphisms, their gene action (additive, dominance, epistatic effects), and their allele frequencies. This information is required in order to predict individual phenotypes from genotypes (as opposed to mean offspring phenotypes). In this case, both additive as well as non-additive genotypic effects are important, even if they contribute little to the total additive variance of the trait. Mapping the quantitative trait loci (QTLs) affecting natural variation for quantitative traits is achieved using polymorphic marker loci with clear Mendelian segregation in linkage or association mapping populations (Falconer and Mackay, 1996; Lynch and Walsh, 1998). The principles of QTL mapping have been known since the early twentieth century; however, only recently has the discovery of abundant molecular markers, advances in rapid and costeffective genotyping methods, including whole genome sequencing, and the development of sophisticated statistical and computational methods facilitated the molecular dissection of quantitative traits in a wide range of organisms. QTL mapping requires a population for which there is genetic variation for the trait of interest, and in which individuals have been phenotyped for the trait and genotyped for a panel of molecular markers. In a classical linkage mapping population, correlations between the unknown variants affecting the quantitative trait and the molecular marker genotypes (this correlation is called linkage disequilibrium, LD) are generated by producing segregating progeny from two parental lines that are genetically divergent for the trait. In an association mapping population, LD between the causal variants affecting the trait and the molecular markers are generated by many generations of recombination during the population’s history. In both cases, QTLs are detected if there is a difference in trait phenotype between the genotypes at a marker locus (Falconer and Mackay, 1996; Lynch and Walsh, 1998). This simple test is repeated for all markers (or pairs of markers) in a genome scan. Evidence for association is then evaluated after accounting for the multiple tests performed. The choice of experimental design for QTL mapping is largely predicated by the biology of the organism being studied. Linkage mapping has the advantage that all alleles are at intermediate frequencies, which increases the power of mapping; and that LD is generated by the experimental design and not attributable to other causes that could give rise to spurious associations. However, linkage mapping interrogates a small sample of genetic diversity in the population (this can be partially ameliorated by constructing populations from four or eight parental lines); and the precision of mapping is limited by the numbers of recombination events that occur when creating the population. In contrast, association mapping can have greater mapping precision given the many generations of recombination experienced by most outbred populations and samples a wider range of genetic diversity. The drawbacks of association mapping are that very large numbers of marker loci are needed to account for the larger number of recombination
375
events; the power to detect QTLs declines as the allele frequencies decrease (and individual rare alleles cannot be interrogated at all); and LD can be caused by population processes not related to genetic linkage (e.g., population structure that is not accounted for) that can cause spurious associations. Very large sample sizes are required in both linkage and association mapping studies to detect alleles with small to moderate effects with P-values that are low enough to pass multiple testing criteria (Falconer and Mackay, 1996; Lynch and Walsh, 1998).
Lessons Learned Some general conclusions regarding the genetic architecture of quantitative traits have emerged from high-resolution QTL mapping studies in humans and especially in model organisms such as yeast, Drosophila, Arabidopsis, and mice (Mackay, 2001; Flint and Mackay, 2009; Mackay et al., 2009; Mackay, 2014). 1. Most quantitative traits are indeed highly polygenic, with hundreds, if not more, contributing loci with additive effects. The corollary is that the individual additive effects are typically small, and together contribute only a small fraction of the total genetic variation in association mapping populations, especially in humans. This could be because the total additive genetic variance is overestimated due to the contribution of dominance and epistatic variance and/or because the effects of causal variants are underestimated when they are in LD with the marker locus. 2. The genetic basis of variation for quantitative traits inferred from mapping natural variants is typically novel and distinct from that inferred by studies of mutations with large effects. Possibly segregating variation may not be maintained in natural populations at loci that are required for wild type expression of the trait, and, conversely, mutagenesis screens miss (or ignore) loci at which mutations with subtle effects could affect quantitative traits. 3. Variants affecting quantitative traits have highly contextdependent effects; i.e., their effects differ in magnitude and/or direction in different genetic backgrounds (epistasis), different environments (GEI), and between males and females (genetic variation in sexual dimorphism). Thus, additive effects of alleles with highly contextdependent effects may not be detectable when averaged over multiple environments and/or genetic backgrounds. On the other hand, context-dependent effects are important. Epistatic interactions reveal genetic networks affecting quantitative traits, and genotype by environment variation and genetic variation in sexual dimorphism are potential mechanisms maintaining genetic variation for quantitative traits in natural populations. 4. Molecular variants, not genes, are the relevant unit of observation. Different variants in the same gene can have different effects on the same trait, and different variants in the same gene can independently affect different traits. Therefore, genes but not variants may be highly pleiotropic.
376
Quantitative Genetic Variation
5. Most variation affecting quantitative traits does not cause amino acid changes in protein coding genes, but rather synonymous polymorphisms in coding regions (possibly associated with mRNA stability) and putative regulatory polymorphisms in promoters and introns that could affect transcription factor binding and mRNA splicing, and affect the amount, timing, or tissue-specific pattern of expression. Joint mapping of variants that affect both organismal phenotypes and intermediate molecular phenotypes such as networks of correlated gene expression traits, metabolites, and proteins will yield novel biological insights by which multiple polygenic perturbations affect quantitative traits.
have the tools to address long-standing unanswered questions in evolutionary quantitative genetics, including (but not limited to) the mechanisms maintaining quantitative genetic variation, the polygenic mutation rate, the molecular basis of GEI, the cause of limits to long-term selection, and phenotypic stability in the face of genetic and environmental variation. The future of evolutionary quantitative genetics is bright.
Future Prospects
References
The genomics revolution and advances in computational power now enable us to perform genotype–phenotype mapping studies on the scale and with the density of molecular markers required to simultaneously identify many genes affecting variation for quantitative traits. In the foreseeable future, this will extend to complete genome sequences of all individuals in large mapping populations as well as measurements of gene expression and other molecular endophenotypes, giving unprecedented insights into the biological underpinnings of complex traits, and pleiotropic connections among traits. As the new technologies mature and become cost-effective, all organisms will become model organisms, enabling us to understand the genetic basis of wide-ranging ecological specializations and adaptations. Finally, we now
Falconer, D.S., Mackay, T.F.C., 1996. Introduction to Quantitative Genetics, fourth ed. Harlow, Essex: Addison Wesley Longman. Fisher, R.A., 1918. The correlation between relatives on the supposition of Mendelian inheritance. Transactions of the Royal Society of Edinburgh 52, 399–433. Flint, J., Mackay, T.F.C., 2009. Genetic architecture of quantitative traits in mice, flies and humans. Genome Research 19, 723–733. Lynch, M., Walsh, B., 1998. Genetics and Analysis of Quantitative Traits. Sunderland, MA: Sinauer Associates, Inc. Mackay, T.F.C., 2001. The genetic architecture of quantitative traits. Annual Reviews of Genetics 35, 303–339. Mackay, T.F.C., 2014. Epistasis and quantitative traits: Using model organisms to study gene−gene interactions. Nature Reviews Genetics 15, 22–33. Mackay, T.F.C., Stone, E.A., Ayroles, J.F., 2009. The genetics of quantitative traits: Challenges and prospects. Nature Reviews Genetics 10, 565–577.
See also: Genetic Architecture. Multivariate Quantitative Genetics. Quantitative Genetic Variation, Comparing Patterns of. Quantitative Genetics in Natural Populations