Gene Flow

Gene Flow

Gene Flow 785 selected; as the family becomes too small, the longer products will be selected. This cyclic process will cause a continuous oscillation...

159KB Sizes 1 Downloads 30 Views

Gene Flow 785 selected; as the family becomes too small, the longer products will be selected. This cyclic process will cause a continuous oscillation around a mean in size. However, each contraction will result in the loss of divergent genes, whereas each expansion will result in the indirect `replacement' of these lost genes with identical copies of other genes in the family. With unequal crossovers occurring at random positions throughout the cluster and with selection acting in favor of the least divergence among family members, this process can act to slow down dramatically the continuous process of genetic drift between family members. The second process responsible for concerted evolution is intergenic gene conversion between `nonallelic' family members. It is easy to see that different tandem elements of nearly identical sequence can take part in the formation of Holliday intermediates which can resolve into either unequal crossing over products or gene conversion between nonallelic sequences. Although the direction of information transfer from one gene copy to the next will be random in each case, selection will act upon this molecular process to ensure an increase in homogeneity among different gene family members. As discussed above, information transfer ± presumably by means of gene conversion ± can also occur across gene clusters that belong to the same family but are distributed to different chromosomes. Thus, with unequal crossing-over and interallelic gene conversion (which are actually two alternative outcomes of the same initial process) along with selection for homogeneity, all of the members of a gene family can be maintained with nearly the same DNA sequence. Nevertheless, concerted evolution will still lead to increasing divergence between whole gene families present in different species. See also: Concerted Evolution; Gene Conversion; Globin Genes, Human; Immunoglobulin Gene Superfamily; Molecular Clock; Unequal Crossing Over

Gene Flow J B Mitton Copyright ß 2001 Academic Press doi: 10.1006/rwgn.2001.0508

Gene flow is defined as the movement of genes among populations. The rate of gene flow, m, is the proportion of the gene copies in a population that have been carried into that population by immigrants. Gene flow can be mediated by the dispersal of either gametes or

individuals. But gene flow is not equivalent to dispersal, for gametes or individuals that move among populations but fail to incorporate genes into the gene pool have not mediated gene flow. Population structure, the pattern of genetic variation among populations, is produced by the joint action of gene flow, genetic drift, and natural selection. Genetic drift is change in allelic frequencies produced by accidents of sampling and chance variation in survival, mating success, and family size. Natural selection is defined as the differential reproduction of genotypes.

Elaboration Genetic Drift Differentiates Populations

If populations are not connected by gene flow, stochastic changes will cause them to diverge in time. Imagine a large, genetically diverse, randomly mating population that is suddenly broken apart into two perfectly isolated populations, each considerably smaller than the initial population. Initially, these populations might share the same alleles at similar frequencies. The Hardy±Weinberg Law demonstrates that, in the absence of selection, mutation, and migration, allelic frequencies will not change in an infinitely large population with random mating. But in finite populations, allelic frequencies drift over time, with stochastic variation in survival and reproduction. The stochastic loss of alleles may differ between populations, increasing the genetic distance between them. In addition, mutations may introduce new alleles into populations, further distinguishing them. With sufficient time, perfectly isolated populations will become completely differentiated, so that they do not share any alleles. The rate of genetic drift in a population is dependent on the number of breeding adults in the population. Consider a gene segregating two alleles, A and a, at frequencies p and q, respectively, so that: p ‡ q ˆ 1:0 The standard error (SE) of the allelic frequency is a measure of the magnitude of drift of the frequency of an allele in a single generation. The standard error of the frequency of an allele is: r SE ˆ

pq 2N

where N is the number of breeding adults. Most of the time (95%), the change in frequency will be less than two standard errors. For example, in a population with 1000 breeding adults, and p and q equal to 0.5, the standard error of allelic frequency is 0.01. Thus, in

786

Gene Flow

the next generation, 95% of the time, p will be greater than 0.48 but less than 0.52. However, in a population with the same frequencies but only 10 breeding adults, the standard error is 0.11, so the likely range of p in the next generation will be from 0.38 to 0.72. Thus, the rate of genetic drift increases with diminishing population size.

Gene Flow Tends to Homogenize Populations

Gene flow among populations makes them more similar. This point can be made intuitively by considering an exercise with two glasses of wine, one red and one white. Imagine pouring a small amount from the glass of white wine into the other glass, then swirling it. The red wine will still be red, but careful inspection would reveal that the intensity of the color has diminished. Now pour some of the red wine into the other glass. A few drops of red wine bring a tinge of red to the white wine. Now imagine repeating the exchanges many times. Ultimately, the colors in the two glasses will be identical. Similarly, some gene flow between populations will make them more similar, and high gene flow will make them indistinguishable. The impact of gene flow on population structure can be illustrated quantitatively by modeling genetic variation at a single gene. Now consider gene flow into a population from populations that have different allelic frequencies. If the proportion of migrants into a population is m, and the frequency of A in the migrants is pÅ, then p0 , the new frequency of A in the population, will be: pm ‡ p…1 p0 ˆ 



If gene flow were unopposed by other forces, the populations connected by gene flow would ultimately share the same alleles, at the same frequencies.

Natural Selection Can Overcome Gene Flow Natural selection can oppose the homogenizing effect of gene flow, sustaining genetic differences among populations linked by gene flow. For example, the blue mussel, Mytilus edulis (Figure 1), exhibits an abrupt genetic boundary despite high gene flow. Blue mussels are native to the North Atlantic, and are common in the rocky intertidal. They are dioecious, i.e., an individual is either male or female, and they release their gametes into the water. The gametes unite to form veliger larvae, which are carried by currents for at least 3 weeks. Studies of coastal currents suggest that larvae could be carried more than 100 km, and an estimate of gene flow from genetic data (see below) indicates that blue mussels exchange many individuals among populations each generation. Despite high

levels of gene flow, the mussels in Long Island Sound remain distinctly differentiated from other populations. Long Island Sound receives water from several major rivers (Housatonic, Quinnipiac, Connecticut, Thames), which dilutes the salinity of the Sound to about one half of the salinity of the open ocean. Thus, the Sound is a distinct environment for mussels, which must make physiological adjustment to retain osmotic cell pressure. Variation at the gene coding for leucine aminopeptidase (Lap) plays an important role in the maintenance of cell pressure; some genotypes are most efficient at high salinity, while other genotypes are most efficient at low salinity. Each spring, millions of larvae are carried into Long Island Sound by currents sweeping west along the coast of Rhode Island and Connecticut. But each fall, mortality in the young mussels creates a sharp genetic cline in Lap frequencies near Guilford, Connecticut, where salinity changes abruptly. Although the veliger are capable of dispersing more than 100 km, the genetic cline is only 20 km wide. In addition, studies of both ribbed mussels, Geukensia demissa, and acorn barnacles, Semibalanus balanoides, have reported significant differentiation between the samples taken from the upper and lower portions of the intertidal zone ± distances of one or two meters. These species, like the blue mussel, also have pelagic larvae, and consequently gene flow would homogenize the frequencies of neutral or unselected genes within the intertidal zone. Both cases of differentiation were produced by selection differing among habitats in a heterogeneous environment.

Some Generalizations concerning Gene Flow Dispersal

Although gene flow is not synonymous with dispersal, it is certainly true that long-distance dispersal

Figure 1

(See Plate 17) The blue mussel, Mytilus edulis.

Gene Flow 787 provides the opportunity for long-distance gene flow, and hence for high levels of gene flow among populations. The larvae of some marine mollusks have been documented to be carried by equatorial currents from the coast of Africa to the Caribbean Sea, and we would expect those species to have high levels of gene flow among populations in Africa or in the Caribbean. On the other hand, some marine mollusks brood their young, or attach egg cases to the substrate, severely limiting the opportunity for dispersal, and restricting gene flow. Species that are philopatric with respect to breeding sites, such as salamanders and some species of birds, are characterized by very low gene flow.

Mating System

The mating system can have a profound impact on gene flow. For example, the mating systems of plants can be characterized as predominantly selfing, or predominantly outcrossing, or a mixed system, employing an intermediate balance of selfing and outcrossing. Many species of plants, such as wheat, barley, oaks, and pines, are monoecious, meaning that an individual produces both male and female gametes. Wheat and barley produce their seeds predominantly (> 99%) by selfing. This mating system is characterized by very low gene flow, for there is no gene flow in the fertilization of selfed seed, and the seeds typically disperse less than 2 m. Gene flow is much higher in oaks and pines, which are typically outcrossed and windpollinated. Outcrossed seeds have separate maternal and paternal parents, and the wind pollination provides the possibility that the parents are distant from one another. How far can oak or pine pollen travel? Pollen traps on ships 150 km from shore have captured pine pollen, confirming long-distance dispersal, and providing the opportunity for long-distance gene flow. Behavior can have a major impact on gene flow. Plants with animal pollinators will have gene flow determined by the behavior of their pollinators. Plants pollinated by bees that visit many flowers on a plant before visiting an adjacent plant will have low gene flow. Gene flow mediated by various species of hummingbirds can be low or high, depending on whether the birds defend small territories or are `trapliners,' flying substantial distances between sequential pollinations. Pods of killer whales around the San Juan Islands, Washington State, have distinct feeding behaviors that constrain their social systems and limit gene flow among pods. Some of the pods prey predominantly on marine mammals, such as seals and sea lions, while other pods prey almost exclusively on salmon. Longterm studies of the behaviors of the pods revealed that the pods defend their territories, and are stealthy when they trespass into the territories defended by

neighboring pods. Studies of mitochondrial DNA identified diagnostic differences between mammaleating and fish-eating pods, and suggest that gene flow between these pods had not occurred for 2000 years.

Direct Measurement of Gene Flow The most direct measure of gene flow is to tag permanently an individual at or near its natal site, and then record where it breeds. For example, bird bands, which are amulets or rings placed on a bird's leg, have been used to study gene flow in many species of birds. Tags have been attached to the fins of fish, and tiny bar code signs have been glued on insects. Radio beacons fashioned into collars have revealed the movements of wolves and lynx. Fluorescent dyes prepared as a fine dust have been used to mark birds and small mammals for short periods of time. Radio transmitters have been placed in the stomachs of snakes and beneath the skin of sharks. Mammals have had their coats numbered with bleach or paint. These marking techniques have the advantage of providing clear evidence of dispersal and, if the animal breeds at its destination, evidence of gene flow. They have the disadvantage that they are often labor-intensive, and some of the tags, such as radio transmitters, are both expensive and short-lived. But these techniques cannot be used in all species and, in addition, they provide just a single estimate of gene flow. Because animal behavior is flexible, and can vary among years and generations, tagging studies may not reflect the average gene flow. Finally, population structure may be predominantly determined by historical events rather than the current rate of gene flow.

Inference from Genetic Data Fst Measures the Differentiation of Populations

Fst is a quantitative estimate of the degree of differentiation of populations. Consider a gene segregating two alleles, A and a, at frequencies p and q, respectively. Fst, a standardized variance of allelic frequencies, is defined as: Fst ˆ

S2p pq

where the numerator is the variance of p among populations, and the denominator is the product of the means of the allelic frequencies. The variance of allelic frequencies among populations is calculated as: S2p ˆ

1X … pi d i

p†2

788

Gene Flow

where d is the number of populations, pi are the frequencies of the A allele in the populations, and pÅ is the mean of the frequencies. Fst is zero if all populations have the same alleles at the same frequencies, and 1.0 for two populations fixed for different alleles. Fst will increase over time between isolated populations, and because genetic drift increases with decreasing population size, the rate of divergence increases with decreasing population size. The degree of differentiation among populations will come to an equilibrium that reflects a balance between genetic drift and gene flow. The relationship between differentiation and gene flow is: Fst ˆ 1=…4Nm ‡ 1† or, equivalently, Nm ˆ …1=Fst 1†=4 where N is the number of breeding individuals in a population. Thus, if populations are completely isolated for a long time, Fst will decline to zero, but if just one member of a population is a new immigrant (e.g., Nm ˆ 1) then the rate of gene flow is: mˆ

1 N

and the equilibrium value of Fst will be 0.20. Higher rates of gene flow will make the populations even more similar. For example, if the number of immigrants is 5 per generation, then Fst will be less than 0.05, and the populations will be, for all practical purposes, very similar. An important threshold is placed at the rate of gene flow of Nm ˆ 1.0. Effectively, when Nm < 1, gene flow is not sufficient to offset the effects of genetic drift. So populations connected by Nm < 1 will diverge in time, while for populations connected by Nm > 1, gene flow will prevent differentiation by genetic drift.

Inference of Gene Flow in Limber Pine

The organellar genomes of pines are ideal for measuring gene flow, as mitochondrial DNA (mtDNA) has maternal inheritance and chloroplast DNA (cpDNA) has paternal inheritance in pines. These different modes of inheritance allow us to explicitly identify gene flow mediated by pollen and by seeds. In addition, pollen and seeds have disparate potentials for dispersal. The wind-borne pollen have the potential to travel great distances, but in contrast, the seeds of pines usually fall within a circle that has a radius equal to the height of the tree. Limber pine, Pinus flexilis (Figure 2), is native to western North America, where it is primarily restricted to windy ridges and scree slopes from the

Figure 2

(See Plate 18) The limber pine, Pinus flexilis.

Sierra Madre of Mexico to the Canadian Rockies, from Mt Pinos in southern California to the Black Hills of South Dakota. The seeds of limber pine are dispersed and planted by Clark's nutcracker, Nucifraga columbiana. The bird and pine are engaged in a mutualism sculpted by evolution. Limber pine relies on the bird to harvest, disperse, and plant its seeds. Clark's nutcracker relies on limber pine seeds to get through the winter. Both the bird and the pine have evolved morphological traits (a sublingual pouch, wingless seeds) to better serve and exploit their partner. The birds usually cache seeds on windy or south-facing slopes that will be free of snow in winter, and this explains the curious distribution of limber pine. A bird can carry approximately 30 limber pine seeds in its sublingual pouch. When its pouch is full, the bird flies to a propitious site for caching and harvesting seed. The flight distances are highly variable; although the record flight exceeds 20 km, most flights are very short, a few meters to a few hundred meters. The potentials for dispersal of pollen and seed lead biologists to expect high gene flow in genes dispersed by pollen (nuclear genes, cpDNA) and low gene flow for genes dispersed solely by seed. This hypothesis was tested with a study of gene flow among populations of limber pine in the Front Range of Colorado. The populations were distributed from tree line at the Continental Divide to an isolated stand of trees 100 miles to the east, on an escarpment on the Great Plains. Haplotype frequencies were used to calculate Fst for both cpDNA and mtDNA, and gene flow was inferred from Fst with the equation directly above. Fsts were 0.02 and 0.68 for cpDNA and mtDNA, respectively, suggesting that the number of migrants among populations per year are 12.25 for pollen and 0.12 for seeds. The gene flow of cpDNA is high, and should tend to homogenize the frequencies of cpDNA haplotypes and nuclear genes among populations within distances of approximately 100 miles. In contrast, the

Gene Flow 789 gene flow of mtDNA is below the threshold at which the influence of genetic drift predominates. So mtDNA is expected to vary more among populations than nuclear genes and cpDNA, and genetic drift will cause populations to diverge with respect to mtDNA haplotypes.

Private Alleles Estimate Gene Flow

Private alleles, or alleles found only in a single population, can also be used to infer rates of gene flow among populations. The private alleles can be from markers from mtDNA, cpDNA, nuclear DNA, or allozyme markers, and they are usually taken from surveys of geographical variation within a species. For example, a survey of allozyme variation throughout the range might reveal several or many private alleles. The average frequency of the private alleles, pÅ, is plotted on a regression line on a plot of ln(pÅ) on the ordinate versus ln(Nm) on the abscissa. The regression line was estimated from a computer simulation study examining the relationship between genetic drift and gene flow in the determination of the geographical distribution of new mutations. Consider a species that has very low gene flow among populations. When mutation produces a novel allele in a single population, it could drift to moderate or even high frequencies before an individual bearing that allele migrated to another population and reproduced. However, if gene flow in the species was very high, then it is likely that the new mutation would still be at a low frequency when it was successfully introduced to another population. Thus, low gene flow allows private alleles to drift to higher frequencies while high gene flow holds private alleles to low frequencies.

Estimates of gene flow from private alleles are usually, but not always, consistent with estimates from Fst. In a compilation of estimates of gene flow from private alleles, Slatkin (Table 1) found the very highest rate of gene flow in the blue mussel, M. edulis (Nm ˆ 42) (Figure 1). This estimate is probably realistic, for the mussels have pelagic larvae that ride ocean currents for weeks. At the other end of the scale were four species of salamanders, all with values of Nm considerably below 1.0. Once again, this estimate of gene flow seems reasonable given our knowledge of salamanders. Salamanders forage only short distances, and they usually breed in their natal ponds. Consequently, movement of individuals among populations is rare.

Caveats concerning the Relationship of Fst to Nm The relationships between Nm and Fst and between Nm and the frequency of private alleles are both dependent on assumptions that may be frequently violated in the data collected in range-wide surveys of genetic variation.

Assumption of `Evolutionary Equilibrium'

The inference of rates of gene flow from either Fst or private alleles depends on the assumption that there has been sufficient time for population structure to come to an evolutionary equilibrium determined by the joint action of gene flow and genetic drift. Conformation to this assumption is rarely considered, but some biologists believe that very few species have reached equilibrium. For example, limber pines were

Table 1 Estimates of the number of migrants moving among populations (Nm) from the average frequency of private alleles (pÅ(1)) Common name

Formal name

pÅ(1)

Nm

Blue mussel Fruit fly Milkfish Desert lizard [Annual plant] Pacific treefrog Valley pocket gopher Pacific slender salamander Red back salamander Oldfield mouse Camp's slender salamander Zigzag salamander

Mytilus edulis Drosophila willistoni Chanos chanos Lacerta melisellensis Stephanomeria exigua Hyla regilla Thomomys bottae Batrachoseps pacifica Plethodon cinereus Peromyscus polionotus Batrachoseps campi Plethodon dorsalis

0.008 0.014 0.030 0.066 0.054 0.081 0.087 0.117 0.200 0.158 0.338 0.294

42.0 9.9 4.2 1.9 1.4 1.4 0.86 0.64 0.22 0.31 0.16 0.10

Note: values of Nm have been adjusted for the sample sizes, so there is not a perfect rank-order correlation between pÅ(1) and Nm. (Adapted from Slatkin, 1985.)

790

Gene Frequency

displaced from high elevations by the glaciers that reached their most recent glacial maximum 18 000 years ago. Once the glaciers subsided, limber pine were able to colonize numerous sites above 10 000 feet in the Rocky Mountains, where limber pines commonly attain ages in excess of 1000 years. The populations with ancient trees are certainly not at an evolutionary equilibrium between drift and gene flow, for very few of their generations have passed since they recolonized high elevations. Similar scenarios apply to the plants and animals that moved northward in North America and Europe since the last glacial maximum.

Heterogeneity among Estimates

In studies of gene flow based on Fst, the values of Fst are commonly heterogeneous. This should not be the case for neutral characters, for migration and drift should influence all loci in similar ways. The relationship between Fst and Nm is appropriate only for neutral genes; selection on a subset of the loci can produce heterogeneous estimates of Fst. One of the most striking cases of heterogeneity of estimates of gene flow comes from a series of studies of the American oyster, Crassostrea virginica. Estimates of gene flow from allozyme markers suggest that the larvae move great distances, homogenizing allelic frequencies from Massachusetts to Texas. However, both mtDNA and several nuclear DNA markers reveal a picture of limited gene flow, with a major barrier to gene flow in the vicinity of Cape Canaveral, Florida. The authors attribute the heterogeneity of estimates of gene flow to balancing selection on the allozyme loci. Heterogeneity of estimates of gene flow frequently involve lower estimates of Fst from microsatellite loci than from other nuclear markers. The differences are particularly pronounced when the populations are well differentiated, and gene flow between them is low. This heterogeneity is attributable to heterogeneous mutation rates. While the mutation rates for nuclear loci are typically 10 6±10 8, mutation rates for microsatellite loci are much higher, often around 10 3, but reaching 1/20. High mutation rates at microsatellite loci are due to the nature of the variation at these loci. Microsatellite alleles differ in their numbers of tandem repeats, and the different sizes of the alleles produces chromosomal rearrangements when chromosomes are unable to synapse perfectly in the first division of meiosis. The high rates of mutation generate many size variants in each population. For microsatellite loci, the sharing of alleles among populations may be due to independent mutations, rather than gene flow. Biologists using genetic data to infer rates of migration are obliged to be cognizant of the assumptions

underlying their methods. If there are egregious violations of the assumptions, estimates of gene flow may be unreliable.

Further Reading

Avise JC (1994) Molecular Markers, Natural History and Evolution. New York: Chapman & Hall. Endler JA (1977) Geographic Variation, Speciation, and Clines. Princeton, NJ: Princeton University Press. Futuyma DJ (1998) Evolutionary Biology, 3rd edn. Sunderland, MA: Sinauer Associates. Latta RG and Mitton JB (1997) A comparison of population differentiation across four classes of gene marker in limber pine (Pinus flexilis James). Genetics 146: 1153±1163. Mitton JB (1997) Selection in Natural Populations. New York: Oxford University Press. Slatkin M (1985) Gene flow in natural populations. Annual Review of Ecological Systems 16: 393±430.

Reference

Slatkin M (1985) Rare alleles as indicators of gene flow. Evolution 39: 53±65

See also: Genetic Colonization; Genetic Drift; Genetic Migration; Hybrid Zone, Mouse; Phylogeography; Population Genetics; Population Substructure

Gene Frequency C F Aquadro Copyright ß 2001 Academic Press doi: 10.1006/rwgn.2001.0509

Gene frequency refers to the proportion of a population that carries one type of variant, or allele, at a locus. More appropriately referred to as `allele frequency,' gene frequency ranges from 0 (where the particular variant is absent from the population) to 1 (where the variant type is the only allele present). In the latter case, the population is said to be `fixed' for this particular allele. While often defined in terms of a locus or gene and, in the early days of genetics, assessed by phenotype of the corresponding genotype, the gene frequency is now applied to the frequency of any alternative form found segregating in a population, e.g., alternative nucleotides at a single site in a sequence, whether it be in coding, intron, or intragenic regions, as well as insertion/deletion variants and even alternative gene rearrangements such as inversion types. Gene frequency is estimated by taking a random sample of individuals from what might be considered a population of the species of interest (e.g., from a