R34
Dispatch
Plant genetics: Seeing selection in S allele sequences D. Charlesworth and D.S. Guttman
New data on allelic sequence diversity in natural populations provide evidence for natural selection acting on the self-incompatibility loci of two plant species; there are interesting parallels with, and differences from, other polymorphic systems such as mammalian MHC loci. Address: Department of Ecology and Evolution, University of Chicago, 1101 E 57th St, Chicago, Illinois, 60637-1573, USA. Electronic identifier: 0960-9822-007-R0034 Current Biology 1997, 7:R34–R37 © Current Biology Ltd ISSN 0960-9822
The self-incompatibility (S) genes of flowering plants are, with the fungal incompatibility and mammalian major histocompatibility (MHC) loci, among the most highly polymorphic loci known. In Oenothera organensis, for example, 35 alleles have been detected [1] by laborious compatibility tests between plants of this endemic species, the total population size of which may not exceed 5000 [2]. Different populations of the same species have usually been found to have highly overlapping sets of alleles [3], though only four of twenty-nine alleles were common to Turkish and Japanese populations of Brassica campestris [4]. The self-recognition aspect of plant incompatibility — pollen carrying alleles in common with a potential recipient plant is recognized as ‘self’ and prevented from fertilizing ovules — interests cell biologists [5,6], while its impressive polymorphism has attracted the interest of population geneticists. The maintenance of variability is the simplest aspect of the polymorphism to explain [1,7]. Rare alleles have a fertility advantage because pollen carrying such an allele will not be rejected by incompatibility reactions of recipient plants; each new allele that arises will therefore tend to increase in frequency. In a large enough population, this ‘frequency-dependent’ selection favours each new incompatibility-type allele until an equilibrium is reached with equal allele frequencies. Mutation can thus lead to a large number of alleles being present in a population [1,7]. Allele numbers depend on population sizes, as alleles may be lost by chance in small populations, and also on the rates of mutation to new specificities, as frequent mutation tends to increase allele numbers [1,7]. The fertility advantage of low-frequency alleles also hinders chance loss of alleles from any population, and soon restores lost alleles if there is gene flow from other populations in the species. Alleles should thus be maintained in species for very long evolutionary times (Fig. 1) [1,7,8].
Molecular genetic studies of S alleles have recently opened the way for renewed population studies. In several self-incompatible plant species, alleles have been cloned that segregate with the incompatibility types of plants in families, and encode sequences of cosegregating incompatibility proteins (see [9,10]). In the two best studied plant families, Solanaceae and Brassicaceae, enormous differences were found between alleles for different incompatibility types. Probes based on one cloned allele rarely detect other alleles by hybridization, even under low stringency, unless the alleles are unusually similar, as in the case of two alleles from Solanum chacoense that are 95 % similar, the mature proteins differing by only ten amino acids [11]. Multiple amino-acid differences between alleles are consistent with the long-term maintenance of
Figure 1
Lines of descent of S alleles in four hypothetical populations descended from a common ancestor and with three incompatibility types (indicated by colour differences). Mutations to new allelic types are indicated by bifurcations in the lines, and changes in sequence within an incompatibility type by changes in the intensity of the colour. Genetic differences between alleles should accumulate in proportion to the times since they separated by mutation to different specificities, just as if there were no selection [19] but with a much longer timescale. Without the selective maintenance of a polymorphism, alleles at a locus should all trace back to a common ancestor an average of 4N generations ago, where N is the population size, but, when the different alleles are maintained by selection, the expected time can be increased enormously [8]. Species A: only one ancestral allelic type now found — the other ancestral types once present were lost during the history of this species — and mutation has occurred to form two new incompatibility types. Species B: two ancestral allelic types. Species C: all three ancestral allelic types remain present; two derived representatives of the blue type are now found. Species D: underwent a bottleneck in population size, reducing the number of ancestral allelic types represented in the modern population, and new types have arisen since this event.
Dispatch
allele polymorphisms — the common ancestor of two alleles may even have predated speciation (Fig. 1) [12,13].
R35
Figure 2 0.65
Richman et al. [16] used reverse transcriptase (RT) PCR to amplify S RNA sequences in pistils of S. carolinense and P. crassifolia plants; they analysed a 390 base region, encoding about 130 of the total 220 amino acids. Digestion of genomic DNA with restriction enzymes revealed sequence differences between alleles and showed that all the plants studied were heterozygotes, as expected at this locus. The incompatibility types of the alleles sequenced are not yet known for the plants in this study but, for a few Physalis families studied in detail, plants with the same two S allele sequences were shown to be invariably incompatible with one another, whereas plants with allelic differences were compatible, confirming that the sequences are indeed those of S alleles. The new S. carolinense and P. crassifolia population data confirm earlier findings that alleles differ at multiple sites (Fig. 2). They also yield the best estimates to date of the average numbers of substitutions between S alleles (though only alleles producing different band mobilities were sequenced). In S. carolinense, the average numbers of synonymous (silent) and non-synonymous (amino-acid changing) differences between allele pairs were 0.51 and 0.38, respectively. These differences suggest that the alleles diverged a very long time ago; indeed, these alleles mingle in the phylogenetic tree constructed from their sequences with those from other species in the genera Petunia and Lycopersicon, and even Nicotiana (believed to have diverged from Solanum about 30 million years ago). Unlike the anciently diverged MHC loci, where sets of recombinationally rearranged sequences with relatively simple patterns of differences can be identified [17], the extensive sequence differences between S alleles prevents recognition of the same allelic types in different species. The S. carolinense and P. crassifolia S allele sequences share some interesting features [16]. In both, the alleles fall into
S. carolinense S alleles Human MHC alleles 0.20
Frequency
0.15
0.10
Heterozygosity
0.88
0.80
0.84
0.76
0.72
0.68
0.60
0.64
0.56
0.52
0.48
0.40
0.44
0.36
0.32
0.28
0.20
0.24
0.16
0.12
0.08
0
0
0.05
0.04
Until recently, this diversity hindered study of these alleles in populations. As sequence information has accumulated, however, patterns of variability have emerged [9], and conserved regions have been discovered that can be used to design primers for analysis using the polymerase chain reaction (PCR). This makes it fairly straightforward to obtain S allele sequences, which is valuable for typing plants of self-incompatible horticultural species [14,15] and permits, for the first time, the study of individuals in natural populations. New data on alleles from two species in the Solanum family — two populations of Solanum carolinense and one of Physalis crassifolia — have now provided the first extensive evidence on the polymorphism of S alleles in natural populations [16].
© 1997 Current Biology
Distributions in the levels of diversity at different codons in thirteen S. carolinense S alleles, and fourteen human MHC alleles from the highly polymorphic DBQ locus.
two groups. In S. carolinense, the four alleles in one group differ by 12 % or less at synonymous sites, whereas in the other alleles differ by more than 22 % from one another and the alleles in the first group (Fig. 3) [18]. Alleles that are more closely related tend to have higher ratios of nonsynonymous to synonymous changes than do more diverged alleles. The P. crassifolia population shows a similar pattern, though in this species the pattern agrees more closely with the expectation from population genetics theory of an accelerating trend for increasing numbers of lineages as the present time is approached [19]. One of the two groups has four alleles and the other twenty-four alleles, and there are many differences — implying long evolutionary times — between sequences in different sets. A 59 base region of the S allele sequences, including the most diverse portion, was analysed in detail [16]. The average ratio of non-synonymous to synonymous changes (Pn/Ps) is 1.35 for pairs of alleles in the same set, with Pn in the range 0–45 %, (Fig. 3); alleles belonging to different sets differ much more (Pn ~40–61 %), but with a slight bias towards synonymous differences (Pn/Ps = 0.89). As natural selection usually eliminates alleles coding for proteins with variant amino-acid sequences, finding excess amino-acid substitutions indicates that ‘diversifying’ selection has occurred favouring new sequences, as found for other loci involved in recognition processes [20]. In several of these other systems, excess non-synonymous changes were also found mainly in comparisons of alleles that are relatively similar [20,21]. These patterns suggest that new allelic types differing in amino-acid sequence are favoured by selection but, once a new type is established, the presence of conserved sites, and sometimes restrictions in the amino acids allowed at variable sites, probably limit further protein sequence divergence [22].
R36
Current Biology, Vol 7 No 1
Figure 3
To gain further insight, it may be helpful to study sequences of other reference loci that are not polymorphic. One interesting comparison would be with flanking sequences from regions near the S locus. No natural population data are currently available, but there is already evidence that these regions are even more diverged than the coding sequences [23], as expected if they are subject to weaker selective constraints and can accumulate base changes almost indefinitely. If this is confirmed, it would suggest that natural selection acts to conserve sequences of at least parts of the S loci, as suggested by the differing levels of diversity in different regions of the locus itself.
30
P. crassifolia within clusters P. crassifolia between clusters S. carolinense
25
Frequency
20
15
10
0.80
0.76
0.72
0.68
0.60
0.64
0.56
0.52
0.48
0.40
Ps
0.44
0.36
0.32
0.28
0.20
0.24
0.16
0.12
0.08
0
0
0.04
5
© 1997 Current Biology
Distributions of numbers of synonymous sequence differences (Ps) between S. carolinense and P. crassifolia S alleles, showing clusters of relatively similar alleles and sets of more divergent alleles in both species. See text for an explanation of the P. crassifolia allele clusters.
Despite this generally similar picture, the two species differ both in allele numbers and in their apparent ages. From the numbers found in samples, it was estimated that S. carolinense has thirteen to fifteen, and P. crassifolia about forty-three or forty-four, S alleles [16]. These estimates are surprising in view of the clear evidence for long-term maintenance of the allelic polymorphism. The high degree of sequence variation between the S. carolinense alleles suggests that these allelic lineages are extremely ancient. That the alleles diverged from their common ancestor so long ago is hard to reconcile with there being so few alleles. The P. crassifolia lineages cluster much more than do the S. carolinense lineages, but this may be just a chance difference as, within independently evolved populations, the lines of descent of alleles will, of course, differ [19]. The fact that the larger cluster of P. crassifolia alleles are all most closely related to one another, rather than to any allele in another species — unlike the S. carolinense alleles — suggests a less ancient origin of these alleles, perhaps because of a population bottleneck (see Fig. 1). If so, however, the origin must have been so long ago that a clear picture is now obscured, as there is great divergence even within this cluster. Indeed, the divergence between allelic sequences makes it very difficult to infer reliably the very ancient history of these alleles and the populations in which they have been maintained. It appears from the sequences now available that many sites in these loci have undergone multiple changes to the point that the numbers of substitutions, and their order of occurrence, are obscure — that is, the sites are saturated and no longer accurately indicate the times since divergence began.
It would also be interesting to know how frequently alleles with the same specificity differ at silent or replacement sites. Differences within populations will probably be rare, because — where there are many alleles and thus few copies of each — genetic drift should ensure recent common ancestry of all extant alleles of the same incompatibility type. Thus no, or few, neutral differences should have arisen between alleles of any given type. Most sequences will therefore identify different alleles, and this will give a good way to find total allele numbers, which until now has required very laborious pollination experiments. To date, the few S alleles of matching type that have been sequenced from a single population have proved to be identical [10]. However, two synonymous differences were found in the coding sequence of Papaver rhoeas S alleles from British and Spanish populations [10], suggesting that appreciable time has elapsed since alleles from these populations had common ancestors, and additional examples may be uncovered once the incompatibility types corresponding to the P. crassifolia and S. carolinense S allele sequences are known. Such data may help identify which portions of S proteins are involved in recognition, by highlighting sites at which amino-acid differences leave incompatibility type unchanged or regions where amino-acid substitutions are rarer than expected from the numbers of silent differences. In the case of the mammalian MHC loci, data of this type suggest that selection has acted to conserve sequences of particular allelic types, in addition to the diversifying selection for new allelic types [17]. Sequences of alleles of different specificity do not give this information, because their long divergence times have led to the accumulation of too many differences for us to be able to identify those that may have functional significance. Such comparisons alone can help identify conserved sites, such as those involved in the ribonuclease function of the S proteins of Solanaceae [24], but not the sites that are important in recognition. In the fungus Podospora anserina, it has been shown by this approach that specificity differences are due to single amino-acid differences, although different alleles
Dispatch
have multiple differences [25]. Population genetic analyses should thus help us to define functional regions of S proteins, and this further understanding should in turn lead to more refined analyses of the evolution of S allele polymorphism. References 1. Wright S: The distribution of self-sterility alleles in populations. Genetics 1939, 24:538–552. 2. Ellstrand NC, Ritter K, Levin DA: Protein polymorphism in the narrow endemic Oenothera organensis. Evolution 1979, 33:534–542. 3. Lawrence MJ, Lane MD, O’Donnell S, Franklin-Tong VE: The population genetics of the self-incompatibility polymorphism in Papaver rhoeas. V. Cross-classification of the S-alleles from three natural populations. Heredity 1993, 71:581–590. 4. Nou IS, Watnabe M, Isogai A, Hinata K: Comparison of S-alleles and S-glycoproteins between two populations of Brassica campestris in Turkey and Japan. Sex Plant Reprod 1993, 6:79–86. 5. Li X, Nield J, Hayman D, Langridge P: Thioredoxin activity in the C terminus of Phalaris S protein. Plant J 1995, 8:133–138. 6. Rudd JJ, Franklin FCH, Lord JM, Franklin-Tong VE: Increased phosphorylation of a 26-kD pollen protein is induced by the selfincompatibility response in Papaver rhoeas. Plant Cell 1996, 8:713–724. 7. Wright S: The distribution of self-sterility alleles in populations. Evolution 1964, 18:609–619. 8. Vekemans X, Slatkin M: Gene and allelic genealogies at a gametophytic self-incompatibility locus. Genetics 1994, 137:1157–1165. 9. Sims TM: Genetic regulation of self-incompatibility s1. Crit Rev Plant Sci 1993, 12:129–167. 10. Walker EA, Ride JP, Kurup S, Franklin-Tong VE, Lawrence MJ, Franklin FCH: Molecular analysis of two functional homologues of the S3 allele of the Papaver rhoeas self-incompatibility gene isolated from different populations. Plant Mol Biol 1996, 30:983–994. 11. Saba-El-Leil MK, Rivard S, Morse D, Cappadocia M: The S11 and S13 self incompatibility alleles in Solanum chacoense are remarkably similar. Plant Mol Biol 1994, 24:571–583. 12. Ioerger TR, Clark AG, Kao T-H: Polymorphism at the selfincompatibility locus in Solanaceae predates speciation. Proc Natl Acad Sci USA 1990, 87:9732–9735. 13. Dwyer KG, Balent MA, Nasrallah JB, Nasrallah ME: DNA sequences of self-incompatibility genes from Brassica campestris and B. oleracea: polymorphism predating speciation. Plant Mol Biol 1991, 16:481–486. 14. Brace J, King GJ, Ockendon DJ: A molecular approach to the identification of S-alleles in Brassica oleracea. Sex Plant Reprod 1994, 20:203–208. 15. Janssens GA, Goderis IJ, Broekaert WF, Broothaerts W: A molecular method for S-allele identification in apple based on allele-specific PCR. Theoret Appl Genet 1995, 91:691–698. 16. Richman AD, Uyenoyama MK, Kohn JR: Allelic diversity and gene genealogy at the self-incompatibility locus in the Solanaceae. Science 1996, 273:1212–1216. 17. Gyllensten UB, Lashkari D, Erlich HA: Allelic diversification at the class II DQB locus of the mammalian major histocompatibility complex. Proc Nat Acad Sci USA 1990, 87:1835–1839. 18. Richman AD, Kao T-h, Schaeffer SW, Uyenoyama MK: S-allele sequence diversity in natural populations of Solanum carolinense (Horsenettle). Heredity 1995, 75:405–415. 19. Hudson RR: Gene genealogies and the coalescent process. Oxf Surv Evol Biol 1990, 7:1–45. 20. Lee Y-H, Vaquier VD: The divergence of species-specific abalone sperm lysins is promoted by positive Darwinian selection. Biol Bull 1992, 182:97–104. 21. Hughes A, Ota T, Nei M: Positive Darwinian selection promotes charge profile diversity in the antigen-binding cleft of class I major-histocompatibility-complex molecules. Mol Biol Evol 1990, 76:515–524. 22. Tanaka T, Nei M: Positive Darwinian selection observed at the variable region genes of immunoglobulins. Mol Biol Evol 1989, 6:447–459. 23. Coleman CA, Kao T-H: The flanking regions of Petunia inflata Salleles are heterogeneous and contain repetitive sequences. Plant Mol Biol 1992, 18:725–737.
R37
24. Royo J, Kunz C, Kowyama Y, Anderson MA, Clarke AE, Newbiggin E: Loss of a histidine residue at the active site of S-ribonuclease is associated with self-compatibility in Lycopersicon peruvianum. Proc Nat Acad Sci USA 1994, 91:6511–6514. 25. Deleu C, Clave, C, Begueret J: A single amino acid difference is sufficient to elicit vegetative incompatibility in the fungus Podospora anserina. Genetics 1993, 135:45–52.