Recent progress in reconstructing angiosperm phylogeny

Recent progress in reconstructing angiosperm phylogeny

-PLANT August 2000 17/7/00 9:04 am Page 330 trends in plant science Reviews 30 Müller, M. and Knudsen, S. (1993) The nitrogen response of a barley...

253KB Sizes 0 Downloads 42 Views

-PLANT August 2000

17/7/00

9:04 am

Page 330

trends in plant science Reviews 30 Müller, M. and Knudsen, S. (1993) The nitrogen response of a barley C-hordein promoter is controlled by positive and negative regulation of the GCN4 and endosperm box. Plant J. 4, 343–355 31 Rao, V.V. et al. (1992) Developmental changes of L-lysine–ketoglutaric acid reductase in rat brain and liver. Comp. Biochem. Physiol. B Biochem. Mol. Biol. 103, 221–224 32 Markovitz, P.J. et al. (1984) Familial hyperlysinemias. J. Biol. Chem. 259, 11643–11646 33 Deleu, C. et al. (1999) Three new osmotic stress-regulated cDNAs identified by differential display polymerase chain reaction in rapeseed leaf discs. Plant Cell Environ. 22, 979–988 34 Feller, A. et al. (1994) Repression of the genes for lysine biosynthesis in Saccharomyces cerevisiae is caused by limitation of Lys14-dependent transcriptional activation. Mol. Cell. Biol. 14, 6411–6418 35 Verhage, M. et al. (2000) Synaptic assembly of the brain in the absence of neurotransmitter secretion. Science 287, 864–869

36 Lam, H.M. et al. (1998) Glutamic acid receptor genes in plants. Nature 396, 125–126 37 Nanjo, T. et al. (1999) Biological functions of proline in morphogenesis and osmotolerance revealed in antisense transgenic Arabidopsis thaliana. Plant J. 18, 185–193

Paulo Arruda*, Edson L. Kemper, Fabio Papes and Adilson Leite are at the Centro de Biologia Molecular e Engenharia Genética, Universidade Estadual de Campinas, 13083-970, Campinas, SP, Brazil. Paulo Arruda is also in the Departamento de Genética e Evolução, IB, Universidade Estadual de Campinas, 13083-970, Campinas, SP, Brazil. *Author for correspondence (tel 155 19 788 1137; fax 155 19 788 1089; e-mail [email protected]).

Recent progress in reconstructing angiosperm phylogeny Robert K. Kuzoff and Charles S. Gasser In the past year, the study of angiosperm phylogeny has moved from tentative inferences based on relatively small data matrices into an era of sophisticated, multigene analyses and significantly greater confidence. Recent studies provide both strong statistical support and mutual corroboration for crucial aspects of angiosperm phylogeny. These include identifying the earliest extant lineages of angiosperms, confirming Amborella as the sister of all other angiosperms, confirming some previously proposed lineages and redefining other groups consistent with their phylogeny. This phylogenetic framework enables the exploration of both genotypic and phenotypic diversification among angiosperms.

U

nderstanding the phylogenetic relationships among the principal lineages, or clades (Box 1), of angiosperms is essential for elucidating the evolutionary events that underlie the diversification and ascension of this ecologically dominant plant group. We also need to reconstruct flowering-plant phylogeny to facilitate comparative studies of plant development, metabolism, reproduction, pathology and genomics. For these and other reasons, reconstructing angiosperm phylogeny has been a major goal of plant systematists. The state of knowledge before 1999

Attempts to unravel the overall phylogeny of angiosperms through cladistic analysis date back more than a decade1,2. Goals of such studies include identifying the composition of major lineages, the relationships among them and the earliest lineages (first-branching clades) of flowering plants. Analyses reported before 1999 were typically based on relatively small non-molecular2,3 or single-gene4–6 data matrices, with some exceptions7,8. Many results generated during this period constituted noteworthy advances that were largely upheld by subsequent work. For example, several clades were identified, including the eudicots, rosids and asterids; some previously proposed groups, including the Hamameliidae and Dilleniidae, were also shown to be assemblages of distantly related species2,4–6,8,9. However, although a potentially accurate picture of angiosperm phylogeny was taking shape, the plant-systematic and larger biological communities did not place great confidence in it. In addition to obvious instances of conflict among the earlier studies, systematists were aware of several other problems that tempered 330

August 2000, Vol. 5, No. 8

their enthusiasm. One major concern was that statistical support for putative clades and the relationships among them was generally low, if investigated. A second concern was that earlier studies relied exclusively on parsimony as an optimality criterion in data analysis. However, in parsimony analyses of DNA sequences, long branches in a tree separated by short internodes can attract each other artifactually because of chance substitutions of identical nucleotides at homologous sequence positions10,11. Such long-branch attraction can be engendered by using distantly related outgroups. This is because the branch leading to the outgroups attracts another long branch to the base of the ingroup (Box 1). Alternatively it can be engendered by insufficient taxon sampling, because taxonomically large groups are represented only by sparse, long branches in an analysis9,12–14. A third concern about these earlier studies was that the available analysis protocols and computer programs employed were not well suited to analysing complex phylogenies (those with large numbers of taxa5,15,16). Consequently, analyses of some complex phylogenies had to be stopped by the investigators before they could be completed4–6. Finally, it became clear that the amount of data being analyzed was not sufficient to resolve the phylogenetic problems addressed, both because there were too few phylogenetically informative characters9,12,15 and because some of the apparently informative characters were potentially biased and misleading9,17. Breakthroughs during the past year

Beginning in late 1999, several more-rigorous, multigene studies have been published that address phylogenetic relationships among

1360 - 1385/00/$ – see front matter © 2000 Elsevier Science Ltd. All rights reserved. PII: S1360-1385(00)01685-X

-PLANT August 2000

17/7/00

9:04 am

Page 331

trends in plant science Reviews

Five recent studies

Tree A hypothesis of phylogenetic relationships (branching order) for a group of species (sometimes genes) inferred through phylogenetic analysis25–27,32 of DNA or protein sequences, non-molecular data, or a combination of these, sampled from each species (Fig. I). The root is generally determined by the position of the branch connecting outgroup (black) and ingroup (colored) species. The first branch (earliest-diverging lineage) within the ingroup is the sister group to all other ingroup species. The most recent common ancestor (MRCA) of all ingroup species is represented by the node adjacent to the root of the tree. A tree is composed of hierarchically nested clades (phylogenetic branches or lineages) of organisms, each comprising a MRCA for that group and all its descendants. All nodes in a completely resolved tree are bifurcating. An unresolved node is a polytomy (clade Y). Outgroup Species First branch of ingroup

Root of tree

Species A

Species G

Species B MRCA of A to G Species C

Clade Y Clade Z

Species D

MRCA of B to F

Species E

Species F Fig. I.

Clade X

One of the recent landmark studies18 focused on relationships near the phylogenetic root of angiosperms, using a fairly novel approach to identify these earliest branches that does not involve sampling outgroups. All other seed plants are distantly related to angiosperms and their use as outgroups might hinder phylogeny reconstruction. To circumvent this problem, species were sampled for two paralogous phytochrome genes, phyA and phyC, that are duplicated only among angiosperms. Sequences of these genes from 26 species (chosen to represent the angiosperm lineages thought to be the most basal) were aligned and analyzed in concert. In the resulting phylogenetic network (an unrooted tree), there is a branch that connects two largely symmetrical halves; one comprising all the phyA genes, the other all the phyC genes. The position of this connecting branch was used to root both halves, producing two highly concordant trees. Several studies before 1999 indicated the positive effect on phylogenetic accuracy of increasing the number of characters sampled per species9,12,13,15. In a recent study22, the sequences of 17 chloroplast genes (~13.4 kb) were sampled from each of 21 species, representing three gymnosperm and 18 putative basal angiosperm lineages. Primary goals of this study were to identify the first-branching lineages of angiosperms, to assess the effects of greatly increased character sampling and to determine the phylogenetic usefulness of 14 previously untested chloroplast genes. Importantly, phylogenetic analyses were conducted with and without gymnosperm outgroups, to explore whether this would affect the relationships at the base of the angiosperms22. On the basis of several Monte Carlo computer simulations (method reviewed in Ref. 28), it has been argued that the judicious addition of species to phylogenetic analyses of taxonomically large groups can break up otherwise long branches and improve the accuracy of results12–14. A third recent study19 attempted to improve the accuracy of an inferred angiosperm phylogeny by increased sampling of both the number of species (560) and the nucleotides per species (4733) relative to earlier studies4–8. The genes sampled were

Box 1. Glossary

Sister to rest of ingroup

principal angiosperm lineages18–24. This surge in publications has been facilitated by extensive collaboration among international researchers, automated sequencing technology, advances in phylogenetic analysis software and access to increasingly powerful personal computers. Some recent inquiries focus on relationships near the phylogenetic root of the flowering plants18,20–22 and others explore a broader range of angiosperm phylogeny19,23,24. These studies use data matrices of between two18,23 and 17 (Ref. 22) complete, aligned gene sequences, and include up to 560 species19,24. Inferring trees from such large data matrices and assessing the statistical support for recovered clades have been facilitated by innovations in phylogenetic analysis software16,25–27. Major relationships elucidated by these analyses are generally not novel but, collectively, they provide abundant corroboration for each other as well as previously unrealized levels of statistical support. Although the strategies used in recent investigations differ appreciably, taken together, they strongly support a suite of conclusions for overall angiosperm phylogeny18–22,24. First, these recent studies identify Amborella trichopoda as the sister group of all other angiosperms. Second, the next two successive branches in the angiosperm tree are the Nymphaeales and the ‘ITA’ (Illiciaceae, Schisandraceae, Trimeniaceae and Austrobaileyaceae) clade; Amborella, the Nymphaeales and ITA are known as the ‘ANITA’ clades20 (Figs 1 and 2). Third, the composition of several major lineages of angiosperms (Table 1) is consistent among all these studies (Table 2). Fourth, these studies, taken together, suggest that several relationships among these lineages have been confidently resolved, although several relationships remain unclear (Fig. 2).

Trends in Plant Science

Bootstrapping A technique for assessing how strongly phylogenetic data support clades in a tree29. The characters in the original data set are sampled, with replacement, until a new data set of the same size as the orig inal is generated. The generated data set is analyzed phylogenetically and a summary tree produced. These two steps, constituting one replicate, are repeated a specified number of times. Bootstrap values represent the percentage of summary trees supporting a particular clade. Replicates can propagate biases present in the original data matrix. Parsimony jackknifing A technique for assessing how strongly phylogenetic data support clades in a tree25. Replicate data matrices are generated through independent and random deletion of characters from the original data matrix until a user-defined fraction [typically 0.5 or 1 2 (1 4 e)] of the original characters remain. Summary trees are produced for each replicate data matrix through parsimony analysis. Jackknife values are the percentage of summary trees that support a clade. Replicates can propagate biases present in the original data matrix.

August 2000, Vol. 5, No. 8

331

-PLANT August 2000

17/7/00

9:04 am

Page 332

trends in plant science Reviews

(a)

(b)

(c)

(d)

Fig. 1. Flowers of representatives from ANITA clades. Amborella, the Nymphaeales and the ITA (Illiciaceae, Schisandraceae, Trimeniaceae and Austrobaileyaceae) clade are known as the ‘ANITA’ clades20. (a) Female flower of Amborella trichopoda (photograph courtesy of Sandra K. Floyd). Amborella comprises one species of dioecious, woody shrub endemic to New Caledonia. Flowers have a spirally arranged, undifferentiated perianth, a small hypanthium and either numerous more-or-less laminar stamens (male flower) or 4–8 distinct, urn-shaped carpels, with unfused margins (female flower). (b) Flower of Nymphaea sp. (Nymphaeales; photograph courtesy of Gregory M. Plunkett). Nymphaea comprises ~50 species of aquatic, rhizomatous cosmopolitanly distributed herbs. Flowers are bisexual with several undifferentiated tepals, numerous more-or-less laminar stamens, numerous united carpels, with fused margins, and an inferior ovary. (c) Flower of Austrobaileya scandens (Austrobaileyaceae, ITA clade; photograph courtesy of Susana Magallón). Austrobaileya comprises one species of evergreen liana found in NE Australia. Flowers have numerous, undifferentiated, spirally arranged tepals, several spirally arranged, more-or-less laminar stamens, and 10–13 distinct carpels with unfused margins. (d) Flower of Illicium sp. (Illiciaceae, ITA clade; photograph courtesy of Douglas E. Soltis), Illicium comprises ~42 species of shrubs and small trees, distributed in eastern Asia and southern North America. Flowers are bisexual, with a spirally arranged, undifferentiated perianth, spirally arranged stamens with short, thick filaments, and carpels with partially fused margins. All descriptions are based on Refs 33,51.

rbcL and atpB from the chloroplast genome, and nuclear 18S rDNA. Because of the increased sampling of species, this study was the most-rigorous test of the monophyly of major angiosperm groups yet published. The many species included in this study merited the use of alternatives to standard computational techniques. For example, they used a recently developed search algorithm called the Ratchet16 (implemented by NONA 2.0, Ref 26) that finds shorter trees in parsimony analyses of complex phylogenies more quickly than other available algorithms. Also, because traditional heuristic parsimony bootstrapping29 with this number of species was impractical, this study used a more rapid method, parsimony jackknifing25 (Box 1) to assess the support for inferred clades (Table 2). The evolution of individual genes or genomes used in phylogeny reconstruction is generally not well understood and might have a negative impact on phylogenetic analyses. In a fourth recent study20, five genes from three genomes (nuclear, chloroplast and mitochondrial) were sampled from 105 species, representing all major gymnosperm and putative basal angiosperm lineages. The sampled genes, encompassing 8733 aligned nucleotides, were the same as those used in the third study, above19, with the addition of two mito332

August 2000, Vol. 5, No. 8

chondrial genes, matR and atp1. Sampling three genomes with potentially different histories and modes of inheritance (the nuclear genome is biparentally inherited but the chloroplast and mitochondrial genomes are generally uniparentally inherited) might mitigate the effects of possible cryptic biases present in the individual genes or genomes. In addition, separate phylogenetic analyses were conducted for each of the five genes included in the study20. For reasons discussed above, a fifth study21 also sampled five genes representing all three plant genomes (totaling 6564 nucleotides) for 51 species from several major angiosperm and gymnosperm lineages. These genes are nuclear 18S rDNA, chloroplast atpB and three mitochondrial genes (mtSSU, cox1 and rps2). In addition, the sequence data were analyzed with and without gymnosperm outgroups and a series of likelihood ratio tests (method reviewed in Ref. 30) were conducted to determine whether the position of Amborella as sister to all other angiosperms was significantly better than alternative candidates for this position. Although the other studies discussed above relied solely on parsimony, which can produce misleading results under certain circumstances10,11, the data in this study21 were reanalyzed using a model-based approach to phylogenetic inference called maximumlikelihood estimation31,32 (MLE). This analysis should be less sensitive to branch-length disparity, provided that the model of sequence evolution used is appropriate. Mutual corroboration

Although there is always room to question phylogenetic results10,11,22, taken together, the five studies reviewed above provide a wellsupported picture of angiosperm phylogeny (Fig. 2; Table 2) that withstands a variety of potential criticisms. Although the effects of taxon12–14 and character9,12,13,15 sampling can dramatically affect the accuracy of inferred phylogenies, altering these factors across a wide range (21 versus 560 species and 2.2 versus 13.4 kb per species; Table 2) did not alter the relationships among basal branches or the composition of major angiosperm lineages. Long-branch attraction, whether it is due to the use of distantly related outgroups or to the presence of long ingroup branches, also does not appear to have had a negative impact on these phylogenetic results: • Relationships near the root of the tree and the composition of major lineages were not affected by the removal of distantly related outgroups from analyses18,21,22. • The first three branches in the trees from the five studies summarized above are not especially long compared with other branches in each of these studies18–22. • MLE provides corroboration for the branch positions that were inferred through parsimony analysis and several likelihood ratio tests detected no better rootings21. Hence, although long-branch attraction among the earliest angiosperm lineages cannot be decisively ruled out, it is unlikely that it affected these results.

-PLANT August 2000

17/7/00

9:04 am

Page 333

trends in plant science Reviews

95/-

Other Conifers

Extant gymnosperms

Pinaceae

100/100/100

Gnetales

Ginkgo 100/-

Cycads

Amborella 100/100 100/100

90/65

Nymphaeales

100/98

ITA

100/99

Chloranthales

Ceratophyllum 98/72

99/83 100/93 71/-

Monocots Magnoliales

Angiosperms

Impact and implications Phenotypic evolution

100/94 100/-

ANITA clades

Finally, these principal findings are not likely to be the result of cryptic-gene or genome biases. The positions of Amborella and the other ANITA clades in independent analyses of at least five individual genes representing all plant genomes6,18,20,23 and five multigene matrices18–22 strongly suggests that all these sources of data contain a significant, concordant phylogenetic signal. By far the simplest explanation for the congruence of these studies and their strong statistical support is that they have each accurately inferred elements of the underlying organismal phylogeny.

Eudicots

The collective findings of the above studies Laurales 100/97 67/97/71 provide a phylogenetic framework that 100/94 Winterales enables additional research on several classi83/cal problems in angiosperm evolution. For Piperales 100/88 example, there is tremendous interest in 100/98 Ranunculales reconstructing the patterns of diversification 86/84 Proteales among angiosperms for a variety of attributes 100/99 100/100 including floral and vegetative morphology, Sabiaceae 91/59 metabolism, modes of reproduction, con100/100 Trochodendrales stituent gene families, and genomes18–22,33. 100/100 Buxaceae and This can be accomplished by mapping those Didymelaceae characters onto individual trees, tracing their 98/87 -/100 Caryophyllales evolution throughout the phylogeny and -/99 Asterids reconstructing character states for ancestral -/100 -/60 taxa (e.g. using MacClade 3.07, Ref. 34). Rosids Comparisons of features of families in the Trends in Plant Science ANITA clades, in their phylogenetic context, provide insight into the origin and early 52 Fig. 2. Simplified ‘supertree’ based on five recent multigene phylogenies for diversification of flowering plants18,20,21,33. angiosperms18–22. The supertree is the strict consensus of 36 shortest trees recovered from a For example, the vegetative body of the combranch-and-bound analysis of clades in individual multigene studies (source trees) using mon ancestor of all extant angiosperms probmatrix representation with parsimony52 and a weighting scheme based on the support values ably had a woody habit, vessel-less wood, of branches in the source trees (,50% 5 1; 51–70% 5 2; 71–90% 5 3; 91–100% 5 4). unilacunar leaf nodes with two traces, leaves Support values for clades in the supertree are based on parsimony bootstrap29 (blue) and with chloranthoid teeth along their margins jackknife25 (red) analyses from two source trees (‘2’ indicates ,50% support). Support values from other source trees are listed in Table 2. Relationships among gymnosperms shown and no ethereal oils. Flowers of this common here are consistent with recent results from analyses of seed plants36. Abbreviations: ITA ancestor probably had an undifferentiated clade, consists of Illiciaceae, Schisandraceae, Trimeniaceae and Austrobaileyaceae. perianth arranged in more than two cycles or series, perianth appendages that were unfused above the base and anthers that shed pollen towards the center of the flower. Carpels in these flowers were urn- Genetic basis of phenotypic evolution shaped (ascidiate), were not attached to one another (apocarpous) In general, understanding genotypic or phenotypic diversification and had margins that did not fuse completely but were closed at requires knowledge of an organismal phylogeny to specify the maturity by secretions. An herbaceous habit, ethereal oils, wood locations, frequency and directionality of character state changes with vessels, differentiated sepals and petals, and complete carpel within a lineage. The phylogenetic results discussed above provide closure probably evolved after the origin of flowering plants. These the requisite framework to dissect the evolution of genes, gene derived features might have contributed to the rapid diversification families and genomes, and to relate these events to morphological of later-evolved angiosperm lineages (Table 1). The emerging pic- evolution. Molecular and phylogenetic dissection of petal and ture of the earliest angiosperms differs appreciably from previously ovule evolution illustrate this point (additional examples are proposed models for the first angiosperms (reviewed in Refs 35–37). reviewed in Ref. 38). Recent results from studies of all the seed-plant groups suggest It has been argued on the basis of comparative morphology that, that the extant gymnosperms form a monophyletic group that is the among angiosperms, superficially similar petals have evolved sevsister lineage to the angiosperms20,36,37 (Fig. 2). This contradicts eral times from either stamens (andropetals) or sepals (bracteosome earlier models that placed one group of gymnosperms, the petals)38. How frequently each has occurred can be elucidated only Gnetales, as the sister to all angiosperms. Consequently, future by the combined analysis of comparative morphology and geneclues to the morphological transitions that led to the origin of the expression and gene-family evolution in the context of angiosperm angiosperms will come largely from fossil33,36 and comparative phylogeny38–40. Recent results indicate that, among the eudicots, sepmolecular-genetic data37,38. arate lineages of AP3 homologs have been recruited in independent August 2000, Vol. 5, No. 8

333

-PLANT August 2000

17/7/00

9:04 am

Page 334

trends in plant science Reviews Genome evolution

Table 1. Several lineages (clades) of flowering plants recognized in recent phylogenetic analyses43 Clade

Families (species)

Examples

Amborella

1 (1)

Amborella trichopoda

Nymphaeales

2 (81)

Water lilies (Nymphaea spp.)

‘ITA’a

4 (95)

Star anise (Illicium verum)

Chloranthales

1 (75)

Chloranthus, Ascarina

Ceratophyllum

1 (6)

Ceratophyllum

Monocots

102 (65 000)

Grasses, orchids, palms

Magnoliales

6 (2700)

Magnolia, tulip tree (Liriodendron tulipifera)

Laurales

7 (3400)

Sassafras, avocado (Persea americana)

Winterales

2 (73)

Winter’s bark (Drimys)

Piperales

4 (3500)

Black pepper (Piper)

Ranunculales

7 (4400)

Buttercup (Ranunculus spp.), poppy (Papaver spp.)

Proteales

3 (1600)

Sacred lotus (Nelumbo), plane (Platanus spp.)

Caryophyllales

26 (9400)

Cactus (Cactaceae), beet (Beta vulgaris), spinach (Spinacia oleracea)

Asterids

107 (87 000)

Sunflower (Helianthus spp.), carrot, tomato, holly (Ilex spp.), snapdragon (Antirrhinum majus)

A comparison of the molecular phylogeny of group-I introns and overall angiosperm phylogeny suggests that group-I introns have been independently acquired ~1000 times by various angiosperms44. Whether these introns have been acquired independently from fungal donors or through horizontal transfer from plants is unclear but it has important implications for the potential for genetic transfer from genetically engineered crop species to local flora. Resolving this question will depend on expanded analyses of group-I introns in the context of angiosperm phylogeny. Phylogenetic analysis of genome size among the grasses reveals a predominantly unidirectional trend towards genomic ‘obesity’45. Whether genomes evolve towards greater size in general among angiosperms can now be explored in other well-defined angiosperm lineages. Genome doubling through polyploidy is a common phenomenon in the history of angiosperms and is especially prevalent in crop species42. The number of polyploid origins and their parentage, consequences for gene evolution and biochemical impact can now be dissected in the context of overall angiosperm phylogeny. Classification

Several authors have discussed the inadequacies of previously available classification systems for flowering plants9,35,46. Foremost Rosids 149 (77 000) Maple (Acer spp.), apple, pea, rose, Arabidopsis among their concerns is that these classifia cations are at odds with robust results of pub‘ITA’ clade comprises Illiciaceae, Schisandraceae, Trimeniaceae and Austrobaileyaceae. lished phylogenetic analyses. In an effort to address this need, the Angiosperm Phylogeny evolutionary events to determine petal identity in the Ranunculales Group (APG), a consortium of over 40 plant systematists, has proand in the Rosids and Asterids39,40. Among the grasses, AP3- posed a working classification of angiosperms into orders and some homolog expression is conserved in petals but other genetic factors higher-level groupings46. Their provisional classification system was have been altered, converting the familiar petals into lodicules38,41. based on results of single4–6 and preliminary multigene19 studies. The Although the directionality of change in these cases is fairly uncon- additional phylogenetic results summarized above are entirely controversial, the exact origins and frequency of these transformations cordant with their classification system, suggesting that it should be remain unclear. Determining these will require testing additional retained and extended. The proposed APG classification is useful for organisms, selected according to the available phylogeny. In addi- facilitating communication within and among disciplines of biology tion, an understanding of the general applicability of other aspects of and presentation of fruitful research to the broader community. floral development in model organisms will be achieved through future phylogenetically informed research42. Future directions The Arabidopsis INNER NO OUTER (INO) gene has been shown Although much of the overall angiosperm phylogeny is now confito be essential for the asymmetric growth of ovule outer integu- dently resolved, polytomies (Box 1) among magnoliid clades, ments43. Gene-expression patterns in wild-type and mutant Ara- Ceratophyllum and the monocots, as well as within the eudicots, bidopsis were compatible with INO acting either to establish require additional study (Fig. 2). The enormous effort that has abaxial–adaxial polarity in ovules or to promote outer integument already been put into these studies might lead to pessimism about outgrowth directly. The phylogenetic results discussed above permit the prospects for additional progress. Fortunately, there are several the selection of angiosperm lineages that are radially symmetrical resources that can be brought to bear on the problem, which will and bitegmic or asymmetrical (polar) and unitegmic in order to test clarify relationships further. For example, several available sources the two models by examining the ancestral function of INO of data have not been fully exploited. Morphological data are being orthologs through comparative gene-expression studies. Tests of reanalyzed in light of the results discussed above to reveal previous these two models through comparative study of expression of INO misleading homology assessments, enhancing their utility in suborthologs isolated from diverse species, selected according to the lat- sequent analyses and facilitating the incorporation of crucial fossil est phylogenies, are in progress [R. Kuzoff et al., unpublished data33,35,36. (http://www.ou.edu/cas/botany-micro/botany2000/section2/ Similar analyses of molecular data9,15,22,23 will reveal hidden tenabstracts/28.shtml)]. dencies in nucleotide substitution and permit the generation of 334

August 2000, Vol. 5, No. 8

-PLANT August 2000

17/7/00

9:04 am

Page 335

trends in plant science Reviews

Table 2. Recent molecular phylogenetic analyses of angiospermsa Study (sequences rbcL analyzed) (Ref. 4)

18S rDNA (Ref. 6)

atpB (Ref. 23)b

phyA, C (Ref. 18)c

17-cp genes (Ref. 22)

3 genes (rbcL, atpB, 18S rDNA)19d

5 genes (rbcL, atpB, 18S rDNA, matR, atp1)20

5 genes (atpB, 18S rDNA, cox1, rps2, mtSSU)21

Genomes (genes) Cp (1)

Nuc (1)

Cp (1)

Nuc (2)

Cp (17)

Nuc (1), Cp (2)

Nuc (1), Cp (2), Mt (2)

Nuc (1), Cp (1), Mt (3)

Nucleotides

1428

1855

1460

2208

~13 400

4733

8733

6564

Species

499

228

357

26

21

560

105

51

Amborella is first branch

2

1

1

1, 92%, 83%

1, 69%

1, 65%

1, 90%

1, 89%

ANITA clades adjacent to root

2

1 (1 some Piperales)

1

1, 86%

1, 94%

1, 71%

1, 97%

1, 92%

Nymphaeales

1

1

1, 100%

1, 100%

1, 100%

1, 100%

1, 100%

1, 100%

Chloranthales

1

1

1, 59%

1 sampled

Not sampled

1, 99%

1, 100%

1 sampled

Monocots

1

Almost (2 1 species)

Almost (1 1 species)

1, 100%

1, 93%

1, 95%

1, 99%

1, 92%

Magnoliales

1

1

1

1, 100%

1 sampled

1, 93%

1, 100%

1, .90%

Laurales

1

1

1, 68%

1, 100%

1 sampled

1, 97%

1, 100%

1, .90%

Winterales

1

1

1, 94%

1, 99%

1 sampled

1, 94%

1, 100%

1 sampled

Piperales

1

2

1

1, 100%

1, 99%

1, 88%

1, 100%

1

Eudicots

1

Almost (1 3 orders)

1

1, 100%

1, 100%

1, 99%

1, 100%

1, 100%

Rosids

1

Almost (1 1 order)

1 sampled

1, 60%

Not sampled

1

Asterids

1

Almost (1 1 order)

1 sampled

1, 99%

Not sampled

1, .90%

Almost Not sampled (2 1 order) 1, 66%

Not sampled

a Clades that are recovered (1) or not recovered (2) and bootstrap support values29 (unless otherwise indicated) are indicated in each analysis. Names for lineages generally follow Ref. 46, except for Winterales and Chloranthales, which were recognized in more recent analyses19,20. ‘1 sampled’ indicates clades that were not tested for monophyly, because only one species was sampled. b Although both atpB and rbcL sequences were analyzed23, results summarized here are based on atpB alone. c Bootstrap values were inferred for the clade comprising all species other than Amborella in the separate phyA and phyC trees18; all other bootstrap values in this analysis came from the analysis of concatenated phyA and phyC sequences. d Jackknife values were used25 rather than bootstrap values29 to assess support for clades recovered in this study. Abbreviations: Nuc, nuclear; Cp, chloroplast; Mt, mitochondrial.

more-realistic models of sequence evolution. These clarified models will inform and increase the accuracy of subsequent model-based phylogenetic analyses30–32. Also, the molecules used in several of the analyses described above18,21,22 and elsewhere47 can be sampled from additional species and combined with existing data for more detailed analyses of relationships within principal angiosperm clades, such as the monocots, eudicots, rosids and asterids. Other complex phylogenies, in the grasses for example, required as many as eight independent data sets before complete resolution with strong support was achieved48. Computational strategies such as compartmentalization5,49,50, in which large, well-supported clades are replaced in an analysis by an inferred ancestral state for that clade, also appear to be promising. In fact, an expanded analysis of 26S rDNA (Ref. 47) and other sequences20 using such compartmentalization of angiosperm clades

has produced complete resolution among the basal lineages and with stronger support than was achieved previously (M. Zanis et al., pers. commun.). We are optimistic that angiosperm phylogenetics will progress rapidly on its present course, facilitating additional comparative studies and fuller exploitation of knowledge garnered from model research organisms. Acknowledgements

We apologize to those whose work could not be included in this review owing to space limitations. We thank Olaf Bininda-Emonds, Jim Doyle, Sean Graham, Toby Kellogg, Jessie McAbee, Greg Plunkett, Vincent Savolainen, Doug and Pam Soltis, and Michael Zanis for helpful discussions, and the Katherine Esau Postdoctoral Fellowship and the National Science Foundation (IBN-9983354) for financial support. August 2000, Vol. 5, No. 8

335

-PLANT August 2000

17/7/00

9:04 am

Page 336

trends in plant science Reviews References 01 Dahlgren, R. and Bremer, K. (1985) Major clades of angiosperms. Cladistics 1, 349–368 02 Donoghue, M.J. and Doyle, J.A. (1989) Phylogenetic analysis of angiosperms and relationships of Hamamelidae. In Evolution, Systematics, and Fossil History of the Hamamelidae (Crane, P.R. and Blackmore, S., eds), pp. 17–45, Clarendon Press 03 Nixon, K.C. et al. (1994) A reevaluation of seed plant phylogeny. Ann. MO Bot. Gard. 81, 484–533 04 Chase, M.W. et al. (1993) Phylogenetics of seed plants: an analysis of nucleotide sequences from the plastid gene rbcL. Ann. MO Bot. Gard. 80, 526–580 05 Rice, K.A. et al. (1997) Analysing large data sets: rbcL 500 revisited. Syst. Biol. 46, 554–563 06 Soltis, D.E. et al. (1997) Angiosperm phylogeny inferred from 18S ribosomal DNA sequences. Ann. MO Bot. Gard. 84, 1–490 07 Doyle, J.A. et al. (1994) Integration of morphological and ribosomal RNA data on the origin of angiosperms. Ann. MO Bot. Gard. 81, 419–450 08 Nandi, O.I. et al. (1998) A combined cladistic analysis of angiosperms using rbcL and non-molecular data sets. Ann. MO Bot. Gard. 85, 137–212 09 Chase, M.W. and Cox, A.V. (1998) Gene sequences, collaboration and analysis of large data sets. Aust. Syst. Bot. 11, 215–229 10 Felsenstein, J. (1978) Cases in which parsimony or compatibility methods will be positively misleading. Syst. Zool. 27, 401–410 11 Kim, J. (1996) General inconsistency conditions for maximum parsimony: effects of branch lengths and increasing numbers of taxa. Syst. Biol. 45, 363–374 12 Hillis, D.M. (1996) Inferring complex phylogenies. Nature 383, 130–131 13 Graybeal, A. (1998) Is it better to add taxa or characters to a difficult phylogenetic problem? Syst. Biol. 47, 9–17 14 Rannala, B. et al. (1998) Taxon sampling and the accuracy of large phylogenies. Syst. Biol. 47, 702–710 15 Soltis, D.E. et al. (1998) Inferring complex phylogenies using parsimony: an empirical approach using three large DNA datasets for angiosperms. Syst. Biol. 47, 32–42 16 Nixon, K. (1999) The parsimony Ratchet: a rapid means for analysing large data sets. Cladistics 15, 407–414 17 Kellogg, E.A. and Juliano, N.D. (1997) The structure and function of Rubisco and their implications for systematic studies. Am. J. Bot. 84, 413–428 18 Mathews, S. and Donoghue, M.J. (1999) The root of angiosperm phylogeny inferred from duplicate phytochrome genes. Science 286, 947–949 19 Soltis, P.S. et al. (1999) Angiosperm phylogeny inferred from multiple genes as a tool for comparative biology. Nature 402, 402–404 20 Qiu, Y-L. et al. (1999) The earliest angiosperms: evidence from mitochondrial, plastid and nuclear genomes. Nature 402, 404–407 21 Parkinson, C.L. et al. (1999) Multigene analyses identify the three earliest lineages of extant flowering plants. Curr. Biol. 9, 1485–1488 22 Graham, S.W. and Olmstead, R.G. Utility of 17 chloroplast genes for inferring the phylogeny of the basal angiosperms. Am. J. Bot. (in press) 23 Savolainen, V. et al. (2000) Phylogenetics of flowering plants based upon a combined analysis of plastid atpB and rbcL gene sequences. Syst. Biol. 49, 306–362 24 Soltis, D.E. et al. Angiosperm phylogeny inferred from 18S rDNA, rbcL, and atpB sequences. Bot. J. Linn. Soc. (in press) 25 Farris, J.S. et al. (1996) Parsimony jackknifing outperforms neighbor-joining. Cladistics 12, 99–124 26 Goloboff, P. (1998) NONA, Computer Program and Software, Version 2.0, Tucuman, Argentina, P. Goloboff 27 Swofford, D.L. (1998) PAUP*: Phylogenetic Analysis Using Parsimony (and Other Methods), Version 4.0, Sinauer 28 Huelsenbeck, J.P. and Hillis, D.M. (1996) Parametric bootstrapping in molecular phylogenetics: applications and performance. In Molecular Zoology: Advances, Strategies, and Protocols (Ferraris, J.D. and Palumbi, S.R., eds), pp. 19–45, Wiley–Liss 29 Felsenstein, J. (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39, 783–791

336

August 2000, Vol. 5, No. 8

30 Huelsenbeck, J.P. and Rannala, B. (1997) Phylogenetic methods come of age: testing hypotheses in an evolutionary context. Science 276, 227–232 31 Yang, Z. (1997) How often do wrong models produce better phylogenies? Mol. Biol. Evol. 14, 105–108 32 Lewis, P.O. (1998) Alternatives to parsimony for inferring phylogeny using nucleotide sequence data. In Molecular Systematics of Plants (Vol. II) (Soltis, D.E. et al., eds), pp. 132–163, Kluwer 33 Doyle, J.A. and Endress, P.K. Morphological phylogenetic analysis of basal angiosperms: comparison and combination with molecular data. Int. J. Plant Sci. (in press) 34 Maddison, W.P. and Maddison, D.R. (1992) MacClade: Analysis of Phylogeny and Character Evolution, Version 3.07, Sinauer 35 Doyle, J.A. (1998) Phylogeny of vascular plants. Annu. Rev. Ecol. Syst. 29, 567–599 36 Donoghue, M.J. and Doyle, J.A. (2000) Seed plant phylogeny: demise of the anthophyte hypothesis? Curr. Biol. 10, R106–R109 37 Frohlich, M.W. and Parker, D.S. (2000) The mostly male theory of flower evolutionary origins: from genes to fossils. Syst. Bot. 25, 155–170 38 Lawton-Rauh, A.L. et al. (2000) Molecular evolution of flower development. Trends Ecol. Evol. 15, 144–149 39 Kramer, E.M. et al. (1998) Molecular evolution of genes controlling petal and stamen development: duplication and divergence within the APETELA3 and PISTILLATA MADS-box gene lineages. Genetics 149, 765–783 40 Kramer, E.M. and Irish, V.F. (1999) Evolution of genetic mechanisms controlling petal development. Nature 399, 144–148 41 Ambrose, B.A. et al. (2000) Molecular and genetic analyses of the Silky1 gene reveal conservation in floral organ specification between eudicots and monocots. Mol. Cell 5, 569–579 42 Soltis, D.E. and Soltis, P.S. (2000) Contributions of plant molecular systematics to studies of molecular evolution. Plant Mol. Biol. 42, 45–75 43 Villanueva, J.M. et al. (1999) INNER NO OUTER regulates abaxial–adaxial patterning in Arabidopsis ovules. Genes Dev. 13, 3160–3169 44 Cho, Y. et al. (1998) Explosive invasion of plant mitochondria by a group I intron. Proc. Natl. Acad. Sci. U. S. A. 95, 14244–14249 45 Bennetzen, J.L. and Kellogg, E.A. (1997) Do plants have a one-way ticket to genomic obesity? Plant Cell 9, 1509–1514 46 Angiosperm Phylogeny Group (1998) An ordinal classification for families of flowering plants. Ann. MO Bot. Gard. 85, 531–553 47 Kuzoff, R.K. et al. (1998) The phylogenetic potential of entire 26S rDNA sequences in plants. Mol. Biol. Evol. 15, 251–263 48 Grass Phylogeny Working Group A phylogeny of the grass family (Poaceae), as inferred from eight character sets. In Proceedings of the Second International Conference on the Comparative Biology of the Monocots: Grasses – Systematics and Evolution (Vol. 2) (Jacobs, S.W.L. and Everett, J.E., eds), CSIRO (in press) 49 Mishler, B.D. (1994) Cladistic analysis of molecular and morphological data. Am. J. Phys. Anthropol. 94, 143–156 50 Bininda-Emonds, O.R.P. et al. (1998) Supraspecific taxa as terminals in cladistic analysis: implicit assumptions of monophyly and a comparison of methods. Biol. J. Linn. Soc. 64, 101–133 51 Mabberley, D.J. (1997) The Plant Book: A Portable Dictionary of the Vascular Plants (2nd edn), Cambridge University Press 52 Sanderson, M.J. et al. (1998) Phylogenetic supertrees: assembling the trees of life. Trends Ecol. Evol. 13, 105–109

Robert K. Kuzoff* and Charles S. Gasser are at the Section of Molecular and Cellular Biology, University of California, Davis, CA 95616, USA. *Author for correspondence (tel 11 530 752 3111; fax 11 530 752 3085; e-mail [email protected]).