Molecular Plant
•
Volume 2
•
Number 4
•
Pages 738–754
•
July 2009
RESEARCH ARTICLE
Molecular Evolution of VEF-Domain-Containing PcG Genes in Plants Ling-Jing Chen, Zhao-Yan Diao, Chelsea Specht and Z. Renee Sung1 Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720–3102, USA
ABSTRACT Arabidopsis VERNALIZATION2 (VRN2), EMBRYONIC FLOWER2 (EMF2), and FERTILIZATION-INDEPENDENT SEED2 (FIS2) are involved in vernalization-mediated flowering, vegetative development, and seed development, respectively. Together with Arabidopsis VEF-L36, they share a VEF domain that is conserved in plants and animals. To investigate the evolution of VEF-domain-containing genes (VEF genes), we analyzed sequences related to VEF genes across land plants. To date, 24 full-length sequences from 11 angiosperm families and 54 partial sequences from another nine families were identified. The majority of the full-length sequences identified share greatest sequence similarity with and possess the same major domain structure as Arabidopsis EMF2. EMF2-like sequences are not only widespread among angiosperms, but are also found in genomic sequences of gymnosperms, lycophyte, and moss. No FIS2- or VEF-L36-like sequences were recovered from plants other than Arabidopsis, including from rice and poplar for which whole genomes have been sequenced. Phylogenetic analysis of the full-length sequences showed a high degree of amino acid sequence conservation in EMF2 homologs of closely related taxa. VRN2 homologs are recovered as a clade nested within the larger EMF2 clade. FIS2 and VEF-L36 are recovered in the VRN2 clade. VRN2 clade may have evolved from an EMF2 duplication event that occurred in the rosids prior to the divergence of the eurosid I and eurosid II lineages. We propose that dynamic changes in genome evolution contribute to the generation of the family of VEF-domain-containing genes. Phylogenetic analysis of the VEF domain alone showed that VEF sequences continue to evolve following EMF2/VRN2 divergence in accordance with species relationship. Existence of EMF2-like sequences in animals and across land plants suggests that a prototype form of EMF2 was present prior to the divergence of the plant and animal lineages. A proposed sequence of events, based on domain organization and occurrence of intermediate sequences throughout angiosperms, could explain VRN2 evolution from an EMF2-like ancestral sequence, possibly following duplication of the ancestral EMF2. Available data further suggest that VEF-L36 and FIS2 were derived from a VRN2-like ancestral sequence. Thus, the presence of VEF-L36 and FIS2 in a genome may ultimately be dependent upon the presence of a VRN2-like sequence. Key words:
VEF; EMF2; FIS2; VRN2; VEF-L36; Arabidopsis; PcG; phylogeny; evolution.
INTRODUCTION Identifying genes that act in developmental pathways and determining how they or their interactions are modified throughout organismal evolution is a major focus of the field of evolutionary developmental biology. Understanding how genes and gene networks function during the development of the model plant Arabidopsis thaliana provides a starting point for investigating how characterized developmental pathways may have played a role in the evolution of diverse plant body plans (Irish and Benfey, 2004). The Polycomb Group protein (PcG) genes play a major role in epigenetic regulation of gene expression. Originally characterized in Drosophila, they encode a conserved group of chromatin proteins found in animals and plants. Structurally different Drosophila PcG proteins form complexes that main-
tain the repression of target genes. A PcG protein complex, composed of four core proteins (Suppressor of Zeste 12 (Su(z)12), Extra sex combs (Esc), P55, and Enhancer of zeste (E(z)) (Kuzmichev et al., 2002; Muller et al., 2002)), can methylate histone H3 at lysine 27 through the E(z) SET domain, providing a methyl mark for subsequent transcriptional repression and gene silencing (Cao et al., 2002; Czermin et al., 2002;
1 To whom correspondence should be addressed. E-mail zrsung@nature. berkeley.edu, fax (510) 642-4995, tel. (510) 642-6966.
ª The Author 2009. Published by the Molecular Plant Shanghai Editorial Office in association with Oxford University Press on behalf of CSPP and IPPE, SIBS, CAS. doi: 10.1093/mp/ssp032, Advance Access publication 19 June 2009 Received 10 March 2009; accepted 25 April 2009
Chen et al.
Muller et al., 2002). Arabidopsis genes structurally similar to Drosophila PcG genes have been reported and their mutants characterized: CURLY LEAF (CLF) (Goodrich et al., 1997), FERTILIZATION-INDEPENDENT SEED DEVELOPMENT1 (FIS1)/MEDEA (MEA) (Grossniklaus et al., 1998; Luo et al., 1999), SWINGER (SWN) (Chanvivattana et al., 2004), FIS3/ FERTILIZATION-INDEPENDENT ENDOSPERM (FIE) (Ohad et al., 1999), FERTILIZATION-INDEPENDENT SEED2 (FIS2) (Luo et al., 1999), EMBRYONIC FLOWER2 (Yoshida et al., 2001), and VERNALIZATION2 (Gendall et al., 2001), and MULTICOPY SUPPRESSOR OF IRA1 (MSI1) (Hennig et al., 2003). Evidence indicates that these genes encode proteins that form putative PcG complexes involved in maintaining the silencing of Arabidopsis MADS-box genes (Chanvivattana et al., 2004; Sung and Amasino, 2004; Wood et al., 2006). Some PcG genes can be grouped into families based on sequence homology, such as CLF, MEDEA (MEA), and SWN (Chanvivattana et al., 2004) and EMF2,VRN2, and FIS2 (Yoshida et al., 2001). It is possible that these gene families are the result of gene duplication and subsequent diversification from ancestral sequences that were present prior to the divergence of the lineages, ultimately leading to plants and animals. Duplication and diversification of nucleotide sequences have been shown to lead to functional innovation across the tree of life (Kim et al., 2004). EMF2 is a core component of the putative PcG complex that represses flowering (Chanvivattana et al., 2004). Loss-of-function mutation in the EMF2 gene leads to elimination of vegetative growth in Arabidopsis (Yang et al., 1995), resulting in early flowering. EMF2 thus may have played a major role in plant survival and the evolution of phenological variability. Protein interactions between EMF2 and three other proteins, CLF (Goodrich et al., 1997), FIE (Kinoshita et al., 2001), and MSI1 (Hennig et al., 2003), suggest that they function as a protein complex in mediating floral repression. The putative EMF2/CLF or SWN/FIE/MSI1 complex represses the flower MADS-box genes AGAMOUS (AG), APETALLA 3 (AP3), and PISTILLATA (PI) during vegetative development (Moon et al., 2003; Calonje et al., 2008). CLF also represses flowering time genes, such as FLOWERING LOCUS T (FT), AG-LIKE 19 (AGL19) (Schonrock et al., 2006; Jiang et al., 2008). FIS2 is a core component of the putative PcG complex FIS2/MEDEA (MEA)/FIE/MSI1 that regulates Arabidopsis seed development via repression of PHERES1 during gametophyte and endosperm development (Kohler et al., 2003). VRN2 is a core component of another putative PcG complex VRN2/CLF or SWN/FIE/MSI1 that induces flowering in response to vernalization via the regulation of the FLOWERING LOCUS C (FLC) (Sung and Amasino, 2004; Wood et al., 2006). It appears that the two groups of plant PcG genes, CLF-MEA-SWN and EMF2-VRN2-FIS2, have co-evolved to form multi-protein complexes that target different gene regulatory networks (Calonje and Sung, 2006). The molecular similarity of the VEF genes suggests that they are related and may be the result of an historic gene duplication event followed by diversification. To understand how the Arabidopsis VEF gene family evolved, we investi-
d
Molecular Evolution of VEF-Domain-Containing PcG Genes
|
739
gated homologs of this gene family in Arabidopsis and other land plants. In this paper, we identified 85 partial and fulllength sequences from land plants with a taxonomic focus on flowering plants. Our results suggest that EMF2 is the most plesiomorphic form of the gene and may have acted as a prototype in the generation of the VEF gene family. Intragenic sequence duplication, deletion/insertion, and intergenic exon shuffling could account for the structural and functional diversification of the VEF genes from an EMF2-like ancestor. We propose that VRN2 evolved from an EMF2-like ancestor, and that VEF-L36 and FIS2 were derived from a VRN2-like ancestral sequence in Arabidopsis and possibly in other angiosperms.
RESULTS Domain Organization in Arabidopsis VEF Family Proteins Using a deduced EMF2 amino acid sequence to BLAST against GenBank, four full-length Arabidopsis proteins, EMF2 (At5g51230), FIS2 (At2g35670), VRN2 (At4g16845), and VEFL36 (At4g16810), were recovered with significant e-values (,2e–12). In addition to the common VEF domain that defines this gene family (Figure 1), EMF2, VRN2, and FIS2 share a C2H2 domain. EMF2 and VRN2 further share an N-terminal domain (N-ter) that is present in the Drosophila homolog, Su(z)12, but is absent in FIS2 and VEF-L36. However, VRN2 differs from EMF2 in lacking sequence corresponding to EMF2 exon 5
Figure 1. Domain Organization of VEF-Domain-Containing Proteins of Arabidopsis. Blue block: EMF2 N-terminal domain (N-ter), which is composed of two parts: an N-terminal cap (cap) and the remaining part (N-ter Dcap) as seen in VRN2. Orange block: EMF2-specific E5–10 domain. Green block: C2H2 zinc finger domain. Red block: VEF domain, which is uniquely located at the N-terminus of VEF-L36. Pink block: EMF2/VRN2-specific E15–17 domain. Light-blue block: VEF-L36-specific repeat domain. Dark-green block: VEF-L36-specific L36 domain. Yellow block: FIS2-specific S-rich domain. Purple block: FIS2 C-terminal tail.
740
|
Chen et al.
d
Molecular Evolution of VEF-Domain-Containing PcG Genes
through exon 10 (E5–10), as well as a stretch of sequence at the N-terminal called the N-terminal cap (N-ter cap). VRN2 also has a 52-aa repeat in the C-terminus that is absent in EMF2. Despite these differences, globally, VRN2 and EMF2 share similar domain organization and 45% amino acid sequence identity. First reported as EMF2-like 1 by Yoshida et al. (2001), VEF-L36 is a hypothetical protein, based on its predicted gene structure from TAIR (TAIR: www.Arabidopsis.org/servlets/TairObject?id= 128616&type=locus). It shares only the VEF domain with the other VEF proteins (Figure 1). Unlike EMF2, VRN2, and FIS2, its VEF domain is located at the N-terminus and its C-terminus comprises a sequence with low similarity to ribosomal protein L36. There is also a stretch of repeat sequence in the middle region that is not found in any of the other VEF genes.
Widespread of EMF2/VRN2 Homologs among Land Plants To investigate the distribution of homologs of VEF genes in plants, we used VEF-containing proteins to perform BLAST searches against the databases listed above (see Methods). Using the Arabidopsis EMF2 amino acid sequence to BLAST against GenBank, 10 full-length homologs were returned, eight from grasses (Poaceae), one from Carica (Caricaceae), and one from Silene (Caryophyllaceae) (Table 1). The grass homologs included one from wheat (Triticum aestivum), three from barley (Hordeum vulgaris), two from maize (Zea mays), and two from rice (Oryza sativa). The Silene homolog is from Silene latifolia of Caryophyllaceae, a member of the core eudicots. The Chromatin Database (www.chromdb.org/) identifies three full-length sequences from poplar (Populus trichocarpa: VEF901, 902, and 904) and one partial sequence (VEF903). The full-length sequences are heretofore referred to as PtEMF2_1 for VEF901, PtEMF2_2 for VEF902, and PtEMF2_4 for VEF904 (see Table 1A). We also sequenced six full-length cDNAs from species in five different angiosperm families representing early-diverging monocots (Acorus; Acorales), higher monocots (Asparagus, Yucca; Asparagales), basal eudicots (Eschscholzia; Papaveraceae), and the asterids (Solanum; Solanaceae) (see Methods). The Kazusa DNA Research Institute provided one full-length sequence from Lotus japonicus (Fabaceae). Using deduced amino acid sequences of these cDNAs to BLAST against GenBank, the same homologs were returned as when using the Arabidopsis EMF2 sequence. Using full-length VRN2, VEFL36, and FIS2 to BLAST against GenBank, we found mostly the same sequences as described above, likely due to sequence homology in the VEF domain. Pair-wise identity scores of these full-length sequences indicate that non-Arabidopsis sequences display higher identity to Arabidopsis EMF2 and VRN2 than to FIS2 and VEF-L36 (Table 2). Among these homologs, VEF-L36 shows lowest pair-wise identity to other members (average score: 8), followed by FIS2 (average score: 17). Both show higher identity to VRN2 than to other EMF2/VRN2 homologs.
Sequence alignment of the 24 full-length proteins was performed using MUSCLE (www.ebi.ac.uk). All non-Arabidopsis full-length sequences possess the N-terminal (N-ter), C2H2, and VEF domains homologous to that of EMF2/VRN2 sequences (Figure 2), indicating a high conservation of domain organization. These sequences are not likely to be orthologs of FIS2 or VEF-L36 due to both the presence of the EMF2/VRN2characteristic N-ter domain and the absence of either the S-rich domain found in FIS2 or the L36 domain characteristic of VEFL36 (Figure 1). Sixteen full-length, non-Arabidopsis sequences contain the complete N-ter that included the N-ter cap: ZmEMF2_1, ZmEMF2_2, HvEMF2_4, HvEMF2_5, LjEMF2, OsEMF2_4, AaEMF2, YfEMF2, AoEMF2, LeEMF2_1, SIEMF2, LjEMF2, PtEMF2_1, PtEMF2_2, TaEMF2_3, CpEMF2. Five sequences, EcEMF2_2, OsEMF2_9, HvEMF2_1, ZmEMF2-2, and PtEMF2_4, lack the N-ter cap. One sequence from barley, HvEMF2_1, lacks both N-ter cap and the VEF domain. Together with Arabidopsis EMF2 and VRN2, these full-length EMF2/ VRN2 sequences represent 14 species from 11 angiosperm families (Acoraceae, Asparagaceae, Agavaceae, Poaceae, Caryophyllaceae, Fabaceae, Brassicaceae, Solanaceae, Salicaceae, Caricaceae, and Papaveraceae). No discernable FIS2 or VEFL36 orthologs were recovered from rice or poplar, despite the availability of full genomic sequences. In addition to the full-length sequences, we found ;140 incomplete sequences showing significant homology to three EMF2 domains in various genomic databases (see Methods). After the elimination of identical sequences, 54 new sequences homologous to one or more EMF2 domains were identified (Table 1B): (1) 9 ESTs possess N-terminal domain sequences, (2) 16 possess C2H2 domain sequences, and (3) 36 possess VEF domain sequences, from nine additional angiosperm families (Malvaceae, Vitaceae, Liliaceae, Vitaceae, Nymphaeaceae, Ranuculaceae, Asteraceae, Bromeliaceae, and Euphorbiaceae) (Table1 andSupplementalFigure1). Altogether, 78 sequences— 24 full-length and 54 partial sequences—were identified from 20 angiosperm families. Outside of the angiosperms, we identified two gymnosperm ESTs sharing homology with the EMF2 C-terminal domain from Pinaceae (Supplemental Figure 2B and 2C), one each in Pinus taeda (pine) and Picea engelmanii (spruce), and two individual ESTs from the lycophyte species Selaginella mollendorffii (Table 1C). One Selaginella partial sequence (SdEMF2p_1) contained both N-ter and C2H2 domains, showing a 44–39% identity to the respective domains of EMF2. The other Selaginella sequence (SdEMF2p) contained only the VEF domain, showing a 58% identity to EMF2’s VEF in a 145-aa region of overlap (Table 1C and Supplemental Figure 2A). The Chromatin Database yielded three full-length sequences homologous to EMF2 from Physcomitrella patens (Bryophyta; moss), PpEMF2_1, _2, _3 (Table 1C). Despite low sequence similarity to Arabidopsis EMF2 (;25%), the moss sequences possess N-ter, C2H2, and VEF domains. These findings that EMF2/VRN2 homologs exist in lycophytes and mosses and have similar domain structure to modern
Chen et al.
d
Molecular Evolution of VEF-Domain-Containing PcG Genes
|
741
Table 1. Full-Length and Partial Sequences of VEF Gene Homologs. (A) Full-length VEF gene homologs from Angiosperm. Name
Family
Plant
Accession #
AaEMF2
Acoraceae
Acorus americanus, sweet flag
GenBank: ABI99480
AoEMF2
Asparagaceae
Asparagus officinalis, sparagus
GenBank: ABD85301
AtEMF2
Brassicaceae
Arabidopsis thaliana
TAIR: AT5G51230
CpEMF2
Caricaceae
Carica papaya
CoGe: Chr Supercontig_13 21118352–2159309
EcEMF2_1
Papaveraceae
Eschscholzia californica, California poppy
GenBank: ABD98790
EcEMF2_2
Papaveraceae
Eschscholzia californica, California poppy
GenBank: ABD98791
FIS2_692
Brassicaceae
Arabidopsis thaliana
TAIR: AT2G35670
HvEMF2_1
Poaceae
Hordeum vulgare, barley
GenBank: BAD99132
HvEMF2_4
Poaceae
Hordeum vulgare, barley
GenBank: BAD99131
HvEMF2_5
Poaceae
Hordeum vulgare, barley
GenBank: BAD99131
LeEMF2_1
Solanaceae
Lycopersicon esculentum
GenBank: ABI99480
LjEMF2
Fabaceae
Lotus japonicus
Legume database
OsEMF2_4
Poaceae
Oryza sativa, rice
TIGR: LOC_Os04g08034
OsEMF2_9
Poaceae
Oryza sativa, rice
TIGR: LOC_Os09g13630
PtEMF2_1
Salicaceae
Populus trichocarpa, cottonwood
ChromDB: VEF901
PtEMF2_2
Salicaceae
Populus trichocarpa, cottonwood
ChromDB: VEF902
PtEMF2_4
Salicaceae
Populus trichocarpa, cottonwood
ChromDB: VEF904
SlEMF2
Caryophyllaceae
Silene latifolia, white campion
GenBank: BAD93353
TaEMF2_3
Poaceae
Triticum aestivum, wheat
GenBank: AAX78232
VRN2_445
Brassicaceae
Arabidopsis thaliana
TAIR: AT4G16845
VEF_L36
Brassicaceae
Arabidopsis thaliana
TAIR: AT4G16810
YfEMF2
Yuccaceae
Yucca filamentosa, Yucca
GenBank: ABD85300
ZmEMF2_1
Poaceae
Zea mays, maize
ChromDB: VEF101
ZmEMF2_2
Poaceae
Zea mays, maize
ChromDB: VEF102
(B) EMF2/VRN2-related ESTs from Angiosperm. N-terminal (nine ESTs) Name Family
Plant
Accession #
CaEMF2p
Solanaceae
Capsicum annuum, pepper
GenBank:CA847455
GmEMF2p_3
Fabaceae
Glycine max, soybean
TIGR: TC221104
GmEMF2p_4
Fabaceae
Glycine max, soybean
TIGR:TC211671
GrEMF2p
Malvaceae
Gossypium barbadense, cotton
TIGR:TC40052
LsEMF2p_1
Asteraceae
Lactuca saligna, lettuce
TIGR:TA10917_4236
MtEMF2p
Fabaceae
Medicago truncatula
TIGR: TC108897
SbEMF2p_2
Poaceae
Sorghum bicolor, sorghum
TIGR: TA29013_4558
VvEMF2p_3
Vitaceae
Vitis vinifera, grape
GenBank: CF609577
ZmEMF2p_3
Poaceae
Zea mays, maize
TIGR:CD436196
C2H2 zinc finger (16 ESTs) Name Family
Plant
Accession #
CcEMF2p_1
Rubiaceae
Coffea canephora
TIGR: TA7702_49390
CsEMF2p_1
Asteraceae
Centaurea solstitialis
TIGR: TA4722_347529
CtEMF2p
Asteraceae
Carthamus tinctorius
TIGR: TA2823_4222
EeEMF2p
Euphorbiaceae
Euphorbia esula
TIGR: TA17942_3993
GmEMF2p_3
Fabaceae
Glycine max, soybean
TIGR: TC221104
GtrEMF2p
Asteraceae
Gerbera hybrid cv. Terra Regina
GenBank: AJ759904
742
|
Chen et al.
d
Molecular Evolution of VEF-Domain-Containing PcG Genes
Table 1. Continued C2H2 zinc finger (16 ESTs) Name Family
Plant
Accession #
LeEMF2p_2
Solanaceae
Lycopersicon esculentum, tomato
GenBank: AW038171
SbEMF2p_2
Poaceae
Sorghum bicolor, sorghum
TIGR: TA29013_4558
ScEMF2p
Poaceae
Secale cereale, cereal rye
GenBank: BE587348
SoEMF2p_2
Poaceae
Saccharum officinarum, sugarcane
TIGR: TA38345_4547
SoEMF2p_3
Poaceae
Saccharum officinarum, sugarcane
TIGR: TC71329
SoEMF2p_1
Poaceae
Saccharum officinarum, sugarcane
GenBank: CA098901
TaEMF2p_2
Poaceae
Triticum aestivum, wheat
GenBank: BJ211655
ToEMF2p_1
Asteraceae
Taraxacum officinale
TIGR: TA5836_50225
VvEMF2p_1
Vitaceae
Vitis vinifera, grape
GenBank: CN006883
ZmEMF2p_4
Poaceae
Zea mays, maize
TIGR: TA193846_4577
VEF domain (36 ESTs) Name
Family
Plant
Accession #
AcEMF2p
Liliaceae
Allium cepa
GenBank: CF443745
AfEMF2p
Ranunculaceae
Aquilegia formosa
TIGR: TA14166_338618
AnanasEMF2p
Bromeliaceae
Ananas comosus
GenBank: DT339533
BnEMF2p_1
Brassicaceae
Brassica napus
GenBank: CX194398
BnEMF2p_2
Brassicaceae
Brassica napus
GenBank: CX188412
CcEMF2p
Rubiaceae
Coffea canephora
TIGR: TA7701_49390
CiEMF2p_1
Asteraceae
Cichorium intybus
GenBank: EH708467
CiEMF2p_2
Asteraceae
Cichorium intybus
TIGR: TA5136_13427
CsEMF2p
Asteraceae
Centaurea solstitialis
GenBank: EH782846 TIGR: TA17942_3993
EeEMF2p
Euphorbiaceae
Euphorbia esula
GhEMF2p_1
Malvaceae
Gossypium hirsutum, cotton
GenBank: DW229901
GhEMF2p_2
Malvaceae
Gossypium hirsutum, cotton
TIGR: TA37052_3635
GhEMF2p_3
Malvaceae
Gossypium hirsutum, cotton
TIGR: TA35411_3635
GmEMF2p_1
Fabaceae
Glycine max, soybean
TIGR: TA61896_3847
HaEMF2p_1
Asteraceae
Helianthus annuus, sunflower
GenBank: CD848472
HpEMF2p_1
Asteraceae
Helianthus paradoxus, sunflower
GenBank: EL487885
LeEMF2p_3
Solanaceae
Lycopersicon esculentum, tomato
GenBank: BI932726 TIGR: TA3490_75948
LsEMF2p
Asteraceae
Lactuca saligna, lettuce
LvEMF2p
Asteraceae
Lactuca virosa, wild lettuce
GenBank: DW160707
MeEMF2p
Euphorbiaceae
Manihot esculenta, cassava
GenBank: DV449784
MtEMF2p
Fabaceae
Medicago truncatula
TIGR: TC108897 FGP: nad03-13ms2-e08
NaEMF2p
Nymphaeaceae
Nuphar advenar, yellow pondlily
NtEMF2p
Solanaceae
Nicotiana tabacum, tobacco
GenBank: EB678277
PsEMF2p
Fabaceae
Pisum sativum, pea
GenBank: AAX47184
SbEMF2p_1
Poaceae
Sorghum bicolor, sorghum
TIGR: TA34517_4558
SbEMF2p_3
Poaceae
Sorghum bicolor, sorghum
TIGR: TA35158_4558
LeEMF2p
Solanaceae
Solanum lycopersicum, tomato
GenBank: AW038171
SoEMF2p_1
Poaceae
Saccharum officinarum, sugarcane
GenBank: CA098901
SoEMF2p_2
Poaceae
Saccharum officinarum, sugarcane
TIGR: TA38345_4547
SoEMF2p_4
Poaceae
Saccharum officinarum, sugarcane
GenBank: CA098901
StEMF2p_2
Solanaceae
Solanum tuberosum
TIGR: TA35890_4113 GenBank: BQ505017
StEMF2p_3
Solanaceae
Solanum tuberosum
TaEMF2p_1
Poaceae
Triticum aestivum, wheat
TIGR: TA70383_4565
VvEMF2p_2
Vitaceae
Vitis vinifera, grape
TIGR: TA47215_29760
Chen et al.
d
Molecular Evolution of VEF-Domain-Containing PcG Genes
|
743
Table 1. Continued VEF domain (36 ESTs) Name
Family
Plant
Accession #
VvEMF2p_4
Vitaceae
Vitis vinifera, grape
GenBank: AM447481
ZmEMF2p_4
Poaceae
Zea mays, maize
TIGR: TA193846_4577
(C) EMF2/VRN2 homologs from Gymnosperm, Spikemoss, and moss. Gymnosperm Name Family Plant
Accession #
PeEMF2p
Pinaceae
Picea engelmannii, spruce
TIGR: TA1969_373101
PlEMF2p
Pinaceae
Pinus taeda, pine
GenBank: CO368996
Lycophyte Name
Family
Plant
Accession #
SdEMF2p
Selaginellaceae
Selaginella moellendorffii, Spikemoss
gnl|050718cr339|1588846_1
SdEMF2p_1
Selaginellaceae
Selaginella moellendorffii, Spikemoss
gnl|050718cr339|1588846_2
Moss Name
Family
Plant
Accession #
PpEMF2_1
Funariaceae
Physcomitrella patens, moss
ChromDB: VEF1501
PpEMF2_2
Funariaceae
Physcomitrella patens, moss
ChromDB: VEF1502
PpEMF2_3
Funariaceae
Physcomitrella patens, moss
ChromDB: VEF1503
Note: ‘p’ in the sequence name stands for partial sequence. The letters in the sequence name stand for the following plants: Aa: Acorus americanus, Ac: Allium cepa, Af: Aquilegia formosa, Ao: Asparagus officinalis, At: Arabidopsis thaliana, Bn: Brassica napus, Ca: Capsicum annuum, Cc: Coffea canephora, Ci: Cichorium intybus, Cp: Carica papaya, Cs: Centaurea solstitialis, Ct: Carthamus tinctorius, Ec: Eschscholzia californica, Ee: Euphorbia esula, Gh: Gossypium hirsutum, Gm: Glycine max, Gr: Gossypium barbadense, Gtr: Gerbera, Ha: Helianthus annuus, Hp: Helianthus paradoxus, Hv: Hordeum vulgare, Le: Lycopersicon esculentum, Lj: Lotus japonicus, Ls: Lactuca saligna, Lv: Lactuca virosa, Me: Manihot esculenta, Mt: Medicago truncatula, Na: Nuphar, Nt: Nicotiana tabacum, Os: Oryza sativa, Pe: Picea engelmannii, Pl: Pinus taeda, Pp: Physcomitrella patens, Ps: Pisum sativum, Pt: Populus trichocarpa, Sb: Sorghum bicolor, Sc: Secale cereale, Sd: Spikemoss, Sl: Silene latifolia, So: Saccharum officinarum, St: Solanum tuberosum, Ta: Triticum aestivum, To: Taraxacum officinale, Vv: Vitis vinifera, Yf: Yucca filamentosa, Zm: Zea mays.
angiosperm EMF2 indicate that EMF2 was likely to have been present in the genomes of early land plants (Supplemental Figure 2D).
Sequence Comparison of EMF2/VRN2 Class Homologs Predicted full-length and partial EMF2/VRN2 protein homologs were aligned using MUSCLE (see Methods).
N-terminal (N-ter) Domain An N-terminal domain for Arabidopsis EMF2 was defined by Yoshida et al. (2001) as a fragment starting from amino acid 47 to 81 (Figure 2A). The domain is also found in VRN2 and in the Drosophila Su(z)12 protein. Our alignment of the fulllength sequences from all identified EMF2/VRN2 class homologs shows that a larger area is conserved across land plants, starting from the first amino acid of EMF2 to the end of exon 4 (aa 81), and is heretofore referred to as the N-ter domain (Figure 2A). Relative to EMF2, VRN2 has an abbreviated Nter domain, starting translation from a methionine (M) corresponding to the second M of EMF2. The sequence between the two methionines of EMF2 is referred to as the N-ter cap. EMF2/VRN2 homologs of monocots Acorus, Yucca, Aspara-
gus, and the basal eudicot Eschscholzia all contain the Nter cap (Figure 2A), suggesting that the angiosperm ancestral sequence may have had both M sites, similar to Arabidopsis EMF2. Indeed, Selaginella SdEMF2p_1 and the Physcomitrella sequences, PpEMF2_3 (VEF1503), have an N-ter cap (Supplemental Figure 2D), although the sequences and lengths are divergent from that found in the identified angiosperm sequences. Some N-ter cap’s second M is replaced with a different aa; for example, S1EMF2’s second M is replaced by an S (Figure 2A). In species with two or more EMF2 class homologous sequences found so far, at least one sequence has such a cap, such as rice (OsEMF2_4 vs. OsEMF2_9), maize (ZmEMF2_1 vs. ZmEMF2_2), poplar (PtEMF2_1 and PtEMF2_2 vs. PtEMF2_4), and California poppy (EcEMF2_1 vs. EcEMF2_2) (Figure 2A and Supplemental Figure 1A). The early land plants also possess at least one sequence with the N-ter cap (Supplemental Figure 2A and 2D).
E5–10 Domain VRN2 is missing most of the amino acid sequence corresponding to EMF2 exon 5 through exon 10 (E5–10). Comparison of
744
|
Chen et al.
Molecular Evolution of VEF-Domain-Containing PcG Genes
d
Table 2. Pair-Wise Alignment Scores of Full-Length VEF Protein Homologs. Sequences name1
1
1.
AaEMF2
-
2.
AoEMF2
61
3.
AtEMF2
54
55
-
4.
CpEMF2
57
58
68
-
5.
EcEMF2_1
53
55
53
54
-
6.
EcEMF2_2
52
52
51
51
57
7.
FIS2_692
18
19
16
18
19
18
-
8.
HvEMF2_1
42
49
40
43
40
43
12
-
9.
HvEMF2_4
52
60
49
52
50
49
17
78
-
10.
HvEMF2_5
42
46
41
40
37
39
16
54
58
11.
LeEMF2_1
58
59
58
62
53
52
18
41
50
42
-
12.
LjEMF2
40
40
42
46
40
38
17
32
37
34
43
-
13.
OsEMF2_4
45
50
42
45
45
43
17
53
61
54
43
32
-
14.
OsEMF2_9
53
61
50
52
52
52
18
71
82
61
51
37
62
-
15.
PpEMF2_1
25
25
27
26
25
25
13
17
27
23
27
21
22
24
-
16.
PpEMF2_2
24
23
26
23
23
20
10
15
21
20
24
15
22
23
69
-
17.
PpEMF2_3
26
28
29
28
29
26
15
20
29
22
28
22
28
29
55
53
-
18.
PtEMF2_1
57
56
63
71
53
51
17
42
49
39
59
41
44
51
23
25
28
-
19.
PtEMF2_2
56
58
63
70
53
50
19
40
50
42
60
42
42
51
27
21
32
84
-
20.
PtEMF2_4
61
60
53
54
52
58
30
25
49
43
57
44
43
50
27
31
30
56
55
21.
SlEMF2
53
53
57
62
50
47
18
35
45
39
58
44
39
47
23
19
27
57
58
49
-
22.
TaEMF2_3
51
58
49
51
51
48
18
80
93
57
49
35
60
81
27
23
29
48
50
48
45
23.
VEF_L36
8
8
8
8
8
7
7
5
8
7
8
6
8
8
8
9
7
8
8
12
8
8
24.
VRN2_445
46
47
45
45
43
46
31
20
44
34
48
36
37
44
29
29
31
45
44
51
42
44
14
25.
YfEMF2
61
82
56
58
57
52
18
50
58
47
59
40
49
60
25
24
28
57
59
60
53
57
9
47
-
26.
ZmEMF2_1
51
57
48
51
48
47
16
68
75
58
50
36
60
77
23
23
27
47
48
48
46
78
8
43
55
-
27.
ZmEMF2_2
55
59
51
51
51
52
17
71
80
60
53
39
62
81
24
22
29
51
51
48
49
80
8
44
59
80
-
46
49
46
48
44
43
17
42
51
41
47
35
43
51
26
25
28
47
47
46
43
51
8
40
49
49
51
Average score
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-
-
-
-
Note: 1. The number listed in the top line represents sequence with same number that is listed in the first column. Calculation of pair-wise alignment scores was described in Methods. Average scores were calculated as the sum of the individual score in one category divided by 26. Among these homologs, VEF-L36 showed lowest identity to other members (average score: 8), followed by FIS2 (average score: 17). On the other hand, both showed higher identity to VRN2 than to other EMF2/VRN2 homologs (pair-wise alignment score between VEFL36 and VRN2: 14, pair-wise alignment score between FIS2 and VRN2: 31). The average pair-wise alignment score of other EMF2/VRN2 members was ;44, calculated as the sum of the average scores (excluding 8 and 17) divided by 25.
EMF2 and VRN2 genomic DNA revealed no conserved corresponding sequence in VRN2 in this region, excluding the possibility of alternative mRNA splicing as the cause of the difference. One full-length sequence, PtEMF2_4 from Populus trichocarpa (poplar) (Figure 2 and Supplemental Figure 1B), and three partial sequences, MtEMF2p from Medicago truncatula, GmEMF2p_3 from Glycine max, and CaEMF2p from Capsicum annuum, lack both the E5–10 domain and the N-ter cap, like VRN2. Within the E5–10 domain, amino acids encoded by EMF2 exon 5 (Figure 2A), 6, and 8 were highly conserved among the non-Arabidopsis EMF2 homologs, suggesting potential conserved function of the E5–10 region. Alignment analysis suggested the presence of E5–10 in all three Physcomitrella sequences, though with divergent sequences (Supplemental Figure 2D).
C2H2 Zinc Finger Domain Unlike most Arabidopsis C2H2-domain-containing genes that have multiple C2H2 motifs in tandem (Englbrecht et al., 2004), the three VEF proteins, VRN2, FIS2, and EMF2, contain a single C2H2 motif that is encoded by exon12 and 13 in EMF2. Previous studies found a conserved 37-aa C2H2 domain sequence in EMF2 homologs from Drosophila, human, and Arabidopsis (Yoshida et al., 2001). Alignment of the full-length sequences shows a conserved region extending from EMF2 exon 11 through 14 in the EMF2 homologs, covering a range of ;77 aa. In the EMF2/VRN2 class, VRN2’s C2H2 is most divergent from that of other members (Figure 2B). Selaginella’s SdEMF2p_1 has the C2H2 region with 39% identity to that of EMF2 (Supplemental Figure 2A). Physcomitrella’s PpEMF2_3 has a region corresponding to that of EMF2’s C2H2; its two
Chen et al.
d
Molecular Evolution of VEF-Domain-Containing PcG Genes
Figure 2. Alignment of Three Domains of Predicted Full-Length Plant VEF Proteins.
|
745
746
|
Chen et al.
d
Molecular Evolution of VEF-Domain-Containing PcG Genes
Cs line up with those in EMF2, but the two Hs are absent (Supplemental Figure 2D).
ilar to EMF2 in that it possesses the N-ter cap, E5–10, C2H2-like, and VEF regions.
E15–17 Domain
Phylogenetic Analysis of Full-Length and VEF Sequences
E15–17 is a region encoded by EMF2 exon 15 to 17, connecting the C2H2 and VEF domains of EMF2, VRN2, and FIS2. Alignment of the EMF2 homologs shows that this region has the highest variability of all EMF2 domains in both amino acid sequence composition and in total length, suggesting intensive diversification including multiple insertion and/or deletion events during the evolution of this region (Supplemental Figure 1D). All three Physcomitrella sequences, including PpEMF2_3, appear to possess this region.
Phylogenetic analysis of the full-length sequences using maximum likelihood and Bayesian methods recovered various lineages reflecting organismal evolution (Figures 3 and 4). Using human and Drosophila sequences as outgroups, phylogenetic analyses of full-length sequences (Figure 3) and VEF domain alone (Figure 4) both recovered a monophyletic angiosperm lineage with monophyletic monocot and eudicot clades. Within the monocots, the grasses (Poales) were also recovered as monophyletic in both full-length and VEF-based gene trees. For VEF domain analyses containing greater sampling of land plant diversity, gymnosperms were found to be monophyletic and sister to angiosperms, Selaginella sister to an angiosperm plus gymnosperm clade, and Physcomitrella sequences sister to remaining land plants. As with full-length sequences, monocots are recovered as monophyletic; however, Eschscholzia, unresolved in the full-length analysis, groups with Aquilegia VEF domain (Figure 4), forming a basal eudicot clade sister to monocots. This clade is unresolved with respect to monocots and core eudicots. Within monophyletic core eudicots, the asterids and rosids are roughly falling out as separate clades, with a few exceptions (e.g. Silene within rosid clade, two sequences of Gossypium recovered as sister to the rosid plus asterid sister group, Lotus japonicus within an otherwise monophyletic asterid clade, and one Helianthus sequence falling within the rosids rather than the asterids). In addition, several sequences from core eudicot species are resolved in a clade containing VRN2, FIS2, and VEF-L36 (Figure 4). This clade is distant from AtEMF2, indicating a different evolutionary history for the VEF domain of VRN2, FIS2, and VEF-L36. In the full-length analyses, PtEMF2_4 or VEF904, a proposed VRN2 ortholog from Populus, is strongly supported within a VRN2 clade reflecting potential homology (or full-length sequence conversion) of the Populus sequence with VRN2. In the VEF domain analyses, this Populus sequence groups with other Populus sequences rather than with the VRN2 clade, indicating that the VEF domain itself is not converging on a VRN2-like VEF domain, despite full sequence and domain-level similarity. Another potential VRN2 ortholog, Medicago truncatula’s MtEMF2p, lacking the E5–10 domain and the N-ter cap, is grouped in the VRN2 clade. It remains
VEF (C-terminal) Domain Alignment of C-terminal sequences of EMF2, VRN2, FIS2, Su(z)12, and the human KIAA0160 led Yoshida et al. (2001) to define an acidic-W/M domain, ;130 aa from exons 18–22 in Arabidopsis EMF2, which is characterized by an acidic cluster and a sequence rich in tryptophan and methionine. A smaller region was later called the VEF domain derived from the initials of VRN2, EMF2, and FIS2 (Birve et al., 2001), which did not include sequences in exon 18, but extended beyond that of the acidic-W/M domain (Figure 2). In this paper, we adopt a broader sense of the VEF domain, encompassing both the acidic-W/M, defined by Yoshida et al. (2001), and the VEF, by Birve et al. (2001), domains (Supplemental Figure 1E–1G). VRN2hasan additional52-aaC’oftheVEFdomain(Supplemental Figure 1G and Figures 1 and 2C) that is not shared with other EMF2-class proteins, including VRN2-like sequences, full-length or partial from plants other than Arabidopsis. Analysis using RADAR (www.ebi.ac.uk/Radar/) suggests that this 52-aa region is a duplication of a stretch of amino acids found within the VEF domain (Supplemental Figure 1G). Selaginella SdEMF2p corresponds to the VEF domain (Supplemental Figure 2A). All three Physcomitrella sequences and the two partial gymnosperm sequences possess the VEF domain (Supplemental Figure 2B–2D). None of the VEF domains found in Physcomitrella, Selaginella, pine, or spruce possesses the VRN2-characteristic repeat sequence in their Ctermini, indicating that this repeat likely evolved in angiosperms after the divergence of the gymnosperm lineage. Among the three moss sequences, PpEMF2_3 is the most sim-
The T-COFFEE (Version 4.85) program was used for the sequence alignment. Vertical lines on top of the sequence mark the boundaries of EMF2 exons, and the arrows and numbers prefixed with an E on top of the sequence indicate EMF2 exons. (A) N-ter domain. Light-blue bar on top of the sequence marks the N-ter domain. Colorless horizontal bar marks the N-ter cap. Dark-blue bar marks the N-terminal domain defined by Yoshida (2001). (B) C2H2 domain. Green bar on top of the sequence marks C2H2 domain defined by Yoshida (2001). Numbers –1, +3, and +6 denote the position relative to the start site of the a-helix of the C2H2 domain. (C) VEF domain. Red and yellow horizontal bars on top of the sequence mark the C-terminal domain defined by Yoshida et al. (2001) and the VEF domain defined by Birve et al. (2001), respectively. Because VEF-L36 only shares VEF with other homologs, its middle and C-terminal sequences were cut off.
Chen et al.
d
Molecular Evolution of VEF-Domain-Containing PcG Genes
|
747
Figure 3. Phylogenetic Analysis of Full-Length VEF Protein Homologs. Phylogeny of EMF2/VRN2 using Bayesian inference; average branch lengths are shown. Measures of support are given at the nodes; Bayesian posterior probability (PP)/maximum likelihood bootstrap support (BS). Support values less than 50 are shown as hyphen "-" and support values of 100 are shown as "+".
to be tested whether other homologs with VRN2-like domain structure will have their VEF sequence converge with AtVRN2 VEF amino acid sequence.
Sequence Relationship between VEF-L36 and EMF2/VRN2 VEF-L36 cDNA was deduced from Arabidopsis genomic sequence (TAIR: www.Arabidopsis.org/servlets/TairObject?id= 128616&type=locus) but has not been assayed for function. The 1872-bp open reading frame encodes a predicted 623-aa protein, with the 125-aa VEF located at the N-terminus and a 113-aa C-terminus with only low sequence similarity L36. The RADAR program detected three types of repeat sequence in the middle region of VEF-L36 (Figure 1 and Supplemental Figure 3A). Except for the VEF domain, VEF-L36 shares no other domains with the other three Arabidopsis VEF proteins. Using its 495-aa sequence without the VEF domain to BLAST search against GenBank, we found three Arabidopsis fragments and one rice homolog, as well as few sequences in other non-plant organisms, such as Drosophila, Dictyostelium, Danio, and Trypanosoma, all lacking the VEF domain (Supplemental Figure 3B). The rice homolog encodes a 410-aa protein with low global homology to the non-VEF part of VEF-L36 (22% identity and 37% similarity, Supplemental Figure 3C). To date, VEF-L36 is the only gene found with both VEF and L36 domains. The VEF domain of VEF-L36 is more closely related to that of VRN2 than to EMF2, as indicated by phylogenetic analyses of both the VEF domain alone and of full-length sequences
(Figures 3 and 4). Among the divergent amino acids between EMF2 and VRN2, VEF-L36 shares nine with VRN2 and only three with EMF2 (Table 3). Moreover, VRN2 (AT4G16845) and VEF-L36 (AT4G16810) are closely linked on Arabidopsis chromosome 4. Among the VEF-domain-containing proteins, the VEF domain in VEF-L36 is the only one located at the N-terminus of a protein. Together, these phenomena suggest that the VEF domain of the VEF-L36 may be transferred from VRN2 on a sister chromatin, through an accidental intronic recombination event during meiosis (Figure 5C). This would imply that only plants with VRN2 may generate L36-VEF. So far, VEF-L36 has only been identified from Arabidopsis.
Sequence Relationship between FIS2 and EMF2/VRN2 FIS2 is similar to EMF2/VRN2 in possessing a single C2H2 and the VEF domain, which is connected by a 459-aa region with 70 serines, called the S-rich domain. In addition to the two types of repeats identified (Luo et al., 1999), RADAR identified a third type of repeat in the S-rich domain (Supplemental Figure 4A). Sequences homologous to the S-rich domain have been found in plants, fungi, bacteria, and animals, but none share the C2H2 or VEF domains with FIS2. Despite the abundance of the S-rich homologous domain in nature, the uniqueness/rareness of the S-rich domain in VEF-domain-containing protein family suggests that FIS2 may represent a unique evolutionary event within the Arabidopsis lineage. The C2H2 domain of FIS2 has greater sequence similarity to VRN2 than EMF2 (Table 3). The VEF domain of FIS2 shows
748
|
Chen et al.
d
Molecular Evolution of VEF-Domain-Containing PcG Genes
Figure 4. Phylogenetic Analysis of VEF Domain Sequences. Phylogeny of VEF domain using maximum likelihood as implemented in RAxML. Measures of support are given at the nodes; Bayesian posterior probability (PP)/maximum likelihood bootstrap support (BS). Support values less than 50 are shown as hyphens (-). Taxonomic groups indicated at right, with exceptions described in text.
Chen et al.
d
Molecular Evolution of VEF-Domain-Containing PcG Genes
|
749
Table 3. Number of Amino Acids Shared between FIS2/VEF-L36 and VRN2 or EMF2*. Identical aa between FIS2 and VRN2
Identical aa between FIS2 and EMF2
Identical aa between VEF-L36 and VRN2
Identical aa between VEF-L36 and EMF2
C2H2 domain
20/131
8/131
na
na
VEF domain
20/116
5/116
9/98
3/98
* Among the divergent amino acids between EMF2 and VRN2, the number of aa shared with EMF2 or VRN2 out of total number of aa in the domain. na, not applicable.
DISCUSSION The VEF domain is found in chromatin proteins required for gene silencing throughout eukaryotic organisms. In addition to the universal VEF domain, the VEF proteins possess other characteristic domains that distinguish them from one another. Based on domain organization, four Arabidopsis VEF proteins were grouped into three classes: EMF2/VRN2, FIS2, and VEF-L36 (Figure 1). Our analysis of homologous sequences throughout land plants indicates the existence of EMF2 in early diverging lineages of land plants (bryophytes and lycophytes) and suggests the presence of an ancestral EMF2-like gene in early land plants. Phylogenetic results (Figures 3 and 4) are consistent with the hypothesis that VRN2 was likely derived from an EMF2-like ancestor within the angiosperms, and that FIS2 and VEF-L36 were secondarily derived from a VRN2-like ancestral sequence in Arabidopsis. Current phylogenetic hypotheses are limited in taxon sampling and in character sampling, constrained by currently available sequences that are not equally distributed across angiosperm evolution and may not represent complete genomic data for all species sampled. Such limitations reduce overall phylogenetic resolution and make it difficult to assign orthology and paralogy to the available sequences in the face of multiple gene and genome duplication events spanning angiosperm evolution. However, given current sampling, our phylogenetic results indicate that EMF2-like genes in angiosperms demonstrate an evolutionary history largely consistent with the taxonomic history of the plants in which they are found.
Proposed Evolution of VEF Genes Figure 5. Model on VRN2, FIS2, and VEF-L36 Evolution. (A) Proposed VRN2 evolution from EMF2. (B) FIS2 evolution from VRN2. (C) VEF-L36 evolution from VRN2.
a closer phylogenetic relationship to the VEF domain of VRN2 than to EMF2 (Figure 4), forming a clade with the VRN2 sequence indicating common ancestry to the exclusion of EMF2. Among the amino acids diverged between EMF2 and VRN2, FIS2 shares 20 identical amino acid residues with VRN2 and only eight with EMF2 in the VEF domain (Table 3). Globally, FIS2 shared a higher pair-wise alignment score with VRN2 than EMF2 (29 vs. 18%; Table 2).
The EMF2/VRN2 class proteins show strong sequence similarity despite modified domain structure. Sequences with the EMF2like domain structure are widespread, found in animals and most vascular plants. Sequences with the VRN2-like domain structure have only been identified in poplar (PtEMF2_4), pepper (CaEMF2p), alfalfa (MtEMF2p), and soybean (GmEMF2_3) (Table 1) as sequences that lack the N-ter cap and E5–10-like VRN2. In Arabidopsis, EMF2 is an essential gene as evidenced by the short-lived and sterile nature of the emf2 mutants. VRN2 promotes vernalization-mediated flowering and vrn2 mutants flower late, but the loss of VRN2 is not lethal (Gendall et al., 2001). Alternative vernalization mechanisms that do not utilize a putative Arabidopsis VRN2 ortholog have evolved in other species (Yan et al., 2004) and may be present in
750
|
Chen et al.
d
Molecular Evolution of VEF-Domain-Containing PcG Genes
Arabidopsis as well. While every plant sequenced thus far has at least one copy of EMF2, VRN2 is found only infrequently. The dispensable nature of VRN2 may result in its lower frequency of occurrence throughout land plants. Based on our data, it is likely that VRN2 can arise from a duplication of an EMF2-like ancestor. Once an additional EMF2 copy is present, one of the copies is no longer under strong selection and is able to diverge, potentially resulting in a VRN2-like sequence. Under this scenario, VRN2-like sequences could arise multiple times and independently following any duplication event that included the EMF2 gene. Similarity in domain structure and amino acid composition could then be the result of convergent evolution. Genes possessing all domains found in EMF2 exist in insects and mammals (Yoshida et al., 2001; Schuettengruber et al., 2007). It can be argued, based on the presence of EMF2-like genes in animals, lycophytes, bryophytes, gymnosperms, and angiosperms, that early land plants shared an ancestral sequence having the domain structure found in modern copies of EMF2. As the gene or genome duplicated, VRN2 may have arisen from a duplication of the ancestral EMF2 (Figure 5A), followed by subsequent loss of the N-ter cap and the E5–10 domain, and the acquisition of the 52-aa C-terminal repeat. The presence of intermediary forms with partial domain structure suggests a potential step-wise evolution of VRN2 from an EMF2-like sequence. Among the full-length and partial sequences from 20 angiosperm families used in this analysis, 20 sequences contain complete N-ter domain (Figure 2A and Supplemental Figure 1A), nine lack the N-ter cap only (Intermediary molecule #1 in Figure 5A) and four lack both the N-ter cap and the E5–10 domain (Intermediary #2 in Figure 5A; Figure 2 and Supplemental Figure 1B) but do not contain a VEF repeat. So far, no sequence that lacks E5–10 but contains the N-ter cap has been found, suggesting that the N-ter cap may need to be lost first in order for the E5–10 domain to be lost. Finally, only one VRN2-like sequence, Arabidopsis VRN2, possesses the Cterminal repeat (Supplemental Figure 1G). Based on the frequency of the intermediary forms and results from phylogenetic analyses, we propose a three-step hypothesis in the evolution of VRN2 from a parental EMF2 following gene duplication (Figure 5A). In the first step, EMF2 loses the N-ter cap, resulting in Intermediary molecule #1. This could be achieved by mutation of the first ATG, rendering the second ATG as a translation-starting site. In the second step, Intermediary #1 loses the E5–10 domain, resulting in Intermediary molecule #2. This could be achieved by mutation of the splice sites within exon 5–10, resulting in exon skipping (Hayashi et al., 1991). In the third step, Intermediary #2 gains a C-terminal repeat, resulting in the backbone of VRN2. Currently, this third step has only been observed in Arabidopsis. The importance of the 52-aa VEF repeat to the VRN2 function remains to be tested, but the intermediate sequences may represent intermediate forms that could be in the process of evolving the VRN2 function. Comparison of structure and function between these sequences and VRN2 will be required to better understand the relationships of these genes.
The proposed process could happen sequentially, resulting in independent derivations of a VRN2-like sequence from an EMF2-like ancestor multiple times throughout plant evolution. Convergence of the VEF domain among the VRN2-like sequences may occur concurrently with the losses of domains during steps 1 and 2, or may occur following these structural changes due to selection on the resulting gene sequence. This later case assumes that independently evolved VRN2 sequences would converge upon a particular function, with selection then acting in a similar manner on the individual VEF domains. Studies demonstrating the function of VRN2-like sequences in plants in which they are found would be required to understand the selection events leading to convergence of sequence data. More complete genomic and taxonomic sampling focused on VRN2-like sequences will enable us to test for possible differences on selection of the VRN2 clade in comparison with various recovered EMF2 clades. The presence of the VEF repeat only in Arabidopsis VRN2 indicates that it may be a lineage-specific event. In this case, the ancestral VRN2 in the most recent common ancestor of Arabidopsis and Populus would not have had the VEF repeat, and the repeat was subsequently gained in the lineage leading to Arabidopsis after its divergence from the eudicot lineage leading to Populus. Phylogenetic analysis showed that the full-length Populus and Arabidopsis VRN2-like sequences are in the same clade, despite the lack of the VEF repeat in PtVRN2_4. However, in the analysis of the VEF domain alone, the VEF of PtEMF2_4 remained in the same clade as that of PtEMF2_1 and PtEMF2_2, suggesting stabilizing selection on the VEF domain in Populus since the duplication event leading to the Populus EMF2/VRN2-like divergence. This indicates that overall domain architecture of the EMF2 gene is evolving independently from within-domain protein structure, at least for the VEF domain. Studies investigating evidence for directional selection on the VEF domain following duplication of EMF2 will be helpful to assess the likelihood of VRN2 evolution following gene or genome duplication. Phylogenetic analysis and sequence similarity comparison clearly demonstrate that the VEF domain of VEF-L36 is more closely related to that of VRN2 than to EMF2 (Table 3 and Figures 3 and 4). Similarly, both the C2H2 and VEF domains of FIS2 are more closely related to those of VRN2 than EMF2 (Table 3 and Figures 3 and 4). These findings support the derivation of FIS2 and VEF-L36 from VRN2; only plants that have evolved VRN2 could generate sequences like Arabidopsis FIS2 and VEF-L36. FIS2 is an essential gene in Arabidopsis, but has not yet been identified in other plants, including plants with full genome sequences. FIS2 is specifically expressed in the gametophyte of Arabidopsis and prevents endosperm development prior to fertilization (Luo et al., 1999, 2000). A search against cDNA libraries constructed from various angiosperm flowers did not result in any FIS2-like homologs. In plants that did not evolve VRN2, EMF2-like or alternative sequences may have evolved to prevent endosperm development without fertilization. Alternatively, genes with functional but without
Chen et al.
sequence conservation (Calonje et al., 2008) may have evolved to take the place of FIS2. The presence of FIS2 and VEF-L36 should be investigated across Brassicaceae and its sister family, Capparaceae (Hall et al., 2002), in order to localize the potential duplication events leading to the evolution of these sequences from a hypothetical VRN2-like ancestral sequence. FIS2 may have diverged from a duplicated VRN2, while VEF-L36 may have evolved via a translocation of a VEF domain donated by VRN2 (Figure 5B and 5C). PRC2 components play important roles in animal development, notably in insects and mammals (Schuettengruber et al., 2007). Some animal VEF protein sequences in the database possess all domains found in Su(z)12; others possess only the VEF and C2H2, or only the VEF domain. Indeed, nematode has a sequence that shares C2H2 and VEF domain with Su(z)12 (see GenBank’s protein databases). Protein sequence alignment based on identity/similarity did not identify any animal protein with the VEF domain linked to FIS2’s S-rich or VEF-L36’s L36 domain, despite the abundance of S-rich and L36 in nature. A comprehensive evolutionary analysis of animal VEF-containing proteins is beyond the scope of the present study. However, gene duplication, domain deletion/insertion/ rearrangement apparently occurred during the evolution of animal VEF proteins as well. For example, mouse has one, chimpanzee has three, and zebra fish has two VEF protein homologs. Some animal homologs possess the N-terminal sequence, while others do not; and some domains specific to certain animals can be identified (data not shown). Gene fusion involving the human VEF homolog would lead to neoplastic tumor growth (Li et al., 2008). Future investigation of domain architectures in animal VEF proteins would provide insights into the evolutionary trends of VEF proteins in plants versus those in animals.
Dynamic Changes during VEF Gene Evolution The evolution of the VEF genes in plants is characterized by the mobility of the VEF domain, duplication, and functional divergence of homologous sequences. In addition to its diverse location in the genome, a VEF domain can be located in the N- or C-terminus within a genetic locus. A VEF domain-containing gene may even lose the VEF domain, as in the case of HvEMF2_1. These phenomena indicate that the VEF domain functions like a mobile functional module that plays a major role in protein evolution, facilitated by intronic recombination or exon shuffling (Patthy, 1996; Kolkman and Stemmer, 2001). The dynamic genetic changes that occurred during the evolution of this small gene family caused varying degrees of divergence in sequences located between the conserved domains. For example, a region encoded by EMF2 exon 15 through exon 17 (E15–17), flanked by the highly conserved C2H2 zinc finger and VEF domains, is a region with the lowest conservation among the EMF2/VRN2 class homologs (Supplemental Figure 1D). While the ends use identical or similar amino acids and have almost no length variation, the center
d
Molecular Evolution of VEF-Domain-Containing PcG Genes
|
751
of the gene region encoded by exon 16 and the 5’ end of exon 17 requires indels for multiple sequence alignment representing up to 20–70 aa in length difference. The gradient in the degree of similarity, from highly divergent at the center region to highly conserved at the 5’ and 3’ ends, may be informative in plant phylogenetics. Finally, we note that the VEF gene tree reflected our best understanding of the organismal tree for included taxa (Grass Phylogeny Working Group, 2001). Regions with high levels of variability combined with low copy number may render EMF2, particularly the E15–17 domain, a useful phylogenetic tool for evaluating the evolutionary relationships of plants across both deep and shallow nodes.
METHODS Identification of Sequences and Domains of VEF Genes across Land Plants Full-length EMF2 putative protein sequence was used to BLAST (Basic Local Alignment Search Tool) search against the following databases: GenBank (www.ncbi.nih.gov/), TIGR/JCVI (www.tigr.org/), the Floral Genome Project (http://fgp.bio. psu.edu/fgp/), Plant Genome DataBase (www.plantgdb.org/), the moss genome (www.cosmoss.org/, http://genome.jgipsf.org/Phypa1_1/Phypa1_1.home.html), the papaya genome (http://tinyurl.com/3ua95v), the pine EST database (http://fungen. botany.uga.edu/), the Plant Genome Network (http://pgn.cornell. edu/cgi-bin/blast/blast_search.pl), Brassica (http://ukcrop.net/), SOL Genomics Network (www.sgn.cornell.edu/), the poplar genome (http://genome.jgi-psf.org), the Chromatin database (www.chromdb.org/), and the Selaginella genome (http:// selaginella.genomics.purdue.edu). Sequences with an e-value greater than 0.001 (non-significant homology) were eliminated, thereby eliminating all non-plant sequences. Plant sequences containing intact EMF2-like N-terminal, C2H2, and/ or C-terminal domains were selected for further analysis. For identification of homologs of FIS2’s S-rich domain and VEFL36’s L36 domain, S-rich domain and L36 domain amino acid sequences were used to BLAST search against the same databases listed above with an e-value cut-off of 0.001.
Sequencing EMF2 Homolog cDNA Plasmid cDNAs were extracted from bacteria culture according to the manufacture’s protocols (QIAGEN Inc. Valencia, CA 91355, USA). M13 rev (5#-GGAAACAGCTATGACCATG-3’) and M13 (–20) (5#-GTAAAACGACGGCCAG-3’) primers were used for sequencing, with the following internal primers used as necessary to obtain full sequences: Acorus: 5#-CTCAGTAGAGCATGTCTGCTG-3#, 5#-CCCATGCAATCGTGAGAATGC-3#, 5#-TGACACGCTGAAAGATGATG-3#, 5#-CATTAACTGCCTGATACTCTTC-3#, Asparagus: 5#-CAATACGGAATCCATCATTTCTGC3#, 5#-CTTGCTCCAATGCCATTGGC-3’; Nuphar: 5#-GATGAGGTCGATGATGATATTGC-3#, 5#-CTGCCAAAACCCGCTGTTTC-3’; Yucca: 5#-GTCAATCGGGCATGTATACTG-3#, 5#-CTTGCTCCAACGCCATTGGC-3’; Eschscholzia 8.1: 5#-GCTGATTACAAGGAACAGACTG-3#,
752
|
Chen et al.
d
Molecular Evolution of VEF-Domain-Containing PcG Genes
5#-CACGGAACATGACCATCTGC-3’;Eschscholzia8.2:5#-GAGGAATGACAGGGTGGAAGC-3#, 5#-GTTCCAGAGATGCATAATCCTTG-3’; Tomato: 5#-GCTTTGCCGAACTTGCCAG-3#, 5#-CCCTATGAGAATGAAAGAATTGCC-3#.
Sequence Alignment T-coffee (www.ebi.ac.uk/t-coffee/) was used to produce a global amino acid alignment using the default values for protein alignment. RADAR (www.ebi.ac.uk/Radar/) was used to detect de novo repeat regions in EMF2 homologous sequences. Classification of VEFs subgroups was performed based on domain organization in the aligned sequences. The full-length VEF homologs were aligned using T-coffee and pair-wise distance scores were calculated with ClustalW (version 1.83, http://www.ebi.ac.uk/Tools/Radar/) as the number of identities in the best alignment divided by the number of residues compared (gap positions excluded). Scores were initially calculated as percent identity scores and were converted to distances by dividing by 100 and subtracting from 1.0 to give total number of differences per site. No correction for multiple substitutions was performed.
VEF Domain Sequence Data The VEF domain, a region held in common by EMF2, VRN2, FIS2, and VEF-L36, was used to estimate the phylogenetic relationships among VEF gene sequences across land plants. Protein alignment of the VEF domain was performed with MUSCLE, resulting in a multiple sequence alignment of about 130 aa. ProtTest 2.0 was also used to determine the model of evolution that best fits the VEF domain alignment. The bestscoring model for the VEF alignment was also JTT +G, and global rearrangements were sampled with a random order of input sequences. Bayesian and Maximum likelihood methods of phylogenetic inference were conducted on the VEF domain alignment using MrBayes (tree not shown) and RAxMLVI-HPC (Stamatakis, 2006), respectively. The analyses were performed on the computer cluster of the Cyber-Infrastructure for Phylogenetic Research project (CIPRES, www.phylo.org) at the San Diego Supercomputer Center. Clade support, which was assessed with nonparametric bootstrapping (Felsenstein, 1985) as implemented in RAxML-VI-HPC, was based on 100 replicates. The tree with the highest log-likelihood score from the RAxML analysis was chosen for representation here.
Phylogenetic Analysis EMF2/VRN2 Full-Length Sequence Data
Accession Numbers
The T-coffee alignment was used for phylogenetic analysis based on its superior prediction of primary homology statements as compared with prior knowledge of functional domain architecture; for example, the N-terminus-located C2H2 domain of FIS2 aligned with the EMF2 N-terminal domain when using MegAlign or ClustalW, while, in T-coffee, the annotated C2H2 domains aligned with one another across all sequences. Bayesian phylogenetic analyses on aligned full-length sequences were performed with MrBayes v. 3.1.2 (Huelsenbeck and Ronquist, 2001; Ronquist and Huelsenbeck, 2003). The model of protein evolution that best fit the protein sequence data was selected using the AIC as implemented in ProtTest 2.0 (Abascal et al., 2005—see e-mail for citation). The best-scoring model for the EMF2/VRN2 full-length alignment was the Jones-Taylor-Thornton (JTT) probability model (Jones et al., 1992), with rate variation among sites calculated as a gamma distribution (+G), and global rearrangements were sampled with a random order of input sequences. Posterior probabilities of the generated trees were approximated using an MCMC algorithm with four incrementally heated chains (T = 0.2) for 5 000 000 generations and sampling trees every 100 generations. Two independent runs were conducted for each dataset simultaneously, the default setting in MrBayes v. 3.1.2. Following completion, the sampled trees from each analysis were plotted against their log-likelihood score to identify the point at which log-likelihood scores reached a maximum value. All trees prior to this point were discarded as the burn-in phase, all post-burn-in trees from each run were pooled, and a 50% majority-rule consensus tree was calculated to obtain a topology with average branch lengths as well as posterior probabilities as indicators of support for all resolved nodes.
Novel full-length protein sequences generated for this study were deposited in GenBank with the following accession numbers: Yucca filamentosa EMF2 (YfEMF2, GenBank accession number(acc.#) ABD85300); Asparagusofficinalis EMF2 (AoEMF2, acc. # ABD85301); Eschscholzia californica EMF2 (EcEMF2_2, acc. # ABD98791); Eschscholzia californica EMF2 (EcEMF2_1, acc. # ABD98790); Tomato EMF2 (LeEMF2_1, acc. # ABI99480); Acorus americanus EMF2 (AaEMF2, acc. # ABI99481).
SUPPLEMENTARY DATA Supplementary Data are available at Molecular Plant Online.
FUNDING This work is supported by NSF grant #IBN 0236399 and USDA grant #03–35301–13244 to Z.R.S.
ACKNOWLEDGMENTS The authors thank Dr Hong Ma (Pennsylvania State University), the Floral Genome Project, the and the SOL Genomics Network (www.sgn.cornell.edu/) for providing EMF2 homologous cDNA clones, Kazusa DNA Research Institute for providing Lotus japonica EMF2 sequence to Dr Rieko Nishimura, Dr Jo Ann Banks (National Science Foundation/Purdue University) for providing Selaginella EMF2 homologous EST sequences, Dr Ralph Quatrano (Washington University) for providing access to the Physcomitrella website, Drs Hong Ma and Damon R. Lisch (UC Berkeley) for comments of the manuscript, Steve Ruzin and Denise Schichnes (Bioimaging Facility, CNR, UC Berkeley) for image processing, and our laboratory members Myriam Calonje, Tiffany Tirtadinata, Robert Luan, Heather
Chen et al.
Driscoll, and Rosario Sanchez for help and support in preparation of this work. No conflict of interest declared.
REFERENCES Abascal, F., Zardoya, R., and Posada, D. (2005). ProtTest: selection of best-fit models of protein evolution. Bioinformatics. 21, 2104–2105. Birve, A., Sengupta, A.K., Beuchle, D., Larsson, J., Kennison, J.A., Rasmuson-Lestander, A., and Muller, J. (2001). Su(z)12, a novel Drosophila Polycomb group gene that is conserved in vertebrates and plants. Development. 128, 3371–3379. Calonje, M., and Sung, Z.R. (2006). Complexity beneath the silence. Curr. Opin. Plant Biol. 9, 530–537. Calonje, M., Sanchez, R., Chen, L., and Sung, Z.R. (2008). EMBRYONIC FLOWER1 participates in Polycomb group-mediated AG gene silencing in Arabidopsis. Plant Cell. 20, 277–291. Cao, R., Wang, L., Wang, H., Xia, L., Erdjument-Bromage, H., Tempst, P., Jones, R.S., and Zhang, Y. (2002). Role of histone H3 lysine 27 methylation in Polycomb-group silencing. Science. 298, 1039–1043. Chanvivattana, Y., Bishopp, A., Schubert, D., Stock, C., Moon, Y.H., Sung, Z.R., and Goodrich, J. (2004). Interaction of Polycombgroup proteins controlling flowering in Arabidopsis. Development. 131, 5263–5276. Czermin, B., Melfi, R., McCabe, D., Seitz, V., Imhof, A., and Pirrotta, V. (2002). Drosophila enhancer of Zeste/ESC complexes have a histone H3 methyltransferase activity that marks chromosomal Polycomb sites. Cell. 111, 185–196. Englbrecht, C.C., Schoof, H., and Bohm, S. (2004). Conservation, diversification and expansion of C2H2 zinc finger proteins in the Arabidopsis thaliana genome. BMC Genomics. 5, 39. Felsenstein, J. (1985). Confidence limits on phylogenies: an approach using the bootstrap. Evolution. 39, 783–791. Gendall, A.R., Levy, Y.Y., Wilson, A., and Dean, C. (2001). The VERNALIZATION 2 gene mediates the epigenetic regulation of vernalization in Arabidopsis. Cell. 107, 525–535. Goodrich, J., Puangsomlee, P., Martin, M., Long, D., Meyerowitz, E.M., and Coupland, G. (1997). A Polycomb-group gene regulates homeotic gene expression in Arabidopsis. Nature. 386, 44–51. Grass Phylogeny Working Group (Nigel P. Barker, Lynn G. Clark, Jerrold I. Davis, Melvin R. Duvall, Gerald F. Guala, Catherine Hsiao, Elizabeth A. Kellogg, and H. Peter Linder) (2001). Phylogeny and subfamilial classification of the grasses (Poaceae). Annals of the Missouri Botanical Garden. 88, 373–457. Grossniklaus, U., Vielle-Calzada, J.P., Hoeppner, M.A., and Gagliano, W.B. (1998). Maternal control of embryogenesis by MEDEA, a polycomb group gene in Arabidopsis. Science. 280, 446–450.
d
Molecular Evolution of VEF-Domain-Containing PcG Genes
|
753
Hennig, L., Taranto, P., Walser, M., Schonrock, N., and Gruissem, W. (2003). Arabidopsis MSI1 is required for epigenetic maintenance of reproductive development. Development. 130, 2555–2565. Huelsenbeck, J.P., and Ronquist, F. (2001). BRBAYES: Baysian inference of phylogenetic trees. Bioinformatics. 17, 754–755. Irish, V.F., and Benfey, P.N. (2004). Beyond Arabidopsis: translational biology meets evolutionary developmental biology. Plant Physiol. 135, 611–614. Jiang, D., Wang, Y., Wang, Y., and He, Y.l (2008). Repression of Flowering Locus C and Flowering Locus T by the Arabidopsis Polycomb Repressive Complex 2 components. PLoS One. 3, e3404. Jones, D.T., Taylor, W.R., and Thornton, J.M. (1992). The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8, 275–282. Kim, S., Yoo, M., Albert, V., Farris, J., Soltis, P.S., and Soltis, D.E. (2004). Phylogeny and diversification of B-function MADS-box genes in angiosperms: evolutionary and functional implications of a 260-million-year-old duplication. Amer. J. Bot. 91, 2102–2118. Kinoshita, T., Harada, J.J., Goldberg, R.B., and Fischer, R.L. (2001). Polycomb repression of flowering during early plant development. Proc. Natl Acad. Sci. U S A. 98, 14156–14161. Kohler, C., Hennig, L., Spillane, C., Pien, S., Gruissem, W., and Grossniklaus, U. (2003). The Polycomb-group protein MEDEA regulates seed development by controlling expression of the MADS-box gene PHERES1. Genes Dev. 17, 1540–1553. Kolkman, J.A., and Stemmer, W.P. (2001). Directed evolution of proteins by exon shuffling. Nat. Biotechnol. 19, 423–428. Kuzmichev, A., Nishioka, K., Erdjument-Bromage, H., Tempst, P., and Reinberg, D. (2002). Histone methyltransferase activity associated with a human multiprotein complex containing the Enhancer of Zeste protein. Genes Dev. 16, 2893–2905. Li, J., Wang, J., Mor, G., and Sklar, J. (2008). A neoplastic gene fusion mimics trans-splicing of RNAs in normal human cells. Science. 321, 1357–1361. Luo, M., Bilodeau, P., Koltunow, A., Dennis, E.S., Peacock, W.J., and Chaudhury, A.M. (1999). Genes controlling fertilizationindependent seed development in Arabidopsis thaliana. Proc. Natl Acad. Sci. U S A. 96, 296–301. Luo, M., Bilodeau, P., Dennis, E.S., Peacock, W.J., and Chaudhury, A. (2000). Expression and parent-of origin effects for FIS2, MEA, and FIE in the endosperm and embryo of developing Arabidopsis seeds. Proc Natl Acad Sci. 97, 10637–10642. Moon, Y.H., Chen, L., Pan, R.L., Chang, H.S., Zhu, T., Maffeo, D.M., and Sung, Z.R. (2003). EMF genes maintain vegetative development by repressing the flower program in Arabidopsis. Plant Cell. 15, 681–693.
Hall, J.C., Sytsma, K.J., and Iltis, H.H. (2002). Phylogeny of Capparaceae and Brassicaceae based on chloroplast sequence data. Amer. J. Bot. 89, 1826–1842.
Muller, J., Hart, C.M., Francis, N.J., Vargas, M.L., Sengupta, A., Wild, B., Miller, E.L., O’Connor, M.B., Kingston, R.E., and Simon, J.A. (2002). Histone methyltransferase activity of a Drosophila Polycomb group repressor complex. Cell. 111, 197–208.
Hayashi, S.I., Kunisada, T., Ogawa, M., Yamaguchi, K., and Nishikawa, S.I. (1991). Exon skipping by mutation of an authentic splice site of c-kit gene in W/W mouse. Nucleic Acids Res. 19, 1267–1271.
Ohad, N., Yadegari, R., Margossian, L., Hannon, M., Michaeli, D., Harada, J.J., Goldberg, R.B., and Fischer, R.L. (1999). Mutations in FIE, a WD Polycomb group gene, allow endosperm development without fertilization. Plant Cell. 11, 407–416.
754
|
Chen et al.
d
Molecular Evolution of VEF-Domain-Containing PcG Genes
Ronquist, F., and Huelsenbeck, J.P. (2003). MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 19, 1572–1574.
Wood, C.C., Robertson, M., Tanner, G., Peacock, W.J., Dennis, E.S., and Helliwell, C.A. (2006). The Arabidopsis thaliana vernalization response requires a Polycomb-like protein complex that also includes VERNALIZATION INSENSITIVE 3. PNAS. 103, 14631–14636.
Schonrock, N., Bouveret, R., Leroy, O., Borghi, L., Kohler, C., Gruissem, W., and Hennig, L. (2006). Polycomb-group proteins repress the floral activator AGL19 in the FLC-independent vernalization pathway. Genes Dev. 20, 1667–1678.
Yan, L., Loukoianov, A., Blech, A., Tranquilli, G., Ramakrish, W., SanMiguel, P., Bennetzen, J., Echenique, v, and Dubcovsky, J. (2004). The Wheat VRN2 gene is a flowering repressor downregulated by vernalization. Science. 303, 1640–1644.
Schuettengruber, B., Chourrout, D., Vervoort, M., Leblanc, B., and Cavalli, G. (2007). Genome regulation by Polycomb and Trithorax proteins. Cell. 128, 735–745.
Yang, C.H., Chen, L.J., and Sung, Z.R. (1995). Genetic regulation of shoot development in Arabidopsis: role of the EMF genes. Developmental Biol. 169, 421–435.
Stamatakis, A. (2006). RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 22, 2688–2690.
Yoshida, N., Yanai, Y., Chen, L., Kato, Y., Hiratsuka, J., Miwa, T., Sung, Z.R., and Takahashi, S. (2001). EMBRYONIC FLOWER2, a novel Polycomb group protein homolog, mediates shoot development and flowering in Arabidopsis. Plant Cell. 13, 2471–2481.
Patthy, L. (1996). Exon shuffling and other ways of module exchange. Matrix Biol. 15, 301–310; discussion 311–302.
Sung, S., and Amasino, R.M. (2004). Vernalization and epigenetics: how plants remember winter. Curr. Opin. Plant Biol. 7, 4–10.