Biochimie 119 (2015) 125e136
Contents lists available at ScienceDirect
Biochimie journal homepage: www.elsevier.com/locate/biochi
Research paper
Structural evolution of the 4/1 genes and proteins in non-vascular and lower vascular plants Sergey Y. Morozov a, b, *, Irina A. Milyutina b, Vera K. Bobrova b, Dmitry Y. Ryazantsev c, Tatiana N. Erokhina c, Sergey K. Zavriev c, Alexey A. Agranovsky a, Andrey G. Solovyev b, Alexey V. Troitsky b a
Department of Virology, Biological Faculty, Lomonosov Moscow State University, Moscow 119992, Russia A. N. Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, 119992 Moscow, Russia M. M. Shemyakin and Yu. A. Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 16/10 Miklukho-Maklaya Str., Moscow 117997, Russia b c
a r t i c l e i n f o
a b s t r a c t
Article history: Received 24 August 2015 Accepted 29 October 2015 Available online 2 November 2015
The 4/1 protein of unknown function is encoded by a single-copy gene in most higher plants. The 4/1 protein of Nicotiana tabacum (Nt-4/1 protein) has been shown to be alpha-helical and predominantly expressed in conductive tissues. Here, we report the analysis of 4/1 genes and the encoded proteins of lower land plants. Sequences of a number of 4/1 genes from liverworts, lycophytes, ferns and gymnosperms were determined and analyzed together with sequences available in databases. Most of the vascular plants were found to encode Magnoliophyta-like 4/1 proteins exhibiting previously described gene structure and protein properties. Identification of the 4/1-like proteins in hornworts, liverworts and charophyte algae (sister lineage to all land plants) but not in mosses suggests that 4/1 proteins are likely important for plant development but not required for a primary metabolic function of plant cell. te Française de Biochimie et Biologie Mole culaire (SFBBM). All rights © 2015 Elsevier B.V. and Socie reserved.
Keywords: Plant 4/1 gene Cloning Protein structure Protein phylogeny Gene evolution
1. Introduction The 4/1 proteins are encoded by single-copy genes in most flowering plants. Initially the 4/1 protein (At-4/1) was identified in Arabidopsis thaliana in a yeast two-hybrid screen of the cDNA library with the movement protein of Tomato spotted wilt tospovirus [1,2]. The 4/1 proteins in Magnoliophyta are characterized by their highly conserved C-terminal domain of 30e37 amino acids. Previous analyses have identified a tripartite domain structure of the tobacco Nt-4/1. The mostly a-helical structure of 4/1 protein with pronounced coiled-coil elements covering more than twothirds of its length implies a potential for self-interaction and binding to protein ligands [3,4]. The function of 4/1 proteins in plants is not clearly understood [2,4e7]. Data available in the NCBI gene expression database (GEO) show that levels of arabidopsis and rice 4/1 mRNAs increase in response to several biotic and abiotic stress factors, suggesting that
* Corresponding author. A. N. Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, 119992 Moscow, Russia. E-mail address:
[email protected] (S.Y. Morozov).
4/1 expression may be controlled to maintain plant homeostasis [6,7]. Fluorescent protein-tagged Nt-4/1 localized to the cytoplasm and the nucleus, and At-4/1 showed a polarized localization to one side of leaf epidermal cells [2,7]. On the other hand, the pattern of GUS expression driven by the Nt-4/1 promoter in transgenic Nicotiana tabacum plants indicates that the transcription of 4/1 gene is associated with the conducting tissues in the hypocotyl and leaf veins at certain developmental stages. Nicotiana 4/1 proteins are RNA-binding proteins targeted specifically to non-perfect duplexes including viroid RNAs, long-distance movement of which is influenced in the 4/1-silenced plants. Several deletion and point mutants of Nt-4/1 were constructed, and the RNA binding site was mapped in the positively charged region of the C-terminal domain of the protein [3,4,6]. The 4/1 proteins showed an obvious similarity in their folding to yeast protein She2p. This protein is a member of a class of nucleic acid binding proteins that contain a single globular domain with a five a-helix bundle and form a symmetric homodimer. Specific yeast mRNAs contain cis-acting hairpin regions, termed zip-code elements, that mediate mRNA binding by She2p required for subsequent directed intracellular RNA transport [4]. Exon/intron analysis showed that the gene structure of 4/1
http://dx.doi.org/10.1016/j.biochi.2015.10.019 te Française de Biochimie et Biologie Mole culaire (SFBBM). All rights reserved. 0300-9084/© 2015 Elsevier B.V. and Socie
126
S.Y. Morozov et al. / Biochimie 119 (2015) 125e136
orthologs exhibit conservation in intron number, and the intron positions and phase are highly conserved across different species of flowering plants. Although the 4/1 genes of many flowering plants have 8 exons and 7 introns, the remarkable exclusions were found, particularly, in the order Rosales where all sequenced representatives have lost introns 3 and 4 and thus contained only 6 exons [3,4]. Even more profound intron loss was revealed in Beta vulgaris (order Caryophyllales) where the 4/1 gene (locus NC_025813) contains no introns although codes for a protein with the characteristic 4/1 features. The high degree of structural conservation in the Magnoliophyta 4/1 proteins suggests that a strong selection maintains their functions. This in turn raises questions about the specific structural features in non-Magnoliophyta 4/1 proteins and their genes. Most of the evolutionary changes optimizing 4/1 function have been likely subjected to selection forces directed by the direct relationship between this function and the fitness of the plant. Thus, investigation of the early evolution of 4/1 genes and proteins can aid in identifying key structural features related to their functions. Indeed, there are several evidences of adaptive character of the 4/1 evolution in Magnoliophyta: i) rather low rates of evolution of protein-coding regions of the corresponding genes; ii) 4/1 genes present a low propensity to undergo duplication; and iii) the probable lack of 4/1 genes in the genus Solanum, which would be linked to the specific adaptive changes of these plants [3,4]. To better understand the evolution of the 4/1 protein and identify potential associations between 4/1 structure and morphological evolution in plants, we have undertaken a comparative analysis of lower land plant 4/1 genes in conjunction with a protein structural analysis. We have partly or completely sequenced 4/1 genes in some representatives of the liverworts, lycophytes, ferns and gymnosperms. In this analysis, we have also mined the transcriptomic resources accumulated in the 1KP project [8] and NCBI to uncover the putative evolutionary history of 4/1 proteins. Our structural analysis shows that most 4/1 proteins have a characteristic C-terminal coiled-coil domain that dates back to the charophytes (sister lineage to the land plants). This domain could not be found however in more distantly related chlorophyte algae, suggesting that 4/1-like proteins arose in the lineage that gave rise to a common ancestor of all land plants [4]. The conserved C-terminal coiled-coil 4/1 domain is also encoded in hornworts and liverworts but not in mosses. We show that non-canonical 4/1 proteins with divergent structures have evolved in the bryophytes, whereas the vast majority of vascular plant groups mostly have typical Magnoliophyta-like 4/1 proteins. In general, our results support previous models of the 4/1 protein organization [4,9] and have important contribution to understanding and future studies of structureefunction relationships in the 4/1 proteins. 2. Materials and methods 2.1. Plant material and nucleic acid extraction Plant material was taken from the collections of the Tsytsin Main Botanical Garden of the Russian Academy of Sciences and the Biological Faculty of Lomonosov Moscow State University. Total DNA was isolated from 200 mg of fresh or dried plant material by DNA extraction kit (MachereyeNagel, Germany) according to the protocol of the manufacturer. 2.2. Genome Walker PCR Genome Walker libraries were constructed by using Universal Genome Walker kit according to manufacturer's instruction (BD Biosciences Clontech). Plant genomic DNA (2.5e5 mg) in each
reaction was digested at 37 C overnight with a restriction enzyme giving rise to blunt-ends. Five enzymes (DraI, EcoRV, PvuII, ScaI, and StuI) were used in five reactions, respectively. After purification with phenol and chloroform extraction and ethanol precipitation, the mixed digested DNA was ligated to partly double-stranded Genome Walker adapter GTAATACGACTCACTATAGGGCACGCGTGGTCGACGGCCCGGGCTGGT (double stranded region is underlined) at 16 C overnight. Primers for PCR-based DNA walking in Genome Walker Libraries were gene-specific Min and Mout (Table 1) in complementary chain 3-50 bases from translational initiation site, adapter sequence AP1 (GTAATACGACTCACTATAGGGC) and AP2 (ACTATAGGGCACGCGTGGT0 ). Two Genome Walker PCR reactions were carried out successively using PCR Kit, which is a Taqbased system with Pfu-like high fidelity and efficiency in the amplification of DNA template, and Advantage 2 PCR Kit (BD Biosciences Clontech). Mout and AP1 were used in primary PCR, and Min and AP2 were used in a secondary (nested) PCR. In a 50-mkL PCR mix, 1 mkL of each Genome Walker DNA library was used as templates in the primary PCR, and 2 mkL of primary PCR products was used as templates in secondary PCR. The PCR was started at 95 C for 1 min, followed by 35 cycles consisting of 95 C for 15 s and 68 C for 4 min, and a final extension at 68 C for 6 min. For amplification of a next fragment of the 4/1 gene on the template of genomic DNA, new pairs of primers were synthesized after sequencing of the first amplified fragment (data not shown). Totally, we performed 11 sequence reads for Marchantia polymorpha 4/1 gene (with 19 new primers), 9 sequence reads for Ceratopteris richardii 4/1 gene, 7 sequence reads for Selaginella pulcherrima 4/1 gene, 8 sequence reads for Selaginella kraussiana 4/1 gene and 4 sequence reads for Picea pungens 4/1 gene. The nucleotide sequences reported in this article have been deposited in the DDBJ, EMBL, and GenBank nucleotide databases with accession numbers KP781978 (Marchantia polymorpha 4/1 gene), KP781979 (Ceratopteris richardii 4/1 gene), KP781980 (S. pulcherrima 4/1 gene), KP781981 (S. kraussiana 4/1 gene), and KP781982 (Picea pungens 4/1 gene, partial sequence). 2.3. Bioinformatic analysis Sequences for comparative analysis were retrieved form NCBI (http://www.ncbi.nlm.nih.gov/), Dendrome (http://dendrome. ucdavis.edu/resources/blast/), TGI (http://compbio.dfci.harvard. edu/tgi/), SGN (http://sgn.cornell.edu/), the PlantGDB website (http://www.plantgdb.org), the PLAZA website (http:// bioinformatics.psb.ugent.be/plaza/) and Phytozome (http://www. phytozome.net) databases. COBALT, the constraint-based alignment tool for multiple protein sequences (http://www.ncbi.nlm.nih.gov/tools/cobalt/) was used for multiple sequence alignments and phylogenetic analyses; Fast Minimum Evolution tree was obtained with the use of default Table 1 Primers used for DNA walking of the 4/1 genes. Species and primer names Picea glauca Pigla-Mout Pigla-Min Ceratopteris richardii Cri-Mout Cri-Min Selaginella moellendorffii Smo-Mout Smo-Min Marchantia polymorpha Mpo-Mout Mpo-Min
Primer sequences 50 -GTCTAAGAGGAAGCAGCTGC 50 -CTTTTCAGGTCTTCGTTTTCCTCTTCT 50 -AACGTATTTTTCATTCTGTGT 50 -AGCTCCGCGTAATTGACGTCTCAA 50 -TTCTTCGCTTGAGATCTTCGTTCTCCT 50 -TTCGGAGCTTCATGAGCTTTTGTTTGAGA 50 -CAGGTTCTGTCTGCATTTGCCGCC 50 -CTAAAACTTCATTGTCCTTTCTGA
S.Y. Morozov et al. / Biochimie 119 (2015) 125e136
127
Table 2 Total gene size and intron length of the selected 4/1 genes. Taxonomy and species name, accession number
Angiosperms (dicots) Nicotiana benthamiana (EU117390) Arabidopsis lyrata (AL7G16040*) Carica papaya (CP00095G00030*) Populus trichocarpa (PT18G12850*) Ricinus communis (RC29670G00030*) Vitis vinifera (VV11G09770*) Angiosperms (monocots) Zea mays (ZM05G19620*) Sorghum bicolor (SB04G003530*) Oryza sativa (OSINDICA_02G05060*) Brachypodium distachyon (BD3G03830*) Gymnosperms Picea glauca (PRJNA83435) Picea abies (MA_878127***) Pinus taeda (scaffold5038***) Pinus lambertiana (scaffold10971***) Monilophytes Ceratopteris richardii (this paper) Lycophytes Selaginella moellendorfii (SM00001G01600***) Selaginella pulcherrima (this paper) Selaginella kraussiana (this paper) Bryophytes Marchantia polymorpha (this paper) Anthoceros agrestis (ERX714368, ERX714369)
Gene and intron sizes Gene
1st
2nd
3rd
4th
5th
6th
7th
4182 1724 3189 2198 1723 9567
66 339 563 67 213 95
1804 97 183 145 93 7893
177 100 165 143 94 172
1065 150 391 738 199 307
88 81 86 84 73 80
80 90 81 94 94 94
144 123 967 153 198 174
4331 2564 2267 2574
614 595 349 750
133 117 152 142
2258 519 495 480
78 74 85 69
119 121 80 122
101 107 96 100
275 278 381 155
nd** nd 213214 nd
nd** 5505 101383 nd
nd** nd 109876 nd
100 101 103 100
234 246 211 216
351 402 349 351
107 113 115 102
423 422 397 395
nd
nd
nd
nd
nd
133
130
84
1215 1230 1142
59 54 58
96 116 58
106 93 62
62 61 54
47 52 64
53 46 64
55 59 47
2938 2059
455 383
592 101
218 95
302 86
352 88
127 86
145 350
See text for details. * e accession according 1 KP project database; ** e full sequence is unavailable; *** e accession according Dendrome project (http://dendrome.ucdavis. edu).
parameters. Independently, neighbor-joining (NJ) phylogenetic tree was constructed using TREECON 1.3b package (http:// bioinformatics.psb.ugent.be/downloads/psb/Userman/treeconw. html) for building a tree from aligned sequences by the unweighted pair-group method using arithmetic averages (UPGMA). For computer-assisted secondary structure prediction, the PSSFinder server (http://linux1.softberry.com/berry.phtml), PCOIL server (http://toolkit.tuebingen.mpg.de/pcoils) and the MARCOIL server (http://www.isrec.isb-sib.ch/webmarcoil/webmarcoilC1. html) were used. Prediction of protein disordered/unstructured regions was carried out using the DisEMBL software (http://dis. embl.de/cgiDict.py). Conserved blocks in the 4/1 proteins were detected using WebLogo3 (http://weblogo.threeplusone.com/). Usually, sequence logos are constructed from a set of aligned sequences and graphed as columns of stacked symbols. The height of each column corresponds to the information content of the corresponding position in the alignment, and the size of the individual symbols within each column reflects the frequency of the corresponding nucleotide at this position. Typically, the logo is calculated from the frequencies of the amino acids in each position. The NCBI accession numbers for the annotated 4/1 sequences used for the sequence logo construction are as follows: Populus trichocarpa XM_002325222; N. tabacum EU117386; Ricinus communis XM_002532486; Arabidopsis thaliana NM_118735; Arabidopsis lyrata XM_002867538; Oryza sativa NM_001052428; Sorghum bicolor XM_002451499; Zea mays NM_001137007; Hordeum vulgare AK359619; Brassica rapa AC189360; Capsicum annuum GD066605; Gossypium hirsutum ES798773; Malus domestica GO550560; Glycine max FK638939; Fragaria vesca EX659494.
3. Results and discussion 3.1. Screening of possible orthologous 4/1 sequences in nonflowering plants Previously, we identified orthologous single copy nuclear 4/1 genes in Arabidopsis thaliana and N. benthamiana. Our previous sequence analysis of 4/1 proteins from a number of dicotyledonous plants has revealed that its C-terminal region is most conserved [2,3]. In order to identify At-4/1 and Nb-4/1 orthologs in nonvascular and lower vascular plants, we performed database searches using as queries the full-length protein sequences and their highly conserved C-terminal regions. Initially, putative 4/1like cDNA sequences were identified by TBLASTN searches using NCBI database (version of April 2011). For non-annotated sequences derived from EST data sets, translations across all six reading frames were searched for significant ORFs at http://web. expasy.org/translate/. A total of five unique 4/1-like cDNA nucleotide sequences were found in the NCBI database for the following species: Picea glauca (accession number EX326890), Picea sitchensis (GH290407), Ceratopteris richardii (CV735868), Selaginella moellendorffii (XM_002961390) and Marchantia polymorpha (BJ865277).
3.2. 4/1 gene sequence analyses in conifers The almost full-length 4/1 mRNAs from EST libraries of P. glauca and P. sitchensis exhibited 99% identity and contained open reading frames of 250 and 248 codons in length, respectively, encoding polypeptides significantly similar to the At-4/1 protein (Evalue ¼ 1e-34, identity ¼ 38%) and Nt-4/1 protein (E-value ¼ 9e-33, identity ¼ 38%). To analyze 4/1 genes in conifer species, primers corresponding to the 50 - and 30 -most portions of the Piceae glauca cDNA 4/1 ORF were used for amplification carried out on genomic DNA from Picea pungens, P. glauca, Pinus mugo, and Pseudotsuga menziesii. Initial PCR analysis, however, gave rise to a negative
128
S.Y. Morozov et al. / Biochimie 119 (2015) 125e136
Fig. 1. Synoptic consensus view on phylogeny of major plant taxa based on recent molecular analyses and discussed in this paper (compiled from Ref. [21] with modifications).
result (data not shown). Similarly, no PCR products were revealed when primers corresponding to the most conserved portions of the putative exon 2 and 30 -most ORF region were used for amplification (data not shown). Previously, we were able to amplify the almost full-lengh 4/1 genes by PCR using DNA templates from several Nicotiana species [3]. Since, in contrast to the tobacco and its close relatives, no PCR products could be obtained for the conifers, we proposed that this may be due to experimental shortcomings or yet unrecognized genomic complexity of the latter species and, thus, we performed cloning by Genome Walker PCR (see Materials and methods). The first pair of gene-specific primers for PCR-based DNA walking in Genome Walker Libraries was positioned in a complementary chain 48 to 22 nts and 22 to 6 nts from translational termination site of P. glauca ORF (Table 1). After sequencing the first series of cloned PCR walking products, the novel gene-specific primers were synthesized. The resulting 1372 bp nucleotide sequence from P. pungens (Colorado spruce) contained two 3'-most exons and introns as well as 604 bp of 30 -terminal non-translated region. Comparison of the nucleotide and predicted polypeptide sequences of the Colorado
spruce 4/1 gene with that of P. glauca in the NCBI EST library revealed that they share more than 99% identity at the amino acid level and more than 97% identity at the nucleotide level in the proteinencoding region of the gene (data not shown). Using the early releases of the white spruce (P. glauca), Norway spruce (P. abies) and loblolly pine (Pinus taeda) genome sequences [10e12] in public databases (NCBI and http://dendrome.ucdavis. edu/treegenes/), we performed BLASTN search with 1372-residue nucleotide sequence from Picea pungens as a query. Genomic nucleotide sequences from the white spruce and Norway spruce 4/ 1 genes showed 98e99% identity to the Colorado spruce gene even in the region corresponding to the last intron of 422 bp in length (intron 7th according to Nicotiana and Arabidopsis species) (data not shown). This estimation is very close to the level of similarity between the last introns in Nicotiana species (95e97% identity) such as N. tabacum, N. benthamiana, Nicotiana otophora, Nicotiana hesperis and Nicotiana clevelandii [3]. The last introns of the 4/1 genes in loblolly pine and Pinus lambertiana are somewhat smaller (397 and 395 bp in length, respectively) and showed 85% identity with several indels (Table 2 and data not shown). Generally, the 7th
S.Y. Morozov et al. / Biochimie 119 (2015) 125e136
129
Fig. 2. Sequence logos of 4/1 conserved domains. These sequence logos, which visualize the distribution of amino acids at each position of conserved motifs, are based on the aligned 4/1 sequences identified in the genomes of selected flowering plants. Amino acids are colored according to chemical properties; polar (brown), hydrophobic (blue), negatively charged (green), positively charged (red). All amino acids are represented as standard, single-letter abbreviations.
introns of 4/1 gene in Pinaceae are not significantly larger than those of flowering plants (Table 2). The Pinaceae species possesses tremendous genome size variation. Pines (Pinus) diverged from spruces (Picea), their closest relatives, more than 80 million years ago and possess larger genomes on average, estimated between 22 and 32 Gbp. The extremely large genome size in conifers has primarily been attributed to an extensive contribution of very long introns and interspersed repetitive content [11]. Particularly, the assembly of the Norway spruce genome has shown that LTR retro-transposons are frequently nested within the long introns of some gene families. The huge size of these genomes has previously complicated their whole-genome sequencing [10]. The same cause could explain the failure of our experiments to amplify the whole 4/1 genes from conifer species (see above). Indeed, bioinformatics search revealed that the 1st and 2nd introns of the P. taeda 4/1 gene are 101,369 bp and 109,906 bp, respectively, although the remaining introns are much smaller (Table 2). These 100 kbp introns also contain sequence elements of retrotransposons (data not shown). Mapping loblolly pine transcriptome against the genome identified 10,991 introns with an average of 3.28 introns per gene. Their maximum reported length is approximately 150 kbp [11]. However, the 1st intron of the P. abies 4/1 gene is 5480 bp only, whereas the 2nd intron is more than 6200 bp (Table 2). This is in correlation with somewhat larger genomes in pines. In plants, increasing intron length is positively correlated with gene expression [13,14]. The first intron in the gene is also generally longer than distal introns [13,14]. This is largely true for loblolly pine, as well [11]. Introns that are known to enhance expression have been observed in diverse organisms including plants, insects, and mammals [13]. This positive effect on gene expression has been named intron-mediated enhancement (IME) [15]. It is important to note that IME is not due to the presence of intronic enhancers, although some enhancing introns can also contain such enhancer elements. While enhancers may be located upstream or downstream of a gene, introns involved in IME must be located in transcribed sequences in order to increase expression [15]. Typically, introns that are located nearer to the 5'-end of a gene have more enhancing power than those at the 3'-end. It remains to be
established how the 1st and 2nd introns of the P. taeda 4/1 gene, being more than 100 kbp in length, contribute to the gene expression. 3.3. 4/1 gene sequence analyses in non-seed vascular plants Extant vascular plants are divided into two major clades: lycophytes and euphyllophytes. Euphyllophytes are divided into spermatophytes and monilophytes, and the latter include horsetails, psilophytes, eusporangiate and leptosporangiate ferns (Fig. 1). Over the past twenty years, molecular phylogenetic approaches confirmed that ferns form a clade sister to seed plants. Ferns are one of the great vascular plant radiations; only the angiosperm clade has more extant species [16]. Lycophytes have features typical of vascular plants, including a dominant and complex sporophyte generation and vascular tissues with lignified cell types [17]. Because the lycophytes are an ancient lineage that diverged shortly after land plants evolved vascular tissues, their genomes may provide a resource for identifying genes that may have been important in the early evolution of developmental and metabolic processes unique to vascular plants. Only three extant orders of lycophytes exist; these include the Lycopodiales (club mosses), the Isoeteales (quillworts), and the Selaginellales. Importantly, S. moellendorffii has one of the smallest genome size of any plant reported [17]. The first pair of gene-specific primers for PCR-based DNA walking in Genome Walker Libraries was positioned in complementary chain 50 to 24 nts and 23 to 3 nts from translational termination site of the fern Ceratopteris richardii cDNA 4/1-like ORF. To clone and sequence 4/1-like genes in S. kraussiana and S. pulcherrima we designed a pair of the Genome Walker primers positioned in complementary chain 55 to 28 nts and 27 to 5 nts from translational termination site of the S. moellendorffii cDNA ORF (Table 1). As a result, partial sequence of fern C. richardii 4/1-like gene and complete sequences of the 4/1 genes in two Selaginalla species were obtained after subsequent rounds of sequencing and synthesis of the new primers (see Materials and methods). Interestingly, the very condensed genome size in S. moellendorffii has partially been attributed to a short median intron length, which
130
S.Y. Morozov et al. / Biochimie 119 (2015) 125e136
Table 3 Comparison of the C-terminal sequences of the 4/1 proteins from some flowering plants and 4/1-like proteins of other plants.
Fig. 3. Predicted secondary structure and positions of the predicted NES and NLS signals in selected 4/1 proteins from dicot plant (Nicotiana tabacum, bipartite NLS score e 6.1), Gymnosperms (Pinus taeda, bipartite NLS score e 7.5; Podocarpus coriaceus, bipartite NLS score e 7.0), Lycophytes (Selaginella willdenowii, bipartite NLS score e 7.0; Selaginella moellendorfii, bipartite NLS score e 7.3) and Marchantiophyta (Marchantia polymorpha, monopartite NLS score e 7.0). Numbers indicate amino acid positions according the N terminus. Secondary structure elements are denoted as a e a-helix, b e b structure. NESs are in yellow; bipartite NLSs are in green; monopartite NLS is in blue.
132
S.Y. Morozov et al. / Biochimie 119 (2015) 125e136
is even smaller than in Arabidopsis species [11,17]. The phenomenon of minimal intron length was also observed in this study for S. kraussiana and S. pulcherrima (Table 2). Ferns tend to have much larger genome sizes and chromosome numbers compared with Selaginellaceae for reasons that are not fully understood. Our comparative data on C. richardii and Selaginella 4/1-like genes indicate that median intron length in fern is also larger (Table 2). 3.4. 4/1 gene sequence in Marchantia polymorpha Bryophytes are a very diverse group of non-vascular land plants with over 800 genera and more than ten thousand species which include liverworts, mosses and hornworts. After colonization of land by the ancestors most closely related to modern day charophycean algae, bryophytes arose during the Ordovician, ca 480 million years ago [18]. Their phylogeny is based on molecular sequence data and morphology of the extant species (Fig. 1). The liverworts (Marchantiophyta) are resolved as the earliest-divergent land plant group, while the mosses (Bryophyta) represent the sister group to a clade formed by hornworts (Anthrocerotophyta) and vascular plants (Tracheophyta). The controversial hypothesis resolved hornworts as sister to mosses and liverworts plus vascular plants [19]. In this paper we studied the structure of 4/1-like gene from liverwort Marchantia polymorpha as a representative of most basal land plants. The first pair of gene-specific primers for PCR-based DNA walking in Genome Walker Libraries was positioned (Table 1) in complementary chain 50 to 24 nts and 23 to 3 nts from translational termination site of the liverwort M. polymorpha 4/1like ORF from NCBI EST library (see above). As a result of subsequent steps of sequencing and the new primer synthesis, 3150 nts sequence of Marchantia 4/1-like gene was obtained. Using recently published transcriptome sequence data for this liverwort we were able to precisely map exon/intron structure of this gene (Table 2). Similar to fern C. richardii, the median intron length was found to be significantly larger in M. polymorpha than that of Selaginella 4/1 genes. Importantly, comparative analysis showed that the intron positions according to codons (and respective amino acids in proteins) in At-4/1, Nt-4/1, Rc-4/1 and full-length 4/1-like genes of conifers, Selaginella and Marchantia are very similar (Supplementary Fig. S1. and data not shown). Moreover, we have recently finished the sequence assembly of the complete 4/1-like gene of the hornwort Anthoceros agrestis on the basis of the NCBI short read archive (SRA) data (Table 2). It was found that the intron positions of the hornwort gene are also highly conserved in comparison with liverworts and flowering plants (Supplementary Fig. S1). These findings strongly suggest that 4/1 gene arose in very early evolution of land plants, probably before separation of liverworts and hornworts, or during radiation of charophycean algae (see below). 3.5. 4/1 protein sequences in lower vascular plants Conclusive evidences suggest that protein function can be correlated well with the presence of local patterns of amino acid residues or motifs shared by proteins with similar function. Motifs are highly conserved sets of residues that form similar patterns and often represent functionally important regions such as active or binding sites, or regions defining the overall protein fold. Throughout the course of evolution, functionally important parts of proteins, like active site residues in case of enzymes, have remained conserved. Thus, analysis of similarity in local regions of a protein or structural motifs could be useful for identifying functionally significant sites. Typically, sequence motifs are derived from multiple sequence alignments of proteins belonging to the same family
or having similar function. Our previous phylogenetic analysis based on alignments of 4/1-like proteins from 15 flowering plants [3, see also Materials and methods] has allowed us to use the motiffinding tool WebLogo 3 (http://weblogo.threeplusone.com/) to find the characteristic motifs of Magnoliophyta 4/1 proteins. There are five patterns revealed for 4/1 proteins (Fig. 2): the N-terminal signature (motif I), three internal signature motifs and the C-terminal motif (motif V). The latter motif represents the most conserved sequence signature including coiled-coil structure with three absolutely conserved Leu residues in positions ‘d’ of heptads [2e4]. This region is positively charged and involved in RNA binding [4,9] (Table 3 and Fig. 2). To identify orthologs of the 4/1 protein among lower vascular plants we used as queries both full-length protein sequences from N. tabacum, P. glauca and S. moellendorffii and the C-terminal signature (motif V). Dendrome (http://dendrome.ucdavis.edu/), NCBI, and the 1KP project (www.onekp.com) were the main bioinformatic sources to retrieve 4/1-like cDNA ORFs and protein sequences. P. glauca 4/1 protein sequence was used as a query in searches employing the TBLASTN option. For non-annotated sequences derived from EST data sets of Lycophytes, Gymnosperms, Leptosporangiate ferns and Eusporangiate ferns (Fig. 1), translations across all six reading frames were searched for 4/1-like ORFs, and the longest open reading frame (ORF) were taken for further analysis. In cases when two or more partial sequences from the same species were independently assigned to the 4/1-like protein and exhibited significant sequence overlap, the sequences were combined into a single consensus sequence. Totally, fulllength, nearly complete and partial 4/1-like protein sequences were revealed in 11 species from Gymnosperms (9 species in Coniferophyta and single species in Ginkgophyta or Cycadophyta), 11 species from Lycophytes, 6 species from Eusporangiate ferns and 22 species from Leptosporangiate ferns (Table 3 and data not shown). Assuming that functionally important regions in 4/1-like proteins may also be recognized as signatures of specific domains, we attempted to define class-wise signature residues of newly retrieved 4/1 sequences in lower vascular plants according to each of the five conserved signatures of flowering plants (Fig. 2). The analysis of the whole diversity of vascular plants revealed that a general conserved pattern could be found only for the C-terminal motif V (Fig. 2) but not for four other signatures (data not shown) excepting motif I in Pinaceae species. In P. glauca and P. taeda, sequences belonging to motif I contain three conserved amino acid blocks, namely, A-T-(S/G)-D-E-E-(L/M), L-L, and F-(D/H)-(Q/R)-I at positions 4, 13, and 17, respectively. Highly conserved sequence belonging to motif V contains V-E-T-L-K, M-K-L-R-K-E-N-E, and LK-R-(K/R) at positions 224, 232, and 241, respectively (Fig. 2). Among these K-228, K-233, R-235, and K-236 are required for the RNA-binding activity of Nt-4/1 [4,9]. Generally, the motif V in all plants studied is considerably more positively charged than other motifs and the complete 4/1 protein (Table 3). Particularly, the predicted pI points of the motifs IeV in Nt-4/1 are 3.66, 6.28, 7.03, 5.02, and 9.82. Prediction of the secondary structure of 4/1 proteins revealed that the Nt-4/1 protein consists of approximately 80% a-helical regions [3]. Predictions made for 4/1-like proteins in Gymnosperms, Lycophytes and ferns revealed similar percentage of ahelical residues (Fig. 3 and data not shown). Most of the predicted a-helical regions potentially form coiled-coil (CC) domains (data not shown) as it was found in Nt-4/1 [3]. Importantly, the high level of similarity between secondary structures of all 4/1 proteins suggest that these proteins evolutionarily preserve amino acid clusters that could be involved in functionally important proteineprotein interactions [20].
S.Y. Morozov et al. / Biochimie 119 (2015) 125e136
133
Fig. 4. Pairwise sequence comparisons of the available amino acid sequence of 4/1-like protein from Chaetosphaeridium globosum and 4/1 protein of the selected flowering plants (Nicotiana tabacum and Oryza brachyantha), fern (Ceratopteris richardii) and Lycophyte Selaginella moellendorffii. BLASP was used at NCBI blast site.
The ability of Nt-4/1 protein to localize both to the nucleus and the cytoplasm as well as accumulation of the protein in the nucleus in the presence of leptomycin B implied that the protein could
possess both a nuclear localization signal (NLS) and a nuclear export signal (NES) [7]. Analysis of possible importin a-dependent NLSs in 4/1 proteins of non-flowering vascular plants using ‘cNLS
134
S.Y. Morozov et al. / Biochimie 119 (2015) 125e136
Fig. 5. The phylogenetic tree based on analysis of the aligned 4/1 proteins from land plants. Taxonomic positions of the plant species are indicated on the right. Fast Minimum Evolution tree was obtained at http://www.ncbi.nlm.nih.gov/tools/cobalt/with the use of default parameters.
Mapper’ (http://nlsmapper.iab.keio.ac.jp) predicted in most cases medium-score bipartite NLSs (Fig. 3 and data not shown). However, positions of NLSs in the individual proteins significantly varied. These observations suggested that the transport of the 4/1 protein into the nucleus could be importin a-dependent. Strikingly, we
revealed no NLS motifs with significant scores in fern 4/1 proteins (data not shown). While no canonical NES could be predicted for the Nt-4/1 protein and 4/1 proteins of other flowering plants, the algorithm based on a combination of neural networks and hidden Markov models
S.Y. Morozov et al. / Biochimie 119 (2015) 125e136
(http://www.cbs.dtu.dk/services/NetNES/) was successfully used in previous paper [7] and this study. NESs were found in all 4/1 proteins from vascular plants. However, positions of NESs in the individual 4/1 proteins could somewhat vary (Fig. 3 and data not shown). 3.6. 4/1 protein sequences in non-vascular land plants For identification of 4/1 protein orthologs among land nonvascular plants we used the same approach as it was described for lower vascular plants (see above). However, the only full-length sequence experimentally retrieved in this study was that for Marchantiopsida (M. polymorpha) (see above). Additionally, 4/1like protein sequences were revealed for a single member from class Jungermanniopsida (Porella navicularis) and four members from class Anthocerotopsida (Anthoceros agrestis, Anthoceros punctatus, Phaeoceros carolinianus and Paraphymatoceros hallii) (Table 3 and data not shown). Analysis of M. polymorpha 4/1-like protein revealed typical 4/1 sequence features described above. It contains 84.1% a-helices forming, as predicted, four anti-parallel coiled-coil domains in the C-terminal area [3,4] (Fig. 3). Unlike 4/ 1 proteins of vascular plants, M. polymorpha 4/1-like protein contains a single canonical NLS and a single NES (Fig. 3). Search for the 4/1 amino acid coding sequences in transcriptomes demonstrated that among three bryophyte divisions (Bryophyta, Anthocerotophyta, and Marchantiophyta) the 4/1-like proteins are encoded in Anthocerotophyta and Marchantiophyta only (Table 3). This observation was further supported by the fact that in the fully sequenced and annotated genome of moss Physcomitrella patensis no 4/1-like sequences were found. The absence of 4/1 in mosses suggests that 4/1-related processes in plants could be regulated differently in distinct plant lineages that may result in compensating the lack of 4/1 by alternative molecular mechanism(s). 3.7. Proteins with 4/1-like signatures in charophycean algae Finding the 4/1-coding sequences with conserved intron positions in the genomes of Anthocerotophyta and Marchantiophyta prompted us to speculate that 4/1-like sequences first appeared in charophycean algae. Indeed, transcriptome sequencing of charophytes [18,19] also revealed proteins obviously related to 4/1 polypeptides of flowering plants. Despite the lack of the full-length sequences we identified five algae species coding for the most conserved C-terminal 4/1 signature (Table 3). Moreover, pairwise BLAST comparisons strongly confirm the relatedness of the prolonged C-terminal portion of 4/1-like sequence from Chaetosphaeridium globosum (order Coleochaetales) to 4/1 proteins from seed and non-seed vascular plants (Fig. 4). Order Coleochaetales occupies a medial position in the phylogeny of charophycean algae [18,19] (Fig. 1). However, the finding that the basal charophyte Mesostigma viride (order Mesostigmatales) [21] code for a 4/1-like signature-containing polypeptide (Table 3) suggests that 4/1 genes appeared in early evolution of charophycean algae. Importantly, our search of transcriptoms and genomes from many species of Chlorophyta (green algae) (Fig. 1) showed no encoded 4/1related protein signatures (data not shown). 4. Concluding remarks In the phylogenetic tree, which is based on comparisons of 62 aligned full-length and nearly complete 4/1 protein sequences, Lycophytes, Gymnosperms, Leptosporangiate ferns and Angiosperms are clustered as the monophyletic groups (Fig. 5). Importantly, the tree topology inferred from COBALT (see Materials and
135
methods) is very similar to the dendrogram revealed by TREECON analysis (see Fig. 1 in Ref. [26]). Moreover, the phylogenetic tree, which is based on comparisons of 134 individual 4/1 proteins retains the general topological properties of the above trees (see Fig. 2 in Ref. [26]). M. polymorpha 4/1-like protein is an outgroup for vascular land plants. Amborella trichopoda sequence represents the first branching lineage in flowering plant subtree of 4/1 proteins where monocots and dicots form distinct trees (Fig. 5, see Figs. 1 and 2 in Ref. [26]). Importantly, the observations on the phylogeny based on the proteins encoded by the single-copy (rarely double-copy) 4/1 gene [4] are in a good agreement with the branching order of land plant evolution trees based on multiple proteins [19,22]. Because of their high phylogenetic potential, single-copy nuclear genes are increasingly being used in systematic studies. It should be noted that the combined data matrix of many single copy nuclear genes yielded the highly supported trees consistent with commonly accepted plant phylogenies, and individual single copy gene trees often gave similar topologies [22]. Despite exhaustive searches, no 4/1 genes could be identified in P. patensis or other mosses. Likewise, we found no evidence of 4/1 genes in potato and tomato using Sol Genomics Network [http:// www.sgn.cornell.edu/] and Genbank [4 and data not shown]. Based on the presence of a 4/1 gene in tobacco, pepper and petunia [3], we assume that the common ancestor of Solanaceae plants must have contained a 4/1 gene, which was further lost in an immediate ancestor of potato and tomato. The lack of a 4/1 gene in the above mentioned species suggests that 4/1 proteins are likely not required for a primary metabolic function. Rather, this finding points to important ecological or secondary metabolic functions for 4/1 proteins. Alternatively, there may be a functional redundancy, and the lack of 4/1 genes in such plants as Solanum sp. and mosses can be compensated by other proteins. Identification of proteins with the 4/1-like signatures in charophycean algae (see above) provide an additional support for the hypothesis that the lack of 4/1 in mosses could be a result of gene loss in the immediate ancestor of Bryophyta. Strikingly, charophycean alga encode proteins with highly conserved C-terminal 4/ 1-like signature (Table 3). Indeed, all algal proteins include terminal coiled-coil element showing absolutely conserved Lys residue in the heptad position ‘e’ and highly conserved Gln and Lys in positions ‘f’ and ‘g’ of the first heptad, respectively. Moreover, Arg and Lys are highly conserved in the positions ‘c’, ‘e’ and ‘f’, respectively, whereas E is absolutely conserved in the position ‘g’ of the second heptad. The third heptad contains absolutely conserved Asn in the position ‘a’ and highly conserved Lys/Arg in the position ‘e’ (Table 3). It can be speculated that this specific amino acid cluster could be involved in functionally important coiled-coil interactions with the evolutionarily highly conserved protein(s) [20]. What are the specific features of charophycean algae in comparison with Chlorophyta that may relate to evolving 4/1-like genes? Charophyte algae have long been recognized as the closest algal relatives of land plants based on cellular and morphological features [23]. This view has been corroborated and refined using molecular phylogenetics that have begun to clarify relationships within charophyte algae and with their sister clade, the land plants (embryophytes) [18]. It has become clear that evolving the 4/1-like genes is not directly connected to the appearance of multicellularity in land plants since not only many Chlorophyta but also some 4/1-genecoding charophytes (like, Mesostigma viride and Closterium peracerosum) are unicellular plants. Nevertheless, a number of land plant traits originated within the multicellular charophyte algae. These include apical meristems, cells that undergo asymmetric division and differentiation, complex branching patterns, and plasmodesmata that form symplastic connections between cells.
136
S.Y. Morozov et al. / Biochimie 119 (2015) 125e136
Additional cellular and biochemical traits that are streptophytespecific include a hexameric cellulose synthase complex that synthesizes the primary cell wall, the phragmoplast as a mechanism for cytokinesis, synthesis and transport of phytohormones, and several other metabolic specializations [24,25]. Funding Experimental design, database searches and phylogenetic analysis and the article writing was performed by A.S. and S.M. in Moscow State University with financial support of the Russian Science Foundation (grant 14-14-00053). Cloning was performed by T. E. and D. R. in Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry with support of the Russian Foundation for Basic Research (grant 14-04-00997a). DNA isolation, primer design and genome walking experiments were carried out in Moscow State University by A.T., V.B. and I.M. with support of the Russian Foundation for Basic Research (grant 15-04-06027). PCR product isolation was carried out in Moscow State University by A.A. with support of the Russian Foundation for Basic Research (grant 13-0401094a). S.Z. participated in manuscript preparation in ShemyakinOvchinnikov Institute of Bioorganic Chemistry with support of the Russian Foundation for Basic Research (grant 15-29-02527). Acknowledgments We thank the researchers who contributed samples used in this study to the 1KP initiative. Appendix A. Supplementary data Supplementary data related to this article can be found at http:// dx.doi.org/10.1016/j.biochi.2015.10.019. References [1] S. von Bargen, K. Salchert, M. Paape, B. Piechulla, J.-W. Kellmann, Interactions between Tomato spotted wilt virus movement protein and plant proteins showing homologies to myosin, kinesin and DnaJ-like chaperones, Plant Physiol. Biochem. 39 (2001) 1083e1093. [2] M. Paape, A.G. Solovyev, T.N. Erokhina, E.A. Minina, M.V. Schepetilnikov, D.E. Lesemann, J. Schiemann, S.Y. Morozov, J.W. Kellmann, At-4/1, an interactor of the Tomato spotted wilt virus movement protein, belongs to a new family of plant proteins capable of directed intra- and intercellular trafficking, Mol. Plant Microbe Interact. 19 (2006) 874e883. [3] S.S. Makarova, E.A. Minina, V.V. Makarov, P.I. Semenyuk, L. Kopertekh, J. Schiemann, M.V. Serebryakova, T.N. Erokhina, A.G. Solovyev, S.Y. Morozov, Orthologues of a plant-specific At-4/1 gene in the genus Nicotiana and the structural properties of bacterially expressed 4/1 protein, Biochimie 93 (2011) 1770e1778. [4] S.Y. Morozov, S.S. Makarova, T.N. Erokhina, L. Kopertekh, J. Schiemann, R.A. Owens, A.G. Solovyev, Plant 4/1 protein: potential player in intracellular, cell-to-cell and long-distance signaling, Front. Plant Sci. 5 (2014) 26, http:// dx.doi.org/10.3389/fpls.2014.00026.
[5] E.A. Minina, T.N. Erokhina, N.V. Soshnikova, A.G. Solovyev, S.Y. Morozov, Immunological detection of plant protein At-4/1 capable of interaction with viral movement proteins, Dokl. Biochem. Biophys. 411 (2006) 351e355. [6] A.G. Solovyev, S.S. Makarova, M.V. Remizowa, H.S. Lim, J. Hammond, R.A. Owens, L. Kopertekh, J. Schiemann, S.Y. Morozov, Possible role of the Nt4/1 protein in macromolecular transport in vascular tissue, Plant Signal. Behav. 8 (2013), http://dx.doi.org/10.4161/psb.25784. [7] A.G. Solovyev, E.A. Minina, S.S. Makarova, T.N. Erokhina, V.V. Makarov, €ggeler, S.Y. Morozov, I.B. Kaplan, L. Kopertekh, J. Schiemann, K.R. Richert-Po Subcellular localization and self-interaction of plant-specific Nt-4/1 protein, Biochimie 95 (2013) 1360e1370. [8] N. Matasci, L.H. Hung, Z. Yan, E.J. Carpenter, N.J. Wickett, S. Mirarab, et al., Data access for the 1,000 plants (1KP) project, Gigascience 3 (2014), http:// dx.doi.org/10.1186/2047-217X-3-17. [9] S.S. Makarova, A.G. Solovyev, S.Y. Morozov, RNA-binding properties of the plant protein Nt-4/1, Biochemistry (Mosc.) 79 (2014) 717e726. [10] B. Nystedt, N.R. Street, A. Wetterbom, A. Zuccolo, Y.-C. Lin, et al., The Norway spruce genome sequence and conifer genome evolution, Nature 497 (2013) 579e584. [11] J.L. Wegrzyn, J.D. Liechty, K.A. Stevens, L.-S. Wu, C.A. Loopstra, et al., Unique features of the loblolly pine (Pinus taeda L.) megagenome revealed through sequence annotation, Genetics 196 (2014) 891e909. [12] I. Birol, A. Raymond, S.D. Jackman, S. Pleasance, R. Coope, G.A. Taylor, et al., Assembling the 20 Gb white spruce (Picea glauca) genome from wholegenome shotgun sequencing data, Bioinformatics 29 (2013) 1492e1497. [13] A.B. Rose, Intron-mediated regulation of gene expression, Curr. Top. Microbiol. Immunol. 326 (2008) 277e290. [14] J.E. Gallegos, A.B. Rose, The enduring mystery of intron-mediated enhancement, Plant Sci. 237 (2015) 8e15. [15] G. Parra, K. Bradnam, A.B. Rose, I. Korf, Comparative and functional analysis of intron-mediated enhancement signals reveals conserved features among plants, Nucleic Acids Res. 39 (2011) 5328e5337. [16] K.M. Pryer, H. Schneider, A.R. Smith, R. Cranfill, P.G. Wolf, J.S. Hunt, S.D. Sipes, Horsetails and ferns are a monophyletic group and the closest living relatives to seed plants, Nature 409 (2001) 618e622. [17] J.A. Banks, T. Nishiyama, M. Hasebe, et al., The selaginella genome identifies genetic changes associated with the evolution of vascular plants, Science 332 (2011) 960e963. [18] R.E. Timme, T.R. Bachvaroff, C.F. Delwiche, Broad phylogenomic sampling and the sister lineage of land plants, PLoS One 7 (2012) e29696, http://dx.doi.org/ 10.1371/journal.pone.0029696. [19] N.J. Wickett, S. Mirarab, N. Nguyen, T. Warnow, E. Carpenter, N. Matasci, S. Ayyampalayam, et al., Phylotranscriptomic analysis of the origin and early diversification of land plants, Proc. Natl. Acad. Sci. U. S. A. 111 (2014) E4859eE4868. [20] D.E. Gordon, M. Mirza, D.A. Sahlender, J. Jakovleska, A.A. Peden, Coiled-coil interactions are required for post-Golgi R-SNARE trafficking, EMBO Rep. 10 (2009) 851e856. taz, Multigene phylogeny of the [21] C. Finet, R.E. Timme, C.F. Delwiche, F. Marle green lineage reveals the origin and diversification of land plants, Curr. Biol. 20 (2010) 2217e2222. [22] J.M. Duarte, P.K. Wall, P.P. Edger, L.L. Landherr, H. Ma, J.C. Pires, J. LeebensMack, C.W. dePamphilis, Identification of shared single copy nuclear genes in Arabidopsis, Populus, Vitis and Oryza and their phylogenetic utility across various taxonomic levels, BMC Evol. Biol. 10 (2010) 61, http://dx.doi.org/ 10.1186/1471-2148-10-61. [23] K.D. Stewart, K.R. Mattox, Some aspects of mitosis in primitive green algae: phylogeny and function, Biosystems 7 (1975) 310e315. [24] L.E. Graham, M.E. Cook, J.S. Busse, The origin of plants: body plan changes contributing to a major evolutionary radiation, Proc. Natl. Acad. Sci. U. S. A. 97 (2000) 4535e4540. [25] B. Becker, B. Marin, Streptophyte algae and the origin of embryophytes, Ann. Bot. 103 (2009) 999e1004. [26] S.Y. Morozov, A.G. Solovyev, A.V. Troitsky, Phylogeny of the plant 4/1 proteins, Data Brief (2015) (submitted for publication).