Accepted Manuscript Phylogenetic analysis of plant calreticulin homologs Piotr Wasąg, Tomasz Grajkowski, Anna Suwińska, Marta Lenartowska, Robert Lenartowski PII: DOI: Reference:
S1055-7903(18)30276-8 https://doi.org/10.1016/j.ympev.2019.01.014 YMPEV 6402
To appear in:
Molecular Phylogenetics and Evolution
Received Date: Revised Date: Accepted Date:
30 April 2018 17 January 2019 18 January 2019
Please cite this article as: Wasąg, P., Grajkowski, T., Suwińska, A., Lenartowska, M., Lenartowski, R., Phylogenetic analysis of plant calreticulin homologs, Molecular Phylogenetics and Evolution (2019), doi: https://doi.org/10.1016/ j.ympev.2019.01.014
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Phylogenetic analysis of plant calreticulin homologs Piotr Wasąga, Tomasz Grajkowskib, Anna Suwińskaa, Marta Lenartowskaa, Robert Lenartowskib* a
Laboratory of Developmental Biology, Department of Cellular and Molecular Biology,
Faculty of Biology and Environmental Protection, Nicolaus Copernicus University in Toruń, Lwowska 1, 87-100 Toruń, Poland b
Laboratory of Molecular and Isotope Methods, Department of Cellular and Molecular
Biology, Faculty of Biology and Environmental Protection, Nicolaus Copernicus University in Toruń, Lwowska 1, 87-100 Toruń, Poland *e-mail:
[email protected]
Abstract
Calreticulin (CRT) is a multifunctional resident endoplasmic reticulum (ER) luminal protein implicated in regulating a variety of cellular processes, including Ca2+ storage/mobilization and protein folding. These multiple functions may be carried out by different CRT genes and protein isoforms. The plant CRT family consist of three genes: CRT1 and CRT2 classified in the common subclass (CRT1/2), and CRT3. These genes are highly conserved during evolution end encode three different protein products (CRT1, 2 and 3). The aim of the current study was to conduct a comparative analysis and sequence-based classification of the plant CRT genes. We used nucleotide and amino acid sequences to phylogenetically cluster the genes and examine potential glycosylation patterns. Additionally, we analyzed phylogenetic relationships within the CRT subclasses. Finally, we analyzed intraspecific CRT duplication events among mono- and dicotyledon species. Our results confirm that each of the CRT genes exist in multiple copies in plant genomes, and that CRT gene duplication is a widespread process in plants.
Keywords
CRT1/2, CRT3, molecular phylogeny, duplication, monocotyledons, dicotyledons
Abbreviations
1
APG
Angiosperm Phylogeny Group
Ca2+
Calcium ions
cDNA
Complementary DNA
CRT
Calreticulin gene
CRT
Calreticulin protein
ER
Endoplasmic reticulum
KEGG
Kyoto Encyclopedia of Genes and Genomes
NCBI
National Center for Biotechnology Information
UTR
Untranslated Region in mRNA
1. Introduction CRT is a evolutionarily conserved protein localized mainly in the ER. Although CRT is primarily known to regulate Ca2+ homeostasis and act as a chaperone, it also plays roles in many intra- and extracellular processes (Jia et al., 2009). The multifunctionality of CRT could be due to the existence of multiple genes. Two CRT genes, CRT1 and CRT2, have been identified in animals (Persson et al., 2002). In contrast, the plant CRT gene family comprises at least three members in two subclasses: CRT1/CRT2 (also known as CRT1a/CRT1b) and CRT3 (; Persson et al., 2003; Jia et al., 2008). CRT1/CRT2 genes correspond with the animal CRTs, whereas CRT3 is plant-specific (Jia et al., 2009). Phylogenetic analysis of nucleotide sequences revealed that CRT3 genes are the most highly conserved and are most closely related to the ancestral gene in plants (Del Bem, 2011). In contrast, CRT1 and CRT2 are similar to each other and evolutionarily diverged between monocots and dicots. Therefore, CRT1 and CRT2 are paralogs of one another. In an evolutionary context, two duplication events of the CRT gene in plants took place at different times, wherein early duplication generated two distinct CRT subgroups: CRT1/CRT2 and the CRT3 (Persson et al., 2003). The CRT sequence length and number of exons are gene- and species-specific (Jia et al., 2009). For example, Zea mays and Oryza sativa CRTs have 14 exons, as does Arabidopsis thaliana CRT3. In contrast, Arabidopsis thaliana CRT1 contains 12 exons and CRT2 contains 13 exons due to fusions within exons 4 through 6. Despite these differences, the overall sizes of exons are conserved except exons 1, 11, 12, and 13 (Persson et al., 2003). The molecular structure of the CRT protein is similar in animals and plants (Michalak et al., 2009; Jia et al., 2009). All plant CRTs identified so far contain three distinct domains (N, P, and C). The globular N-domain, the longest and most conserved region of the protein, 2
contains a hydrophobic ER signal peptide sequence followed by two evolutionarily conserved CRT signature motifs, KHEQKLDCGGGYVKLL and IMFGPDICG. Proper folding of the N-terminal region depends on three cysteine residues that form a disulphide bridge. The Pdomain, which is enriched in proline, serine, and threonine residues, contains a putative nuclear targeting sequence (PPKXIKDPX) and two characteristic triplicate motifs called A (PXXIXDPXXKKPEXWDD) and B (GXWXAXXIXNPXYK). These motifs seem to be critical for the CRT lectin-like chaperone activity (Persson et al., 2003; Jia et al., 2008; An et al., 2011). A high percentage of acidic amino acids in the P domain are required for highaffinity but low-capacity Ca2+ binding (Jia et al., 2009). The polyacidic C-domain is the most variable region of CRT. It contains negatively charged residues that participate in low-affinity but high-capacity binding Ca2+ and a typical ER-retention signal (mostly HDEL in plants and KDEL in animals) (Persson et al., 2002, 2003). Unlike animal CRTs, the plant proteins are commonly glycosylated and phosphorylated in the N- and C-domains (Li and Komatsu, 2000; Persson et al., 2003). The most common sites of glycosylation in plants are near 50 and 60 aa in CRT1/2, and near amino acid 96 in CRT3 (Jia et al., 2009). Three putative glycosylation sites were identified in Arabidopsis thaliana CRT1, whereas the other CRT isoforms in this species contain only one glycosylation site. These results indicate that plant CRTs exhibit species-specific glycosylation patterns (Persson et al., 2003; Christensen et al., 2010). Here, our objective was to use sequences in online databases to determine the evolutionary relationships between the nucleotide and predicted amino acid sequences of the plant CRT genes. In this analysis, we paid particular attention to the CRT duplication events across plant species. Additionally, we used phylogenetic analysis and glycosylation patterns to assign the available CRT sequences to subclasses.
2. Materials and methods
The nucleotide and predicted amino acid sequences of plant CRTs were obtained from the National Center for Biotechnology Information (NCBI; http://www.ncbi.nlm.nih.gov) and Kyoto Encyclopedia of Genes and Genomes (KEGG; http://www.genome.jp/kegg) databases. Sequences were identified relying on the annotation as a first step and then Arabidopsis thaliana CRT was used to BLAST homology searching. The cutoff score for all cDNA sequences identified as the CRT were open reading frame and both 5' and 3' untranslated regions (UTR). Amino acid sequences were predicted from cDNAs and began with a 3
methionine residue at their N-termini and ended with an HDEL motif. ClustalW was used to align the nucleotide and amino acid sequences (http://www.genome.jp/tools/clustalW), and then a rooted guide tree was constructed (data not shown). The phylogenetic trees were built with MEGA software (version 6; http://www.megasoftware.net; Tamura et al., 2013) using the Neighbor-Joining method with Maximum Composite Likelihood and Poisson models for nucleotide and amino acid sequences, respectively. In both cases, bootstrap analysis (1000 replications) was performed with a 70% bootstrap support value. Additionally, trees were rooted to the Selaginella moellendorffii CRT sequence (GenBank accession number XM_002968285.2). The predicted amino acid sequences were analyzed by using an Nglycosylation consensus sequence NXS/T, where X could be any amino acid except for asparagine or proline (Liu and Howell, 2010). The Angiosperm Phylogeny Group (APG) III classification was used to compare relationships between CRT gene trees and species trees (http://www.mobot.org/MOBOT/research/APweb; Chase and Reveal, 2009).
3. Results and Discussion
In searching the NCBI and KEGG databases, we identified a total of 200 cDNAs, including defined and unclassified CRT homologs (CRT1, CRT2, CRT3), and defined and unclassified sequences similar to CRT (e.g. CRT1-like). All of the genes have between 12 and 16 exons, and four sequences encode the KDEL retention signal, which is characteristic of animal CRT genes (Table 1). Our first step was to match unclassified CRT and CRT-like sequences into one of the previously described subclasses (CRT1/2 or CRT3). Among the 35 cDNAs from monocotyledons, 3 were annotated as members of the CRT1/2 subclass, 5 were CRT3, and 7 were CRT3-like. Twenty cDNAs were not annotated as a specific CRT genes, including 7 reported as CRT and 13 as CRT-like. For dicotyledons, 15 out of the 165 analyzed sequences were annotated as CRT1/2, 13 were annotated as being CRT1/2-like, 30 as CRT3, and 47 as CRT3-like. Among the other 60 sequences, 50 were defined as CRT and 10 as CRT-like. As we expected, 80 unclassified cDNA sequences of both mono- and dicotyledons clearly clustered into the CRT1/2 and CRT3 subclasses (Figs. S1 and S2). Moreover, results of the preliminary nucleotide and amino acid sequence alignments were similar (data not shown). For monocotyledons, our cluster analysis assigned 17 previously non-categorized amino acid sequences to the CRT1/2 subclass, and 3 to the CRT3 subclass. For dicotyledons,
4
our cluster analysis assigned 49 sequences to the CRT1/2 subgroup and 11 to the CRT3 subgroup (Figs. 1 and 2). Given that localization of the N-glycosylation sites within amino acid sequences is correlated with the CRT/CRT gene/isoform (near position 50–60 aa for CRT1/2 and near position 96 aa for CRT3; Jia et al., 2009), we examined the location of potential glycosylation sites to confirm the results of our preliminary cluster analysis. We were unable to define the CRT/CRT gene/protein subclass in 4 monocotyledon (Brachypodium distachyon, Oryza sativa, Sorghum bicolor, Zea mays) and 11 dicotyledon (Brassica napus, Brassica oleracea, Glycine max – 2 seq., Phaseolus vulgaris, Vigna angularis, Vigna radiata, Sesamum indicum, Jatropha curcas, Eucalyptus grandis – 2 seq.; Table 1) sequences because of a lack of an Nglycosylation consensus sequence or the presence of potential glycosylation sites outside of the regions we had defined for classification. However, glycosylation site analysis of the remaining amino acid sequences confirmed the CRT genes assignments we had made by phylogenetic clustering. Consistent with others' findings, that three potential glycosylation sites were found in the CRT1 sequence but only one in the CRT2 and CRT3 sequences, respectively (Persson et al., 2003; Christensen et al., 2010), we also identified additional potential glycosylation sites within other of the CRT domains (Table 1). After classifying the sequences into subclasses depending on their similarity degree, we further analyzed each CRT/CRT subclass separately. Among monocotyledon CRT1/2 cDNAs, the highest sequence identity were between Zea mays1_Z46772.1 and Zea mays2_NM_001309255.1 (97.62%) and Zea mays_AF190454.1 (97.17%). The first two also showed the highest level of amino acid identity (97.86%). Furthermore, amino acid sequence of
Setaria
italica_XM_004987369.2
was
97.38%
identical
with
both
Zea
mays2_NM_001309255.1 and Zea mays1_Z46772.1. The lowest degrees of identity were found in comparing nucleotide sequences of Elaeis guineensisL_XM_010923203.1 with Zea mays2_NM_001309255.1 (54.32%) and
Oryza sativaL_XM_015773252.1 (54.39%).
However, in alignments of deduced amino acid sequences, the lowest identity was between Brachypodium distachyon_XM_003563118.3 and Elaeis guineensisL_XM_010923203.1 (64.82%). Among 15 monocotyledon CRT3 cDNAs, the highest identity was between Phoenix dactylifera3L_XM_008792368.2 and Elaeis guineensis3_XM_010929591.1 (89.1%) while the
highest
identity
in
predicted
amino
acid
sequences
was
between
Zea
mays_XM_001174987.1 and Sorghum bicolor_XM_002439993.1 (97.86%). The lowest level of nucleotide identity was between Phoenix dactylifera3L_XM_008792368.2 and Sorghum 5
bicolor_XM_002439993.1 (57.87%). The lowest levels of deduced amino acid sequence identifies were
found in comparisons of Musa acuminata3L_XM_009385531.1 with
Brachypodium distachyon3L_XM_003566015.3 and Sorghum bicolor_XM_002439993.1 (both 75.36%). Among dicotyledon, the nucleotide CRT1/2 and amino acid CRT1/2 sequences of Sesamum indicum_XM_011093838.1 and Sesamum indicum_XM_011071083.1 were 100% identical. Three additional pairs of amino acid sequences were also 100% identical: Brassica napus1_XM_013832261.1 mume_XM_008247679.2
and and
Brassica Prunus
oleracea1_XM_013755944.1, mume_XM_008247687.2,
and
Prunus Solanum
lycopersicum_XM_004230251.2 and Solanum pennellii_XM_015200012.1. We noted the lowest
degree of identity for
nucleotide
sequences when comparing
Camelina
sativa1L_XM_010420638.1 with Nicotiana tomentosiformis_XM_009631851.1 (55.97%) and for
amino
acid
sequences
Camelina
sativa1L_XM_010420638.1
with
Brassica
napus2L_NM_001316226.1 (55.8%). Among the dicotyledon CRT3/CRT3s, two groups of nucleotide and amino acid sequences
were
100%
identical:
Pyrus
bretschneideri3L_XM_009363311.1,
Pyrus
bretschneideri3L_XM_009342088.1, and Pyrus bretschneideri3L_XM_009342102.1; and Brassica napus3_XM_013823232.1 and Brassica napus3_XM_013823267.1. These two Brassica napus amino acid sequences were also 100% identical with Brassica oleracea3_XM_013750040.1, and the Citrus sinensis3_XM_006478683.2 and Citrus clementina_XM_006442909.1 amino acid sequences were 100% identical. We noted the lowest nucleotide sequence identities when comparing Theobroma cacao3_XM_007029130.1 with Theobroma cacao3_XM_007033822.1 (46.93%) and the lowest amino acids identity when
comparing
Pyrus
bretschneideri3L_XM_009363311.1,
Pyrus
bretschneideri3L_XM_009342088.1, and Pyrus bretschneideri3L_XM_009342102.1 to Brassica napus3_XM_013823232.1, Brassica napus3_XM_013823267.1, and Brassica oleracea3_XM_013750040.1 (68.87%). We next used our multiple nucleotide and amino acid sequence alignments to determine evolutionary relationships within the CRT subclasses. Among the CRT1/2 subclass, we observed that amino acid and nucleotide sequences belonging to the Poales order aligned differently within the group due to phylogenetic relationship except paraphyletic group (C) (Figs. 1 and S1, respectively). In contrast, the other monocotyledon amino acid and nucleotide sequences consistently clustered into clades XV, XVI, and XVII. We also noted
6
that clades XVI-XVII (amino acid analysis) and XVI (nucleotide analysis) represent outgroups relative to all others (Figs. 1 and S1). We performed similar analysis of dicotyledon CRTs. Among CRT1/2 subclass we identified 14 monophyletic groups (I – XIV) and two paraphyletic group (A and B), and these assignments were identical when comparing both amino acid and nucleotide sequences (Figs. 1 and S1, respectively). Additionally, obtained cladograms showed different clustering of predicted amino acid sequences which, at the nucleotide level, were clustered together into the monophyletic groups. At the nucleotide level, Myrtales sequences formed an outgroup relative to the other dicotyledon orders (Fig. S1). Moreover, amino acid and nucleotide sequences of Vitis vinifera (Vitales) fall into different clades and one of them clusters with Ziziphus jujuba (Rosales) (Figs. 1 and S1, respectively). In reference to CRT3 subclass, amino acid and cDNA sequences from the Poales order formed two monophyletic groups XX and XXI (Figs. 2 and S2, respectively). Analysis of connections between CRT3 sequences indicated that Poales is a sister group to all other monocotyledon (Fig. S2). In contrary, amino acid sequences from the Arecales and Zingiberales orders appear to form outgroups relative to Poales. Our comparison of dicotyledon CRT3/CRT3 amino acidand nucleotide sequences revealed 17 constant clades (I – XVII) and paraphyletic group (A). Paraphyletic group (A) contains sequences Arachis ipaensis, Arachis duranensis and Beta vulgaris from the Fabales and Malpighiales orders, respectively (Figs. 2 and S2, respectively). We note that whereas sequences from the orders Brassicales, Cucurbitales, Lamiales, Proteales, and Solanales were all assigned to only one clade each, sequences from species belonging to the other orders were assigned to two different clades. Moreover, all amino acid sequences from the monocotyledon are the outgroup to the dicotyledon clade in contrast to cDNA’s (Figs. 2 and S2). The differences we note between the nucleotide and amino acid alignments likely reflect the lower variability of amino acid sequences, which results from degeneracy of the genetic code and silent mutations. Additionally, the larger number of variables (20 amino acids vs. 4 nucleotides) means that the probability of a match by chance is lower for amino acid sequences than for nucleotide sequences (Michu, 2007). Finally, heterogeneity of nucleotide composition can lead to grouping together unrelated clades that have similar nucleotide frequencies (Anup, 2015). Therefore, some authors suggest that amino acid sequences are more appropriate for constructing trees for distantly related species (Russo et al., 1996,Michu, 2007).
7
To compare the CRT gene trees and the species trees, we used the APG III classification (http://www.mobot.org/MOBOT/research/APweb; Chase and Reveal, 2009). In contrast to the APG III system, our analysis of monocotyledon could not confirm that Poales and Zingiberales orders are more closely related to each other than to Arecales. Obtained results showed that only the CRT1/2 amino acid sequences of Zingiberales and Arcales preferentially clustered together, separately from those of Poales (Fig. 1). We also noted discrepancies between CRT sequence assignments and APG III classifications of dicotyledons. Additionally, the trees generated for the CRT1/2 homologs (Fig. 1) differed from those for the CRT3 homologs (Fig. 2). Generally, the APG III classification distinguishes two large monophyletic groups (rosids and asterids) containing most of the dicotyledons and few phylogenetically older taxa besides them as the sister clades, e.g. Proteales and Vitales orders. Our phylograms based on amino acid sequences didn’t reflect these relationships. Instead, we observed no clear phylogenetic correlations and noted wide separation of sequences belonging to common species or orders that should cluster together. For example, sequences from Beta vulgaris were clustered into two separate clades and were separated from other asterids (Fig. 2). Unfortunately, we could not unambiguously determine the reasons for these discrepancies, though it is likely that genes, and even regions within genes, evolve at different rates (Anup, 2015). Additionally, lineage sorting, horizontal gene transfer, and hidden paralogy may contribute to these discrepancies (Anup, 2015). CRT evolution has clearly involved cases of gene duplication and the existence of paralogs. An et al., (2011) described numerous CRT homologs in human, rat, rice, corn, and Arabidopsis thaliana genomes. Person et al. (2003) revealed that all CRT genes are located on chromosome 1 in Arabidopsis thaliana; the CRT1 and CRT2 sequences were found in a region that was duplicated but the CRT3 gene locus in a region without any major duplication activity. Our phylogenetic analysis of plant CRT homologs clearly showed that analogous duplication events took place in many other dico- and monotyledons. However, the number of CRT genes have been multiplied in all other plant species, including the intra- and interchromosomal duplication evens. Generally, plant genomes contain three CRT homologs that are classified into CRT1/2 and CRT3 subgroups, wherein sequence homology suggests that CRT1/2 are similar to each other and correspond with the animal CRT1 and the plant-specific CRT3 genes are more highly conserved across different plant species (Persson et al., 2003;Jia et al., 2009; Thelin et al., 2011). Different CRTs exhibit distinct expression patterns, post-transcriptional, and posttranslational modifications, indicating that they have diverse roles in plants. The CRT1/2 8
family members appear to work as primary proteins within a general ER chaperone network related to regulation of Ca2+ homeostasis. Instead, CRT3 does not have high-capacity Ca2+binding ability (Qiu et al., 2012) and CRT3 gene seems to be co-expressed with pathogen- and signal transduction-related genes, suggesting functional specialization (Jin et al., 2009; Li et al., 2009; Saijo et al., 2009; Christensen et al., 2010). Unfortunately, very limited data are available concerning the preferential expression of different CRT isoforms. Thus, further investigation of the diversity of plant CRT family members is needed to address the structurefunction relationships among plant CRTs. As mentioned in the introduction, plant CRT genes are thought to have arisen as a result of at least two duplication events (Persson et al., 2003). For this reason, and because about 72% of duplicate gene copies are not subject to sequence elimination ( Lawton-Rauh, 2003), we decided to analyze gene duplication among plant CRT genes. To do so, we used the Graphical Sequence Viewer (NCBI software) to analyze chromosomal location of CRT genes. We identified the locations of 197 duplicated CRTs in mono- and dicotyledons. In monocotyledons, we identified loci of 33 out of 35 genes and chromosomal locations for 25 CRTs (Table 1). Zea mays and Phoenix dactylifera contained the largest number of duplicated CRTs: 4 CRT1/2 and 2 CRT3 genes. Most of duplicated genes were on different chromosomes except the Brachypodium distachyon and Musa accuminata genomes. In dicotyledons, we identified loci of 164 out of 165 genes and chromosomal locations for 107 Brassica napus contained the most CRT genes: six in each of the CRT subclasses. Camelina sativa had 6 CRT1/2 gene duplicates and 3 CRT3 duplicates. In dicotyledons, duplicated CRTs were located only at single chromosome (Table 1, e.g. Arabidopsis thaliana, chromosome 1) as well as at different chromosomes (Table 1, e.g. Brassica napus). Taken together, our analysis indicates that CRT gene duplication is widespread in the plant kingdom. Thus, it is striking that we identified only a single copy of each CRT gene in the Arabidopsis thaliana genome, especially given that the Arabidopsis thaliana genome shows evidence of two to four independent duplication events that duplicated 90% of loci (Lawton-Rauh, 2003; Moore and Purugganan, 2005). These duplications are probably reflected by the presence of a putative CRT pseudogene containing four potential exons (Persson et al., 2003). As mentioned by Lehti-Shiu et al., (2017), pseudogenization may cause duplicate genes to be lost. It should be noted that CRT expression is essential for proper embryo development and ontogenesis in animals. For example, homozygous CRT knockout mice are embryonic lethal due to cardiovascular and brain defects (Masaeli et al., 1999; Rauch et al., 2000). Similarly, insertion of a P-element (potpS114307) that disrupted the Drosophila melanogaster calreticulin gene 9
(Crc) caused loss of neurons, disorganization of the peripheral nervous system, and neuronal pathfinding defects during embryogenesis (Salzberg et al., 1997; Prokopenko et al., 2000). In plants, a crucial role for CRT in growth and development is still under debate. We found that post-transcriptional silencing of Petunia hybrida CRT1/2 expression strongly impairs pollen tube elongation that eliminates the sexual reproduction process (Suwińska et al., 2017). However, the latest paper by Vu et al. (2017) indicate that Arabidopsis thaliana triple mutant of CRT1/2/3 grows and develops almost normally while CNX1 gene expression is strictly required for normal pollen development and pollen tube growth. Nevertheless, these authors suggest that CRT and calnexin chaperones complex relationship in the ER is crucial for generative reproduction of this plant. In contrast, Wakasa et al. (2018) clearly showed, that transcriptional gene silencing of endogenous CNX genes did not have any impact for obtaining viable progeny in rice. These findings could reflect evolutionary differences between plants and animals. Alternatively, calnexin family proteins could compensate for loss of CRT expression in plants. Finally, it's possible that some Arabidopsis thaliana CRT genes have not been identified, which would be consistent with our observation that CRT gene duplication is common in plant genomes. In conclusion, we confirmed that multiple CRT gene duplication is common among plant genomes. However, such duplications have not occurred in the Arabidopsis thaliana genome because three CRT homologs (CRT1, CRT2, and CRT3) are present only in one copy. If the Arabidopsis thaliana genome doesn't contain unidentified CRT duplications, then we should carefully consider whether Arabidopsis thaliana should be used as a universal model in studies of the roles of CRT genes in plants.
Acknowledgements The authors thank Michał Opas (University of Toronto, CA), Marek Michalak (University of Alberta, CA), and Deborah J. Frank (Washington University in St. Louis, US) for critical reading of the manuscript. This work was supported by statutory funds from Ministry of Science and Higher Education (PL) for the research program of the Laboratory of Molecular and Isotope Methods (Department of Cellular and Molecular Biology, Nicolaus Copernicus University in Torun, PL).
Conflict of interest
10
The authors declare that they have no conflict of interest.
References
An, Y.Q., Lin, R.M., Wang, F.T., Feng, J., Xu, Y.F., Xu, S.C., 2011. Molecular cloning of a new wheat calreticulin gene TaCRT1 and expression analysis in plant defense responses and abiotic stress resistance. Genet. Mol. Res. 10, 3576-3585. Anup, S., 2015. Causes, consequences and solutions of phylogenetic incongruence. Brief. Bioinform. 16, 536-548. Chase, M.W., Reveal, J.L., 2009. A phylogenetic classification of the land plants to accompany APG III. Bot. J. Linn. Soc. 161, 122–127. Christensen, A., Svensson, K., Thelin, L., Zhang, W., Tintor, N., Prins, D., Funke, N., Michalak, M., Schulze-Lefert, P., Saijo, Y., Sommarin, M., Widell, S., Persson, S., 2010. Higher plant calreticulins have acquired specialized functions in Arabidopsis. PLoS One 5, e11342. Del Bem, L.E., 2011. The evolutionary history of calreticulin and calnexin genes in green plants. Genetica 139, 255-259. Jia, X.Y., He, L.H., Jing, R.L., Li, R.Z., 2009. Calreticulin: conserved protein and diverse functions in plants. Physiol. Plant. 136, 127-138. Jia, X.Y., Xu, C.Y., Jing, R.L., Li, R.Z., Mao, X.G., Wang, J.P., Chang, X.P., 2008. Molecular cloning and characterization of wheat calreticulin (CRT) gene involved in drought-stressed responses. J. Exp. Bot. 59, 739-751. Jin, H., Hong, Z., Su, W., Li, J., 2009. A plant-specific calreticulin is a key retention factor for a defective brassinosteroid receptor in the endoplasmic reticulum. Proc. Natl. Acad. Sci. USA 106, 13612-13617. Lawton-Rauh, A., 2003. Evolutionary dynamics of duplicated genes in plants. Mol. Phylogenet. Evol. 29, 396-409. Lehti-Shiu, M.D., Panchy, N., Wang, P., Uygun, S., Shiu, S.H., 2017. Diversity, expansion, and evolutionary novelty of plant DNA-binding transcription factor families. Biochim. Biophys. Acta. 1860, 3-20. Li, J., Zhao, H.C., Batoux, M., Nekrasov, V., Roux, M., Chinchilla, D., Zipfel, C., Jones, J.D., 2009. Specific ER quality control components required for biogenesis of the plant innate immune receptor EFR. Proc. Natl. Acad. Sci. USA 106, 15973-15978.
11
Li, Z., Komatsu, S., 2000. Molecular cloning and characterization of calreticulin, a calciumbinding protein involved in the regeneration of rice cultured suspension cells. Eur. J. Biochem. 267, 737-745. Liu, J.X., Howell, S.H., 2010. Endoplasmic reticulum protein quality control and its relationship to environmental stress responses in plants. Plant Cell 22, 2930-2942. Mesaeli, N., Nakamura, K., Zvaritch, E., Dickie, P., Dziak, E., Krause, K.H., Opas, M., MacLennan, D.H., Michalak, M., 1999. Calreticulin is essential for cardiac development. J. Cell. Biol. 144, 857-868. Michalak, M., Groenedyk, J., Szabo, E., Gold, L.I., Opas, M., 2009. Calreticulin, a multiprocess calcium-buffering chaperone of the endoplasmic reticulin. Biochem. J. 417, 651-666. Michu, E., 2007. A short guide to phylogeny reconstruction. Plant Soil Environ. 53, 442-446. Moore, R.C., Purugganan, M.D., 2005. The evolutionary dynamics of plant duplicate genes. Curr. Opin. Plant Biol. 8, 122-128. Persson, S., Rosenquist, M., Sommarin, M., 2002. Identification of a novel calreticulin isoform (Crt2) in human and mouse. Gene 297, 151-158. Persson, S., Rosenquist, M., Svensson, K., Galvão, R., Boss, W.F., Sommarin, M. 2003. Phylogenetic analyses and expression studies reveal two distinct groups of calreticulin isoforms in higher plants. Plant Physiol. 133, 1385-1396. Prokopenko, S.N., He, Y., Lu, Y., Bellen, H.J., 2000. Mutations affecting the development of the peripheral nervous system in Drosophila: a molecular screen for novel proteins. Genetics 156, 1691-1715. Qiu, Y., Xi, J., Du, L., Roje, S., Poovaiah, B.W., 2012. A dual regulatory role of Arabidopsis calreticulin-2 in plant innate immunity. Plant J. 69, 489-500. Rauch, F., Prud'homme, J., Arabian, A., Dedhar, S., St-Arnaud, R. 2000. Heart, Brain, and Body Wall Defects in Mice Lacking Calreticulin. Exp. Cell Res. 256, 105-111. Russo, C.A., Takezaki, N., Nei, M., 1996. Efficiencies of different genes and different treebuilding methods in recovering a known vertebratephylogeny. Mol. Biol. Evol. 13, 525-536. Saijo, Y., Tintor, N., Lu, X., Rauf, P., Pajerowska-Mukhtar, K., Häweker, H., Dong, X., Robatzek, S., Schulze-Lefert, P., 2009. Receptor quality control in the endoplasmic reticulum for plant innate immunity. EMBO J. 28, 3439-3449. Salzberg, A., Prokopenko, S.N., He, Y., Tsai, P., Pál, M., Maróy, P., Glover, D.M., Deák, P., Bellen, H.J., 1997. P-element insertion alleles of essential genes on the third 12
chromosome of Drosophila melanogaster: mutations affecting embryonic PNS development. Genetics 147, 1723-1741. Suwińska, A., Wasąg, P., Zakrzewski, P., Lenartowska, M., Lenartowski, R., 2017. Calreticulin is required for calcium homeostasis and proper pollen tube tip growth in Petunia. Planta 245, 909-926. Tamura, K., Stecher, G., Peterson, D., Filipski, A., Kumar, S., 2013. MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Mol. Biol. Evol. 30, 2725-2729. Thelin, L., Mutwil, M., Sommarin, M., Persson, S., 2011. Diverging functions among calreticulin isoforms in higher plants. Plant Signal. Behav. 6, 905-910. Vu, K.V., Nguyen, N.T., Jeong, C.Y., Lee, Y.H., Lee, H., Hong, S.W., 2017. Systematic deletion of the ER lectin chaperone genes reveals their roles in vegetative growth and male gametophyte development in Arabidopsis. Plant J. 89, 972-983. Wakasa, Y., Kawakatsu, T., Harada, T., Takaiwa, F., 2018. Transgene-independent heredity of RdDM-mediated transcriptional gene silencing of endogenous genes in rice. Biotechnol. J. 16, 2007-2015.
Figure legends
Figure 1 Phylogenetic tree of monocotyledon and dicotyledon CRT1/2 amino acid sequences. The tree was constructed by using the Neighbor-Joining method with Maximum Composite Likelihood correction model and included 97 amino acid sequences obtained from the NCBI and KEGG databases NCBI accession number of each sequence is shown after an underscore. A number or ‘L’ (like) in the species name describes a specific CRT homolog or CRT homolog-like sequence. The lack of numbers or letters indicate unclassified CRT homologs. The Selaginella moellendorffii CRT sequence was used as an outgroup (GenBank accession number XM_002968285.2). Accession numbers for the predicted amino acid sequences were identical to those of the corresponding cDNAs. The numbers at nodes indicate percentage levels of bootstrap support based on 1000 replicates. No values are given for groups with bootstrap values less than 70%. The colored circles/squares indicate plant orders as indicated in the key.
Figure 2 Phylogenetic tree of monocotyledons and dicotyledon CRT3 amino acid sequences. The tree was constructed by using the Neighbor-Joining method with Maximum Composite Likelihood correction model and included 103 amino acid sequences obtained 13
from the NCBI and KEGG databases NCBI accession number of each sequence is shown after an underscore. A number or ‘L’ (like) in the species name describes a specific CRT homolog or CRT homolog-like sequence. The lack of numbers or letters indicate unclassified CRT homologs. The Selaginella moellendorffii CRT sequence was used as an outgroup (GenBank accession number XM_002968285.2). Accession numbers for the predicted amino acid sequences were identical to those of the corresponding cDNAs. The numbers at nodes indicate percentage levels of bootstrap support based on 1000 replicates. No values are given for groups with bootstrap values less than 70%. The colored circles/squares indicate plant orders as indicated in the key.
Supplementary Figure 1 Comparative phylogenetic analysis of monocotyledon and dicotyledon CRT1/2 nucleotide sequences. The tree was constructed by using the NeighborJoining method with Maximum Composite Likelihood correction model and included 97 cDNAs obtained from the NCBI and KEGG databases. The numbers at nodes indicate percentage levels of bootstrap support (1000 replicates) more than 70%. The colored circles/squares indicate plant orders as indicated in the key.
Supplementary Figure 2 Comparative phylogenetic analysis of monocotyledon and dicotyledon CRT3 cDNAs. The tree was constructed by using the Neighbor-Joining method with Maximum Composite Likelihood correction model and included 103 cDNAs obtained from the NCBI and KEGG databases. The numbers at nodes indicate percentage levels of bootstrap support (1000 replicates) more than 70%. The colored circles/squares indicate plant orders as indicated in the key.
Table 1 CRT genes from monocotyledons and dicotyledons.
14
Camelina sativa1L XM 010416676.1 Camelina sativa1L XM 010512970.1 Camelina sativa1L XM 010512972.1 Camelina sativa1L XM 010512969.1 Arabidopsis thaliana1 NM 104513.4 Camelina sativa1 XM 010481938.1 Camelina sativa1L XM 010420638.1
I
Brassica rapa1 XM 009124734.1 Brassica oleracea1L XM 013741554.1 Brassica napus1 XM 013832261.1 Brassica oleracea1 XM 013755944.1
A
Brassica rapa1L XM 009115141.1 Brassica napus1L XM 013826395.1 Eutrema salsugineum XM 006392374.1 Brassica napus2L NM 001316226.1 Arabidopsis lyrata2 XM 002892446.1 Arabidopsis thaliana2 NM 100791.3 Brassica rapa2L XM 009149924.1 Brassica napus2 XM 013785569.1
II
Brassica oleracea2L XM 013730198.1 Eutrema salsugineum XM 006417533.1 Brassica rapa2 XM 009112594.1 Brassica oleracea2 XM 013744509.1 Brassica napus2L XM 013895419.1 Brassica napus2 XM 013802231.1 Tarenaya hassleriana1L XM 010555420.1 Tarenaya hassleriana1 XM 010551233.1 Tarenaya hassleriana XM 010545897.1 Vitis vinifera XM 002270318.3 Citrus sinensis XM 006472123.1 Citrus clementina XM 006433460.1
III
Gossypium raimondii XM 012607467.1 Gossypium raimondiiL XM 012607469.1 Gossypium raimondiiL XM 012608554.1
IV
Gossypium raimondiiL XM 012578814.1 Theobroma cacao2 XM 007031074.1 Nelumbo nucifera XM 010269711.1 Ricinus communis NM 001323723.1 Jatropha curcas XM 012233481.1
V
Populus trichocarpa XM 002318921.2 Populus euphratica XM 011025463.1
VI
Populus euphraticaL XM 011038643.1 Beta vulgaris NM 001303065.1 Prunus persica XM 007207409.1 Prunus mume XM 008247687.2 Prunus mume XM 008247679.2 Pyrus bretschneideriL XM 009339914.1 Malus domestica XM 008372162.2 Pyrus bretschneideri XM 009375002.1 Malus domesticaL XM 008390264.2 Fragaria vesca XM 004302196.2
B
Poales Zingiberales Arecales Brassicales Caryophyllales Cucurbitales Fabales Lamiales Malpighiales Malvales Myrtales Proteales Rosales Sapindales Solanales Vitales
Fragaria vesca XM 004302196.2 Vigna radiata XM 014659207.1 Vigna angularis XM 017559813.1 Phaseolus vulgaris XM 007144933.1
VII
Glycine max XM 003555759.2 Glycine max1 NM 001249422.2 Medicago truncatula XM 003591164.2 Cicer arietinum XM 004495609.1 Arachis duranensis XM 016084233.1 Arachis ipaensis XM 016319753.1 Cucumis sativus XM 004144709.2 Cucumis melo XM 008455449.2
VIII IX X
Eucalyptus grandis XM 010025820.1 Eucalyptus grandisL XM 010041253.1 Vitis vinifera XM 002282365.3 Ziziphus jujuba XM 016019436.1 Erythranthe guttatusL XM 012978362.1 Erythranthe guttatusL XM 012978359.1
XI XII
Sesamum indicum XM 011093838.1 Sesamum indicum XM 011071083.1 Sesamum indicumL XM 011093839.1 Petunia hybrida HG738129.1 Nicotiana tomentosiformis XM 009631851.1 Nicotiana sylvestris XM 009806291.1
XIII
Solanum tuberosum XM 006344690.2 Solanum lycopersicum XM 004230251.2
XIV
Solanum pennellii XM 015200012.1 Zea mays1 Z46772.1 Zea mays AF190454.1 Zea mays2 NM 001309255.1 Zea maysL XM 008671858.1 Setaria italica XM 004987369.2 Oryza sativa XM 015791681.1 Brachypodium distachyon XM 003563118.3
C
Setaria italicaL XM 004981102.1 Oryza sativaL XM 015773252.1 Musa acuminataL XM 009386726.1 Musa acuminataL XM 009404234.1 Musa acuminata XM 009394905.1 Phoenix dactyliferaL XM 008799239.2
XV
Elaeis guineensisL XM 010933456.1 Phoenix dactyliferaL XM 008810437.2 Elaeis guineensisL XM 010916251.1 Phoenix dactyliferaL XM 008797745.2 Elaeis guineensisL XM 010941814.1 Phoenix dactyliferaL XM 008782872.2 Elaeis guineensisL XM 010923203.1 Selaginella moellendorffii NW 003314271.1
XVI XVII
Poales Zingiberales Arecales Brassicales Caryophyllales Cucurbitales Fabales Lamiales Malpighiales Malvales Myrtales Proteales Rosales Sapindales Solanales Vitales
Brassica napus3L XM 013802156.1 Brassica napus3L XM 013802190.1
I
Brassica rapa3 XM 009112645.1 Brassica napus3 XM 013823232.1 Brassica napus3 XM 013823267.1 Brassica oleracea3 XM 013750040.1 Brassica oleracea3 XM 013748922.1 Brassica napus3L XM 013817189.1 Brassica rapa3L XM 009120127.1 Brassica napus3L XM 013860963.1
II
Arabidopsis thaliana3 NM 100718.4 Arabidopsis lyrata XM 002892411.1 Camelina sativa3L XM 010459780.1 Camelina sativa3L XM 010477342.1 Camelina sativa3 XM 010490578.1 Eutrema salsugineum XM 006417626.1 Tarenaya hassleriana3 XM 010546033.1 Jatropha curcas3 XM 012209823.1 Populus trichocarpa XM 006372771.1 Populus euphratica3L XM 011038829.1
III
Populus euphratica3L XM 011013222.1 Populus euphratica3L XM 011009105.1 Theobroma cacao3 XM 007033822.1 Gossypium raimondii3 XM 012626068.1
IV
Cucumis sativus3 XM 004152104.2 Cucumis melo3 XM 008455899.2 Medicago truncatula XM 003606771.2 Cicer arietinum3 XM 004507292.2 Arachis duranensis3 XM 016096631.1 Arachis ipaensis3 XM 016331622.1
V
Glycine max3L XM 003537880.3 Glycine max3L XM 003541101.3 Phaseolus vulgaris XM 007131872.1 Vigna radiata3 XM 014639044.1 Vigna angularis3 XM 017578410.1 Citrus sinensis3 XM 006478683.2 Citrus clementina XM 006442909.1
VI
Ziziphus jujuba3 XM 016031115.1 Fragaria vesca3L XM 004306542.2 Prunus persica XM 007222486.1 Prunus mume3 XM 008224203.1
VII
Malus domestica3 XM 008370955.2 Pyrus bretschneideri3 XM 009342892.1 Beta vulgaris3L XM 010673956.1 Eucalyptus grandis3L XM 010048763.1 Eucalyptus grandis3L XM 010061735.1
VIII
Vitis vinifera3L XM 010652325.1 Nelumbo nucifera3L XM 010260212.1 Nelumbo nucifera3L XM 010244439.1 Pyrus bretschneideri3L XM 009342088.1 Pyrus bretschneideri3L XM 009342102.1 Pyrus bretschneideri3L XM 009363311.1 Malus domestica3L XM 008372617.2
IX
Poales Zingiberales Arecales Brassicales Caryophyllales Cucurbitales Fabales Lamiales Malpighiales Malvales Myrtales Proteales Rosales Sapindales Solanales Vitales
Nelumbo nucifera3L XM 010244439.1 Pyrus bretschneideri3L XM 009342088.1 Pyrus bretschneideri3L XM 009342102.1 Pyrus bretschneideri3L XM 009363311.1 Malus domestica3L XM 008372617.2 Malus domestica3L XM 008390596.2
X
Prunus mume3L XM 008220248.2 Fragaria vesca3L XM 004302130.2 Ziziphus jujuba3L XM 016020267.1 Medicago truncatula XM 003624156.2 Cicer arietinum3L XM 004492800.2
XI
Vigna angularis3L XM 017564087.1 Vigna radiata3L XM 014667712.1 Phaseolus vulgaris XM 007139646.1
XII
Glycine max3 XM 003534452.3 Arachis ipaensis3L XM 016338558.1 Arachis duranensis3L XM 016106472.1
A
Beta vulgaris3L XM 010675591.1 Eucalyptus grandis3L XM 010040093.1 Vitis vinifera3L XM 002276397.2
XIII
Ricinus communis3 XM 002514771.2 Jatropha curcas3L XM 012230400.1 Populus trichocarpa XM 002325873.2
XIV
Populus euphratica3L XM 011009804.1 Citrus sinensis3L XM 006480403.2 Citrus clementina XM 006428587.1 Theobroma cacao3 XM 007029130.1 Gossypium raimondii3L XM 012615148.1
XV XVI
Sesamum indicum3 XM 011078845.1 Erythranthe guttatus3 XM 012979176.1 Solanum lycopersicum3L XM 004237507.2 Solanum pennellii3L XM 015216285.1 Solanum tuberosum3L XM 006339981.2 Nicotiana tomentosiformis3L XM 009600582.1 Nicotiana sylvestris3L XM 009787923.1
XVII
Nicotiana tomentosiformis3L XM 009605316.1 Nicotiana sylvestris3L XM 009785800.1 Solanum tuberosum3L XM 006345720.2 Solanum lycopersicum3 XM 004239608.2 Solanum pennellii3 XM 015221015.1 Phoenix dactylifera3L XM 008792368.2 Elaeis guineensis3 XM 010929591.1
XVIII
Phoenix dactylifera3L XM 008789997.2 Musa acuminata3L XM 009385531.1 Musa acuminata3L XM 009387175.1
XIX
Sorghum bicolor XM 002456729.1 Zea mays3 NM 001147729.1 Setaria italica3L XM 004970829.3
XX
Oryza sativa3 XM 015765355.1 Brachypodium distachyon3 XM 003564777.3 Oryza sativa3 XM 015785313.1 Brachypodium distachyon3L XM 003566015.3 Setaria italica3L XM 004961520.2 Sorghum bicolor XM 002439993.1 Zea mays NM 001174987.1 Selaginella moellendorffii NW 003314271.1
XXI
Poales Zingiberales Arecales Brassicales Caryophyllales Cucurbitales Fabales Lamiales Malpighiales Malvales Myrtales Proteales Rosales Sapindales Solanales Vitales
Table 1 CRT genes from monocotyledons and dicotyledons. ACCESSION NUMBER SPECIES
E.guineensis
P.dactylifera
B.distachyon
O.sativa
S.bicolor
S.italica
Z.mays
M.acuminata
PROTEIN ISOFORM
CHROMOSOME
LOCUS
NUMBER OF EXONS
POTENTIAL GLYCOSYLATION SITE (aa)
NUCLEOTIDE SEQUENCE
PROTEIN SEQUENCE
XM_010916251.1
XP_010914553.1
* CRT L
2
105039926
15
58, 128, 158
XM_010923203.1
XP_010921505.1
* CRT L
5
105045043
14
62
XM_010933456.1
XP_010931758.1
* CRT L
10
105052598
14
58
XM_010941814.1
XP_010940116.1
* CRT L
15
105058770
14
53
XM_010929591.1
XP_010927893.1
* CRT3
8
105049829
14
100
XM_008782872.2
XP_008781094.1
* CRT L
UN
103700964
14
62
XM_008789997.2
XP_008788219.1
* CRT3 L
UN
103706042
14
99
XM_008797745.2
XP_008795967.1
* CRT L
UN
103711552
14
53
XM_008799239.2
XP_008797461.1
* CRT L
UN
103712656
14
58
XM_008810437.2
XP_008808659.1
* CRT L
UN
103720634
15
58, 128, 158
XM_008792368.2
XP_008790590.1
* CRT3 L
UN
103707753
14
100
XM_003563118.3
XP_003563166.1
* CRT
1
100841311
13
56, 391
XM_003566015.3
XP_003566063.1
* CRT3 L
2
100838375
14
93
XM_003564777.3
XP_003564825.1
* CRT3
2
100837158
14
-
XM_015765355.1
XP_015620841.1
* CRT3
1
9267167
14
-
XM_015773252.1
XP_015628738.1
* CRT L
3
4334675
13
57
XM_015785313.1
XP_015640799.1
* CRT3
5
4339262
14
96
XM_015791681.1
XP_015647167.1
* CRT1
7
4342826
13
61
XM_002456729.1
XP_002456774.1
* CRT
3
SORBIDRAFT_03g042500
14
94
XM_002439993.1
XP_002440038.1
* CRT
9
SORBIDRAFT_09g024930
14
-
XM_004961520.2
XP_004961577.1
* CRT3 L
III
101777678
14
90
XM_004970829.3
XP_004970886.2
* CRT3 L
V
101761253
14
96
XM_004981102.1
XP_004981159.1
* CRT L
IX
101762276
14
61
XM_004987369.2
XP_004987426.2
* CRT
UN
101774614
13
57
XM_008671858.1
XP_008670080.1
* CRT L
2
103647314
14
57
NM_001147729.1
NP_001141201.1
CRT3
8
100273288
14
98
NM_001309255.1
NP_001296184.1
CRT2
7
ZEAMMB73_853300
15
57
Z46772.1
CAA86728.1
CRT1
-
-
-
57
AF190454.1
AAF01470.1
CRT
-
-
-
57
NM_001174987.1
NP_001168458.1
* CRT
6
100382232
14
-
XM_009394905.1
XP_009393180.1
* CRT
3
103978936
14
57
XM_009404234.1
XP_009402509.1
* CRT L
5
103986277
14
57
XM_009385531.1
XP_009383806.1
* CRT3 L
11
103971503
14
102
XM_009386726.1
XP_009385001.1
* CRT L
11
103972398
14
57, 407
XM_009387175.1
XP_009385450.1
* CRT3 L
UN
103972813
14
100
NUMBER OF DUPLICATIONS CRT1/2
CRT3
4
1
4
2
1
2
1
2
-
2
2
2
4
2
3
2
Poales Zingiberales Arecales Brassicales Caryophyllales Cucurbitales Fabales Lamiales Malpighiales Malvales Myrtales Proteales Rosales Sapindales Solanales Vitales
Table 1 cont. ACCESSION NUMBER SPECIES
A.lyrata
A.thaliana
B.napus
B.oleracea
B.rapa
PROTEIN ISOFORM
CHROMOSOME
LOCUS
NUMBER OF EXONS
POTENTIAL GLYCOSYLATION SITE (aa)
NUCLEOTIDE SEQUENCE
PROTEIN SEQUENCE
XM_002892411.1
XP_002892457.1
UN
ARALYDRAFT_888076
14
96
XM_002892446.1
XP_002892492.1
CRT2
UN
ARALYDRAFT_888161
13
59
NM_104513.4
NP_176030.1
CRT1
1
AT1G56340
12
59, 154, 399
NM_100718.4
NP_563816.1
CRT3
1
AT1G08450
14
97
NM_100791.3
NP_172392.1
CRT2
1
AT1G09210
13
59
XM_013826395.1
XP_013681849.1
* CRT1 L
A2
106386552
12
59, 154
XM_013895419.1
XP_013750873.1
* CRT2 L
A5
106453163
12
54, 59
XM_013785569.1
XP_013641023.1
* CRT2
A6
106346289
12
59
XM_013802156.1
XP_013657610.1
* CRT3 L
A8
106362288
14
96
XM_013802190.1
XP_013657644.1
* CRT3 L
A8
106362312
14
96
XM_013802231.1
XP_013657685.1
* CRT2
A8
106362348
12
54, 59
XM_013817189.1
XP_013672643.1
* CRT3 L
C1
106377035
14
96
XM_013823232.1
XP_013678686.1
* CRT3
C3
106383113
14
100
XM_013823267.1
XP_013678721.1
* CRT3
C3
106383132
14
100
XM_013832261.1
XP_013687715.1
* CRT1
C4
106391579
12
59, 154
XM_013860963.1
XP_013716417.1
* CRT3 L
UN
106420124
14
96
NM_001316226.1
NP_001303155.1
CRT2 L
C8
106415836
12
40 (out of range)
XM_013755944.1
XP_013611398.1
* CRT1
C9
106318094
12
59, 154
XM_013750040.1
XP_013605494.1
* CRT3
C8
106312492
14
100
XM_013748922.1
XP_013604376.1
* CRT3
C8
106311670
14
96
XM_013744509.1
XP_013599963.1
* CRT2
C8
106307530
12
54, 59
XM_013730198.1
XP_013585652.1
* CRT2 L
C5
106294595
12
59
XM_013741554.1
XP_013597008.1
* CRT1 L
C1
106305155
12
15, 69, 164 (out of range)
XM_009149924.1
XP_009148172.1
* CRT2 L
A6
103871649
12
59
XM_009112594.1
XP_009110842.1
* CRT2
A8
103836348
12
54, 59
XM_009112645.1
XP_009110893.1
* CRT3
A8
103836390
14
96
XM_009115141.1
XP_009113389.1
* CRT1 L
A9
103838690
12
59, 154
XM_009120127.1
XP_009118375.1
* CRT3 L
A9
103843398
14
96
XM_009124734.1
XP_009122982.1
* CRT1
UN
103847648
12
5, 59, 154
* CRT
NUMBER OF DUPLICATIONS CRT1/2
CRT3
1
1
2
1
6
6
4
2
4
2
Poales Zingiberales Arecales Brassicales Caryophyllales Cucurbitales Fabales Lamiales Malpighiales Malvales Myrtales Proteales Rosales Sapindales Solanales Vitales
Table 1 cont. ACCESSION NUMBER SPECIES
C.sativa
E.salsugineum
T.hassleriana
B.vulgaris
C.melo
C.sativus
A.duranensis
A.ipaensis
C.arietinum
PROTEIN ISOFORM
CHROMOSOME
LOCUS
NUMBER OF EXONS
POTENTIAL GLYCOSYLATION SITE (aa)
NUCLEOTIDE SEQUENCE
PROTEIN SEQUENCE
XM_010481938.1
XP_010480240.1
* CRT1
17
104758963
12
59, 399
XM_010477342.1
XP_010475644.1
* CRT3 L
17
104755026
14
96
XM_010459780.1
XP_010458082.1
* CRT3 L
14
104739432
14
96
XM_010420638.1
XP_010418940.1
* CRT1 L
7
104704579
11
59, 256
XM_010416676.1
XP_010414978.1
* CRT1 L
7
104701045
12
59, 399
XM_010512972.1
XP_010511274.1
* CRT1 L
5
104787394
12
59, 399
XM_010512970.1
XP_010511272.1
* CRT1 L
5
104787392
12
59, 399
XM_010512969.1
XP_010511271.1
* CRT1 L
5
104787391
12
59, 399
XM_010490578.1
XP_010488880.1
* CRT3
3
104766654
14
96
XM_006392374.1
XP_006392436.1
* CRT
UN
EUTSA_v10023481mg
13
54, 59, 154
XM_006417533.1
XP_006417596.1
* CRT
UN
EUTSA_v10007717mg
13
54, 59
XM_006417626.1
XP_006417689.1
* CRT
UN
EUTSA_v10007710mg
14
98
XM_010551233.1
XP_010549535.1
* CRT1
UN
104820667
13
5, 54, 154, 399
XM_010545897.1
XP_010544199.1
* CRT
UN
104816886
13
54, 154, 395
XM_010546033.1
XP_010544335.1
* CRT3
UN
104816983
15
99, 270
XM_010555420.1
XP_010553722.1
* CRT1 L
UN
104823722
13
55, 155, 401
XM_010673956.1
XP_010672258.1
* CRT3 L
3
104888853
14
95
XM_010675591.1
XP_010673893.1
* CRT3 L
4
104890198
14
101
NM_001303065.1
NP_001289994.1
CRT
4
104890403
14
7, 57, 157
XM_008455449.2
XP_008453671.1
* CRT
UN
103494317
13
56, 156
XM_008455899.2
XP_008454121.1
* CRT3
UN
103494621
14
105
XM_004144709.2
XP_004144757.1
* CRT
2
101205515
13
56, 156
XM_004152104.2
XP_004152152.1
* CRT3
4
101203114
14
105
XM_016084233.1
XP_015939719.1
* CRT
A09
107465251
14
56, 156
XM_016106472.1
XP_015961958.1
* CRT3 L
A04
107485938
14
97
XM_016096631.1
XP_015952117.1
* CRT3
A03
107476754
15
6, 99
XM_016331622.1
XP_016187108.1
* CRT3
B03
107628970
14
6, 99
XM_016338558.1
XP_016194044.1
* CRT3 L
B04
107635173
14
97
XM_016319753.1
XP_016175239.1
* CRT
B09
107617877
14
56, 156
XM_004492800.2
XP_004492857.1
* CRT3 L
Ca3
101514432
14
4, 94
XM_004495609.1
XP_004495666.1
* CRT
Ca4
101511865
14
56, 156, 290, 397
XM_004507292.2
XP_004507349.1
* CRT3
Ca6
101501628
14
91
NUMBER OF DUPLICATIONS CRT1/2
CRT3
6
3
2
1
3
1
1
2
1
1
1
1
1
2
1
2
1
2
Poales Zingiberales Arecales Brassicales Caryophyllales Cucurbitales Fabales Lamiales Malpighiales Malvales Myrtales Proteales Rosales Sapindales Solanales Vitales
Table 1 cont. ACCESSION NUMBER SPECIES
G.max
M.truncatula
P.vulgaris
V.angularis
V.radiata
E.guttatus
S.indicum
J.curcas
P.euphratica
PROTEIN ISOFORM
CHROMOSOME
LOCUS
NUMBER OF EXONS
POTENTIAL GLYCOSYLATION SITE (aa)
NUCLEOTIDE SEQUENCE
PROTEIN SEQUENCE
XM_003534452.3
XP_003534500.1
* CRT3
9
100776652
14
4, 95
XM_003537880.3
XP_003537928.1
* CRT3 L
11
100776524
14
6 (out of range)
XM_003541101.3
XP_003541149.1
* CRT3 L
12
100802428
15
5 (out of range)
XM_003555759.2
XP_003555807.1
* CRT
20
100811997
14
58, 158, 399
NM_001249422.2
NP_001236351.1
CRT1
10
100037475
14
56, 156, 397
XM_003624156.2
XP_003624204.1
CRT
7
MTR_7g080370
14
4, 94
XM_003606771.2
XP_003606819.1
CRT
4
MTR_4g068080
14
93
XM_003591164.2
XP_003591212.1
CRT
1
MTR_1g083960
13
56, 156, 290
XM_007131872.1
XP_007131934.1
* CRT
11
PHAVU_011G053000g
14
5 (out of range)
XM_007144933.1
XP_007144995.1
* CRT
7
PHAVU_007G200800g
14
58, 158, 399
XM_007139646.1
XP_007139708.1
* CRT
8
PHAVU_008G052500g
14
4, 94
XM_017559813.1
XP_017415302.1
* CRT
2
108326352
14
58, 158, 399
XM_017578410.1
XP_017433899.1
* CRT3
8
108340819
14
3 (out of range)
XM_017564087.1
XP_017419576.1
* CRT3 L
3
108329735
14
4, 94
XM_014639044.1
XP_014494530.1
* CRT3
2
106756570
14
3 (out of range)
XM_014659207.1
XP_014514693.1
* CRT
8
106772667
14
58, 158, 399
XM_014667712.1
XP_014523198.1
* CRT3 L
UN
106779579
14
4, 94
XM_012979176.1
XP_012834630.1
* CRT3
UN
105955452
14
96
XM_012978362.1
XP_012833816.1
* CRT L
UN
105954683
14
63
XM_012978359.1
XP_012833813.1
* CRT L
UN
105954682
14
63
XM_011078845.1
XP_011077147.1
* CRT3
LG4
105161224
14
97
XM_011093838.1
XP_011092140.1
* CRT
LG10
105172424
14
63, 163
XM_011093839.1
XP_011092141.1
* CRT L
LG10
105172425
14
163 (out of range)
XM_011071083.1
XP_011069385.1
* CRT
UN
105155213
14
63, 163
XM_012209823.1
XP_012065213.1
* CRT3
UN
105628416
14
-
XM_012230400.1
XP_012085790.1
* CRT3 L
UN
105644897
14
97
XM_012233481.1
XP_012088871.1
* CRT
UN
105647415
16
52, 152
XM_011025463.1
XP_011023765.1
* CRT
UN
105125151
14
52, 152
XM_011013222.1
XP_011011524.1
* CRT3 L
UN
105116058
13
103
XM_011009804.1
XP_011008106.1
* CRT3 L
UN
105113578
14
93
XM_011009105.1
XP_011007407.1
* CRT3 L
UN
105113086
13
103
XM_011038829.1
XP_011037131.1
* CRT3 L
UN
105134424
13
103
XM_011038643.1
XP_011036945.1
* CRT L
UN
105134292
14
52, 152
NUMBER OF DUPLICATIONS CRT1/2
CRT3
2
3
1
2
1
2
1
2
1
2
2
1
3
1
1
2
2
4
Poales Zingiberales Arecales Brassicales Caryophyllales Cucurbitales Fabales Lamiales Malpighiales Malvales Myrtales Proteales Rosales Sapindales Solanales Vitales
Table 1 cont. ACCESSION NUMBER SPECIES
P.trichocarpa
R.communis
G.raimondii
T.cacao
E.grandis
N.nucifera
F.vesca
M.domestica
PROTEIN ISOFORM
CHROMOSOME
LOCUS
NUMBER OF EXONS
POTENTIAL GLYCOSYLATION SITE (aa)
NUCLEOTIDE SEQUENCE
PROTEIN SEQUENCE
XM_002318921.2
XP_002318957.1
CRT
LGXIII
POPTR_0013s01090g
14
52
XM_002325873.2
XP_002325909.1
* CRT
LGXIX
POPTR_0019s08290g
14
93
XM_006372771.1
XP_006372833.1
* CRT
LGXVII
POPTR_0017s05490g
13
103
XM_002514771.2
XP_002514817.1
* CRT3
UN
8274259
14
98
NM_001323723.1
NP_001310652.1
CRT
UN
8269812
14
52, 152
XM_012607467.1
XP_012462921.1
* CRT
13
105782615
14
60, 160
XM_012608554.1
XP_012464008.1
* CRT L
13
105783227
13
56
XM_012607469.1
XP_012462923.1
* CRT L
13
105782616
14
60, 160
XM_012578814.1
XP_012434268.1
* CRT L
7
105761112
14
56, 156
XM_012626068.1
XP_012481522.1
* CRT3
5
105796374
14
7, 97
XM_012615148.1
XP_012470602.1
* CRT3 L
3
105788317
14
94
XM_007029130.1
XP_007029191.1
CRT3
5
TCM_025085
14
93
XM_007031074.1
XP_007031136.1
CRT2
5
TCM_026754
14
7, 56, 156, 398
XM_007033822.1
XP_007033884.1
CRT3
4
TCM_019985
16
97
XM_010048763.1
XP_010047065.1
* CRT3 L
UN
104435995
14
-
XM_010061735.1
XP_010060037.1
* CRT3 L
UN
104447947
14
-
XM_010040093.1
XP_010038395.1
* CRT3 L
UN
104426918
14
105
XM_010025820.1
XP_010024122.1
* CRT
UN
104414665
14
60, 160
XM_010041253.1
XP_010039555.1
* CRT L
UN
104428253
14
62, 162
XM_010260212.1
XP_010258514.1
* CRT3 L
UN
104598256
14
99
XM_010269711.1
XP_010268013.1
* CRT
UN
104605095
14
57, 157
XM_010244439.1
XP_010242741.1
* CRT3 L
UN
104587015
14
97
XM_004302130.2
XP_004302178.1
* CRT3 L
LG6
101294184
14
97
XM_004302196.2
XP_004302244.1
* CRT
LG6
101315062
12
7, 56, 156, 304
XM_004306542.2
XP_004306590.1
* CRT3 L
LG7
101312475
15
96
XM_008390596.2
XP_008388818.1
* CRT3 L
12
103451159
14
99
XM_008390264.2
XP_008388486.1
* CRT L
12
103450863
13
8, 57, 157
XM_008372162.2
XP_008370384.1
* CRT
4
103433875
13
7, 57, 157
XM_008372617.2
XP_008370839.1
* CRT3 L
4
103434283
14
97
XM_008370955.2
XP_008369177.1
* CRT3
4
103432753
14
104
NUMBER OF DUPLICATIONS CRT1/2
CRT3
1
2
1
1
4
2
1
2
2
3
1
2
1
2
2
3
Poales Zingiberales Arecales Brassicales Caryophyllales Cucurbitales Fabales Lamiales Malpighiales Malvales Myrtales Proteales Rosales Sapindales Solanales Vitales
Table 1 cont. ACCESSION NUMBER SPECIES
P.bretschneideri
P.mume
P.persica
Z.jujuba
C.clementina
C.sinensis
N.sylvestris
N.tomentosiformis
P.hybrida S.lycopersicum
S.pennellii
PROTEIN ISOFORM
CHROMOSOME
LOCUS
NUMBER OF EXONS
POTENTIAL GLYCOSYLATION SITE (aa)
NUCLEOTIDE SEQUENCE
PROTEIN SEQUENCE
XM_009363311.1
XP_009361586.1
* CRT3 L
UN
103951844
14
97
XM_009375002.1
XP_009373277.1
* CRT
UN
103962308
13
8, 57, 157
XM_009339914.1
XP_009338189.1
* CRT L
UN
103930566
13
8, 57, 157
XM_009342088.1
XP_009340363.1
* CRT3 L
UN
103932466
14
97
XM_009342102.1
XP_009340377.1
* CRT3 L
UN
103932480
14
97
XM_009342892.1
XP_009341167.1
* CRT3
UN
103933225
14
104
XM_008247679.2
XP_008245901.1
* CRT
LG1
103344047
13
7, 56, 156
XM_008247687.2
XP_008245909.1
* CRT
LG1
103344056
13
7, 56, 156
XM_008220248.2
XP_008218470.1
* CRT3 L
LG1
103318812
14
105
XM_008224203.1
XP_008222425.1
* CRT3
LG2
103322299
14
97
XM_007207409.1
XP_007207471.1
* CRT
UN
ppa006226mg
13
7, 56, 156
XM_007222486.1
XP_007222548.1
* CRT
UN
ppa006217mg
14
97
XM_016019436.1
XP_015874922.1
* CRT
2
107411779
14
57, 157
XM_016031115.1
XP_015886601.1
* CRT3
7
107421787
14
99
XM_016020267.1
XP_015875753.1
* CRT3 L
2
107412492
14
96, 237
XM_006433460.1
XP_006433523.1
* CRT
UN
CICLE_v10001298mg
14
5, 55, 155
XM_006428587.1
XP_006428650.1
* CRT
UN
CICLE_v10011848mg
14
86
XM_006442909.1
XP_006442972.1
* CRT
UN
CICLE_v10020358mg
14
90
XM_006472123.1
XP_006472186.1
* CRT
3
102618403
14
5, 55, 155
XM_006478683.2
XP_006478746.1
* CRT3
5
102608719
14
90
XM_006480403.2
XP_006480466.1
* CRT3 L
6
102611212
14
86
XM_009806291.1
XP_009804593.1
* CRT
UN
104249802
14
8, 59, 159
XM_009785800.1
XP_009784102.1
* CRT3 L
UN
104232563
14
108
XM_009787923.1
XP_009786225.1
* CRT3 L
UN
104234366
14
102
XM_009631851.1
XP_009630146.1
* CRT
UN
104120134
14
8, 59, 159
XM_009605316.1
XP_009603611.1
* CRT3 L
UN
104098551
14
105
XM_009600582.1
XP_009598877.1
* CRT3 L
UN
104094618
15
102
HG738129.1
CDJ26237.1
-
-
XM_004230251.2
XP_004230299.1
* CRT
1
101246093
13
59, 159
XM_004237507.2
XP_004237555.1
* CRT3 L
4
101262304
14
88
XM_004239608.2
XP_004239656.1
* CRT3
5
101248677
14
105
XM_015200012.1
XP_015055498.1
* CRT
1
107002089
13
59, 159
XM_015216285.1
XP_015071771.1
* CRT3 L
4
107015866
14
88
XM_015221015.1
XP_015076501.1
* CRT3
5
107020585
14
105
CRT1
-
8, 59, 159
NUMBER OF DUPLICATIONS CRT1/2
CRT3
2
4
2
2
1
1
1
2
1
2
1
2
1
2
1
2
1
-
1
2
1
2
Poales Zingiberales Arecales Brassicales Caryophyllales Cucurbitales Fabales Lamiales Malpighiales Malvales Myrtales Proteales Rosales Sapindales Solanales Vitales
Table 1 cont. ACCESSION NUMBER SPECIES
S.tuberosum
V.vinifera
PROTEIN ISOFORM
CHROMOSOME
LOCUS
NUMBER OF EXONS
POTENTIAL GLYCOSYLATION SITE (aa)
NUCLEOTIDE SEQUENCE
PROTEIN SEQUENCE
XM_006339981.2
XP_006340043.1
* CRT3 L
UN
102606145
14
87
XM_006344690.2
XP_006344752.1
* CRT
UN
102603479
14
59, 159
XM_006345720.2
XP_006345782.1
* CRT3 L
UN
102581294
14
105
XM_002276397.2
XP_002276433.2
* CRT3 L
4
100265011
15
6, 102
XM_010652325.1
XP_010650627.1
* CRT3 L
5
100267984
14
95
XM_002270318.3
XP_002270354.1
* CRT
7
100264203
14
53, 392
XM_002282365.3
XP_002282401.1
* CRT
14
100256319
15
58
NUMBER OF DUPLICATIONS CRT1/2
CRT3
1
2
2
2
The stars used in the table indicate sequences at the different identification levels as follows: predicted, hypothetical and uncharacterized sequences. A ‘L’ letter (like) in the protein isoform type describes a CRT homolog-like sequence. The lack of numbers or letters indicate unclassified CRT homologs. The potential glycosylation sites indicating a isoform type are delineated by bold. The colored circles/squares indicate plant orders as indicated in the key; UN: uncharacterized.
Poales Zingiberales Arecales Brassicales Caryophyllales Cucurbitales Fabales Lamiales Malpighiales Malvales Myrtales Proteales Rosales Sapindales Solanales Vitales
HIGHLIGHTS The CRT gene duplication is a widespread in plant genomes. The Arabidopsis genome, in contrast to other plant genomes, has only 3 CRT homologs. The CRT isoforms reveal a functional specialization.
15
CRT1 DUPLICATION
CRT1/2 Subfunctionalization
Ancestral gene
CRT2
MULTIPLICATION
DUPLICATION
PSEUDOGENIZATION Neofunctionalization
GENE LOSS CRT3