Phylogenetic analysis of plant calreticulin homologs

Phylogenetic analysis of plant calreticulin homologs

Accepted Manuscript Phylogenetic analysis of plant calreticulin homologs Piotr Wasąg, Tomasz Grajkowski, Anna Suwińska, Marta Lenartowska, Robert Lena...

2MB Sizes 0 Downloads 33 Views

Accepted Manuscript Phylogenetic analysis of plant calreticulin homologs Piotr Wasąg, Tomasz Grajkowski, Anna Suwińska, Marta Lenartowska, Robert Lenartowski PII: DOI: Reference:

S1055-7903(18)30276-8 https://doi.org/10.1016/j.ympev.2019.01.014 YMPEV 6402

To appear in:

Molecular Phylogenetics and Evolution

Received Date: Revised Date: Accepted Date:

30 April 2018 17 January 2019 18 January 2019

Please cite this article as: Wasąg, P., Grajkowski, T., Suwińska, A., Lenartowska, M., Lenartowski, R., Phylogenetic analysis of plant calreticulin homologs, Molecular Phylogenetics and Evolution (2019), doi: https://doi.org/10.1016/ j.ympev.2019.01.014

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Phylogenetic analysis of plant calreticulin homologs Piotr Wasąga, Tomasz Grajkowskib, Anna Suwińskaa, Marta Lenartowskaa, Robert Lenartowskib* a

Laboratory of Developmental Biology, Department of Cellular and Molecular Biology,

Faculty of Biology and Environmental Protection, Nicolaus Copernicus University in Toruń, Lwowska 1, 87-100 Toruń, Poland b

Laboratory of Molecular and Isotope Methods, Department of Cellular and Molecular

Biology, Faculty of Biology and Environmental Protection, Nicolaus Copernicus University in Toruń, Lwowska 1, 87-100 Toruń, Poland *e-mail: [email protected]

Abstract

Calreticulin (CRT) is a multifunctional resident endoplasmic reticulum (ER) luminal protein implicated in regulating a variety of cellular processes, including Ca2+ storage/mobilization and protein folding. These multiple functions may be carried out by different CRT genes and protein isoforms. The plant CRT family consist of three genes: CRT1 and CRT2 classified in the common subclass (CRT1/2), and CRT3. These genes are highly conserved during evolution end encode three different protein products (CRT1, 2 and 3). The aim of the current study was to conduct a comparative analysis and sequence-based classification of the plant CRT genes. We used nucleotide and amino acid sequences to phylogenetically cluster the genes and examine potential glycosylation patterns. Additionally, we analyzed phylogenetic relationships within the CRT subclasses. Finally, we analyzed intraspecific CRT duplication events among mono- and dicotyledon species. Our results confirm that each of the CRT genes exist in multiple copies in plant genomes, and that CRT gene duplication is a widespread process in plants.

Keywords

CRT1/2, CRT3, molecular phylogeny, duplication, monocotyledons, dicotyledons

Abbreviations

1

APG

Angiosperm Phylogeny Group

Ca2+

Calcium ions

cDNA

Complementary DNA

CRT

Calreticulin gene

CRT

Calreticulin protein

ER

Endoplasmic reticulum

KEGG

Kyoto Encyclopedia of Genes and Genomes

NCBI

National Center for Biotechnology Information

UTR

Untranslated Region in mRNA

1. Introduction CRT is a evolutionarily conserved protein localized mainly in the ER. Although CRT is primarily known to regulate Ca2+ homeostasis and act as a chaperone, it also plays roles in many intra- and extracellular processes (Jia et al., 2009). The multifunctionality of CRT could be due to the existence of multiple genes. Two CRT genes, CRT1 and CRT2, have been identified in animals (Persson et al., 2002). In contrast, the plant CRT gene family comprises at least three members in two subclasses: CRT1/CRT2 (also known as CRT1a/CRT1b) and CRT3 (; Persson et al., 2003; Jia et al., 2008). CRT1/CRT2 genes correspond with the animal CRTs, whereas CRT3 is plant-specific (Jia et al., 2009). Phylogenetic analysis of nucleotide sequences revealed that CRT3 genes are the most highly conserved and are most closely related to the ancestral gene in plants (Del Bem, 2011). In contrast, CRT1 and CRT2 are similar to each other and evolutionarily diverged between monocots and dicots. Therefore, CRT1 and CRT2 are paralogs of one another. In an evolutionary context, two duplication events of the CRT gene in plants took place at different times, wherein early duplication generated two distinct CRT subgroups: CRT1/CRT2 and the CRT3 (Persson et al., 2003). The CRT sequence length and number of exons are gene- and species-specific (Jia et al., 2009). For example, Zea mays and Oryza sativa CRTs have 14 exons, as does Arabidopsis thaliana CRT3. In contrast, Arabidopsis thaliana CRT1 contains 12 exons and CRT2 contains 13 exons due to fusions within exons 4 through 6. Despite these differences, the overall sizes of exons are conserved except exons 1, 11, 12, and 13 (Persson et al., 2003). The molecular structure of the CRT protein is similar in animals and plants (Michalak et al., 2009; Jia et al., 2009). All plant CRTs identified so far contain three distinct domains (N, P, and C). The globular N-domain, the longest and most conserved region of the protein, 2

contains a hydrophobic ER signal peptide sequence followed by two evolutionarily conserved CRT signature motifs, KHEQKLDCGGGYVKLL and IMFGPDICG. Proper folding of the N-terminal region depends on three cysteine residues that form a disulphide bridge. The Pdomain, which is enriched in proline, serine, and threonine residues, contains a putative nuclear targeting sequence (PPKXIKDPX) and two characteristic triplicate motifs called A (PXXIXDPXXKKPEXWDD) and B (GXWXAXXIXNPXYK). These motifs seem to be critical for the CRT lectin-like chaperone activity (Persson et al., 2003; Jia et al., 2008; An et al., 2011). A high percentage of acidic amino acids in the P domain are required for highaffinity but low-capacity Ca2+ binding (Jia et al., 2009). The polyacidic C-domain is the most variable region of CRT. It contains negatively charged residues that participate in low-affinity but high-capacity binding Ca2+ and a typical ER-retention signal (mostly HDEL in plants and KDEL in animals) (Persson et al., 2002, 2003). Unlike animal CRTs, the plant proteins are commonly glycosylated and phosphorylated in the N- and C-domains (Li and Komatsu, 2000; Persson et al., 2003). The most common sites of glycosylation in plants are near 50 and 60 aa in CRT1/2, and near amino acid 96 in CRT3 (Jia et al., 2009). Three putative glycosylation sites were identified in Arabidopsis thaliana CRT1, whereas the other CRT isoforms in this species contain only one glycosylation site. These results indicate that plant CRTs exhibit species-specific glycosylation patterns (Persson et al., 2003; Christensen et al., 2010). Here, our objective was to use sequences in online databases to determine the evolutionary relationships between the nucleotide and predicted amino acid sequences of the plant CRT genes. In this analysis, we paid particular attention to the CRT duplication events across plant species. Additionally, we used phylogenetic analysis and glycosylation patterns to assign the available CRT sequences to subclasses.

2. Materials and methods

The nucleotide and predicted amino acid sequences of plant CRTs were obtained from the National Center for Biotechnology Information (NCBI; http://www.ncbi.nlm.nih.gov) and Kyoto Encyclopedia of Genes and Genomes (KEGG; http://www.genome.jp/kegg) databases. Sequences were identified relying on the annotation as a first step and then Arabidopsis thaliana CRT was used to BLAST homology searching. The cutoff score for all cDNA sequences identified as the CRT were open reading frame and both 5' and 3' untranslated regions (UTR). Amino acid sequences were predicted from cDNAs and began with a 3

methionine residue at their N-termini and ended with an HDEL motif. ClustalW was used to align the nucleotide and amino acid sequences (http://www.genome.jp/tools/clustalW), and then a rooted guide tree was constructed (data not shown). The phylogenetic trees were built with MEGA software (version 6; http://www.megasoftware.net; Tamura et al., 2013) using the Neighbor-Joining method with Maximum Composite Likelihood and Poisson models for nucleotide and amino acid sequences, respectively. In both cases, bootstrap analysis (1000 replications) was performed with a 70% bootstrap support value. Additionally, trees were rooted to the Selaginella moellendorffii CRT sequence (GenBank accession number XM_002968285.2). The predicted amino acid sequences were analyzed by using an Nglycosylation consensus sequence NXS/T, where X could be any amino acid except for asparagine or proline (Liu and Howell, 2010). The Angiosperm Phylogeny Group (APG) III classification was used to compare relationships between CRT gene trees and species trees (http://www.mobot.org/MOBOT/research/APweb; Chase and Reveal, 2009).

3. Results and Discussion

In searching the NCBI and KEGG databases, we identified a total of 200 cDNAs, including defined and unclassified CRT homologs (CRT1, CRT2, CRT3), and defined and unclassified sequences similar to CRT (e.g. CRT1-like). All of the genes have between 12 and 16 exons, and four sequences encode the KDEL retention signal, which is characteristic of animal CRT genes (Table 1). Our first step was to match unclassified CRT and CRT-like sequences into one of the previously described subclasses (CRT1/2 or CRT3). Among the 35 cDNAs from monocotyledons, 3 were annotated as members of the CRT1/2 subclass, 5 were CRT3, and 7 were CRT3-like. Twenty cDNAs were not annotated as a specific CRT genes, including 7 reported as CRT and 13 as CRT-like. For dicotyledons, 15 out of the 165 analyzed sequences were annotated as CRT1/2, 13 were annotated as being CRT1/2-like, 30 as CRT3, and 47 as CRT3-like. Among the other 60 sequences, 50 were defined as CRT and 10 as CRT-like. As we expected, 80 unclassified cDNA sequences of both mono- and dicotyledons clearly clustered into the CRT1/2 and CRT3 subclasses (Figs. S1 and S2). Moreover, results of the preliminary nucleotide and amino acid sequence alignments were similar (data not shown). For monocotyledons, our cluster analysis assigned 17 previously non-categorized amino acid sequences to the CRT1/2 subclass, and 3 to the CRT3 subclass. For dicotyledons,

4

our cluster analysis assigned 49 sequences to the CRT1/2 subgroup and 11 to the CRT3 subgroup (Figs. 1 and 2). Given that localization of the N-glycosylation sites within amino acid sequences is correlated with the CRT/CRT gene/isoform (near position 50–60 aa for CRT1/2 and near position 96 aa for CRT3; Jia et al., 2009), we examined the location of potential glycosylation sites to confirm the results of our preliminary cluster analysis. We were unable to define the CRT/CRT gene/protein subclass in 4 monocotyledon (Brachypodium distachyon, Oryza sativa, Sorghum bicolor, Zea mays) and 11 dicotyledon (Brassica napus, Brassica oleracea, Glycine max – 2 seq., Phaseolus vulgaris, Vigna angularis, Vigna radiata, Sesamum indicum, Jatropha curcas, Eucalyptus grandis – 2 seq.; Table 1) sequences because of a lack of an Nglycosylation consensus sequence or the presence of potential glycosylation sites outside of the regions we had defined for classification. However, glycosylation site analysis of the remaining amino acid sequences confirmed the CRT genes assignments we had made by phylogenetic clustering. Consistent with others' findings, that three potential glycosylation sites were found in the CRT1 sequence but only one in the CRT2 and CRT3 sequences, respectively (Persson et al., 2003; Christensen et al., 2010), we also identified additional potential glycosylation sites within other of the CRT domains (Table 1). After classifying the sequences into subclasses depending on their similarity degree, we further analyzed each CRT/CRT subclass separately. Among monocotyledon CRT1/2 cDNAs, the highest sequence identity were between Zea mays1_Z46772.1 and Zea mays2_NM_001309255.1 (97.62%) and Zea mays_AF190454.1 (97.17%). The first two also showed the highest level of amino acid identity (97.86%). Furthermore, amino acid sequence of

Setaria

italica_XM_004987369.2

was

97.38%

identical

with

both

Zea

mays2_NM_001309255.1 and Zea mays1_Z46772.1. The lowest degrees of identity were found in comparing nucleotide sequences of Elaeis guineensisL_XM_010923203.1 with Zea mays2_NM_001309255.1 (54.32%) and

Oryza sativaL_XM_015773252.1 (54.39%).

However, in alignments of deduced amino acid sequences, the lowest identity was between Brachypodium distachyon_XM_003563118.3 and Elaeis guineensisL_XM_010923203.1 (64.82%). Among 15 monocotyledon CRT3 cDNAs, the highest identity was between Phoenix dactylifera3L_XM_008792368.2 and Elaeis guineensis3_XM_010929591.1 (89.1%) while the

highest

identity

in

predicted

amino

acid

sequences

was

between

Zea

mays_XM_001174987.1 and Sorghum bicolor_XM_002439993.1 (97.86%). The lowest level of nucleotide identity was between Phoenix dactylifera3L_XM_008792368.2 and Sorghum 5

bicolor_XM_002439993.1 (57.87%). The lowest levels of deduced amino acid sequence identifies were

found in comparisons of Musa acuminata3L_XM_009385531.1 with

Brachypodium distachyon3L_XM_003566015.3 and Sorghum bicolor_XM_002439993.1 (both 75.36%). Among dicotyledon, the nucleotide CRT1/2 and amino acid CRT1/2 sequences of Sesamum indicum_XM_011093838.1 and Sesamum indicum_XM_011071083.1 were 100% identical. Three additional pairs of amino acid sequences were also 100% identical: Brassica napus1_XM_013832261.1 mume_XM_008247679.2

and and

Brassica Prunus

oleracea1_XM_013755944.1, mume_XM_008247687.2,

and

Prunus Solanum

lycopersicum_XM_004230251.2 and Solanum pennellii_XM_015200012.1. We noted the lowest

degree of identity for

nucleotide

sequences when comparing

Camelina

sativa1L_XM_010420638.1 with Nicotiana tomentosiformis_XM_009631851.1 (55.97%) and for

amino

acid

sequences

Camelina

sativa1L_XM_010420638.1

with

Brassica

napus2L_NM_001316226.1 (55.8%). Among the dicotyledon CRT3/CRT3s, two groups of nucleotide and amino acid sequences

were

100%

identical:

Pyrus

bretschneideri3L_XM_009363311.1,

Pyrus

bretschneideri3L_XM_009342088.1, and Pyrus bretschneideri3L_XM_009342102.1; and Brassica napus3_XM_013823232.1 and Brassica napus3_XM_013823267.1. These two Brassica napus amino acid sequences were also 100% identical with Brassica oleracea3_XM_013750040.1, and the Citrus sinensis3_XM_006478683.2 and Citrus clementina_XM_006442909.1 amino acid sequences were 100% identical. We noted the lowest nucleotide sequence identities when comparing Theobroma cacao3_XM_007029130.1 with Theobroma cacao3_XM_007033822.1 (46.93%) and the lowest amino acids identity when

comparing

Pyrus

bretschneideri3L_XM_009363311.1,

Pyrus

bretschneideri3L_XM_009342088.1, and Pyrus bretschneideri3L_XM_009342102.1 to Brassica napus3_XM_013823232.1, Brassica napus3_XM_013823267.1, and Brassica oleracea3_XM_013750040.1 (68.87%). We next used our multiple nucleotide and amino acid sequence alignments to determine evolutionary relationships within the CRT subclasses. Among the CRT1/2 subclass, we observed that amino acid and nucleotide sequences belonging to the Poales order aligned differently within the group due to phylogenetic relationship except paraphyletic group (C) (Figs. 1 and S1, respectively). In contrast, the other monocotyledon amino acid and nucleotide sequences consistently clustered into clades XV, XVI, and XVII. We also noted

6

that clades XVI-XVII (amino acid analysis) and XVI (nucleotide analysis) represent outgroups relative to all others (Figs. 1 and S1). We performed similar analysis of dicotyledon CRTs. Among CRT1/2 subclass we identified 14 monophyletic groups (I – XIV) and two paraphyletic group (A and B), and these assignments were identical when comparing both amino acid and nucleotide sequences (Figs. 1 and S1, respectively). Additionally, obtained cladograms showed different clustering of predicted amino acid sequences which, at the nucleotide level, were clustered together into the monophyletic groups. At the nucleotide level, Myrtales sequences formed an outgroup relative to the other dicotyledon orders (Fig. S1). Moreover, amino acid and nucleotide sequences of Vitis vinifera (Vitales) fall into different clades and one of them clusters with Ziziphus jujuba (Rosales) (Figs. 1 and S1, respectively). In reference to CRT3 subclass, amino acid and cDNA sequences from the Poales order formed two monophyletic groups XX and XXI (Figs. 2 and S2, respectively). Analysis of connections between CRT3 sequences indicated that Poales is a sister group to all other monocotyledon (Fig. S2). In contrary, amino acid sequences from the Arecales and Zingiberales orders appear to form outgroups relative to Poales. Our comparison of dicotyledon CRT3/CRT3 amino acidand nucleotide sequences revealed 17 constant clades (I – XVII) and paraphyletic group (A). Paraphyletic group (A) contains sequences Arachis ipaensis, Arachis duranensis and Beta vulgaris from the Fabales and Malpighiales orders, respectively (Figs. 2 and S2, respectively). We note that whereas sequences from the orders Brassicales, Cucurbitales, Lamiales, Proteales, and Solanales were all assigned to only one clade each, sequences from species belonging to the other orders were assigned to two different clades. Moreover, all amino acid sequences from the monocotyledon are the outgroup to the dicotyledon clade in contrast to cDNA’s (Figs. 2 and S2). The differences we note between the nucleotide and amino acid alignments likely reflect the lower variability of amino acid sequences, which results from degeneracy of the genetic code and silent mutations. Additionally, the larger number of variables (20 amino acids vs. 4 nucleotides) means that the probability of a match by chance is lower for amino acid sequences than for nucleotide sequences (Michu, 2007). Finally, heterogeneity of nucleotide composition can lead to grouping together unrelated clades that have similar nucleotide frequencies (Anup, 2015). Therefore, some authors suggest that amino acid sequences are more appropriate for constructing trees for distantly related species (Russo et al., 1996,Michu, 2007).

7

To compare the CRT gene trees and the species trees, we used the APG III classification (http://www.mobot.org/MOBOT/research/APweb; Chase and Reveal, 2009). In contrast to the APG III system, our analysis of monocotyledon could not confirm that Poales and Zingiberales orders are more closely related to each other than to Arecales. Obtained results showed that only the CRT1/2 amino acid sequences of Zingiberales and Arcales preferentially clustered together, separately from those of Poales (Fig. 1). We also noted discrepancies between CRT sequence assignments and APG III classifications of dicotyledons. Additionally, the trees generated for the CRT1/2 homologs (Fig. 1) differed from those for the CRT3 homologs (Fig. 2). Generally, the APG III classification distinguishes two large monophyletic groups (rosids and asterids) containing most of the dicotyledons and few phylogenetically older taxa besides them as the sister clades, e.g. Proteales and Vitales orders. Our phylograms based on amino acid sequences didn’t reflect these relationships. Instead, we observed no clear phylogenetic correlations and noted wide separation of sequences belonging to common species or orders that should cluster together. For example, sequences from Beta vulgaris were clustered into two separate clades and were separated from other asterids (Fig. 2). Unfortunately, we could not unambiguously determine the reasons for these discrepancies, though it is likely that genes, and even regions within genes, evolve at different rates (Anup, 2015). Additionally, lineage sorting, horizontal gene transfer, and hidden paralogy may contribute to these discrepancies (Anup, 2015). CRT evolution has clearly involved cases of gene duplication and the existence of paralogs. An et al., (2011) described numerous CRT homologs in human, rat, rice, corn, and Arabidopsis thaliana genomes. Person et al. (2003) revealed that all CRT genes are located on chromosome 1 in Arabidopsis thaliana; the CRT1 and CRT2 sequences were found in a region that was duplicated but the CRT3 gene locus in a region without any major duplication activity. Our phylogenetic analysis of plant CRT homologs clearly showed that analogous duplication events took place in many other dico- and monotyledons. However, the number of CRT genes have been multiplied in all other plant species, including the intra- and interchromosomal duplication evens. Generally, plant genomes contain three CRT homologs that are classified into CRT1/2 and CRT3 subgroups, wherein sequence homology suggests that CRT1/2 are similar to each other and correspond with the animal CRT1 and the plant-specific CRT3 genes are more highly conserved across different plant species (Persson et al., 2003;Jia et al., 2009; Thelin et al., 2011). Different CRTs exhibit distinct expression patterns, post-transcriptional, and posttranslational modifications, indicating that they have diverse roles in plants. The CRT1/2 8

family members appear to work as primary proteins within a general ER chaperone network related to regulation of Ca2+ homeostasis. Instead, CRT3 does not have high-capacity Ca2+binding ability (Qiu et al., 2012) and CRT3 gene seems to be co-expressed with pathogen- and signal transduction-related genes, suggesting functional specialization (Jin et al., 2009; Li et al., 2009; Saijo et al., 2009; Christensen et al., 2010). Unfortunately, very limited data are available concerning the preferential expression of different CRT isoforms. Thus, further investigation of the diversity of plant CRT family members is needed to address the structurefunction relationships among plant CRTs. As mentioned in the introduction, plant CRT genes are thought to have arisen as a result of at least two duplication events (Persson et al., 2003). For this reason, and because about 72% of duplicate gene copies are not subject to sequence elimination ( Lawton-Rauh, 2003), we decided to analyze gene duplication among plant CRT genes. To do so, we used the Graphical Sequence Viewer (NCBI software) to analyze chromosomal location of CRT genes. We identified the locations of 197 duplicated CRTs in mono- and dicotyledons. In monocotyledons, we identified loci of 33 out of 35 genes and chromosomal locations for 25 CRTs (Table 1). Zea mays and Phoenix dactylifera contained the largest number of duplicated CRTs: 4 CRT1/2 and 2 CRT3 genes. Most of duplicated genes were on different chromosomes except the Brachypodium distachyon and Musa accuminata genomes. In dicotyledons, we identified loci of 164 out of 165 genes and chromosomal locations for 107 Brassica napus contained the most CRT genes: six in each of the CRT subclasses. Camelina sativa had 6 CRT1/2 gene duplicates and 3 CRT3 duplicates. In dicotyledons, duplicated CRTs were located only at single chromosome (Table 1, e.g. Arabidopsis thaliana, chromosome 1) as well as at different chromosomes (Table 1, e.g. Brassica napus). Taken together, our analysis indicates that CRT gene duplication is widespread in the plant kingdom. Thus, it is striking that we identified only a single copy of each CRT gene in the Arabidopsis thaliana genome, especially given that the Arabidopsis thaliana genome shows evidence of two to four independent duplication events that duplicated 90% of loci (Lawton-Rauh, 2003; Moore and Purugganan, 2005). These duplications are probably reflected by the presence of a putative CRT pseudogene containing four potential exons (Persson et al., 2003). As mentioned by Lehti-Shiu et al., (2017), pseudogenization may cause duplicate genes to be lost. It should be noted that CRT expression is essential for proper embryo development and ontogenesis in animals. For example, homozygous CRT knockout mice are embryonic lethal due to cardiovascular and brain defects (Masaeli et al., 1999; Rauch et al., 2000). Similarly, insertion of a P-element (potpS114307) that disrupted the Drosophila melanogaster calreticulin gene 9

(Crc) caused loss of neurons, disorganization of the peripheral nervous system, and neuronal pathfinding defects during embryogenesis (Salzberg et al., 1997; Prokopenko et al., 2000). In plants, a crucial role for CRT in growth and development is still under debate. We found that post-transcriptional silencing of Petunia hybrida CRT1/2 expression strongly impairs pollen tube elongation that eliminates the sexual reproduction process (Suwińska et al., 2017). However, the latest paper by Vu et al. (2017) indicate that Arabidopsis thaliana triple mutant of CRT1/2/3 grows and develops almost normally while CNX1 gene expression is strictly required for normal pollen development and pollen tube growth. Nevertheless, these authors suggest that CRT and calnexin chaperones complex relationship in the ER is crucial for generative reproduction of this plant. In contrast, Wakasa et al. (2018) clearly showed, that transcriptional gene silencing of endogenous CNX genes did not have any impact for obtaining viable progeny in rice. These findings could reflect evolutionary differences between plants and animals. Alternatively, calnexin family proteins could compensate for loss of CRT expression in plants. Finally, it's possible that some Arabidopsis thaliana CRT genes have not been identified, which would be consistent with our observation that CRT gene duplication is common in plant genomes. In conclusion, we confirmed that multiple CRT gene duplication is common among plant genomes. However, such duplications have not occurred in the Arabidopsis thaliana genome because three CRT homologs (CRT1, CRT2, and CRT3) are present only in one copy. If the Arabidopsis thaliana genome doesn't contain unidentified CRT duplications, then we should carefully consider whether Arabidopsis thaliana should be used as a universal model in studies of the roles of CRT genes in plants.

Acknowledgements The authors thank Michał Opas (University of Toronto, CA), Marek Michalak (University of Alberta, CA), and Deborah J. Frank (Washington University in St. Louis, US) for critical reading of the manuscript. This work was supported by statutory funds from Ministry of Science and Higher Education (PL) for the research program of the Laboratory of Molecular and Isotope Methods (Department of Cellular and Molecular Biology, Nicolaus Copernicus University in Torun, PL).

Conflict of interest

10

The authors declare that they have no conflict of interest.

References

An, Y.Q., Lin, R.M., Wang, F.T., Feng, J., Xu, Y.F., Xu, S.C., 2011. Molecular cloning of a new wheat calreticulin gene TaCRT1 and expression analysis in plant defense responses and abiotic stress resistance. Genet. Mol. Res. 10, 3576-3585. Anup, S., 2015. Causes, consequences and solutions of phylogenetic incongruence. Brief. Bioinform. 16, 536-548. Chase, M.W., Reveal, J.L., 2009. A phylogenetic classification of the land plants to accompany APG III. Bot. J. Linn. Soc. 161, 122–127. Christensen, A., Svensson, K., Thelin, L., Zhang, W., Tintor, N., Prins, D., Funke, N., Michalak, M., Schulze-Lefert, P., Saijo, Y., Sommarin, M., Widell, S., Persson, S., 2010. Higher plant calreticulins have acquired specialized functions in Arabidopsis. PLoS One 5, e11342. Del Bem, L.E., 2011. The evolutionary history of calreticulin and calnexin genes in green plants. Genetica 139, 255-259. Jia, X.Y., He, L.H., Jing, R.L., Li, R.Z., 2009. Calreticulin: conserved protein and diverse functions in plants. Physiol. Plant. 136, 127-138. Jia, X.Y., Xu, C.Y., Jing, R.L., Li, R.Z., Mao, X.G., Wang, J.P., Chang, X.P., 2008. Molecular cloning and characterization of wheat calreticulin (CRT) gene involved in drought-stressed responses. J. Exp. Bot. 59, 739-751. Jin, H., Hong, Z., Su, W., Li, J., 2009. A plant-specific calreticulin is a key retention factor for a defective brassinosteroid receptor in the endoplasmic reticulum. Proc. Natl. Acad. Sci. USA 106, 13612-13617. Lawton-Rauh, A., 2003. Evolutionary dynamics of duplicated genes in plants. Mol. Phylogenet. Evol. 29, 396-409. Lehti-Shiu, M.D., Panchy, N., Wang, P., Uygun, S., Shiu, S.H., 2017. Diversity, expansion, and evolutionary novelty of plant DNA-binding transcription factor families. Biochim. Biophys. Acta. 1860, 3-20. Li, J., Zhao, H.C., Batoux, M., Nekrasov, V., Roux, M., Chinchilla, D., Zipfel, C., Jones, J.D., 2009. Specific ER quality control components required for biogenesis of the plant innate immune receptor EFR. Proc. Natl. Acad. Sci. USA 106, 15973-15978.

11

Li, Z., Komatsu, S., 2000. Molecular cloning and characterization of calreticulin, a calciumbinding protein involved in the regeneration of rice cultured suspension cells. Eur. J. Biochem. 267, 737-745. Liu, J.X., Howell, S.H., 2010. Endoplasmic reticulum protein quality control and its relationship to environmental stress responses in plants. Plant Cell 22, 2930-2942. Mesaeli, N., Nakamura, K., Zvaritch, E., Dickie, P., Dziak, E., Krause, K.H., Opas, M., MacLennan, D.H., Michalak, M., 1999. Calreticulin is essential for cardiac development. J. Cell. Biol. 144, 857-868. Michalak, M., Groenedyk, J., Szabo, E., Gold, L.I., Opas, M., 2009. Calreticulin, a multiprocess calcium-buffering chaperone of the endoplasmic reticulin. Biochem. J. 417, 651-666. Michu, E., 2007. A short guide to phylogeny reconstruction. Plant Soil Environ. 53, 442-446. Moore, R.C., Purugganan, M.D., 2005. The evolutionary dynamics of plant duplicate genes. Curr. Opin. Plant Biol. 8, 122-128. Persson, S., Rosenquist, M., Sommarin, M., 2002. Identification of a novel calreticulin isoform (Crt2) in human and mouse. Gene 297, 151-158. Persson, S., Rosenquist, M., Svensson, K., Galvão, R., Boss, W.F., Sommarin, M. 2003. Phylogenetic analyses and expression studies reveal two distinct groups of calreticulin isoforms in higher plants. Plant Physiol. 133, 1385-1396. Prokopenko, S.N., He, Y., Lu, Y., Bellen, H.J., 2000. Mutations affecting the development of the peripheral nervous system in Drosophila: a molecular screen for novel proteins. Genetics 156, 1691-1715. Qiu, Y., Xi, J., Du, L., Roje, S., Poovaiah, B.W., 2012. A dual regulatory role of Arabidopsis calreticulin-2 in plant innate immunity. Plant J. 69, 489-500. Rauch, F., Prud'homme, J., Arabian, A., Dedhar, S., St-Arnaud, R. 2000. Heart, Brain, and Body Wall Defects in Mice Lacking Calreticulin. Exp. Cell Res. 256, 105-111. Russo, C.A., Takezaki, N., Nei, M., 1996. Efficiencies of different genes and different treebuilding methods in recovering a known vertebratephylogeny. Mol. Biol. Evol. 13, 525-536. Saijo, Y., Tintor, N., Lu, X., Rauf, P., Pajerowska-Mukhtar, K., Häweker, H., Dong, X., Robatzek, S., Schulze-Lefert, P., 2009. Receptor quality control in the endoplasmic reticulum for plant innate immunity. EMBO J. 28, 3439-3449. Salzberg, A., Prokopenko, S.N., He, Y., Tsai, P., Pál, M., Maróy, P., Glover, D.M., Deák, P., Bellen, H.J., 1997. P-element insertion alleles of essential genes on the third 12

chromosome of Drosophila melanogaster: mutations affecting embryonic PNS development. Genetics 147, 1723-1741. Suwińska, A., Wasąg, P., Zakrzewski, P., Lenartowska, M., Lenartowski, R., 2017. Calreticulin is required for calcium homeostasis and proper pollen tube tip growth in Petunia. Planta 245, 909-926. Tamura, K., Stecher, G., Peterson, D., Filipski, A., Kumar, S., 2013. MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Mol. Biol. Evol. 30, 2725-2729. Thelin, L., Mutwil, M., Sommarin, M., Persson, S., 2011. Diverging functions among calreticulin isoforms in higher plants. Plant Signal. Behav. 6, 905-910. Vu, K.V., Nguyen, N.T., Jeong, C.Y., Lee, Y.H., Lee, H., Hong, S.W., 2017. Systematic deletion of the ER lectin chaperone genes reveals their roles in vegetative growth and male gametophyte development in Arabidopsis. Plant J. 89, 972-983. Wakasa, Y., Kawakatsu, T., Harada, T., Takaiwa, F., 2018. Transgene-independent heredity of RdDM-mediated transcriptional gene silencing of endogenous genes in rice. Biotechnol. J. 16, 2007-2015.

Figure legends

Figure 1 Phylogenetic tree of monocotyledon and dicotyledon CRT1/2 amino acid sequences. The tree was constructed by using the Neighbor-Joining method with Maximum Composite Likelihood correction model and included 97 amino acid sequences obtained from the NCBI and KEGG databases NCBI accession number of each sequence is shown after an underscore. A number or ‘L’ (like) in the species name describes a specific CRT homolog or CRT homolog-like sequence. The lack of numbers or letters indicate unclassified CRT homologs. The Selaginella moellendorffii CRT sequence was used as an outgroup (GenBank accession number XM_002968285.2). Accession numbers for the predicted amino acid sequences were identical to those of the corresponding cDNAs. The numbers at nodes indicate percentage levels of bootstrap support based on 1000 replicates. No values are given for groups with bootstrap values less than 70%. The colored circles/squares indicate plant orders as indicated in the key.

Figure 2 Phylogenetic tree of monocotyledons and dicotyledon CRT3 amino acid sequences. The tree was constructed by using the Neighbor-Joining method with Maximum Composite Likelihood correction model and included 103 amino acid sequences obtained 13

from the NCBI and KEGG databases NCBI accession number of each sequence is shown after an underscore. A number or ‘L’ (like) in the species name describes a specific CRT homolog or CRT homolog-like sequence. The lack of numbers or letters indicate unclassified CRT homologs. The Selaginella moellendorffii CRT sequence was used as an outgroup (GenBank accession number XM_002968285.2). Accession numbers for the predicted amino acid sequences were identical to those of the corresponding cDNAs. The numbers at nodes indicate percentage levels of bootstrap support based on 1000 replicates. No values are given for groups with bootstrap values less than 70%. The colored circles/squares indicate plant orders as indicated in the key.

Supplementary Figure 1 Comparative phylogenetic analysis of monocotyledon and dicotyledon CRT1/2 nucleotide sequences. The tree was constructed by using the NeighborJoining method with Maximum Composite Likelihood correction model and included 97 cDNAs obtained from the NCBI and KEGG databases. The numbers at nodes indicate percentage levels of bootstrap support (1000 replicates) more than 70%. The colored circles/squares indicate plant orders as indicated in the key.

Supplementary Figure 2 Comparative phylogenetic analysis of monocotyledon and dicotyledon CRT3 cDNAs. The tree was constructed by using the Neighbor-Joining method with Maximum Composite Likelihood correction model and included 103 cDNAs obtained from the NCBI and KEGG databases. The numbers at nodes indicate percentage levels of bootstrap support (1000 replicates) more than 70%. The colored circles/squares indicate plant orders as indicated in the key.

Table 1 CRT genes from monocotyledons and dicotyledons.

14

Camelina sativa1L XM 010416676.1 Camelina sativa1L XM 010512970.1 Camelina sativa1L XM 010512972.1 Camelina sativa1L XM 010512969.1 Arabidopsis thaliana1 NM 104513.4 Camelina sativa1 XM 010481938.1 Camelina sativa1L XM 010420638.1

I

Brassica rapa1 XM 009124734.1 Brassica oleracea1L XM 013741554.1 Brassica napus1 XM 013832261.1 Brassica oleracea1 XM 013755944.1

A

Brassica rapa1L XM 009115141.1 Brassica napus1L XM 013826395.1 Eutrema salsugineum XM 006392374.1 Brassica napus2L NM 001316226.1 Arabidopsis lyrata2 XM 002892446.1 Arabidopsis thaliana2 NM 100791.3 Brassica rapa2L XM 009149924.1 Brassica napus2 XM 013785569.1

II

Brassica oleracea2L XM 013730198.1 Eutrema salsugineum XM 006417533.1 Brassica rapa2 XM 009112594.1 Brassica oleracea2 XM 013744509.1 Brassica napus2L XM 013895419.1 Brassica napus2 XM 013802231.1 Tarenaya hassleriana1L XM 010555420.1 Tarenaya hassleriana1 XM 010551233.1 Tarenaya hassleriana XM 010545897.1 Vitis vinifera XM 002270318.3 Citrus sinensis XM 006472123.1 Citrus clementina XM 006433460.1

III

Gossypium raimondii XM 012607467.1 Gossypium raimondiiL XM 012607469.1 Gossypium raimondiiL XM 012608554.1

IV

Gossypium raimondiiL XM 012578814.1 Theobroma cacao2 XM 007031074.1 Nelumbo nucifera XM 010269711.1 Ricinus communis NM 001323723.1 Jatropha curcas XM 012233481.1

V

Populus trichocarpa XM 002318921.2 Populus euphratica XM 011025463.1

VI

Populus euphraticaL XM 011038643.1 Beta vulgaris NM 001303065.1 Prunus persica XM 007207409.1 Prunus mume XM 008247687.2 Prunus mume XM 008247679.2 Pyrus bretschneideriL XM 009339914.1 Malus domestica XM 008372162.2 Pyrus bretschneideri XM 009375002.1 Malus domesticaL XM 008390264.2 Fragaria vesca XM 004302196.2

B

Poales Zingiberales Arecales Brassicales Caryophyllales Cucurbitales Fabales Lamiales Malpighiales Malvales Myrtales Proteales Rosales Sapindales Solanales Vitales

Fragaria vesca XM 004302196.2 Vigna radiata XM 014659207.1 Vigna angularis XM 017559813.1 Phaseolus vulgaris XM 007144933.1

VII

Glycine max XM 003555759.2 Glycine max1 NM 001249422.2 Medicago truncatula XM 003591164.2 Cicer arietinum XM 004495609.1 Arachis duranensis XM 016084233.1 Arachis ipaensis XM 016319753.1 Cucumis sativus XM 004144709.2 Cucumis melo XM 008455449.2

VIII IX X

Eucalyptus grandis XM 010025820.1 Eucalyptus grandisL XM 010041253.1 Vitis vinifera XM 002282365.3 Ziziphus jujuba XM 016019436.1 Erythranthe guttatusL XM 012978362.1 Erythranthe guttatusL XM 012978359.1

XI XII

Sesamum indicum XM 011093838.1 Sesamum indicum XM 011071083.1 Sesamum indicumL XM 011093839.1 Petunia hybrida HG738129.1 Nicotiana tomentosiformis XM 009631851.1 Nicotiana sylvestris XM 009806291.1

XIII

Solanum tuberosum XM 006344690.2 Solanum lycopersicum XM 004230251.2

XIV

Solanum pennellii XM 015200012.1 Zea mays1 Z46772.1 Zea mays AF190454.1 Zea mays2 NM 001309255.1 Zea maysL XM 008671858.1 Setaria italica XM 004987369.2 Oryza sativa XM 015791681.1 Brachypodium distachyon XM 003563118.3

C

Setaria italicaL XM 004981102.1 Oryza sativaL XM 015773252.1 Musa acuminataL XM 009386726.1 Musa acuminataL XM 009404234.1 Musa acuminata XM 009394905.1 Phoenix dactyliferaL XM 008799239.2

XV

Elaeis guineensisL XM 010933456.1 Phoenix dactyliferaL XM 008810437.2 Elaeis guineensisL XM 010916251.1 Phoenix dactyliferaL XM 008797745.2 Elaeis guineensisL XM 010941814.1 Phoenix dactyliferaL XM 008782872.2 Elaeis guineensisL XM 010923203.1 Selaginella moellendorffii NW 003314271.1

XVI XVII

Poales Zingiberales Arecales Brassicales Caryophyllales Cucurbitales Fabales Lamiales Malpighiales Malvales Myrtales Proteales Rosales Sapindales Solanales Vitales

Brassica napus3L XM 013802156.1 Brassica napus3L XM 013802190.1

I

Brassica rapa3 XM 009112645.1 Brassica napus3 XM 013823232.1 Brassica napus3 XM 013823267.1 Brassica oleracea3 XM 013750040.1 Brassica oleracea3 XM 013748922.1 Brassica napus3L XM 013817189.1 Brassica rapa3L XM 009120127.1 Brassica napus3L XM 013860963.1

II

Arabidopsis thaliana3 NM 100718.4 Arabidopsis lyrata XM 002892411.1 Camelina sativa3L XM 010459780.1 Camelina sativa3L XM 010477342.1 Camelina sativa3 XM 010490578.1 Eutrema salsugineum XM 006417626.1 Tarenaya hassleriana3 XM 010546033.1 Jatropha curcas3 XM 012209823.1 Populus trichocarpa XM 006372771.1 Populus euphratica3L XM 011038829.1

III

Populus euphratica3L XM 011013222.1 Populus euphratica3L XM 011009105.1 Theobroma cacao3 XM 007033822.1 Gossypium raimondii3 XM 012626068.1

IV

Cucumis sativus3 XM 004152104.2 Cucumis melo3 XM 008455899.2 Medicago truncatula XM 003606771.2 Cicer arietinum3 XM 004507292.2 Arachis duranensis3 XM 016096631.1 Arachis ipaensis3 XM 016331622.1

V

Glycine max3L XM 003537880.3 Glycine max3L XM 003541101.3 Phaseolus vulgaris XM 007131872.1 Vigna radiata3 XM 014639044.1 Vigna angularis3 XM 017578410.1 Citrus sinensis3 XM 006478683.2 Citrus clementina XM 006442909.1

VI

Ziziphus jujuba3 XM 016031115.1 Fragaria vesca3L XM 004306542.2 Prunus persica XM 007222486.1 Prunus mume3 XM 008224203.1

VII

Malus domestica3 XM 008370955.2 Pyrus bretschneideri3 XM 009342892.1 Beta vulgaris3L XM 010673956.1 Eucalyptus grandis3L XM 010048763.1 Eucalyptus grandis3L XM 010061735.1

VIII

Vitis vinifera3L XM 010652325.1 Nelumbo nucifera3L XM 010260212.1 Nelumbo nucifera3L XM 010244439.1 Pyrus bretschneideri3L XM 009342088.1 Pyrus bretschneideri3L XM 009342102.1 Pyrus bretschneideri3L XM 009363311.1 Malus domestica3L XM 008372617.2

IX

Poales Zingiberales Arecales Brassicales Caryophyllales Cucurbitales Fabales Lamiales Malpighiales Malvales Myrtales Proteales Rosales Sapindales Solanales Vitales

Nelumbo nucifera3L XM 010244439.1 Pyrus bretschneideri3L XM 009342088.1 Pyrus bretschneideri3L XM 009342102.1 Pyrus bretschneideri3L XM 009363311.1 Malus domestica3L XM 008372617.2 Malus domestica3L XM 008390596.2

X

Prunus mume3L XM 008220248.2 Fragaria vesca3L XM 004302130.2 Ziziphus jujuba3L XM 016020267.1 Medicago truncatula XM 003624156.2 Cicer arietinum3L XM 004492800.2

XI

Vigna angularis3L XM 017564087.1 Vigna radiata3L XM 014667712.1 Phaseolus vulgaris XM 007139646.1

XII

Glycine max3 XM 003534452.3 Arachis ipaensis3L XM 016338558.1 Arachis duranensis3L XM 016106472.1

A

Beta vulgaris3L XM 010675591.1 Eucalyptus grandis3L XM 010040093.1 Vitis vinifera3L XM 002276397.2

XIII

Ricinus communis3 XM 002514771.2 Jatropha curcas3L XM 012230400.1 Populus trichocarpa XM 002325873.2

XIV

Populus euphratica3L XM 011009804.1 Citrus sinensis3L XM 006480403.2 Citrus clementina XM 006428587.1 Theobroma cacao3 XM 007029130.1 Gossypium raimondii3L XM 012615148.1

XV XVI

Sesamum indicum3 XM 011078845.1 Erythranthe guttatus3 XM 012979176.1 Solanum lycopersicum3L XM 004237507.2 Solanum pennellii3L XM 015216285.1 Solanum tuberosum3L XM 006339981.2 Nicotiana tomentosiformis3L XM 009600582.1 Nicotiana sylvestris3L XM 009787923.1

XVII

Nicotiana tomentosiformis3L XM 009605316.1 Nicotiana sylvestris3L XM 009785800.1 Solanum tuberosum3L XM 006345720.2 Solanum lycopersicum3 XM 004239608.2 Solanum pennellii3 XM 015221015.1 Phoenix dactylifera3L XM 008792368.2 Elaeis guineensis3 XM 010929591.1

XVIII

Phoenix dactylifera3L XM 008789997.2 Musa acuminata3L XM 009385531.1 Musa acuminata3L XM 009387175.1

XIX

Sorghum bicolor XM 002456729.1 Zea mays3 NM 001147729.1 Setaria italica3L XM 004970829.3

XX

Oryza sativa3 XM 015765355.1 Brachypodium distachyon3 XM 003564777.3 Oryza sativa3 XM 015785313.1 Brachypodium distachyon3L XM 003566015.3 Setaria italica3L XM 004961520.2 Sorghum bicolor XM 002439993.1 Zea mays NM 001174987.1 Selaginella moellendorffii NW 003314271.1

XXI

Poales Zingiberales Arecales Brassicales Caryophyllales Cucurbitales Fabales Lamiales Malpighiales Malvales Myrtales Proteales Rosales Sapindales Solanales Vitales

Table 1 CRT genes from monocotyledons and dicotyledons. ACCESSION NUMBER SPECIES

E.guineensis

P.dactylifera

B.distachyon

O.sativa

S.bicolor

S.italica

Z.mays

M.acuminata

PROTEIN ISOFORM

CHROMOSOME

LOCUS

NUMBER OF EXONS

POTENTIAL GLYCOSYLATION SITE (aa)

NUCLEOTIDE SEQUENCE

PROTEIN SEQUENCE

XM_010916251.1

XP_010914553.1

* CRT L

2

105039926

15

58, 128, 158

XM_010923203.1

XP_010921505.1

* CRT L

5

105045043

14

62

XM_010933456.1

XP_010931758.1

* CRT L

10

105052598

14

58

XM_010941814.1

XP_010940116.1

* CRT L

15

105058770

14

53

XM_010929591.1

XP_010927893.1

* CRT3

8

105049829

14

100

XM_008782872.2

XP_008781094.1

* CRT L

UN

103700964

14

62

XM_008789997.2

XP_008788219.1

* CRT3 L

UN

103706042

14

99

XM_008797745.2

XP_008795967.1

* CRT L

UN

103711552

14

53

XM_008799239.2

XP_008797461.1

* CRT L

UN

103712656

14

58

XM_008810437.2

XP_008808659.1

* CRT L

UN

103720634

15

58, 128, 158

XM_008792368.2

XP_008790590.1

* CRT3 L

UN

103707753

14

100

XM_003563118.3

XP_003563166.1

* CRT

1

100841311

13

56, 391

XM_003566015.3

XP_003566063.1

* CRT3 L

2

100838375

14

93

XM_003564777.3

XP_003564825.1

* CRT3

2

100837158

14

-

XM_015765355.1

XP_015620841.1

* CRT3

1

9267167

14

-

XM_015773252.1

XP_015628738.1

* CRT L

3

4334675

13

57

XM_015785313.1

XP_015640799.1

* CRT3

5

4339262

14

96

XM_015791681.1

XP_015647167.1

* CRT1

7

4342826

13

61

XM_002456729.1

XP_002456774.1

* CRT

3

SORBIDRAFT_03g042500

14

94

XM_002439993.1

XP_002440038.1

* CRT

9

SORBIDRAFT_09g024930

14

-

XM_004961520.2

XP_004961577.1

* CRT3 L

III

101777678

14

90

XM_004970829.3

XP_004970886.2

* CRT3 L

V

101761253

14

96

XM_004981102.1

XP_004981159.1

* CRT L

IX

101762276

14

61

XM_004987369.2

XP_004987426.2

* CRT

UN

101774614

13

57

XM_008671858.1

XP_008670080.1

* CRT L

2

103647314

14

57

NM_001147729.1

NP_001141201.1

CRT3

8

100273288

14

98

NM_001309255.1

NP_001296184.1

CRT2

7

ZEAMMB73_853300

15

57

Z46772.1

CAA86728.1

CRT1

-

-

-

57

AF190454.1

AAF01470.1

CRT

-

-

-

57

NM_001174987.1

NP_001168458.1

* CRT

6

100382232

14

-

XM_009394905.1

XP_009393180.1

* CRT

3

103978936

14

57

XM_009404234.1

XP_009402509.1

* CRT L

5

103986277

14

57

XM_009385531.1

XP_009383806.1

* CRT3 L

11

103971503

14

102

XM_009386726.1

XP_009385001.1

* CRT L

11

103972398

14

57, 407

XM_009387175.1

XP_009385450.1

* CRT3 L

UN

103972813

14

100

NUMBER OF DUPLICATIONS CRT1/2

CRT3

4

1

4

2

1

2

1

2

-

2

2

2

4

2

3

2

Poales Zingiberales Arecales Brassicales Caryophyllales Cucurbitales Fabales Lamiales Malpighiales Malvales Myrtales Proteales Rosales Sapindales Solanales Vitales

Table 1 cont. ACCESSION NUMBER SPECIES

A.lyrata

A.thaliana

B.napus

B.oleracea

B.rapa

PROTEIN ISOFORM

CHROMOSOME

LOCUS

NUMBER OF EXONS

POTENTIAL GLYCOSYLATION SITE (aa)

NUCLEOTIDE SEQUENCE

PROTEIN SEQUENCE

XM_002892411.1

XP_002892457.1

UN

ARALYDRAFT_888076

14

96

XM_002892446.1

XP_002892492.1

CRT2

UN

ARALYDRAFT_888161

13

59

NM_104513.4

NP_176030.1

CRT1

1

AT1G56340

12

59, 154, 399

NM_100718.4

NP_563816.1

CRT3

1

AT1G08450

14

97

NM_100791.3

NP_172392.1

CRT2

1

AT1G09210

13

59

XM_013826395.1

XP_013681849.1

* CRT1 L

A2

106386552

12

59, 154

XM_013895419.1

XP_013750873.1

* CRT2 L

A5

106453163

12

54, 59

XM_013785569.1

XP_013641023.1

* CRT2

A6

106346289

12

59

XM_013802156.1

XP_013657610.1

* CRT3 L

A8

106362288

14

96

XM_013802190.1

XP_013657644.1

* CRT3 L

A8

106362312

14

96

XM_013802231.1

XP_013657685.1

* CRT2

A8

106362348

12

54, 59

XM_013817189.1

XP_013672643.1

* CRT3 L

C1

106377035

14

96

XM_013823232.1

XP_013678686.1

* CRT3

C3

106383113

14

100

XM_013823267.1

XP_013678721.1

* CRT3

C3

106383132

14

100

XM_013832261.1

XP_013687715.1

* CRT1

C4

106391579

12

59, 154

XM_013860963.1

XP_013716417.1

* CRT3 L

UN

106420124

14

96

NM_001316226.1

NP_001303155.1

CRT2 L

C8

106415836

12

40 (out of range)

XM_013755944.1

XP_013611398.1

* CRT1

C9

106318094

12

59, 154

XM_013750040.1

XP_013605494.1

* CRT3

C8

106312492

14

100

XM_013748922.1

XP_013604376.1

* CRT3

C8

106311670

14

96

XM_013744509.1

XP_013599963.1

* CRT2

C8

106307530

12

54, 59

XM_013730198.1

XP_013585652.1

* CRT2 L

C5

106294595

12

59

XM_013741554.1

XP_013597008.1

* CRT1 L

C1

106305155

12

15, 69, 164 (out of range)

XM_009149924.1

XP_009148172.1

* CRT2 L

A6

103871649

12

59

XM_009112594.1

XP_009110842.1

* CRT2

A8

103836348

12

54, 59

XM_009112645.1

XP_009110893.1

* CRT3

A8

103836390

14

96

XM_009115141.1

XP_009113389.1

* CRT1 L

A9

103838690

12

59, 154

XM_009120127.1

XP_009118375.1

* CRT3 L

A9

103843398

14

96

XM_009124734.1

XP_009122982.1

* CRT1

UN

103847648

12

5, 59, 154

* CRT

NUMBER OF DUPLICATIONS CRT1/2

CRT3

1

1

2

1

6

6

4

2

4

2

Poales Zingiberales Arecales Brassicales Caryophyllales Cucurbitales Fabales Lamiales Malpighiales Malvales Myrtales Proteales Rosales Sapindales Solanales Vitales

Table 1 cont. ACCESSION NUMBER SPECIES

C.sativa

E.salsugineum

T.hassleriana

B.vulgaris

C.melo

C.sativus

A.duranensis

A.ipaensis

C.arietinum

PROTEIN ISOFORM

CHROMOSOME

LOCUS

NUMBER OF EXONS

POTENTIAL GLYCOSYLATION SITE (aa)

NUCLEOTIDE SEQUENCE

PROTEIN SEQUENCE

XM_010481938.1

XP_010480240.1

* CRT1

17

104758963

12

59, 399

XM_010477342.1

XP_010475644.1

* CRT3 L

17

104755026

14

96

XM_010459780.1

XP_010458082.1

* CRT3 L

14

104739432

14

96

XM_010420638.1

XP_010418940.1

* CRT1 L

7

104704579

11

59, 256

XM_010416676.1

XP_010414978.1

* CRT1 L

7

104701045

12

59, 399

XM_010512972.1

XP_010511274.1

* CRT1 L

5

104787394

12

59, 399

XM_010512970.1

XP_010511272.1

* CRT1 L

5

104787392

12

59, 399

XM_010512969.1

XP_010511271.1

* CRT1 L

5

104787391

12

59, 399

XM_010490578.1

XP_010488880.1

* CRT3

3

104766654

14

96

XM_006392374.1

XP_006392436.1

* CRT

UN

EUTSA_v10023481mg

13

54, 59, 154

XM_006417533.1

XP_006417596.1

* CRT

UN

EUTSA_v10007717mg

13

54, 59

XM_006417626.1

XP_006417689.1

* CRT

UN

EUTSA_v10007710mg

14

98

XM_010551233.1

XP_010549535.1

* CRT1

UN

104820667

13

5, 54, 154, 399

XM_010545897.1

XP_010544199.1

* CRT

UN

104816886

13

54, 154, 395

XM_010546033.1

XP_010544335.1

* CRT3

UN

104816983

15

99, 270

XM_010555420.1

XP_010553722.1

* CRT1 L

UN

104823722

13

55, 155, 401

XM_010673956.1

XP_010672258.1

* CRT3 L

3

104888853

14

95

XM_010675591.1

XP_010673893.1

* CRT3 L

4

104890198

14

101

NM_001303065.1

NP_001289994.1

CRT

4

104890403

14

7, 57, 157

XM_008455449.2

XP_008453671.1

* CRT

UN

103494317

13

56, 156

XM_008455899.2

XP_008454121.1

* CRT3

UN

103494621

14

105

XM_004144709.2

XP_004144757.1

* CRT

2

101205515

13

56, 156

XM_004152104.2

XP_004152152.1

* CRT3

4

101203114

14

105

XM_016084233.1

XP_015939719.1

* CRT

A09

107465251

14

56, 156

XM_016106472.1

XP_015961958.1

* CRT3 L

A04

107485938

14

97

XM_016096631.1

XP_015952117.1

* CRT3

A03

107476754

15

6, 99

XM_016331622.1

XP_016187108.1

* CRT3

B03

107628970

14

6, 99

XM_016338558.1

XP_016194044.1

* CRT3 L

B04

107635173

14

97

XM_016319753.1

XP_016175239.1

* CRT

B09

107617877

14

56, 156

XM_004492800.2

XP_004492857.1

* CRT3 L

Ca3

101514432

14

4, 94

XM_004495609.1

XP_004495666.1

* CRT

Ca4

101511865

14

56, 156, 290, 397

XM_004507292.2

XP_004507349.1

* CRT3

Ca6

101501628

14

91

NUMBER OF DUPLICATIONS CRT1/2

CRT3

6

3

2

1

3

1

1

2

1

1

1

1

1

2

1

2

1

2

Poales Zingiberales Arecales Brassicales Caryophyllales Cucurbitales Fabales Lamiales Malpighiales Malvales Myrtales Proteales Rosales Sapindales Solanales Vitales

Table 1 cont. ACCESSION NUMBER SPECIES

G.max

M.truncatula

P.vulgaris

V.angularis

V.radiata

E.guttatus

S.indicum

J.curcas

P.euphratica

PROTEIN ISOFORM

CHROMOSOME

LOCUS

NUMBER OF EXONS

POTENTIAL GLYCOSYLATION SITE (aa)

NUCLEOTIDE SEQUENCE

PROTEIN SEQUENCE

XM_003534452.3

XP_003534500.1

* CRT3

9

100776652

14

4, 95

XM_003537880.3

XP_003537928.1

* CRT3 L

11

100776524

14

6 (out of range)

XM_003541101.3

XP_003541149.1

* CRT3 L

12

100802428

15

5 (out of range)

XM_003555759.2

XP_003555807.1

* CRT

20

100811997

14

58, 158, 399

NM_001249422.2

NP_001236351.1

CRT1

10

100037475

14

56, 156, 397

XM_003624156.2

XP_003624204.1

CRT

7

MTR_7g080370

14

4, 94

XM_003606771.2

XP_003606819.1

CRT

4

MTR_4g068080

14

93

XM_003591164.2

XP_003591212.1

CRT

1

MTR_1g083960

13

56, 156, 290

XM_007131872.1

XP_007131934.1

* CRT

11

PHAVU_011G053000g

14

5 (out of range)

XM_007144933.1

XP_007144995.1

* CRT

7

PHAVU_007G200800g

14

58, 158, 399

XM_007139646.1

XP_007139708.1

* CRT

8

PHAVU_008G052500g

14

4, 94

XM_017559813.1

XP_017415302.1

* CRT

2

108326352

14

58, 158, 399

XM_017578410.1

XP_017433899.1

* CRT3

8

108340819

14

3 (out of range)

XM_017564087.1

XP_017419576.1

* CRT3 L

3

108329735

14

4, 94

XM_014639044.1

XP_014494530.1

* CRT3

2

106756570

14

3 (out of range)

XM_014659207.1

XP_014514693.1

* CRT

8

106772667

14

58, 158, 399

XM_014667712.1

XP_014523198.1

* CRT3 L

UN

106779579

14

4, 94

XM_012979176.1

XP_012834630.1

* CRT3

UN

105955452

14

96

XM_012978362.1

XP_012833816.1

* CRT L

UN

105954683

14

63

XM_012978359.1

XP_012833813.1

* CRT L

UN

105954682

14

63

XM_011078845.1

XP_011077147.1

* CRT3

LG4

105161224

14

97

XM_011093838.1

XP_011092140.1

* CRT

LG10

105172424

14

63, 163

XM_011093839.1

XP_011092141.1

* CRT L

LG10

105172425

14

163 (out of range)

XM_011071083.1

XP_011069385.1

* CRT

UN

105155213

14

63, 163

XM_012209823.1

XP_012065213.1

* CRT3

UN

105628416

14

-

XM_012230400.1

XP_012085790.1

* CRT3 L

UN

105644897

14

97

XM_012233481.1

XP_012088871.1

* CRT

UN

105647415

16

52, 152

XM_011025463.1

XP_011023765.1

* CRT

UN

105125151

14

52, 152

XM_011013222.1

XP_011011524.1

* CRT3 L

UN

105116058

13

103

XM_011009804.1

XP_011008106.1

* CRT3 L

UN

105113578

14

93

XM_011009105.1

XP_011007407.1

* CRT3 L

UN

105113086

13

103

XM_011038829.1

XP_011037131.1

* CRT3 L

UN

105134424

13

103

XM_011038643.1

XP_011036945.1

* CRT L

UN

105134292

14

52, 152

NUMBER OF DUPLICATIONS CRT1/2

CRT3

2

3

1

2

1

2

1

2

1

2

2

1

3

1

1

2

2

4

Poales Zingiberales Arecales Brassicales Caryophyllales Cucurbitales Fabales Lamiales Malpighiales Malvales Myrtales Proteales Rosales Sapindales Solanales Vitales

Table 1 cont. ACCESSION NUMBER SPECIES

P.trichocarpa

R.communis

G.raimondii

T.cacao

E.grandis

N.nucifera

F.vesca

M.domestica

PROTEIN ISOFORM

CHROMOSOME

LOCUS

NUMBER OF EXONS

POTENTIAL GLYCOSYLATION SITE (aa)

NUCLEOTIDE SEQUENCE

PROTEIN SEQUENCE

XM_002318921.2

XP_002318957.1

CRT

LGXIII

POPTR_0013s01090g

14

52

XM_002325873.2

XP_002325909.1

* CRT

LGXIX

POPTR_0019s08290g

14

93

XM_006372771.1

XP_006372833.1

* CRT

LGXVII

POPTR_0017s05490g

13

103

XM_002514771.2

XP_002514817.1

* CRT3

UN

8274259

14

98

NM_001323723.1

NP_001310652.1

CRT

UN

8269812

14

52, 152

XM_012607467.1

XP_012462921.1

* CRT

13

105782615

14

60, 160

XM_012608554.1

XP_012464008.1

* CRT L

13

105783227

13

56

XM_012607469.1

XP_012462923.1

* CRT L

13

105782616

14

60, 160

XM_012578814.1

XP_012434268.1

* CRT L

7

105761112

14

56, 156

XM_012626068.1

XP_012481522.1

* CRT3

5

105796374

14

7, 97

XM_012615148.1

XP_012470602.1

* CRT3 L

3

105788317

14

94

XM_007029130.1

XP_007029191.1

CRT3

5

TCM_025085

14

93

XM_007031074.1

XP_007031136.1

CRT2

5

TCM_026754

14

7, 56, 156, 398

XM_007033822.1

XP_007033884.1

CRT3

4

TCM_019985

16

97

XM_010048763.1

XP_010047065.1

* CRT3 L

UN

104435995

14

-

XM_010061735.1

XP_010060037.1

* CRT3 L

UN

104447947

14

-

XM_010040093.1

XP_010038395.1

* CRT3 L

UN

104426918

14

105

XM_010025820.1

XP_010024122.1

* CRT

UN

104414665

14

60, 160

XM_010041253.1

XP_010039555.1

* CRT L

UN

104428253

14

62, 162

XM_010260212.1

XP_010258514.1

* CRT3 L

UN

104598256

14

99

XM_010269711.1

XP_010268013.1

* CRT

UN

104605095

14

57, 157

XM_010244439.1

XP_010242741.1

* CRT3 L

UN

104587015

14

97

XM_004302130.2

XP_004302178.1

* CRT3 L

LG6

101294184

14

97

XM_004302196.2

XP_004302244.1

* CRT

LG6

101315062

12

7, 56, 156, 304

XM_004306542.2

XP_004306590.1

* CRT3 L

LG7

101312475

15

96

XM_008390596.2

XP_008388818.1

* CRT3 L

12

103451159

14

99

XM_008390264.2

XP_008388486.1

* CRT L

12

103450863

13

8, 57, 157

XM_008372162.2

XP_008370384.1

* CRT

4

103433875

13

7, 57, 157

XM_008372617.2

XP_008370839.1

* CRT3 L

4

103434283

14

97

XM_008370955.2

XP_008369177.1

* CRT3

4

103432753

14

104

NUMBER OF DUPLICATIONS CRT1/2

CRT3

1

2

1

1

4

2

1

2

2

3

1

2

1

2

2

3

Poales Zingiberales Arecales Brassicales Caryophyllales Cucurbitales Fabales Lamiales Malpighiales Malvales Myrtales Proteales Rosales Sapindales Solanales Vitales

Table 1 cont. ACCESSION NUMBER SPECIES

P.bretschneideri

P.mume

P.persica

Z.jujuba

C.clementina

C.sinensis

N.sylvestris

N.tomentosiformis

P.hybrida S.lycopersicum

S.pennellii

PROTEIN ISOFORM

CHROMOSOME

LOCUS

NUMBER OF EXONS

POTENTIAL GLYCOSYLATION SITE (aa)

NUCLEOTIDE SEQUENCE

PROTEIN SEQUENCE

XM_009363311.1

XP_009361586.1

* CRT3 L

UN

103951844

14

97

XM_009375002.1

XP_009373277.1

* CRT

UN

103962308

13

8, 57, 157

XM_009339914.1

XP_009338189.1

* CRT L

UN

103930566

13

8, 57, 157

XM_009342088.1

XP_009340363.1

* CRT3 L

UN

103932466

14

97

XM_009342102.1

XP_009340377.1

* CRT3 L

UN

103932480

14

97

XM_009342892.1

XP_009341167.1

* CRT3

UN

103933225

14

104

XM_008247679.2

XP_008245901.1

* CRT

LG1

103344047

13

7, 56, 156

XM_008247687.2

XP_008245909.1

* CRT

LG1

103344056

13

7, 56, 156

XM_008220248.2

XP_008218470.1

* CRT3 L

LG1

103318812

14

105

XM_008224203.1

XP_008222425.1

* CRT3

LG2

103322299

14

97

XM_007207409.1

XP_007207471.1

* CRT

UN

ppa006226mg

13

7, 56, 156

XM_007222486.1

XP_007222548.1

* CRT

UN

ppa006217mg

14

97

XM_016019436.1

XP_015874922.1

* CRT

2

107411779

14

57, 157

XM_016031115.1

XP_015886601.1

* CRT3

7

107421787

14

99

XM_016020267.1

XP_015875753.1

* CRT3 L

2

107412492

14

96, 237

XM_006433460.1

XP_006433523.1

* CRT

UN

CICLE_v10001298mg

14

5, 55, 155

XM_006428587.1

XP_006428650.1

* CRT

UN

CICLE_v10011848mg

14

86

XM_006442909.1

XP_006442972.1

* CRT

UN

CICLE_v10020358mg

14

90

XM_006472123.1

XP_006472186.1

* CRT

3

102618403

14

5, 55, 155

XM_006478683.2

XP_006478746.1

* CRT3

5

102608719

14

90

XM_006480403.2

XP_006480466.1

* CRT3 L

6

102611212

14

86

XM_009806291.1

XP_009804593.1

* CRT

UN

104249802

14

8, 59, 159

XM_009785800.1

XP_009784102.1

* CRT3 L

UN

104232563

14

108

XM_009787923.1

XP_009786225.1

* CRT3 L

UN

104234366

14

102

XM_009631851.1

XP_009630146.1

* CRT

UN

104120134

14

8, 59, 159

XM_009605316.1

XP_009603611.1

* CRT3 L

UN

104098551

14

105

XM_009600582.1

XP_009598877.1

* CRT3 L

UN

104094618

15

102

HG738129.1

CDJ26237.1

-

-

XM_004230251.2

XP_004230299.1

* CRT

1

101246093

13

59, 159

XM_004237507.2

XP_004237555.1

* CRT3 L

4

101262304

14

88

XM_004239608.2

XP_004239656.1

* CRT3

5

101248677

14

105

XM_015200012.1

XP_015055498.1

* CRT

1

107002089

13

59, 159

XM_015216285.1

XP_015071771.1

* CRT3 L

4

107015866

14

88

XM_015221015.1

XP_015076501.1

* CRT3

5

107020585

14

105

CRT1

-

8, 59, 159

NUMBER OF DUPLICATIONS CRT1/2

CRT3

2

4

2

2

1

1

1

2

1

2

1

2

1

2

1

2

1

-

1

2

1

2

Poales Zingiberales Arecales Brassicales Caryophyllales Cucurbitales Fabales Lamiales Malpighiales Malvales Myrtales Proteales Rosales Sapindales Solanales Vitales

Table 1 cont. ACCESSION NUMBER SPECIES

S.tuberosum

V.vinifera

PROTEIN ISOFORM

CHROMOSOME

LOCUS

NUMBER OF EXONS

POTENTIAL GLYCOSYLATION SITE (aa)

NUCLEOTIDE SEQUENCE

PROTEIN SEQUENCE

XM_006339981.2

XP_006340043.1

* CRT3 L

UN

102606145

14

87

XM_006344690.2

XP_006344752.1

* CRT

UN

102603479

14

59, 159

XM_006345720.2

XP_006345782.1

* CRT3 L

UN

102581294

14

105

XM_002276397.2

XP_002276433.2

* CRT3 L

4

100265011

15

6, 102

XM_010652325.1

XP_010650627.1

* CRT3 L

5

100267984

14

95

XM_002270318.3

XP_002270354.1

* CRT

7

100264203

14

53, 392

XM_002282365.3

XP_002282401.1

* CRT

14

100256319

15

58

NUMBER OF DUPLICATIONS CRT1/2

CRT3

1

2

2

2

The stars used in the table indicate sequences at the different identification levels as follows: predicted, hypothetical and uncharacterized sequences. A ‘L’ letter (like) in the protein isoform type describes a CRT homolog-like sequence. The lack of numbers or letters indicate unclassified CRT homologs. The potential glycosylation sites indicating a isoform type are delineated by bold. The colored circles/squares indicate plant orders as indicated in the key; UN: uncharacterized.

Poales Zingiberales Arecales Brassicales Caryophyllales Cucurbitales Fabales Lamiales Malpighiales Malvales Myrtales Proteales Rosales Sapindales Solanales Vitales

HIGHLIGHTS The CRT gene duplication is a widespread in plant genomes. The Arabidopsis genome, in contrast to other plant genomes, has only 3 CRT homologs. The CRT isoforms reveal a functional specialization.

15

CRT1 DUPLICATION

CRT1/2 Subfunctionalization

Ancestral gene

CRT2

MULTIPLICATION

DUPLICATION

PSEUDOGENIZATION Neofunctionalization

GENE LOSS CRT3