International Journal of Biological Macromolecules 147 (2020) 513–520
Contents lists available at ScienceDirect
International Journal of Biological Macromolecules journal homepage: http://www.elsevier.com/locate/ijbiomac
Functional and structural features of proteins associated with alternative splicing Polina Savosina a, Dmitry Karasev a, Alexander Veselovsky a,b, Yuliana Miroshnichenko a, Boris Sobolev a,⁎ a b
Department of Bioinformatics, Institute of Biomedical Chemistry, Pogodinskaya street 10, 119121 Moscow, Russia Institute of Physiologically Active Substances, Chernogolovka 142432, Russia
a r t i c l e
i n f o
Article history: Received 27 June 2019 Received in revised form 16 September 2019 Accepted 21 September 2019 Available online 10 January 2020
a b s t r a c t The alternative splicing is a mechanism increasing the number of expressed proteins and a variety of these functions. We uncovered the protein domains most frequently lacked or occurred in the splice variants. Proteins presented by several isoforms participate in such processes as transcription regulation, immune response, etc. Our results displayed the association of alternative splicing with branched regulatory pathways. By considering the published data on the protein proteins encoded by the 18th human chromosome, we noted that alternative products display the differences in several functional features, such as phosphorylation, subcellular location, ligand specificity, protein-protein interactions, etc. The investigation of alternative variants referred to the protein kinase domain was performed by comparing the alternative sequences with 3D structures. It was shown that large enough insertions/deletions could be compatible with the kinase fold if they match between the conserved secondary structures. Using the 3D data on human proteins, we showed that conformational flexibility could accommodate fold alterations in splice variants. The investigations of structural and functional differences in splice isoforms are required to understand how to distinguish the isoforms expressed as functioning proteins from the non-realized transcripts. These studies allow filling the gap between genomic and proteomic data. © 2020 Elsevier B.V. All rights reserved.
1. Introduction The study of splicing regulation is required to investigate fundamental biological questions and solved biotechnological tasks [1]. Alternative splicing allows the same gene to encode several mature mRNAs (transcripts), translation of which results in proteins with different amino acid sequences. Owing to this process, the number of protein forms significantly exceeds the number of coding genes. It is assumed that the majority of multi-exon genes are responsible for the synthesis of alternatively spliced isoforms [2,3]. However, the problem of precise detection of splice variants from the experimental data (such as RNAseq) still has not been completely solved yet [4]. At present, most information on alternative splicing is obtained at the transcript level. However, many mRNAs may not realize themselves as proteins. A recent study [5] suggests that a vast part of predicted alternative transcripts may not be translated into proteins. The protein products transcribed from “non-correct” transcripts are degraded by mechanisms of unfolded protein response (UPR), including endoplasmic-reticulum-associated protein degradation (ERAD) [6]. Some of the predicted mature mRNAs can be the result of prognostic
⁎ Corresponding author. E-mail address:
[email protected] (B. Sobolev).
https://doi.org/10.1016/j.ijbiomac.2019.09.241 0141-8130/© 2020 Elsevier B.V. All rights reserved.
errors [3]. At the same time, high-throughput proteomics studies allow detection of proteotypic peptides related to splice variants [7]. However, the number of functioning splice isoforms remains unknown since the data on alternative splice variants at the protein level are quite limited [8]. This complicates the analysis of the existing protein isoforms, including the patterns related to functional differences between the products encoded by the same genes. The differences that preserve the structural stability of proteins and condition the specific functional peculiarities need investigating. Detection of alternative proteoforms is necessary to fill the gap between genomic and proteomic data [9]. Analyzing a few examples, when the 3D structure of isoforms is already known, the researches described some structural differences that do not disturb the protein fold [3]. The authors concluded that the alternative splicing sites tend to locate in the disordered areas of proteins and the differences between splice isoforms do not significantly alter the protein. The largest insertion observed at the 3D alignment of isoforms was 44 residues in length. Summarizing the study performed, the authors suggested the rules for recognition of splice isoforms realized in protein expressions. Bisele et al. showed that proteins appear to be much more resistant to structural deletions, insertions, and replacements associated with alternative splicing [10]. In our opinion, more detailed analysis of splice isoforms related to a particular fold will help clear understanding what variations are permitted to keep this fold.
514
P. Savosina et al. / International Journal of Biological Macromolecules 147 (2020) 513–520
The higher mobility of a protein chain seems to facilitate fitting the changed stretches in various splice structures [3]. As reported by several authors, the protein regions eliminated, inserted or replaced in splice variants are enriched by the conformationally disordered segments [3,11–14]. For obvious reasons, the researches use the predicted disorder scores at the large-scale studies. To investigate the relationship between the alternative splicing and conformational instability, we compared the information about splice variants and solved 3D protein structures. In this study, we have analyzed the following: - Representation of alternatively spliced proteins in the regulatory pathways and enzyme classes; - Differences in the number of the most represented domains in gene products related to alternative splicing; - Functional differences in splice isoforms detected at the protein level; - Alterations in splice variants related to a certain protein family in the context of overall fold structure. - Relationship between the flexibility of protein chains and alternative splicing.
This approach should help to clarify the determinants causing functional peculiarities of splice variants. 2. Materials and methods We selected 10,525 revised entries from the UniProt Knowledgebase [15] containing at least two alternative sequences encoded by the same human gene. The presence of protein domains in the so-called canonical isoforms and other splice variants was defined with the PfamScan tools [16,17]. The data on the expression of particular isoforms at the protein level and their functional characteristics were retrieved from the published articles describing the isoform expression by using gene engineering constructions. Information about the regulatory pathways, in which the studied proteins participate, was retrieved from the Reactom Pathway Knowledgebase [18]. The data on 3D protein structures were retrieved from the Protein Data Bank (RCSB PDB) [19]. The assignment of studied proteins to enzyme classes was performed according to the Enzyme Nomenclature Signature (http:// www.sbcs.qmul.ac.uk/iubmb/enzyme/index.html) retrieved from the UniProt entries. Investigation of 3D protein structures and 3D alignment of protein structures were performed using the PyMol system [20]. The data on phosphorylation both the canonical and non-canonical isoforms were obtained by inspection of the UniProt entries, including links to published articles. The structural mapping of canonical isoforms was based on multiple alignments performed with PROMALS3D [21] for the protein regions corresponding to the PFAM domains Pkinase (pf00069) and Pkinase_Tyr (PF07714). This program uses a sequence and 3D data, allowing a more precise matching of sequence regions related to conservative structural subunits. The structural mapping of isoforms was based on pair alignment with the corresponding canonical variants. 2.1. Flexibility of splice regions We compared the splice and rest parts of protein structures in terms of flexibility (including disorder) and secondary structures. The sequences of regions established to be replaced or deleted in the splice variants were retrieved from the UniProt entries. These stretches were mapped into the 3D structures resolved by X-Ray. We selected 1894 UniProt sequences of human proteins, which were present by 25,293 polypeptide chains from the PDB resource. The structure categories for residues were obtained from all PDB chains corresponding to the
given UniProt sequence. Thus, each residue was assigned by one to four conformational states: missing (showing to disorder), alphahelix, beta-sheet, and coil. The B-factor values (for alpha carbon of each residue) retrieved from PDB entries were used for estimation of flexibility. The data for missing residues were ignored. To allow comparing the B-factors from different structures, these values from each protein chains were processed by Zscore normalization. In further calculations, we use the obtained normalized scores. If more than one chains represent a certain residue, we used the average score. We selected the 1090 UniProt sequences, in which the number of residues matched to the splice regions and located in PDB chains was not less than ten (maximum 439). The number of residues out of splice areas should be not less than the number of “spliced” ones (maximum 3629). 3. Results 3.1. Statistics on alternative splicing events in regulatory pathways and protein classes Considering the data on 10,525 human genes encoding splice variants, we noted that the products of genes encoding at least two isoforms participate in several key processes, such as transcription control, antigen processing, mRNA splicing, etc. The data obtained for the most presented processes are shown in Fig. 1A. So, alternative splicing can be responsible for branching of regulatory processes. The considered UniProt entries contained the information on proteins belonging to several enzyme classes. The more presented classes are shown in Fig. 1B. In comparison, the entries assigned to these classes and containing information on a single variant are also depicted; in general, the number of genes encoding two or more splice variants is 1.5–2 times higher than that of genes encoding single transcripts. Overall, the data found are not contradictory with the results of large-scale studies on protein characteristics associated with alternative splicing [22]. However, these studies did not compare the alternative transcripts with known 3D protein structures. Such investigation is necessary to characterize structural and functional differences, which are realized in protein products [3]. 3.2. The domain composition and alternative splicing It is evident that the absence or occurrence of the whole domains can be compatible with the folding of multi-domain proteins. It was confirmed by scanning the UniProt sequences against the protein domain library using the PfamScan tool [Li et al., 2015]. We defined the frequency of presence or absence of various protein domains in alternative splice variants related to the canonical isoforms (Fig. 2). The Zinc-finger double domain (PF13465) revealed the highest number of cases when the domain elements were missing in the isoforms. The majority of variants lacking this domain were found at the transcript level. However, splice forms were also confirmed for several genes at the protein level. For example, four isoforms encoded by ZNF509 genes contain various numbers of zinc fingers. It is established that a canonical form activates the expression of Cyclin-dependent protein kinase 1 (CDK1) whereas a shortened isoform activates the expression of retinoblastoma protein inhibiting the cellular cycle [23]. So, the absence or presence of protein domain may influence the functional diversity of proteins. 3.3. The cases of functional differences between splice variants confirmed at the protein level (for the certain chromosome products) How many of the splice variants reported in the UniProt entries can be really functioning proteins? To consider this question, we analyzed the data on the proteins encoded by genes of the 18th human chromosome. According to UniProtKB, the 276 genes located in this chromosome encode 636 transcripts. Considering the articles published, we
P. Savosina et al. / International Journal of Biological Macromolecules 147 (2020) 513–520
515
Fig. 1. Multiple splicing and regulatory processes. (A) The representation of UniProt entries with the multiple splice variants depending on regulatory pathways. (B) Distribution of UniProt entries, presenting several (dashed) or single (blank) splice isoforms belonging to various enzyme classes.
found only 19 genes, whose splice variants were experimentally confirmed at the protein level [24–44]. As can be seen from Fig. 3 and Supplementary Table 1, the smaller part of splice variants is described in the papers published as proteins responsible for the specific functional features differing them from the canonical forms. The data on specific functional differences were found. The known phosphorylation sites were lost, and new experimentally confirmed phosphosites were identified in the non-canonical variants. It is well known that phosphorylation significantly defines the functionality of proteins. The analysis of these markedly limited data displayed a wide enough range of functional consequences of alternative splicing: • The presence or absence of specific signal sequences defines the cellular localization of splice variants. So, the phosphatase TCPTP (UniProt AC P11706) is presented in two isoforms, one of them P48TC (P17706-1) includes the long non-catalytic region 316–415 ending with a hydrophobic stretch 396–415, which is responsible for the association with the endoplasmatic reticulum. Substitution of the Cterminal part on the non-coding region (382–415) by a 6-residue
•
•
•
•
•
Fig. 2. The number of cases when splice variants acquire or lose the entire sequences of certain domain types.
segment results in a lack of EPR signal in the P45TC variant (P17706-2). The latter isoform retains the signal of nuclear localization at the positions 371–381 close to its C-terminus [24] The site Ser304 located in both protein products was shown to be phosphorylated by the cyclin-dependent kinases in P45TC but not in P48TC [25], which may be due to different subcellular location. The absence of domain changes the ability of a protein to interact with with its protein partners. The β isoform of the PHLPP1 phosphatase (O60346-1) contains the N-terminal RAS-association domain responsible for binding the RAS proteins that participate in cell signaling. The first 512 residues, including this functional region, are deleted in α isoform (O60346-2) [26,27]. The higher expression in tumor tissues is described for several splice isoforms. For example, the canonical form of the CABYR protein (O75952-1) is expressed in the spermatid and spermatozoids, and its isoforms CBP86-II (O75952-3) and CBP86-III (O75952-5) are not specific to the testes. They are found in tumor cells and also interact with the protein kinase GSK3B, which does not bind the canonical CABYR [28]. The splice variants of rTS (Q7L5Y1) drastically differ in their expression in tumor cells [29]. A splice variant can differ in the number of repeated domains, resulting in its ability to interact with the ligands or protein partners. The isoform of MBD1 protein MBD1v3 (Q9UIS9-4) contains two CXXC repeats and inhibits the transcription activity if the promoter is methylated. The other isoforms MBD1v1 (Q9UIS9-1) and MBD1v2 (Q9UIS9-2) containing three repeats suppress the transcription of non-methylated promotor [30]. The presence of different domains performing a similar function can influence the substrate binding. The isoforms of transcription factor TCF4 (P15884) contain the transcription activation domain AD2. Some variants such as P15884-1 include the second transcription activation domain AD1 at the N-terminus. Both AD2 and AD1 bind to the DNA E-box sequences regulating the gene reporter transcription. But they act synergically in combination [31]. Differences in the enzyme substrate specificity can be illustrated by considering SUMO-protein ligase PIAS2 (O75928). The splice form PIAS2-β (O759281) controls the MDM2 and NCOA2 SUMOylation, while the isoform PIAS2-α (O75928-2) that lacks the Ser-rich region participates in the sumoylation of PARK7 and PML [32]. Binding with different effectors is shown for isoforms of TNFRSF11A (Q9Y6Q6). The canonical form interacts only with the cRANK protein.
516
P. Savosina et al. / International Journal of Biological Macromolecules 147 (2020) 513–520
Fig. 3. Transcripts of 19 genes located at the 18th human chromosome and encoding at least two splice isoforms. Dashed areas corresponded to the variants confirmed at the protein level. The asterisks show the items with lost or newly acquired phosphorylation sites (experimentally detected). The AC numbers of corresponded UniProt entries are subscribed.
But the variant, in which 13 residues were excised from the extracellular part, binds the other tumor necrosis factor [33].
By considering the data on the proteins encoded by the single chromosome, we noted that alternative products display the differences in several functional features, such as phosphorylation, subcellular location, ligand specificity, protein-protein interactions, etc. 3.4. Alternative splicing and the domain fold In many cases, the transcripts from the same gene encode the putative proteins with alterations in a certain domain sequence. Do these proteins adopt the 3D structure providing their functioning? To discover how the splice alterations are consistent with the domain fold, we consider the large superfamily of protein kinases, which is well presented by 3D structures in the protein databank. Though these enzymes considerably diverge in their sequences, they maintain an overall fold, which is subdivided into the larger N- and C-lobes and several conserved secondary structures (Fig. 4). The UniProt Knowledgebase provides information on the 476 reviewed entries on human proteins, each of which contains 490 domains related to the protein kinase families PF00069 or PF07714. All entries presented data on 1319 splice isoforms, including those chosen as the “canonical” sequences. The alterations in the kinase domain were located in 357 isoforms, including 136 splice variants, in which the PfamScan program did not recognize the kinase pattern. The remaining 221 isoforms contained deletions, insertions, or amino acid exchanges (mutations). We selected proteins in which at least one isoform (generally a canonical one) had the resolved 3D structure. Splice variants with notes indicating the uncertainty of detection such as “no experimental confirmations” were excluded. Thus, we analyzed 13 groups of splice variants presented in the corresponding UniProt entries (Table 1). The data collected were compared with the arrangement of conserved elements derived for the protein kinase fold [45]. This analysis
should have clarified what splice variants can be realized as real functioning proteins. As can be seen from Table 1, seven (O43353, P04626, P11362, P16234, Q16620, Q6VAB6, Q59H18) isoforms lacked the structural regions needed to maintain the protein kinase fold, which could provide the enzyme activity. In these cases, the shortened kinase domain can form a non-active catalytic unit. Comparing the splice variants P36888-2 with the 3D structure of the canonical product of FLT3 gene (1RJB), we noted the absence of activation loop (Fig. 5A). It is known that mutations in the activation loop can switch the kinase into a constitutively active form, which promotes the regulatory disorders observed in myeloid leukemia and other diseases [46,47]. The activation loop blocks the activity in the closed conformation, whereas its absence can lead to the permanent kinase activity of the considered protein. The isoform Q08345-5 encoded by DDR1 gene contains a short insertion (five residues), which should slightly elongate the aC helix without fold alteration (Fig. 5B). It is less clear how the insertion of a 41residue stretch at the aC helix in Q05397-2, -3 can affect the structure of this helix. A 13-residue insertion in P49841-2 isoform distinguishes this splice variant from the canonical Glycogen synthase kinase-3 beta (gene GSK3β). The inserted stretch is located between two structural elements, aG and aH. The Visual inspection of the 3D structure (PDB ID 1J1B) showed that the insertion was compatible with the fold (Fig. 5C). It is known that P49841-2 is a neuron-specific splice variant and plays a specific role in axon growth [48]. The canonical variant of MNK1 kinase (Q9BUB5) contains a long insertion (42 residues) comparing to Q9BUB5-2. The 3D structure (PDB ID 2HW6) corresponded to the sequence variant MNK1a (Q9BUB5-2), not including the insertion mentioned. The insertion is located between the fold elements aE and b6. Examination of the PDB entry confirms the permissibility of the long insertion, which does not touch the conserved structural subunits. The examples of Ser/Thr kinases were observed when the canonical variants were significantly longer than the typical kinase domain (about 250 residues). In these cases, PfamScan recognizes two N- and C-
Fig. 4. Secondary structure elements of the protein kinase fold: location in the consensus amino acid sequence. The fold elements are named according to the nomenclature suggested by Hanks & Hunter [40]. The beta-strands (b1 to b11) and alpha-helices (aC to aI) are shown by lower and higher bars, respectively.
P. Savosina et al. / International Journal of Biological Macromolecules 147 (2020) 513–520
517
Table 1 The alteration of spiced isoforms related to the conserved elements of the kinase fold. UniProt AC
Starta
Stopa
Alterations
Isoform identifiers
PDB ID
PDB chain
O43353 P04626 P06241 P11362 P16234 P36888 Q05397 Q08345 Q16620 Q6VAB6 Q59H18 P49841 Q9BUB5
18 720 271 478 593 610 422 610 538 666 463 56 50
294 987 524 767 954 943 680 905 807 931 723 340 374
Deleted b1-aE Deleted b1-aC Conservative exchanges in b1 and b2 Deleted b1-aE Deleted b1-aD Deleted b6-b10 (Activation loop) 41-residue insertion in aC 5-residue insertion in aC Deleted aEF-aI Deleted b1-b5 Deleted aI 13-residue insertion between aG and aH 42-residue deletion between in a region the aE and b6
O43353-2 P04626-6 P06241-2 Isoform from P11362-8 to P11362-13 P16234-3 P36888-2 Q05397-2, -3 Q08345-5 Q16620-6 Q6VAB6-2 Q59H18-4 P49841-2 Q9BUB5-2
5NG0 3PP0 2DQ7 5EW8 5GRN 1RJB 4I4E 4BKJ 4ASZ 2Y4I 4YFI 1J1B 2HW6
A A X A A A A A A C A A A
a
Terminal positions of the protein kinase domain.
terminal domain parts and the intermediate region is predicted to be disordered. So, the canonical variant Q9UPE1 is comprised of about 480 residues and contains more than 120 residues between b7 and b8. A 34-residue deletion within this region is observed in the other splice variant (Q9UPE1-2). The 3D structures are available for several “long” kinase domains suggesting that very long insertions between the conserved fold elements are possible. For example, the kinase domain of MASTL
(Q9UPE1) comprises of about 800 residues and contains a 500-residue insertion. Researches had to design a shortened variant, in which a 4residue stretch substituted the unusually long regions to obtain a crystal structure [49]. Interestingly, the N-terminal fragment excised from the original sequence covered the N-lobe ending after the DFG motif and part of the Clobe. The C-terminal fragment corresponded to the rest of the C-lobe, capturing the activation segment joined to the aF helix. It completed
Fig. 5. The 3D structures of protein kinase domains with indication of regions, in which the deletions or insertions in splice variants were found. A) The deletion of an activation segment (shown by red) in isoform P36888-2 of FLT3 kinase can switch the enzyme into the constitutively active form (PDB ID 1RJB). B) 5-residue insertion in the aC helix (isoform Q08345-5) between residues Ala665 and Arg666 in the DDR kinase (PDB ID 4BKJ). C) The insertion of 13-residue stretch (isoform P49841-2) between Lys303 and Val304 in kinase GSK-3β is located in the long disordered region joining the helices aG and aH (PDB ID). D) The shortened construction derived from MUSTL kinase (PDB ID 5LOH). The N-terminal and Cterminal fragments are shown by red and cyan, respectively. The 4-residue stretch substituting the 500-residue region is shown in blue.
518
P. Savosina et al. / International Journal of Biological Macromolecules 147 (2020) 513–520
the structure (Fig. 5D), allowing the construction to provide the typical kinase fold (PDB ID 5LOH) and bind the inhibitor molecule. Our observations showed that the considered kinase fold could be resistant to the significant alterations related to the alternative splicing.
3.5. Alternative splicing and conformational flexibility Average frequencies of missing residues included and non-included in alternatively coded regions did not significantly differ. Additionally, the splice-affected regions contained the notable number of amino acid positions related to secondary structures. So, such regions within 1383 sequences contained the residues adopting stable alpha-helical conformation (conserved in all corresponding chains). With that said, we decided to compare the splice and rest parts of protein structures in their conformational flexibility using the temperature factor (Bfactor). The average normalized B-factors and SD values were calculated for two subsets represented “spliced” and “non-spliced” residues in each sequence (Fig. 6). Despite the considerable variation, the results displayed a clear trend. As can be seen from Fig. 6, the higher B-factor “splice” scores are concentrated at the left plot part (lower “non-splice” scores). 1SD and 2SD confidence intervals of “splice” and “non-splice” residue subsets within the given sequence are not crossed at 22 and 2 points of the left part, respectively. The lowest “splice” scores of Bfactor are found at the right plot part with two cases of the 1SD interval non-intersection. Considering above, we suggested that the protein molecule regions affected by alternative splicing tend to display the more conformational flexibility in comparison with the more stable non-exchanged parts. On the other hand, the more conformationally stable splice regions seem to be close to relatively flexible areas. It can be explained by location of the stable substructure, eliminated in the splice variant, at enough mobile surrounding.
4. Discussion The products of alternative splicing are known to be involved in the key regulatory processes. It was confirmed by the information retrieved from the Reactom Pathway Database. In spite of the incompleteness of this manually curated informational resource, Fig. 1 provides the picture of pathways, in which the genes encoding alternative forms participate. It is noteworthy that proteins presented by several isoforms are involved in transcription regulation, processing of mRNA, and reactions of the immune response. Evidently, alternative splicing is required for the branching of pathways, including the processes directly touching the transcription and mRNA themselves. As can be seen from Fig. 2, the alternative variants can contain different numbers of repeat units of relatively small domains. This was observed for the zinc-finger, immunoglobulin, fibronectin, and other domains. The multi-domain protein can also loose separate larger domains, such as protein kinases, changing their functional characteristics. Possibly, such events result in blocking or switching the regulatory pathways. The majority of splice isoforms are not confirmed at the protein level. The literature data about the expression of alternative protein products of genes located on a single (18th human) chromosome showed that this information, in spite of being limited, confirms the general findings on the functional role of alternative products. Various functional changes due to alternative splicing should be noted. The differences in the phosphorylation profiles indicate alterations in the regulatory pathways. The presence or absence of regions responsible for different subcellular locations should be explicitly noted. Determination of alternative isoforms in tumor tissues is also related to disorders in cellular regulation [50]. The protein 3D structures were inspected to analyze the locations of insertions and deletions. Determination of alterations between the conserved secondary structures was in coincidence with the results of earlier studies. Nevertheless, the investigation of structures related to the protein kinase domain revealed significant alterations due to alternative
Fig. 6. The average normalized scores of B-factor obtained for residues matched and non-matched into splice regions. The values for “splice” and “non-splice” subsets are designated in light gray and dark, respectively. The data related to 1090 UniProt sequences are ordered by ascending of “non-splice” scores. The white filled diamonds mark the points, at which 1SD confidence intervals of the subsets are not intersected. The gray filled diamonds show the non-intersection of 2SD intervals.
P. Savosina et al. / International Journal of Biological Macromolecules 147 (2020) 513–520
splicing. Enough long truncated isoforms of these proteins seem to maintain the stable 3D structure according to the rules suggested by Hegyi et al. [3], keeping the entire structure of a multi-domain protein but losing the enzyme activity. So, removal of the C-lobe in receptor Tyr-kinases should result in forming the receptor composed of an extracellular part, transmembrane helix and kinase N-lobe. Such protein can bind its partners but cannot transduce the signal mediated by the kinase domain. The absence of region corresponding to the activation loop in the isoform P36888-2 can switch the enzyme to constitutively active forms. However, this suggestion needs further experimental confirmation. Comparing the sequences of splice variants with the 3D structures of “canonical” isoforms, we suggested that the fold architecture is compatible with enough large insertions between the conserved structural elements. Preservation of the kinase fold despite large insertions is confirmed by detection of several large kinase domains (about 800 residues) expressed at the protein level. In these cases, PfamScan recognizes the N- and C-parts of the domain at both ends of the sequence. The intermediate regions are predicted as disordered. Substitution of such long “insertion” by a short stretch enables to obtain the 3D structure (PDB ID 5LOH) adopting the classical kinase fold. Considering various members of protein kinase superfamily in the sequence and 3D levels, we can observe a large variability of these proteins. Also, the stable architecture of kinase fold seems to permit enough large alterations due to alternative splicing. The conformationally mobile segments (detected as protein disorder) are frequently located in the regions affected by alternative splicing. These findings are often obtained with predictive tools, providing reliable results [51]. In this study, we considered the residues, eliminated or exchanged in splice variants, by analyzing the solved 3D structures. Comparing the ‘splice” residues with unchanged molecule regions within the same structure showed that B-factor scores revealed the structural difference more clearly than the content of missing residues or secondary structures. The more notable results displayed higher flexibility in the splice regions. On the other hand, the unchanged molecule parts demonstrated higher flexibility when the splice residues gave lower scores. The latter can be related to that the mobile environment provides the accommodation of stable ordered substructure to fold alterations in splice variants.
5. Conclusion A vital aspect of the study of alternative splicing touches the problem of isoform expression at the protein level. It is useful to develop overall principles, which are to help distinguish the “erroneous” transcripts or non-active protein products from functioning proteins [3]. Extensive knowledge about the peculiarities related to particular domain folds is also required, involving data on the sequence, 3D structure, and functional characteristics. Our findings are not contradictory with the rules proposed earlier but display the possibility of significant alterations if a fold is present in the diverged protein family. Using the 3D data of the various human proteins, we showed the possible role of conformational flexibility in accommodation to fold alterations in various splice variants. There are uncertainties about the expression of most splice isoforms at the protein level [5]. However, statistics derived from the “row” data retrieved directly from UniProt are in a good agreement with the results obtained by the analysis of scientific articles on the expressed protein isoforms. However, further research is needed to explore alternative splicing related to certain protein families, involving the data on structural and functional features. Supplementary data to this article can be found online at https://doi. org/10.1016/j.ijbiomac.2019.09.241.
519
Acknowledgments The work was performed in the framework of the Program for Basic Research of State Academies of Sciences for 2013-2020. References [1] A. Dobrin, P. Saxena, M. Fussenegger, Synthetic biology: applying biological circuits beyond novel therapies, Integr. Biol. (Camb.) 8 (2016) 409–430, https://doi.org/10. 1039/c5ib00263j. [2] E.R. Gamazon, B.E. Stranger, Genomics of alternative splicing: evolution, development and pathophysiology, Hum. Genet. 133 (2014) 679–687, https://doi.org/10. 1007/s00439-013-1411-3. [3] H. Hegyi, L. Kalmar, T. Horvath, P. Tompa, Verification of alternative splicing variants based on domain integrity, truncation length and intrinsic protein disorder, Nucleic Acids Res. 39 (2011) 1208–1219, https://doi.org/10.1093/nar/gkq843. [4] D. Aguiar, L.F. Cheng, B. Dumitrascu, F. Mordelet, A.A. Pai, B.E. Engelhardt, Bayesian nonparametric discovery of isoforms and individual specific quantification, Nat. Commun. 9 (2018) 1681, https://doi.org/10.1038/s41467-018-03402-w. [5] M.L. Tress, F. Abascal, A. Valencia, Alternative splicing may not be the key to proteome complexity, Trends Biochem. Sci. 42 (2017) 98–110, https://doi.org/10.1016/j. tibs.2016.08.008. [6] M. Kaneko, K. Imaizumi, A. Saito, S. Kanemoto, R. Asada, K. Matsuhisa, Y. Ohtake, ER stress and disease: toward prevention and treatment, Biol. Pharm. Bull. 40 (2017) 1337–1343, https://doi.org/10.1248/bpb.b17-00342. [7] R. Tavares, G. Wajnberg, N.M. Scherer, B.A. Pauletti, J.S. Cassoli, C.G. Ferreira, A.F. Paes Leme, P.S. de Araujo-Souza, D. Martins-de-Souza, F. Passetti, Unveiling alterative splice diversity from human oligodendrocyte proteome data, J Proteomics 151 (2017) 293–301, https://doi.org/10.1016/j.jprot.2016.05.023. [8] S.A. Bhuiyan, S. Ly, M. Phan, B. Huntington, E. Hogan, C.C. Liu, J. Liu, P. Pavlidis, Systematic evaluation of isoform function in literature reports of alternative splicing, BMC Genomics 19 (2018) 637, https://doi.org/10.1186/s12864-018-5013-2. [9] C.M. Overall, Can proteomics fill the gap between genomics and phenotypes? J Proteomics 100 (2014) 1–2, https://doi.org/10.1016/j.jprot.2014.02.025. [10] F. Birzele, G. Csaba, R. Zimmer, Alternative splicing and protein structure evolution, Nucleic Acids Res. 36 (2008) 550–558, https://doi.org/10.1093/nar/gkm1054. [11] P.R. Romero, P.R. Zaidix, Y.Y. Fang, V.N. Uversky, P. Radivojac, C.J. Oldfield, M.S. Cortese, M. Sickmeier, T. LeGall, Z. Obradovic, A.K. Dunker, Alternative splicing in concert with protein intrinsic disorder enables increased functional diversity in multicellular organisms, Proc. Natl. Acad. Sci. U. S. A. 103 (2006) 8390–8395, https://doi.org/10.1073/pnas.0507916103. [12] K.J. Niklas, S.E. Bondos, A.K. Dunker, S.A. Newman, Rethinking gene regulatory networks in light of alternative splicing, intrinsically disordered protein domains, and post-translational modifications, Front. Cell Dev. Biol. 3 (2015) 8, https://doi.org/ 10.3389/fcell.2015.00008. [13] P.E. Wright, H.J. Dyson, Intrinsically disordered proteins in cellular signalling and regulation, Nat. Rev. Mol. Cell Biol. 16 (2015) 18–29, https://doi.org/10.1038/ nrm3920. [14] J. Zhou, S. Zhao, A.K. Dunker, Intrinsically disordered proteins link alternative splicing and post-translational modifications to complex cell signaling and regulation, J. Mol. Biol. 430 (2018) 2342–2359, https://doi.org/10.1016/j.jmb.2018.03.028. [15] T. UniProt Consortium, UniProt: the universal protein knowledgebase, Nucleic Acids Res. 46 (2018) 2699, https://doi.org/10.1093/nar/gky092. [16] W. Li, A. Cowley, M. Uludag, T. Gur, H. McWilliam, S. Squizzato, Y.M. Park, N. Buso, R. Lopez, The EMBL-EBI bioinformatics web and programmatic tools framework, Nucleic Acids Res. 43 (2015) W580–W584, https://doi.org/10.1093/nar/gkv279. [17] R.D. Finn, A. Bateman, J. Clements, P. Coggill, R.Y. Eberhardt, S.R. Eddy, A. Heger, K. Hetherington, L. Holm, J. Mistry, E.L. Sonnhammer, J. Tate, M. Punta, Pfam: the protein families database, Nucleic Acids Res. 42 (Database issue) (2014) D222–D230, https://doi.org/10.1093/nar/gkt1223. [18] A. Fabregat, S. Jupe, L. Matthews, K. Sidiropoulos, M. Gillespie, P. Garapati, R. Haw, B. Jassal, F. Korninger, B. May, M. Milacic, C.D. Roca, K. Rothfels, C. Sevilla, V. Shamovsky, S. Shorser, T. Varusai, G. Viteri, J. Weiser, G. Wu, L. Stein, H. Hermjakob, P. D’Eustachio, The reactome pathway knowledgebase, Nucleic Acids Res. 46 (D1) (2018) D649–D655, https://doi.org/10.1093/nar/gkx1132. [19] P.W. Rose, A. Prlić, A. Altunkaya, C. Bi, A.R. Bradley, C.H. Christie, L.D. Costanzo, J.M. Duarte, S. Dutta, Z. Feng, R.K. Green, D.S. Goodsell, B. Hudson, T. Kalro, R. Lowe, E. Peisach, C. Randle, A.S. Rose, C. Shao, Y.P. Tao, Y. Valasatava, M. Voigt, J.D. Westbrook, J. Woo, H. Yang, J.Y. Young, C. Zardecki, H.M. Berman, S.K. Burley, The RCSB protein data bank: integrative view of protein, gene and 3D structural information, Nucleic Acids Res.. 45(D1) (2017) D271-D281. doi:https://doi.org/10.1093/nar/gkw1000. [20] G. Janson, C. Zhang, M.G. Prado, A. Paiardini, PyMod 2.0: improvements in protein sequence-structure analysis and homology modeling within PyMOL, Bioinformatics 33 (2017) 444–446, https://doi.org/10.1093/bioinformatics/btw638. [21] J. Pei, N.V. Grishin, PROMALS3D: multiple protein sequence alignment enhanced with evolutionary and three-dimensional structural information, Methods Mol. Biol. 1079 (2014) 263–271, https://doi.org/10.1007/978-1-62703-646-7_17. [22] O. Kelemen, P. Convertini, Z. Zhang, Y. Wen, M. Shen, M. Falaleeva, S. Stamm, Function of alternative splicing, Gene 514 (2013) 1–30, https://doi.org/10.1016/j.gene. 2012.07.083. [23] B.N. Jeon, M.K. Kim, J.H. Yoon, M.Y. Kim, H. An, H.J. Noh, W.I. Choi, D.I. Koh, M.W. Hur, Two ZNF509 (ZBTB49) isoforms induce cell-cycle arrest by activating transcription of p21/CDKN1A and RB upon exposure to genotoxic stress, Nucleic Acids Res. 42 (2014) 11447–11461, https://doi.org/10.1093/nar/gku857.
520
P. Savosina et al. / International Journal of Biological Macromolecules 147 (2020) 513–520
[24] J.A. Lorenzen, C.Y. Dadabay, E.H. Fischer, COOH-terminal sequence motifs target the T cell protein tyrosine phosphatase to the ER and nucleus, J. Cell Biol. 131 (1995) 631–643, https://doi.org/10.1083/jcb.131.3.631. [25] P. Bukczynska, M. Klingler-Hoffmann, K.I. Mitchelhill, M.H. Lam, M. Ciccomancini, N.K. Tonks, B. Sarcevic, B.E. Kemp, T. Tiganis, The T-cell protein tyrosine phosphatase is phosphorylated on Ser-304 by cyclin-dependent protein kinases in mitosis, Biochem. J. 380 (2004) 939–949, https://doi.org/10.1042/bj20031780. [26] J. Brognard, A.C. Newton, PHLiPPing the switch on Akt and protein kinase C signaling, Trends Endocrinol. Metab. 19 (2008) 223–230, https://doi.org/10.1016/j.tem. 2008.04.001. [27] J.R. Molina, N.K. Agarwal, F.C. Morales, Y. Hayashi, K.D. Aldape, G. Cote, M.M. Georgescu, PTEN, NHERF1 and PHLPP form a tumor suppressor network that is disabled in glioblastoma, Oncogene 31 (2012) 1264–1274, https://doi.org/10.1038/ onc.2011.324. [28] H.C. Hsu, Y.L. Lee, T.S. Cheng, S.L. Howng, L.K. Chang, P.J. Lu, Y.R. Hong, Characterization of two non-testis-specific CABYR variants that bind to GSK3beta with a prolinerich extensin-like domain, Biochem. Biophys. Res. Commun. 329 (2005) 1108–1117, https://doi.org/10.1016/j.bbrc.2005.02.089. [29] B.J. Dolnick, A.R. Black, Alternate splicing of the rTS gene product and its overexpression in a 5-fluorouracil-resistant cell line, Cancer Res. 14 (1996) 3207–3210. [30] N. Fujita, S. Takebayashi, K. Okumura, S. Kudo, T. Chiba, H. Saya, M. Nakao, Methylation-mediated transcriptional silencing in euchromatin by methyl-CpG binding protein MBD1 isoforms, Mol. Cell. Biol. 19 (1999) 6415–6426, https://doi. org/10.1128/MCB.19.9.6415. [31] M. Sepp, K. Kannike, A. Eesmaa, M. Urb, T. Timmusk, Functional diversity of human basic helix-loop-helix transcription factor TCF4 isoforms generated by alternative 5′ exon usage and splicing, PLoS One 6 (2011), e22138. https://doi.org/10.1371/ journal.pone.0022138. [32] S.H. Yang, A.D. Sharrocks, PIASx acts as an Elk-1 coactivator by facilitating derepression, EMBO J. 24 (2005) 2161–2171, https://doi.org/10.1016/j.bbrc.2005.02.089. [33] C. Sirinian, A.D. Papanastasiou, I.K. Zarkadis, H.P. Kalofonos, Alternative splicing generates a truncated isoform of human TNFRSF11A (RANK) with an altered capacity to activate NF-κB, Gene 525 (2013) 124–129, https://doi.org/10.1016/j.gene.2013.04. 075. [34] S. Eichmuller, D. Usener, R. Dummer, A. Stein, D. Thiel, D. Schadendorf, Serological detection of cutaneous T-cell lymphoma-associated antigens, Proc. Natl. Acad. Sci. U. S. A. 98 (2001) 629–634, https://doi.org/10.1073/pnas.98.2.629. [35] F. Gasparri, F. Sola, G. Locatelli, M. Muzio, The death domain protein p84N5, but not the short isoform p84N5s, is cell cycle-regulated and shuttles between the nucleus and the cytoplasm, FEBS Lett. 574 (2004) 13–19, https://doi.org/10.1016/j.febslet. 2004.07.074. [36] T. Iskratsch, S. Lange, J. Dwyer, A.L. Kho, C. dos Remedios, E. Ehler, Formin follows function: a muscle-specific isoform of FHOD3 is regulated by CK2 phosphorylation and promotes myofibril maintenance, J. Cell Biol. 191 (2010) 1159–1172, https:// doi.org/10.1083/jcb.201005060. [37] A. Miranda-Vizuete, J. Ljung, A.E. Damdimopoulos, J.A. Gustafsson, R. Oko, M. PeltoHuikko, G. Spyrou, Characterization of Sptrx, a novel member of the thioredoxin family specifically expressed in human spermatozoa, J. Biol. Chem. 276 (2001) 31567–31574, https://doi.org/10.1074/jbc.M101760200.
[38] R. Nawrotzki, N.Y. Loh, M.A. Ruegg, K.E. Davies, D.J. Blake, Characterisation of alphadystrobrevin in muscle, J. Cell Sci. 111 (1998) 2595–2605. [39] H.M. Sadoulet-Puccio, L.M. Kunkel, Dystrophin and its isoforms, Brain Pathol. 6 (1996) 25–35, https://doi.org/10.1111/j.1750-3639.1996.tb00780.x. [40] K. Satoh, H. Yanai, T. Senda, K. Kohu, T. Nakamura, N. Okumura, A. Matsumine, S. Kobayashi, S. Toyoshimax, T. Akiyama, DAP-1, a novel protein that interacts with the guanylate kinase-like domains of hDLG and PSD-95, Genes Cells 2 (1997) 415–424, https://doi.org/10.1046/j.1365-2443.1997.1310329.x. [41] F. Vidal, F. Baudoinx, C. Miquel, M.F. Galliano, A.M. Christiano, J. Uitto, J.P. Ortonne, G. Meneguzzi, Cloning of the laminin alpha 3 chain gene (LAMA3) and identification of a homozygous deletion in a patient with Herlitz junctional epidermolysis bullosa, Genomics 30 (1995) 273–280, https://doi.org/10.1006/geno.1995.9877. [42] Y. Wu, L. Yu, G. Bi, K. Luo, G. Zhou, S. Zhao, Identification and characterization of two novel human SCAN domain-containing zinc finger genes ZNF396 and ZNF397, Gene 310 (2003) 193–201, https://doi.org/10.1016/S0378-1119(03)00551-1. [43] A. Zhang, P.L. Yeung, C.W. Li, S.C. Tsai, G.K. Dinh, X. Wu, H. Li, J.D. Chen, Identification of a novel family of ankyrin repeats containing cofactors for p160 nuclear receptor coactivators, J. Biol. Chem. 279 (2004) 33799–33805, https://doi.org/10.1074/jbc. M403997200. [44] P. Zhang, D.W. Chan, Y. Zhu, J.J. Li, I.O. Ng, D. Wan, J. Gu, Identification of carboxypeptidase of glutamate like-B as a candidate suppressor in cell growth and metastasis in human hepatocellular carcinoma, Clin. Cancer Res. 12 (2006) 6617–6625, https:// doi.org/10.1158/1078-0432.CCR-06-1307. [45] S.K. Hanks, T. Hunter, Protein kinases 6. The eukaryotic protein kinase superfamily: kinase (catalytic) domain structure and classification, FASEB J. 9 (1995) 576–596, https://doi.org/10.1096/fasebj.9.8.7768349. [46] M. Mizuki, R. Fenski, H. Halfter, I. Matsumura, R. Schmidt, C. Müller, W. Grüning, K. Kratz-Albers, S. Serve, C. Steur, T. Büchner, J. Kienast, Y. Kanakura, W.E. Berdel, H. Serve, Flt3 mutations from patients with acute myeloid leukemia induce transformation of 32D cells mediated by the Ras and STAT5 pathways, Blood 96 (2000) 3907–3914. [47] T. Taketani, T. Taki, K. Sugita, Y. Furuichix, E. Ishii, R. Hanada, M. Tsuchida, K. Sugita, K. Ida, Y. Hayashi, FLT3 mutations in the activation loop of tyrosine kinase domain are frequently found in infant ALL with MLL rearrangements and pediatric ALL with hyperdiploidy, Blood 103 (2004) 1085–1088, https://doi.org/10.1182/blood2003-02-0418. [48] Z. Castaño, P.R. Gordon-Weeks, R.M. Kypta, The neuron-specific isoform of glycogen synthase kinase-3beta is required for axon growth, J. Neurochem. 113 (2010) 117–130, https://doi.org/10.1111/j.1471-4159.2010.06581.x. [49] C.A. Ocasio, M.B. Rajasekaran, S. Walker, D. Le Grand, J. Spencer, F.M. Pearl, S.E. Ward, V. Savic, L.H. Pearl, H. Hochegger, A.W. Oliver, A first generation inhibitor of human Greatwall kinase, enabled by structural and functional characterization of a minimal kinase domain construct, Oncotarget 7 (2016) 71182–71197, https:// doi.org/10.18632/oncotarget.11511/. [50] V. Gonçalves, J.F.S. Pereira, P. Jordan, Signaling pathways driving aberrant splicing in cancer cells, Genes (Basel) 9 (2017) https://doi.org/10.3390/genes9010009pii: E9. [51] V.N. Uversky, A.K. Dunker, Multiparametric analysis of intrinsically disordered proteins: looking at intrinsic disorder through compound eyes, Anal. Chem. 84 (2012) 2096–2104, https://doi.org/10.1021/ac203096k.