CHAPTER ONE
Protein intrinsic disorder and structure-function continuum Vladimir N. Uversky* Department of Molecular Medicine and USF Health Byrd Alzheimer’s Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, United States Laboratory of New Methods in Biology, Institute for Biological Instrumentation, Russian Academy of Sciences, Moscow, Russia *Corresponding author: e-mail address:
[email protected]
Contents 1. Introduction 2. Locks, keys, and protein functionality 3. Intrinsic disorder, multifunctionality, and “moonlighting” vs. structure-function paradigm 4. Proteoforms against “one gene–one enzyme” hypothesis 5. Intrinsic disorder and proteoforms 6. Proteoforms and structural flexibility of ordered proteins 7. Proteoforms and protein-structure continuum References
2 3 4 5 6 9 10 11
Abstract The functional proteome of a given organism noticeably exceeds its corresponding genome due to various events at the DNA (genetic variations), mRNA (alternative splicing, alternative promoter usage, alternative initiation of translation, and mRNA editing), and protein levels (post-translational modifications) that results in the appearance of various proteoforms; i.e., different molecular forms in which the protein product of a single gene can be found. In addition to these induced proteoforms, basic (or intrinsic, or conformational) proteoforms are generated due to the presence of intrinsically disordered or structurally flexible regions in a protein. Furthermore, protein functionality can affect the structural ensemble of both conformational and induced proteoforms, and hence serves as a factor generating functioning proteoforms. Therefore, a single gene encodes for a wide array of different proteoforms, which represents the foundation for protein multifunctionality. In other words, instead of the classical protein structure-function paradigm rooted in the “one-gene–one-protein–one-function” model, a correlation between between protein structure and function is described by a more general “protein structure-function continuum” model, where a given protein exists as a dynamic conformational ensemble containing multiple proteoforms (conformational/basic, inducible/modified, and functioning) characterized by a broad spectrum of structural features and possessing various functional potentials.
Progress in Molecular Biology and Translational Science, Volume 166 ISSN 1877-1173 https://doi.org/10.1016/bs.pmbts.2019.05.003
#
2019 Elsevier Inc. All rights reserved.
1
2
Vladimir N. Uversky
1. Introduction The overall importance of proteins cannot be overestimated, since they serve as a basis of structure and function of all living cells and play a number of crucial roles in the maintenance of life. At the molecular level, (almost) all biological processes in a living cell are associated with various activities of different proteins, thus “life is the mode of existence of proteinaceous bodies.”1 The diversity of protein functions is endless: some serve as simple structural blocks supporting cellular shape; others (enzymes) act as biological catalysis, which lower the activation energy of various biochemical reactions generating a whole host of biogenic compounds and thereby cause dramatic acceleration of these reactions; still other proteins participate in the transduction of different signals, interconversion of the various form of energy, and transmission of information. Protein may act alone or in complexes with other proteins, membranes, nucleic acids, polysaccharides, different small molecules, and ions. Many proteins undergo extensive post-translational modifications that increase the chemical diversity of a polypeptide chain and serve as important, often reversible, regulators of protein functionality. Therefore, it is not surprising that proteins are the most abundant biopolymers of a cell, accounting for 15–20% of cell wet weight (70% is water) and constituting 40–55% of cell dry mass.2–4 Estimation of the expected total number of proteins per unit of cell volume suggested that a cubic micron (i.e., 1 fL) of bacterial, yeast, and mammalian cells might contain 2–4 million proteins.5 Therefore, since the content of a total protein in a cell scales roughly linearly with cell mass and volume,5 there are 3–4 million proteins per E. coli cell of 1 μm3 volume, a haploid budding yeast cell of characteristic volume 40 μm3 contains 100–150 million proteins per cell, and mammalian cells with characteristic volumes of 2000–4000 μm3 might contain 1010 proteins per cell.5 Obviously, the amount of total protein is not the only that increases while transitioning from bacteria to mammals, but the protein variety increases as well. In fact, there are 5000–10,000 different proteins in a bacterial cell, whereas this number increases to 10,000–20,000 in a typical mammalian cell.6 Of course, not all these different proteins are present in a cell in equal quantities, and the fraction of the proteome covered by the top 1000 proteins accounts for more than 90% of protein copies.7,8
Protein intrinsic disorder and structure-function continuum
3
2. Locks, keys, and protein functionality For a long time, the classic protein structure-function paradigm, which states that unique 3D structure is a prerequisite to function, was unquestioned. This paradigm, rooted in the “lock and key” hypothesis formulated in 1894 by Fischer9,10 and subsequently supported by the crystal structures of proteins solved by X-ray diffraction, careful analysis of protein denaturation and unfolding, and many other observations, dominated scientific minds for more than a century. Subsequent studies revealed that such a rigid picture of protein-substrate interaction is an oversimplification, and this interaction is better described by the induced fit model, where binding of an enzyme to the appropriate substrate is accompanied by subtle changes in a protein’s active site.11 It was believed that specific functionality of a given protein is defined by a unique spatial positioning of its amino acid side chains and prosthetic groups. This idea was already undoubtedly supported by the first 3D crystal structure determined for an enzyme co-crystallized with its inhibitor, which showed that an active site of lysozyme was characterized by precise positioning of certain amino acid side chains, indicating that this positioning almost certainly facilitates catalysis.12 This specific spatial arrangement of functional groups in biologically active proteins was assumed to be defined by their unique 3D structures, predetermined by unique amino acid sequences encoded in unique genes. Walking this sequence backward generates the famous “one gene–one enzyme” hypothesis, according to which each gene is responsible for producing a single enzyme that in turn affects a single step in a metabolic pathway.13 Understanding the relationships between protein structure and function remains a primary focus of structural biology and represents a key problem lying at the junction of modern biochemistry, biophysics, molecular biology, genetics, protein engineering, and bioinformatics. Since protein misfolding and dysfunction are associated with the pathogenesis of various diseases, understanding the protein structurefunction relationship has important biomedical implementations. It is recognized now that the aforementioned “one gene–one enzyme” hypothesis and the related protein structure-function paradigm, according to which the specific functionality of a given protein is determined by its unique 3D structure where a protein and substrate have to fit to each other like a lock and key in order to exert a chemical effect on each other,
4
Vladimir N. Uversky
are oversimplifications. In fact, there are numerous findings that cannot be fit into or explained by the these widely accepted theoretical models.14 Although these paradigm nonconcordant findings were originally treated as anomalies or rare exceptions, they continued to pile up, and, over time, created enough grounds for considering a possibility that the paradigm should be changed.15–18
3. Intrinsic disorder, multifunctionality, and “moonlighting” vs. structure-function paradigm Recent studies unequivocally revealed that not all functional proteins are structured throughout their entire lengths, and many proteins are highly flexible or structurally disordered as a whole or contain substantial disordered regions.15–26 Bioinformatics studies indicated that intrinsically disordered proteins (IDPs) and hybrid proteins with ordered domains and intrinsically disordered protein regions (IDPRs) are highly abundant in nature,27–31 with 25–30% of eukaryotic proteins being mostly disordered,28 more than half of eukaryotic proteins having long regions of disorder,27,28,30 and >70% of signaling proteins possessing long disordered regions.32 The functions of IDPs and IDPRs are typically complementary to functions of ordered proteins and domains.15,16,18,26,33 Instead of possessing unique 3D structures, these IDPs and IDPRs exist as highly dynamic ensembles of rapidly interconverting conformations (or highly dynamic sets of short-lived structures) either at the secondary or tertiary structure levels, and can be present in collapsed (molten globulelike) or extended (coil- or pre-molten globule-like) forms.17,19,23,24,34,35 Despite an obvious structure-function paradigm-based expectation that structural “floppiness” would be incompatible with protein functionality, many protein functions originate from the lack of ordered structure in a protein molecule. In fact, many IDPs/IDPRs are known to be promiscuous binders involved in regulation and control of various cellular processes.15,17,23,24,34,36–40 Functionally, these IDPs/IDPRs can be grouped into several broad classes, such as molecular recognition, molecular assembly, protein modification, entropic chain activities, and RNA and protein chaperones.37,41 Furthermore, structural “floppiness” also defines the ability of IDP/IDPR to be controlled and regulated at multiple levels,35,38,40,42,43 with various post-translational modifications (PTMs) being one of the most important means of such disorder-centered regulation.44,45
Protein intrinsic disorder and structure-function continuum
5
Among several consequences of the presence of intrinsic disorder in proteins is their multifunctionality, which is mostly determined by the mosaic architecture of IDPs/IDPRs (see below), where multiple relatively short and differently folded functional elements are spread within the amino acid sequences.43 The aforementioned multifunctionality is a specific feature of “moonlighting” proteins, many of which were shown to be either completely disordered or possess long IDPRs.46 Often, IDPs and hybrid proteins with IDPRs are specifically compartmentalized within a cell, being found in different proteinaceous membrane-less organelles (PMLOs).47–51
4. Proteoforms against “one gene–one enzyme” hypothesis Another important observation in strict contradiction to the influential “one gene–one enzyme” hypothesis is the fact that the number of functionally different proteins dramatically exceeds the number of proteinencoding genes. For example, although the number of protein-coding genes in a human cell approaches 20,700,52 the number of functionally different proteins is several orders of magnitude higher (in the range of several hundred thousands, if not a million). Humans are not an exception, and functional proteomes of many (if not all) organisms are noticeably larger than their corresponding genomes. Therefore, it seems that the major part of the complexity of the biological machinery is determined by protein variation, rather than results from a high number of distinct genes.53 The increased size of a functional proteome over the size of a corresponding genome is determined by multiple factors, such as the allelic variations (i.e., single or multiple point mutations, indels, SNPs), different pre-translational mechanisms affecting genes (e.g., production of numerous mRNA variants by the alternative splicing and mRNA editing), and changes induced in proteins by various post-translational modifications (PTMs).54–58 Therefore, allelic variations, pre-translational, and post-translational modifications define the biological complexity at the DNA, mRNA, and protein levels, respectively.59 These events also ensure that multiple proteoforms, which are distinct protein molecules with different functions, are created from a single gene.59 Furthermore, in addition to the aforementioned mechanisms increasing the chemical variability of a polypeptide chain, protein structural diversity can be further increased by intrinsic disorder, structural flexibility, and
6
Vladimir N. Uversky
Fig. 1 Schematic representation of the classic “one-gene–one-protein–one-function” paradigm (top part, blue) and its modification by alternative splicing and PTMs when affected genes encode ordered proteins (middle part, pink) or intrinsically disordered and hybrid proteins containing ordered and intrinsically disordered domains (bottom part, red). Reproduced with permission from Uversky VN. (Intrinsically disordered) splice variants in the proteome: implications for novel drug discovery. Genes & Genomics. 2016a;38(7):577–594. doi:10.1007/s13258-015-0384-0.
functioning (see below).60 As a result, instead of the oversimplified “onegene–one-protein” view, the actual gene-protein relationship is much more complex, often being in line with the “one-gene–many-proteins” or “onegene–many-functions” concept (see Fig. 1).60,61
5. Intrinsic disorder and proteoforms Importantly, the original definition of proteoforms as different molecular forms in which the protein product of a single gene can be found, including changes due to genetic variations, alternatively spliced mRNA, and PTMs59 was based on the “rigid” view of a protein molecule as a biological entity with a unique structure, which can be modified by mutations, alternative splicing, and PTMs. However, the major mechanisms leading to the generation of proteoforms, such as alternative splicing and PTMs, are
Protein intrinsic disorder and structure-function continuum
7
intimately linked to the phenomenon of protein intrinsic disorder. In fact, mRNA regions affected by alternative splicing predominantly encode IDPRs.62 Sites of many enzymatically catalyzed PTMs (e.g., phosphorylation, acetylation, methylation, glycosylation, etc.) are known to be preferentially found within the IDPRs.44,45 Furthermore, functionality-based proteoforms also have their roots in intrinsic disorder, since IDPs/IDPRs are known to be highly promiscuous binders15–19,21,22,26,36,37,42,43,63–71 that can differently fold at interaction with different partners,72–74 and since IDPs/IDPRs can be involved in the “binding chain reactions,” where binding-induced (partial) folding of an IDP/IDPR can generate a new conformation with a novel binding site capable of interaction with a new partner, binding of which leads to some additional structural rearrangements resulting in the appearance of a new binding cite capable of interaction with a new partner, and so on.75,76 All this indicates that IDPs/IDPRs represent a very rich source of proteoforms. One should keep in mind that the presence of intrinsic disorder adds further levels of complexity to the proteoform concept. A protein molecule is often considered as a semi-homogeneous entity, entirely ordered or disordered, or consisting of homogeneously ordered and/or disordered regions/domains. However, this picture is another oversimplification, as proteins typically comprise differently folded regions with different levels of conformational stability. In fact, at the basic level, the inability of IDPs/IDPRs to have unique 3D-structures is determined by the specific features of their amino acid sequences, such as compositional biases (depletion in order-promoting residues Trp, Tyr, Phe, Ile, Leu, Val, Cys, and Asn and enrichment in disorder-promoting residues Ala, Arg, Gly, Gln, Ser, Glu, Lys, and Pro), low sequence complexity, presence of repeats, low overall hydrophobicity, high net charge, and many others.15,68,77–79 However, this inability to fold is unequally distributed within a protein molecule, with its different parts being under-folded to different degrees. This defines the mosaic structure and astonishing multi-level spatiotemporal heterogeneity of IDP/IDPR that represent a complex combination of foldons (independently foldable units), inducible foldons (disordered regions that can (partially) fold at interaction with the binding partners), morphing inducible foldons (disordered regions that can differently fold at interaction with different binding partners), non-foldons (non-foldable protein regions), semi-foldons (regions that are always in a semi-folded form), and unfoldons (ordered regions that have to undergo an order-to-disorder transition to
8
Vladimir N. Uversky
become functional).35,43,76,80,81 One should also keep in mind that it is extremely unlikely that different foldons in a given protein would possess identical conformational stability. Furthermore, foldons are known to continually unfold and refold even under native conditions.82–84 As a result, at any given moment, even a well-folded, ordered protein would have a mosaic structure, possessing a set of temporary folded and unfolded foldons.81 Fig. 2 illustrates that these differently under-folded pieces of the protein structural mosaic might have well-defined and specific functions.81 Such structural mosaic defines complex “anatomy” of an IDP that might contain multiple relatively short, differently ordered/disordered functional elements defining its complex molecular “physiology” reflected in its multifunctionality and ability to be involved in interaction with, regulation of, and be controlled by multiple structurally unrelated partners.81 Therefore, IDPs/IDPRs are structurally and functionally heterogeneous complex systems, which, due to their highly dynamic nature, exist as basic (or conformational, or intrinsic) proteoforms.60,81
Fig. 2 Schematic representation of the mosaic nature of the protein structurefunction space. One should keep in mind that “Dormant disorder” is different from the other “outer-ring” functional grouping, since the corresponding segment does not describe a particular functional group, but rather represents the means by which the functionality is achieved. Reproduced with permission from Uversky VN. Functional roles of transiently and intrinsically disordered regions within proteins. FEBS J. 2015;282 (7):1182–1189. doi:10.1111/febs.13202.
Protein intrinsic disorder and structure-function continuum
9
6. Proteoforms and structural flexibility of ordered proteins One should also keep in mind that even though “normal” (ordered) proteins are characterized by unique 3D structures, they cannot be considered as completely rigid, crystal-like entities. On the contrary, the importance of conformational flexibility and the need of structural dynamics for the successful functionality of globular proteins (even enzymes) was emphasized in many studies over the past 55 years.11,85–96 As a matter of fact, biological functions of enzymes are critically dependent on their internal dynamics, where individual amino acid residues, groups of amino acids, and even entire domains move relative to each other in a wide range of time-scales, from femtoseconds to seconds, to facilitate catalytic activity.85,88,89 It was also emphasized that functional conformational changes and allosteric behavior of globular proteins can rely on the existence of conformational substates, which can be described as the atomic displacements leading to the formation and interconversion of different local configurations of the same overall protein structure.97–103 This idea is illustrated by Fig. 3, representing potential energy landscapes of ordered and disordered proteins.104 An energy landscape of an ordered protein is characterized by a specific funnel-like shape, where a broad mouth at the top represents a set of unfolded conformations and the narrow end at the bottom shows the lowest energy state that corresponds to the native structure.105–109 On the other hand, an energy landscape of an IDP is relatively flat, but rough, with multiple local energy minima separated by small barriers.43,104,110,111 Finally, careful analysis of the bottom of the funnel-shaped energy landscape revealed that for many proteins, the surface of the energy minimum is actually not smooth, and is instead rough because of the presence of many smaller minima corresponding to different states sampled by a protein (see Fig. 3). This is definitely the case for so-called hybrid proteins containing ordered domains and IDPRs and even for “normal” ordered proteins whose structures have been solved via X-ray crystallography or by NMR and which are considered to be folded, but still often contain both ordered regions and intrinsically disordered regions lacking a stable tertiary structure.104,112 Therefore, conformational flexibility of ordered and hybrid proteins represents an important source of structural heterogeneity that can serve as a foundation for generating basic/conformational proteoforms.
10
Vladimir N. Uversky
Fig. 3 Schematic representation of the energy landscapes for (A) an ordered protein; (B) an IDP; and (C) a close-up view of the bottom of the funnel-like energy landscape of a hybrid protein containing ordered domains (shown in white) and IDPRs (shown in red). Reproduced with the permission from Burger VM, Gurry T, Stultz CM. Intrinsically disordered proteins: where computation meets experiment. Polymers. 2014;6 (10):2684–2719. doi:10.3390/polym6102684.
7. Proteoforms and protein-structure continuum The data in the previous sections suggest that even without mutations, PTMs, or alternative splicing, any given protein can be considered as a basic (or intrinsic, or conformational) proteoform since structurally it represents a dynamic conformational ensemble, members of which have different structures. Structural differences between the members of these conformational ensembles could be rather subtle, as in the case of ordered proteins, or substantial, as in the case of IDPs/IDPRs and hybrid proteins. Potentially, these differently structured members of a dynamic conformational ensemble may have different functions. Such conformational proteoforms are different from the inducible (or modified) proteoforms generated by the various alterations at DNA, RNA, or protein level because of mutations, alternative splicing, mRNA editing, and PTMs,
Protein intrinsic disorder and structure-function continuum
11
respectively. Obviously, any inducible (or modified) proteoform (i.e., any mutated, alternatively spliced, or post-translationally modified form of a given protein) is a conformational proteoform itself since it also represents a structural ensemble. Furthermore, protein function, interaction with specific partners, or even just placement of a protein inside its natural cellular environment, which is extremely crowded and characterized by the presence of high concentrations of various biological macromolecules,113–115 has limited available volume,116 and contains restricted amounts of free water,115,117–121 can affect structural ensemble of both basic and induced proteoforms. Therefore, protein functionality per se can be considered as a factor responsible for generation of new functioning proteoforms.60,81 Taken together, these considerations suggest that the “one-gene–oneprotein–one-function” is outdated and should be substituted by the “onegene–many-proteins–many-functions” paradigm. In other words, instead of an old protein structure-function model, according to which a unique function can only be conducted by a protein with a unique 3D-structure, information about which is encoded in a unique amino acid sequence, a more general model should be introduced for global linkage of protein structure and function. This more general “protein structure-function continuum” model postulates that a given protein might exist as a dynamic conformational ensemble containing multiple proteoforms of different origin (conformational/basic, inducible/modified, and functioning) characterized by a broad spectrum of structural features and possessing different functional potentials.
References 1. Engels F, Dutt CP, Haldane JBS. Dialektik der Natur. International Publishers; 1940. 2. Feijo Delgado F, Cermak N, Hecht VC, et al. Intracellular water exchange for measuring the dry mass, water mass and changes in chemical composition of living cells. PLoS One. 2013;8(7):e67590. https://doi.org/10.1371/journal.pone.0067590. 3. Neidhardt FC, Ingraham JL, Schaechter M. Physiology of the Bacterial Cell: A Molecular Approach. Sunderland, Mass.: Sinauer Associates Inc.; 1990. 4. Neidhardt FC, Umbarger HE. Chemical composition of Escherichia coli. In: Neidhardt FC, Curtiss IR, Ingraham JL, Lin ECC, Low KB, Magasanik B, Reznikoff WS, Riley M, Schaechter M, Umbarger HE, eds. Escherichia coli and Salmonella: Cellular and Molecular Biology. 2nd ed. Washington, D.C: American Society of Microbiology (ASM) Press; 1996: vol. 1. 5. Milo R. What is the total number of protein molecules per cell volume? A call to rethink some published values. Bioessays. 2013;35(12):1050–1055. https://doi.org/ 10.1002/bies.201300066. 6. Hartl FU. Protein misfolding diseases. Annu Rev Biochem. 2017;86:21–26. https:// doi.org/10.1146/annurev-biochem-061516-044518.
12
Vladimir N. Uversky
7. Geiger T, Velic A, Macek B, et al. Initial quantitative proteomic map of 28 mouse tissues using the SILAC mouse. Mol Cell Proteomics. 2013;12(6):1709–1722. https:// doi.org/10.1074/mcp.M112.024919. 8. Nagaraj N, Wisniewski JR, Geiger T, et al. Deep proteome and transcriptome mapping of a human cancer cell line. Mol Syst Biol. 2011;7:548. https://doi.org/10.1038/msb. 2011.81. 9. Fischer E. Einfluss der configuration auf die wirkung der enzyme. Ber Dt Chem Ges. 1894;27:2985–2993. 10. Lemieux UR, Spohr U. How Emil Fischer was led to the lock and key concept for enzyme specificity. Adv Carbohydr Chem Biochem. 1994;50:1–20. 11. Koshland DE. Application of a theory of enzyme specificity to protein synthesis. Proc Natl Acad Sci U S A. 1958;44(2):98–104. 12. Blake CC, Koenig DF, Mair GA, North AC, Phillips DC, Sarma VR. Structure of hen egg-white lysozyme. A three-dimensional Fourier synthesis at 2 Angstrom resolution. Nature. 1965;206(986):757–761. 13. Beadle GW, Tatum EL. Genetic control of biochemical reactions in Neurospora. Proc Natl Acad Sci U S A. 1941;27(11):499–506. 14. Bussard AE. A scientific revolution? The prion anomaly may challenge the central dogma of molecular biology. EMBO Rep. 2005;6(8):691–694. https://doi.org/ 10.1038/sj.embor.7400497. 15. Dunker AK, Lawson JD, Brown CJ, et al. Intrinsically disordered protein. J Mol Graph Model. 2001;19(1):26–59. 16. Tompa P. Intrinsically unstructured proteins. Trends Biochem Sci. 2002;27(10):527–533. https://doi.org/10.1016/S0968-0004(02)02169-2. 17. Uversky VN, Gillespie JR, Fink AL. Why are “natively unfolded” proteins unstructured under physiologic conditions? Proteins. 2000;41(3):415–427. https://doi.org/ 10.1002/1097-0134(20001115)41:3<415::AID-PROT130>3.0.CO;2-7. 18. Wright PE, Dyson HJ. Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. J Mol Biol. 1999;293(2):321–331. https://doi.org/ 10.1006/jmbi.1999.3110. 19. Dunker AK, Obradovic Z. The protein trinity—linking function and disorder. Nat Biotechnol. 2001;19(9):805–806. 20. Dunker AK, Oldfield CJ, Meng J, et al. The unfoldomics decade: an update on intrinsically disordered proteins. BMC Genomics. 2008;9(Suppl 2):S1. https://doi.org/ 10.1186/1471-2164-9-S2-S1. 21. Dunker AK, Silman I, Uversky VN, Sussman JL. Function and structure of inherently disordered proteins. Curr Opin Struct Biol. 2008;18(6):756–764. https://doi.org/ 10.1016/j.sbi.2008.10.002. 22. Dyson HJ, Wright PE. Intrinsically unstructured proteins and their functions. Nat Rev Mol Cell Biol. 2005;6(3):197–208. https://doi.org/10.1038/nrm1589. 23. Uversky VN. Natively unfolded proteins: a point where biology waits for physics. Protein Sci. 2002;11(4):739–756. https://doi.org/10.1110/ps.4210102. 24. Uversky VN. What does it mean to be natively unfolded? Eur J Biochem. 2002;269(1):2–12. https://doi.org/10.1046/j.0014-2956.2001.02649.x. 25. Uversky VN. The mysterious unfoldome: structureless, underappreciated, yet vital part of any given proteome. J Biomed Biotechnol. 2010;2010:568068. https://doi.org/ 10.1155/2010/568068. 26. Uversky VN, Dunker AK. Understanding protein non-folding. Biochim Biophys Acta. 2010;1804(6):1231–1264. https://doi.org/10.1016/j.bbapap.2010.01.017. 27. Dunker AK, Obradovic Z, Romero P, Garner EC, Brown CJ. Intrinsic protein disorder in complete genomes. Genome Inform Ser Workshop Genome Inform. 2000;11: 161–171.
Protein intrinsic disorder and structure-function continuum
13
28. Oldfield CJ, Cheng Y, Cortese MS, Brown CJ, Uversky VN, Dunker AK. Comparing and combining predictors of mostly disordered proteins. Biochemistry. 2005;44(6): 1989–2000. 29. Peng Z, Yan J, Fan X, et al. Exceptionally abundant exceptions: comprehensive characterization of intrinsic disorder in all domains of life. Cell Mol Life Sci. 2015;72(1): 137–151. https://doi.org/10.1007/s00018-014-1661-9. 30. Ward JJ, Sodhi JS, McGuffin LJ, Buxton BF, Jones DT. Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J Mol Biol. 2004;337(3):635–645. https://doi.org/10.1016/j.jmb.2004.02.002. 31. Xue B, Dunker AK, Uversky VN. Orderly order in protein intrinsic disorder distribution: disorder in 3500 proteomes from viruses and the three domains of life. J Biomol Struct Dyn. 2012;30(2):137–149. https://doi.org/10.1080/07391102.2012. 675145. 32. Iakoucheva LM, Brown CJ, Lawson JD, Obradovic Z, Dunker AK. Intrinsic disorder in cell-signaling and cancer-associated proteins. J Mol Biol. 2002;323(3):573–584. 33. Oldfield CJ, Dunker AK. Intrinsically disordered proteins and intrinsically disordered protein regions. Annu Rev Biochem. 2014;83:553–584. https://doi.org/10.1146/ annurev-biochem-072711-164947. 34. Uversky VN. Protein folding revisited. A polypeptide chain at the folding-misfoldingnonfolding cross-roads: which way to go? Cell Mol Life Sci. 2003;60(9):1852–1871. 35. Uversky VN. A decade and a half of protein intrinsic disorder: biology still waits for physics. Protein Sci. 2013;22(6):693–724. https://doi.org/10.1002/pro.2261. 36. Dunker AK, Brown CJ, Lawson JD, Iakoucheva LM, Obradovic Z. Intrinsic disorder and protein function. Biochemistry. 2002;41(21):6573–6582. 37. Dunker AK, Brown CJ, Obradovic Z. Identification and functions of usefully disordered proteins. Adv Protein Chem. 2002;62:25–49. 38. Habchi J, Tompa P, Longhi S, Uversky VN. Introducing protein intrinsic disorder. Chem Rev. 2014;114(13):6561–6588. https://doi.org/10.1021/cr400514h. 39. Uversky VN. The most important thing is the tail: multitudinous functionalities of intrinsically disordered protein termini. FEBS Lett. 2013;587(13):1891–1901. https:// doi.org/10.1016/j.febslet.2013.04.042. 40. van der Lee R, Buljan M, Lang B, et al. Classification of intrinsically disordered regions and proteins. Chem Rev. 2014;114(13):6589–6631. https://doi.org/10.1021/ cr400525m. 41. Tompa P, Csermely P. The role of structural disorder in the function of RNA and protein chaperones. FASEB J. 2004;18(11):1169–1175. 42. Uversky VN. Multitude of binding modes attainable by intrinsically disordered proteins: a portrait gallery of disorder-based complexes. Chem Soc Rev. 2011;40(3): 1623–1634. https://doi.org/10.1039/c0cs00057d. 43. Uversky VN. Unusual biophysics of intrinsically disordered proteins. Biochim Biophys Acta. 2013;1834(5):932–951. https://doi.org/10.1016/j.bbapap.2012.12.008. 44. Iakoucheva LM, Radivojac P, Brown CJ, et al. The importance of intrinsic disorder for protein phosphorylation. Nucleic Acids Res. 2004;32(3):1037–1049. https://doi.org/ 10.1093/nar/gkh253. 45. Pejaver V, Hsu WL, Xin F, Dunker AK, Uversky VN, Radivojac P. The structural and functional signatures of proteins that undergo multiple events of post-translational modification. Protein Sci. 2014;23(8):1077–1093. https://doi.org/10.1002/pro.2494. 46. Tompa P, Szasz C, Buday L. Structural disorder throws new light on moonlighting. Trends Biochem Sci. 2005;30(9):484–489. https://doi.org/10.1016/j.tibs.2005.07.008. 47. Darling AL, Liu Y, Oldfield CJ, Uversky VN. Intrinsically disordered proteome of human membrane-less organelles. Proteomics. 2018;18(5-6):e1700193. (1700112 pages). https://doi.org/10.1002/pmic.201700193.
14
Vladimir N. Uversky
48. Meng F, Na I, Kurgan L, Uversky VN. Compartmentalization and functionality of nuclear disorder: intrinsic disorder and protein-protein interactions in intra-nuclear compartments. Int J Mol Sci. 2015;17(1). https://doi.org/10.3390/ijms17010024. 49. Uversky VN. Intrinsically disordered proteins in overcrowded milieu: membrane-less organelles, phase separation, and intrinsic disorder. Curr Opin Struct Biol. 2017;44: 18–30. https://doi.org/10.1016/j.sbi.2016.10.015. 50. Uversky VN. Protein intrinsic disorder-based liquid-liquid phase transitions in biological systems: complex coacervates and membrane-less organelles. Adv Colloid Interface Sci. 2017;239:97–114. https://doi.org/10.1016/j.cis.2016.05.012. 51. Uversky VN. The roles of intrinsic disorder-based liquid-liquid phase transitions in the “Dr. Jekyll-Mr. Hyde” behavior of proteins involved in amyotrophic lateral sclerosis and frontotemporal lobar degeneration. Autophagy. 2017;13(12):2115–2162. https:// doi.org/10.1080/15548627.2017.1384889. 52. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489(7414):57–74. https://doi.org/10.1038/ nature11247. 53. Schluter H, Apweiler R, Holzhutter HG, Jungblut PR. Finding one’s way in proteomics: a protein species nomenclature. Chem Cent J. 2009;3:11. https://doi. org/10.1186/1752-153X-3-11. 54. Farrah T, Deutsch EW, Hoopmann MR, et al. The state of the human proteome in 2012 as viewed through peptideatlas. J Proteome Res. 2013;12(1):162–171. https:// doi.org/10.1021/Pr301012j. 55. Farrah T, Deutsch EW, Omenn GS, et al. State of the human proteome in 2013 as viewed through peptideatlas: comparing the kidney, urine, and plasma proteomes for the biology- and disease-driven human proteome project. J Proteome Res. 2014;13(1):60–75. https://doi.org/10.1021/Pr4010037. 56. Kim MS, Pinto SM, Getnet D, et al. A draft map of the human proteome. Nature. 2014;509(7502):575–581. https://doi.org/10.1038/Nature13302. 57. Reddy PJ, Ray S, Srivastava S. The quest of the human proteome and the missing proteins: digging deeper. OMICS. 2015;19(5):276–282. https://doi.org/10.1089/omi. 2015.0035. 58. Uhlen M, Bjorling E, Agaton C, et al. A human protein atlas for normal and cancer tissues based on antibody proteomics. Mol Cell Proteomics. 2005;4(12):1920–1932. https://doi.org/10.1074/mcp.M500279-MCP200. 59. Smith LM, Kelleher NL, Consortium for Top Down Proteomics. Proteoform: a single term describing protein complexity. Nat Methods. 2013;10(3):186–187. https:// doi.org/10.1038/nmeth.2369. 60. Uversky VN. p53 proteoforms and intrinsic disorder: an illustration of the protein structure-function continuum concept. Int J Mol Sci. 2016;17(11):1874. https://doi.org/ 10.3390/ijms17111874. 61. Uversky VN. (Intrinsically disordered) splice variants in the proteome: implications for novel drug discovery. Genes & Genomics. 2016;38(7):577–594. https://doi.org/ 10.1007/s13258-015-0384-0. 62. Romero PR, Zaidi S, Fang YY, et al. Alternative splicing in concert with protein intrinsic disorder enables increased functional diversity in multicellular organisms. Proc Natl Acad Sci U S A. 2006;103(22):8390–8395. https://doi.org/10.1073/pnas. 0507916103. 63. Daughdrill GW, Pielak GJ, Uversky VN, Cortese MS, Dunker AK. Natively disordered proteins. In: Buchner J, Kiefhaber T, eds. Handbook of Protein Folding. Weinheim, Germany: Wiley-VCH, Verlag GmbH & Co. KGaA; 2005:271–353. 64. Dunker AK, Uversky VN. Signal transduction via unstructured protein conduits. Nat Chem Biol. 2008;4(4):229–230.
Protein intrinsic disorder and structure-function continuum
15
65. Dyson HJ, Wright PE. Coupling of folding and binding for unstructured proteins. Curr Opin Struct Biol. 2002;12(1):54–60. 66. Mohan A, Oldfield CJ, Radivojac P, et al. Analysis of molecular recognition features (MoRFs). J Mol Biol. 2006;362(5):1043–1059. https://doi.org/10.1016/j.jmb.2006. 07.087. 67. Oldfield CJ, Cheng Y, Cortese MS, Romero P, Uversky VN, Dunker AK. Coupled folding and binding with alpha-helix-forming molecular recognition elements. Biochemistry. 2005;44(37):12454–12470. https://doi.org/10.1021/bi050736e. 68. Radivojac P, Iakoucheva LM, Oldfield CJ, Obradovic Z, Uversky VN, Dunker AK. Intrinsic disorder and functional proteomics. Biophys J. 2007;92(5):1439–1456. 69. Uversky VN. Disordered competitive recruiter: fast and foldable. J Mol Biol. 2012;418(5):267–268. https://doi.org/10.1016/j.jmb.2012.02.034. 70. Uversky VN, Dunker AK. The case for intrinsically disordered proteins playing contributory roles in molecular recognition without a stable 3D structure. F1000 Biol Rep. 2013;5:1. https://doi.org/10.3410/B5-1. 71. Vacic V, Oldfield CJ, Mohan A, et al. Characterization of molecular recognition features, MoRFs, and their binding partners. J Proteome Res. 2007;6(6):2351–2366. https://doi.org/10.1021/pr0701411. 72. Hsu WL, Oldfield C, Meng J, et al. Intrinsic protein disorder and protein-protein interactions. Pac Symp Biocomput. 2012;2012:116–127. https://doi.org/10.1142/ 9789814366496_0012. 73. Hsu WL, Oldfield CJ, Xue B, et al. Exploring the binding diversity of intrinsically disordered proteins involved in one-to-many binding. Protein Sci. 2013;22(3): 258–273. https://doi.org/10.1002/pro.2207. 74. Oldfield CJ, Meng J, Yang JY, Yang MQ, Uversky VN, Dunker AK. Flexible nets: disorder and induced fit in the associations of p53 and 14-3-3 with their partners. BMC Genomics. 2008;9(Suppl 1):S1. 75. Fuxreiter M, Toth-Petroczy A, Kraut DA, et al. Disordered proteinaceous machines. Chem Rev. 2014;114(13):6806–6843. https://doi.org/10.1021/cr4007329. 76. Uversky VN. Intrinsic disorder-based protein interactions and their modulators. Curr Pharm Des. 2013;19(23):4191–4213. 77. Romero P, Obradovic Z, Li X, Garner EC, Brown CJ, Dunker AK. Sequence complexity of disordered protein. Proteins. 2001;42(1):38–48. 78. Vacic V, Uversky VN, Dunker AK, Lonardi S. Composition profiler: a tool for discovery and visualization of amino acid composition differences. BMC Bioinf. 2007;8:211. https://doi.org/10.1186/1471-2105-8-211. 79. Williams RM, Obradovic Z, Mathura V, et al. The protein non-folding problem: amino acid determinants of intrinsic order and disorder. Pac Symp Biocomput. 2001;89–100. 80. Jakob U, Kriwacki R, Uversky VN. Conditionally and transiently disordered proteins: awakening cryptic disorder to regulate protein function. Chem Rev. 2014;114(13): 6779–6805. https://doi.org/10.1021/cr400459c. 81. Uversky VN. Functional roles of transiently and intrinsically disordered regions within proteins. FEBS J. 2015;282(7):1182–1189. https://doi.org/10.1111/febs.13202. 82. Maity H, Maity M, Englander SW. How cytochrome c folds, and why: submolecular foldon units and their stepwise sequential stabilization. J Mol Biol. 2004;343(1): 223–233. https://doi.org/10.1016/j.jmb.2004.08.005. 83. Maity H, Maity M, Krishna MM, Mayne L, Englander SW. Protein folding: the stepwise assembly of foldon units. Proc Natl Acad Sci U S A. 2005;102(13): 4741–4746. https://doi.org/10.1073/pnas.0501043102. 84. Maity H, Rumbley JN, Englander SW. Functional role of a protein foldon—an Omega-loop foldon controls the alkaline transition in ferricytochrome c. Proteins. 2006;63(2):349–355. https://doi.org/10.1002/prot.20757.
16
Vladimir N. Uversky
85. Agarwal PK. Role of protein dynamics in reaction rate enhancement by enzymes. J Am Chem Soc. 2005;127(43):15248–15256. https://doi.org/10.1021/ja055251s. 86. Agarwal PK, Billeter SR, Rajagopalan PT, Benkovic SJ, Hammes-Schiffer S. Network of coupled promoting motions in enzyme catalysis. Proc Natl Acad Sci U S A. 2002;99(5):2794–2799. https://doi.org/10.1073/pnas.052005999. 87. Agarwal PK, Geist A, Gorin A. Protein dynamics and enzymatic catalysis: investigating the peptidyl-prolyl cis-trans isomerization activity of cyclophilin A. Biochemistry. 2004;43(33):10605–10618. https://doi.org/10.1021/bi0495228. 88. Eisenmesser EZ, Bosco DA, Akke M, Kern D. Enzyme dynamics during catalysis. Science. 2002;295(5559):1520–1523. https://doi.org/10.1126/science.1066176. 89. Eisenmesser EZ, Millet O, Labeikovsky W, et al. Intrinsic dynamics of an enzyme underlies catalysis. Nature. 2005;438(7064):117–121. https://doi.org/10.1038/ nature04105. 90. Frauenfelder H, Chen G, Berendzen J, et al. A unified model of protein dynamics. Proc Natl Acad Sci U S A. 2009;106(13):5129–5134. https://doi.org/10.1073/pnas. 0900336106. 91. Olsson MH, Parson WW, Warshel A. Dynamical contributions to enzyme catalysis: critical tests of a popular hypothesis. Chem Rev. 2006;106(5):1737–1756. https://doi. org/10.1021/cr040427e. 92. Rajagopalan PT, Benkovic SJ. Preorganization and protein dynamics in enzyme catalysis. Chem Rec. 2002;2(1):24–36. https://doi.org/10.1002/tcr.10009. 93. Sutcliffe MJ, Scrutton NS. A new conceptual framework for enzyme catalysis. Hydrogen tunnelling coupled to enzyme dynamics in flavoprotein and quinoprotein enzymes. Eur J Biochem. 2002;269(13):3096–3102. https://doi.org/10.1046/j.1432-1033.2002. 03020.x. 94. Tousignant A, Pelletier JN. Protein motions promote catalysis. Chem Biol. 2004;11(8):1037–1042. https://doi.org/10.1016/j.chembiol.2004.06.007. 95. Villa J, Strajbl M, Glennon TM, Sham YY, Chu ZT, Warshel A. How important are entropic contributions to enzyme catalysis? Proc Natl Acad Sci U S A. 2000;97(22): 11899–11904. https://doi.org/10.1073/pnas.97.22.11899. 96. Yang LW, Bahar I. Coupling between catalytic site and collective dynamics: a requirement for mechanochemical activity of enzymes. Structure. 2005;13(6):893–904. https:// doi.org/10.1016/j.str.2005.03.015. 97. Artymiuk PJ, Blake CC, Grace DE, Oatley SJ, Phillips DC, Sternberg MJ. Crystallographic studies of the dynamic properties of lysozyme. Nature. 1979;280(5723): 563–568. 98. Austin RH, Beeson KW, Eisenstein L, Frauenfelder H, Gunsalus IC. Dynamics of ligand binding to myoglobin. Biochemistry. 1975;14(24):5355–5373. 99. Beece D, Eisenstein L, Frauenfelder H, et al. Solvent viscosity and protein dynamics. Biochemistry. 1980;19(23):5147–5157. 100. Frauenfelder H, Petsko GA. Structural dynamics of liganded myoglobin. Biophys J. 1980;32(1):465–483. https://doi.org/10.1016/S0006-3495(80)84984-8. 101. Frauenfelder H, Petsko GA, Tsernoglou D. Temperature-dependent X-ray diffraction as a probe of protein structural dynamics. Nature. 1979;280(5723):558–563. 102. Hartmann H, Parak F, Steigemann W, Petsko GA, Ponzi DR, Frauenfelder H. Conformational substates in a protein: structure and dynamics of metmyoglobin at 80 K. Proc Natl Acad Sci U S A. 1982;79(16):4967–4971. 103. Parak F, Frolov EN, Mossbauer RL, Goldanskii VI. Dynamics of metmyoglobin crystals investigated by nuclear gamma resonance absorption. J Mol Biol. 1981;145(4): 825–833. https://doi.org/10.1016/0022-2836(81)90317-X. 104. Burger VM, Gurry T, Stultz CM. Intrinsically disordered proteins: where computation meets experiment. Polymers. 2014;6(10):2684–2719. https://doi.org/ 10.3390/polym6102684.
Protein intrinsic disorder and structure-function continuum
17
105. Leopold PE, Montal M, Onuchic JN. Protein folding funnels: a kinetic approach to the sequence-structure relationship. Proc Natl Acad Sci U S A. 1992;89(18):8721–8725. 106. Onuchic JN, Luthey-Schulten Z, Wolynes PG. Theory of protein folding: the energy landscape perspective. Annu Rev Phys Chem. 1997;48:545–600. https://doi.org/ 10.1146/annurev.physchem.48.1.545. 107. Onuchic JN, Socci ND, Luthey-Schulten Z, Wolynes PG. Protein folding funnels: the nature of the transition state ensemble. Fold Des. 1996;1(6):441–450. https://doi.org/ 10.1016/S1359-0278(96)00060-0. 108. Onuchic JN, Wolynes PG. Theory of protein folding. Curr Opin Struct Biol. 2004;14(1):70–75. https://doi.org/10.1016/j.sbi.2004.01.009. 109. Socci ND, Onuchic JN, Wolynes PG. Protein folding mechanisms and the multidimensional folding funnel. Proteins. 1998;32(2):136–158. https://doi.org/ 10.1002/(SICI)1097-0134(19980801)32:2<136::AID-PROT2>3.0.CO;2-J. 110. Turoverov KK, Kuznetsova IM, Uversky VN. The protein kingdom extended: ordered and intrinsically disordered proteins, their folding, supramolecular complex formation, and aggregation. Prog Biophys Mol Biol. 2010;102(2–3):73–84. https:// doi.org/10.1016/j.pbiomolbio.2010.01.003. 111. Uversky VN, Oldfield CJ, Dunker AK. Intrinsically disordered proteins in human diseases: introducing the D2 concept. Annu Rev Biophys. 2008;37:215–246. https:// doi.org/10.1146/annurev.biophys.37.032807.125924. 112. Tompa P. Unstructural biology coming of age. Curr Opin Struct Biol. 2011;21(3): 419–425. https://doi.org/10.1016/j.sbi.2011.03.012. 113. Rivas G, Ferrone F, Herzfeld J. Life in a crowded world. EMBO Rep. 2004;5(1):23–27. https://doi.org/10.1038/sj.embor.7400056. 114. van den Berg B, Ellis RJ, Dobson CM. Effects of macromolecular crowding on protein folding and aggregation. EMBO J. 1999;18(24):6927–6933. https://doi.org/10.1093/ emboj/18.24.6927. 115. Zimmerman SB, Trach SO. Estimation of macromolecule concentrations and excluded volume effects for the cytoplasm of Escherichia coli. J Mol Biol. 1991;222(3):599–620. 116. Ellis RJ, Minton AP. Cell biology: join the crowd. Nature. 2003;425(6953):27–28. https://doi.org/10.1038/425027a. 117. Ellis RJ. Macromolecular crowding: obvious but underappreciated. Trends Biochem Sci. 2001;26(10):597–604. 118. Fulton AB. How crowded is the cytoplasm? Cell. 1982;30(2):345–347. 119. Minton AP. Influence of excluded volume upon macromolecular structure and associations in ’crowded’ media. Curr Opin Biotechnol. 1997;8(1):65–69. 120. Minton AP. Protein folding: thickening the broth. Curr Biol. 2000;10(3):R97–R99. 121. Zimmerman SB, Minton AP. Macromolecular crowding: biochemical, biophysical, and physiological consequences. Annu Rev Biophys Biomol Struct. 1993;22:27–65.