Chaos, Solitons and Fractals 35 (2008) 960–966 www.elsevier.com/locate/chaos
Fractal aspects of calcium binding protein structures Adriana Isvoran b
a,*
, Laura Pitulice a, Constantin T. Craescu b, Adrian Chiriac
a
a West University of Timisoara, Department of Chemistry, Pestalozzi 16, 300115 Timisoara, Romania INSERM U759/Institute Curie-Recherche, Centre Universitaire Paris-Sud, Batiment 112, 91405 Orsay, France
Accepted 24 May 2006
Communicated by Prof. M.S. El Naschie
Abstract The structures of EF-hand calcium binding proteins may be classified into two distinct groups: extended and compact structures. In this paper we studied 20 different structures of calcium binding proteins using the fractal analysis. Nine structures show extended shapes, one is semi-compact and the other 10 have compact shapes. Our study reveals different fractal characteristics for protein backbones belonging to different structural classes and these observations may be correlated to the physicochemical forces governing the protein folding. Ó 2006 Elsevier Ltd. All rights reserved.
1. Introduction The function of many proteins requires the structural interaction with calcium ions. For this reason these biomolecules are called calcium binding proteins. They may function either as calcium transport or regulator proteins, both inside and outside the cellular space. These functions are crucially important for the normal morphology and metabolism of the cell and play a significant role in the mechanism of disease processes. A large family of calcium binding proteins is characterized by a structural motif called EF-hand that includes a highly conserved sequence providing the metal ligands. Calmodulin and troponin C are the best known and studied examples of such proteins. The EF-hand proteins can be divided into two broad categories: those that only bind calcium to regulate its concentration (calcium-buffering and calcium transporting proteins) and those which bind calcium to decode its signal (calcium-sensor proteins) [1]. Calcium-buffering and calcium-transporting proteins usually have a compact tertiary structure and are not conformationally sensitive to the calcium binding, in comparison with calcium sensor proteins which have an extended tertiary structure. Most of the calcium binding proteins have four EF-hand structural motifs, organised in two domains, N- and C-terminal (two EF-hand motifs in each domain), connected by a more or less flexible region. The EF-hand motif generally consists of a 12-residue calcium-binding loop flanked by two a-helices. In the compact structure case, the two domains are in close contact but in the extended structures they are clearly separated [2]. The physical forces that stabilize the extended or the compact structures are not completely understood. *
Corresponding author. Tel.: +40 745 901 850; fax: +40 256 190 333. E-mail address:
[email protected] (A. Isvoran).
0960-0779/$ - see front matter Ó 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.chaos.2006.05.098
A. Isvoran et al. / Chaos, Solitons and Fractals 35 (2008) 960–966
961
Recent studies showed that extended proteins have a large net electric charge, high charge density and an even balance of charge between the terminal domains, indicating that the electrostatic repulsion is a dominant factor in the stabilization of these structures [3]. Also, the central flexible region contains many amphiphilic residues. By contrast, globular proteins are stabilized by a hydrophobic core contributed by residues from the two domains [4]. Nevertheless, these general rules do not account for all the experimental observations. For example, a recent study showed that the bovine calmodulin, known as having an extended shape, may also have a compact tertiary structure, despite its high charge density and large electric net charge, which would strongly predict an extended structure [5]. Complete understanding of the physicochemical bases of the folding and interaction of complex biomolecules requires various theoretical and experimental complementary approaches. One of the theoretical methods is the fractal analysis. From this point of view, proteins have an intrinsic self-similarity with regard to the compactness and packing of their structures that suggests a simple form of fractal behavior, but with important consequences for the morphology of the protein. The concept of fractals has been applied recently to a number of properties of proteins. As a result, there has been a rapid accumulation of new information concerning fractal aspects of protein backbones [6–10], of protein surfaces [7,11–13] and some other fractal features of proteins structures [14] or dynamics [8,10,15]. In the previous studies concerning fractal properties of protein backbone for a large number of protein structures, the fractal length of the backbone has been calculated as a sum of consecutive segments containing an increasing number of residues (m, varying from 1 to n = number of residues in the protein). Almost all the plots of length versus amino acid interval in log-log scales were bilinear with a slope 0.38 for m < 10 and 0.65 for m > 10, and the fractal dimensions were 1.62 for m < 10 respective 2.85 for m > 10 [6–10]. It is considered that the first linear region is related to the local folding and the second linear region is related to the global folding of the protein. In cases of proteins presenting quaternary structures, the plots usually show three linear regions and the third region is related to the interactions between the units of tertiary structure [10]. In this work we apply the same algorithm in order to reveal the fractal aspects of calcium binding protein backbones and to correlate these aspects to the protein structure and function.
2. Method The fractal dimension of the protein backbone is determined using the algorithm presented by Dewey [7] that has been used in the previous quoted studies. This algorithm is based on the following steps: (i) it calculates the length of the protein backbone, Lm, for different sequence intervals, m j qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X ðxiþm xi Þ2 ðy iþm y i Þ2 þ ðziþm zi Þ2 Lm ¼
ð1Þ
iþm
where xi, yi, zi refer to the spatial coordinates of the ith carbon alpha atom, j is the integer part of the ratio N/m with N the total number of amino acids in sequence: (ii) it plots the length versus this interval using a double-logarithmical representation, ln(Lm) = f(ln(m)); (iii) the slope of the linear regions of this plot gives the fractal dimension according to the equation: 1 ð2Þ s¼ 1 D where s denotes the slope of linear region and D is the fractal dimension [7]. The length of the backbone is taken as the distance between the carbon-alpha (Ca) atoms of the protein. In order to calculate it, we used the Cartesian coordinates of these atoms obtained from the Protein Data Bank [16]. Codes entry for the proteins considered in this study and some of their structural properties are given in Table 1.
3. Results The fractal diagrams obtained with the described algorithm present different aspects for extended and compact structures. In the extended structures, the fractal diagram usually presents three regions (Fig. 1). The first region (marked with 1 in Fig. 1) is linear and includes the maximal interval of the amino acids which can be defined within the N-terminal domain. The second and the third regions (marked with 2 and 3, respectively in Fig. 1) correspond to larger intervals of the amino acids made possible by the inclusion of central linker and C-terminal residues respectively. The third region is linear with a high slope and a small fractal dimension.
962
A. Isvoran et al. / Chaos, Solitons and Fractals 35 (2008) 960–966
Table 1 Proteins considered in this study and some structural properties Nr. crt. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Protein
Code entry
Spatially structure type
N-terminal domain
Flexible linker limits
C-terminal domain
Pig calpain domain VI Neurocalcin Human calcium and integrin binding protein (CIB) Calerythrin Amphioux sarcoplasmic calcium binding protein Sandworm sarcoplasmic calcium binding protein Guanylate cyclase activating proteins Entamobea Histolica calcium binding protein (EhCABP) Bovine calmodulin Worm calmodulin Potato calmodulin African frog calmodulin Xenopus laevis calmoduline Human calmodulin Paramecium calmodulin Yeast calmodulin Rat calmodulin Drosophila calmodulin Chicken troponin Turkey troponin
lalv Ibjf Idgu
Compact Compact Compact
1–81 1–90 1–88
82–99 91–100 89–99
100–172 101–193 100–183
Inya 2sas
Compact Compact
1–85 1–90
86–97 91–101
98–176 102–185
2scp
Compact
1–82
83–93 .
94–174
Ijba Ijfk
Compact Semi-compact
1–88 1–61
89–113 62–71
114–204 72–134
Iprw looj Irfj Icfd Idmo Icll Iclm llkj 3cln 4cln 4tnc 5tnc
Compact Extended Extended Extended Extended Extended Extended Extended Extended Extended Extended Extended
1–75 1–73 1–73 1–74 1–74 1–73 1–74 1–73 1–73 1–73 1–66 1–65
76–85 74–87 74–87 75–86 75–86 74–87 75–86 74–87 74–87 74–87 67–95 66–97
86–149 88–149 88–149 87–148 87–148 88–148 87–148 88–146 88–148 88–148 96–162 98–162
Fig. 1. Fractal diagram of rat calmodulin (code entry 3CLN).
We have to mention here that for intervals larger than a half of the total number of residues in the sequence, the accuracy of determination is lower because when we determinate the length of the backbone, there is a single term in the sum given in Eq. (1). Taking into account this observation we also mention that the second and the third regions in the Fig. 1 reflect that the distance from the first residue of the protein and residues within the linker and the beginning of the C-terminal domain increases monotonically. It reflects the property of the residues in this protein class to be farther away from the N-terminal domain. It is in good agreement with structural data for calmodulins, which are known to present a long and almost straight helix between the two domains. For the rest of C-terminal domain residues (from 110 to 148), the distance relative to first residue decreases reflecting their orientation toward the N-terminal domain.
A. Isvoran et al. / Chaos, Solitons and Fractals 35 (2008) 960–966
963
Fig. 2. Fractal diagram of the sandworm sarcoplasmic calcium binding protein (code entry 2SCP).
In the compact structure case, the fractal diagram may be generally divided into two regions (Fig. 2). The first region (marked with 1 in Fig. 2) corresponds to the maximum interval of about 40 amino acids belonging to the N-terminal domain. It is linear but shorter than that encountered in the extended structures case. The second region (marked with 2 in Fig. 2) corresponds to the intervals of amino acids which allow to take into account the linker and the C-terminal residues. This region is not-linear and it presents many peaks which reflect an irregular evolution of their distance to the N-end residue, with no monotonic increase as observed in case of extended structures. As it was expected, the distance between the last and first residues in sequence is lower for the compact structures than for extended ones. For sandworm sarcoplasmic calcium binding protein the last value of log(L) is 3.13 and it cor˚ between the first and the last residue (Fig. 2). For the rat calmodulin, the last value of responds to a distance of 22.8 A ˚. log(L) is 3.89 and it corresponds to a distance of 48.9 A We can also note two interesting results on the proteins studied here. Even though, the bovine calmodulin (1prw) and the EhCABP (ljfk) are considered to have compact structures, their fractal diagrams (Fig. 3) resemble very well those corresponding to the extended structures (Fig. 1). The fractal characteristics of the investigated proteins’ backbones are presented in Table 2. It may be noted from this table that the fractal dimension for the first linear region is almost the same for all the investigated proteins, but the length of this region is higher for the extended proteins. The mean value of the slope of this line is 0.389 and the mean value of the associated fractal dimension is 1.66. These values are very close to other data presented in literature [6–10].
Fig. 3. Fractal diagram for bovine calmodulin and EhCABP.
964
A. Isvoran et al. / Chaos, Solitons and Fractals 35 (2008) 960–966
Table 2 Fractal characteristics of the proteins studied in this work Code entry
Limits of the first linear region in the diagram
Slope of the first linear region
Fractal dimension associated to the first linear region
Limits of the ‘‘well’’ in the diagram
Limits of the second linear region in the diagram
Slope of the second linear region
Fractal dimension associated to the second linear region
1alv 1bjf 1dgu 1nya 2sas 2scp 1jba 1jfk 1prw 1ooj 1rfj 1cfd 1dmo 1cll 1clm 1lkj 3cln 4cln 4tnc 5tnc
1–32 1–46 1–38 1–40 1–42 1–52 1–46 1–39 1–68 1–60 1–65 1–68 1–67 1–69 1–63 1–62 1–62 1–65 1–68 1–65
0.367 0.388 0.337 0.430 0.408 0.440 0.389 0.408 0.425 0.390 0.370 0.384 0.399 0.410 0.417 0.373 0.460 0.412 0.405 0.408
1.58 1.63 1.55 1.75 1.69 1.78 1.64 1.69 1.74 1.64 1.58 1.62 1.66 1.69 1.71 1.59 1.85 1.72 1.68 1.69
82–109 78–109 86–106 82–109 78–108 82–106 74–110 62–106 68–104 66–106 68–101 69–108 71–106 61–105 83–106 72–108 66–104 71–108 69–101 69–110
– – – – – – – 110–133 115–147 113–141 113–144 110–146 113–146 111–140 113–144 113–143 113–141 113–146 115–159 113–159
– – – – – – – 5.37 3.814 3.104 2.672 3.109 3.458 3.478 4.404 2.794 4.03 4.201 2.350 2.700
– – – – – – – 0.153 0.207 0.243 0.272 0.243 0.223 0.222 0.185 0.263 0.198 0.192 0.285 0.270
4. Discussions and conclusions In the previous studies published so far, the fractal diagrams of the proteins generally present two linear regions: one for m smaller than 10 that reflects the local folding, and the other one for m larger than 10 reflecting the global folding [6–10]. In the case of EF-hand proteins studied in this work, the fractal diagram of the extended structures presents three regions of which only two are linear. The first linear region has the slope of about 0.38 similar to that observed in other studies, but in this case it extends towards higher m values including all the amino acids from the N-terminal domain. This suggests that the N-terminal sequence constitutes a compact structural unit. For the compact structures, the m interval corresponds to a shorter sequence (about 40 amino acids). In this case we may speculate that the ‘‘local folding’’ excludes those amino acids of N-terminal domain which have strong interactions with the C-terminal domain. The second region (Fig. 2), which present many peaks, is considered to reflect that these amino acids are implicated in strong interactions between the two domains. Such region is not present in the fractal diagrams corresponding to the extended structures, which show no contacts between the domains. We support our observations by the fact that proteins are known to fold cooperatively [4]. Cooperatively folding means that the behavior of the parts depends on each other and the overall behavior is a result of the properties of entire system and not of the sum of the properties of its components. This cooperative nature arises from the fact that the residues become closer during folding and interact with each other. For the extended structures, there are interactions only between the residues belonging to the same domain but in case of compact structures, there are also interactions between residues belonging to both domains and situated at their interface. Analysis of the residues situated at the interface between the two domains shows that they are mainly hydrophobic, suggesting that their clustering inside the protein is the main driving force in protein folding. Interesting results were obtained for EhCABP and bovine calmodulin. The tertiary structure of the EhCABP was obtained using NMR experiments and the structural file considered in this study (ljfk) is the minimum energy representative structure. It shows a compact spatial organization of the EhCABP [17]. The coordinates in the ljfk file were obtained by averaging the coordinates of 20 structural models comprised in the ljfj file. The first model in this file (considered to be the best of the 20 NMR structures) clearly suggests a fractal profile characteristic for extended structures. With regard to the bovine calmodulin, whose spatial structure was determined using the X-ray crystallography (lprw), experimental data also suggests a compact organization of the protein [5]. In spite of that, the fractal diagrams obtained within this study for these two proteins are similar to those corresponding to the extended
A. Isvoran et al. / Chaos, Solitons and Fractals 35 (2008) 960–966
965
structures. Sequence alignment using CLUSTAL W [18], shows that there is a higher sequence similarity between these two proteins and extended EF-hand proteins relative to sequences of compact proteins. Sequence identity of bovine calmodulin is higher than 60% with sequences of extended structures while it is only 20% with sequences of compact structures. In case of EhCABP, there is about 28% sequence identity with sequences of extended structures and only 15% with those for compact ones. Usually, proteins with high sequence identity possess structural and functional similarity. High sequence identity and low structural similarity may occur due to conformational plasticity, mutations and solvent effects [19], Calmodulins are known to present unexpected dissimilarity between structure and sequence, which is explained by a high flexible linker region, mutations in this linker or interactions with drugs [19]. Probably, these two proteins present low intrinsic propensities to form compact structures and they may be shifted easily towards extended forms. These observations underline the hypothesis that there is a strong dependence between the packing of the tertiary structure and the physicochemical properties of the amino acids sequences [3], and also that the solution environment of the protein can play an important role [19]. This last remark is supported by the fact that crystallization conditions for bovine calmodulin (which has a compact tertiary structure), were different from the crystallization conditions for other calmodulins. This bovine calmodulin was crystallized in a solution containing an organic compound (glycerol) [5] while the extended calmodulin structures were obtained by crystallization in solutions with no organic compounds. In addition, the EhCABP structure was determined using NMR method, in an aqueous solution. The results presented in this paper reveal for the first time the fractal aspects of the calcium binding proteins structures. The differences between the fractal diagrams of the extended and compact structures underline that the mechanisms responsible for building the tertiary structure of the extended proteins are not the same as those responsible for building the compact proteins. Two general models have been developed to explain how tertiary structure of protein is encoded in its sequence [19]: (1) the local model, in which fold specificity is determined only by a few critical residues and (2) the global model, in which the fold is obtained by interactions involving the entire sequence. The results presented here offer considerable support from the global model because: (i) folding of each domain of extended structures implies interactions between residues which are close in sequence, but the global shape of these proteins is determined by electrostatic repulsions between the two domains [3]; (ii) compact structures folding implies interactions between residues belonging to different domains, some of them being situated at high distance in the linear sequence of protein. These results do not explain the molecular mechanism by which the calcium binding proteins adopt their tertiary structures but they suggest the possibility to establish some theoretical models derived from the concepts of the fractal geometry in order to describe the protein three-dimensional architecture.
Acknowledgements This paper is a part of project 00864VK, Program Brancusi, Bilateral Collaboration Romania – France.
References [1] Carafoli E. The calcium signaling saga: tap water and protein crystals. Nature 2003;4:326–32. [2] Vijar-Kuman S, Kuman DV. Crystal structure of recombinant bovine neurocalcin. Nat Struct Biol 1999;6:80–8. [3] Uchikoga N, Takahashi SY, Ke R, Sonoyama M, Mitaku S. Electric charge balance mechanism of extended soluble proteins. Protein Sci 2005;14:74–80. [4] Kesnin O, Ma B, Rogale K, Gunasekaran K, Nussinov R. Protein-protein interactions: organization, cooperativity and mapping in a bottom-up: System Biology Approach. Phys Biol 2005;2:S24–35. [5] Fallon JL, Quiocho FA. A closed compact structure of native Ca(2+)-calmodulin. Structure 2003;11:1303–7. [6] Wang CX, Huang FH. Fractal study of tertiary structure of proteins. Phys Rev A 1990;41:7043–8. [7] Dewey GT. Fractals in molecular biophysics. New York: Oxford University Press; 1997. p. 25–37. [8] Daniel M, Baskar Sand, Latha MM. Fractal dimension and tertiary structure of proteins. Phys Script 1999;60:270–6. [9] Isvoran A, Licz A, Unipan L, Morariu VV. Determination of the fractal dimension of the lysozyme backbone for three different organisms. Chaos, Solitons & Fractals 2001;12:757–60. [10] Isvoran A. Describing some properties of adenylat kinase using fractal concepts. Chaos, Solitons & Fractals 2004;19: 141–5. [11] Lewis M, Rees DC. Fractal surfaces of proteins. Science 1985;230:1163–5.
966
A. Isvoran et al. / Chaos, Solitons and Fractals 35 (2008) 960–966
[12] Krasnogorskaya N, Legushs EF, Tsvileneva NJ. Fractal structure theory application in crystallography researches, In: Proceedings of the 2nd international workshop on computer science and information technologies CSIT 2000, Ufa, Russia, vol. 3, 2000, p. 57–9. [13] Shenzhen YT, Ioerger TR. Extracting fractal features for analyzing protein structure. In: 16th International conference on pattern recognition (ICPR’02), vol. 2, 2002. p. 20482. [14] Tamburro AM, De Stradis A, D’Alessio L. Fractal aspects of elastin supramolecular organization. J Biomol Struct Dyn 1995;12:1161–72. [15] Morariu VV, Isvoran A, Zainea O. A nonlinear approach to the structure–mobility relationship in protein main chains. Chaos, Solitons & Fractals, in press, doi:10.1016/j.chaos.2005.12.023. [16] Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, et al. The Protein Data Bank. Nucl Acids Res 2000;28:235–42. [17] Atreya HS, Sahu SC, Bhattacharya A, Chary KV, Govil G. NMR derived solution structure of an EF-hand calcium-binding protein from Entamoeba Histolytica. Biochemistry 2001;40:14392–403. [18] Higgins D, Thompson J, Gibson T, Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucl Acids Res 1994;22:4673–80. [19] Gan HK, Perlow AR, Roy S, Ko Y, Wu M, Huang J, et al. Analysis of protein sequence/structure similarity relationships. Biophys J 2002;83:2781–91.