Biochemical and Biophysical Research Communications 380 (2009) 581–585
Contents lists available at ScienceDirect
Biochemical and Biophysical Research Communications journal homepage: www.elsevier.com/locate/ybbrc
Stability and solubility of proteins from extremophiles Richard B. Greaves, Jim Warwicker * Faculty of Life Sciences, Michael Smith Building, University of Manchester, Oxford Road, Manchester M13 9PT, UK
a r t i c l e
i n f o
Article history: Received 6 January 2009 Available online 29 January 2009
Keywords: Extremophiles Thermostability Solubility Protein structure Charge interactions Sugar binding
a b s t r a c t Charges are important for hyperthermophile protein structure and function. However, the number of charges and their predicted contributions to folded state stability are not correlated, implying that more charge does not imply greater stability. The charge properties that distinguish hyperthermophile proteins also differentiate psychrophile proteins from mesophile proteins, but in the opposite direction and to a smaller extent. We conclude that charge number relates to solubility, whereas protein stability is determined by charge location. Most other structural properties are poorly separated over the ambient temperature range, apart from the burial of certain amino acids. Of particular interest are large non-polar sidechains that tend to increased exposure in proteins evolved to function at higher temperatures. Looking at tryptophan in more detail, this increase is often located close to the termini of secondary structure elements, and is discussed in terms of a novel potential role in protein thermostabilisation. Ó 2009 Elsevier Inc. All rights reserved.
Among the habitats colonised by specially adapted organisms are environments of extreme salinity and also of low temperature. Psychrophiles are adapted to life at low temperatures, typically defined as being below 15 °C [1]. Such organisms are generally found in the Arctic, the Antarctic and in the ocean depths as well as in alpine and glacial conditions [2]. Halophiles function in conditions of extreme salinity (salt concentrations up to 5 M) such as those found in salterns and hypersaline lakes [3,4]. Moderate halophiles grow under salt concentrations of between 2% and 20% salt (0.3– 3.4 M) whereas extreme halophiles grow at greater than 15% (2.6 M) salt [5]. It has been argued that proteins from psychrophiles exhibit heightened catalytic efficiency and also greater thermolability at room temperature than proteins from mesophiles and thermophiles [1,6]. Study of psychrophile proteins should be aligned with mesophile and thermophile proteins to extend the ambient temperature range. In previous work [7], we found that whereas ionisable group (charge) differences were extensive between proteins from hyperthermophiles and mesophiles, variation was not so clear for features based on atomic packing. Several groups have investigated the nature and extent of stabilising interactions within proteins from halophiles and psychrophiles. Most simple is to directly compare sequences or structures of proteins from halophilic (or psychrophilic) organisms with those of mesophile-derived homologues [6,8]. However, such studies are typically based on rather limited numbers of structures. Other work removes the homologues pair restriction, employing computational * Corresponding author. Fax: +44 (0) 161 275 5082. E-mail address:
[email protected] (J. Warwicker). 0006-291X/$ - see front matter Ó 2009 Elsevier Inc. All rights reserved. doi:10.1016/j.bbrc.2009.01.145
methods to compare a larger number of sequences and structures [9,10], or to analyse entire genomes [4,5,11]. There has also been protein engineering aimed at investigating thermostability in psychrophile proteins [12]. Factors discriminating psychrophile proteins are, as expected, the reverse of those thought to account for the stability of hyperthermophile proteins [8–10]. The current work looks at discrimination between datasets of 20 protein structures from psychrophiles, 22 protein structures from halophiles, an expanded set (742) of proteins from non-extremophiles (mesophiles), 143 proteins from hyperthermophiles, and 147 proteins from moderate thermophiles. The use of structures and energy calculations allows us to investigate mechanisms of thermo-adaptation, adding to the observation of amino acid preferences [13,14]. Charge interactions and the substitution of charged for polar amino acids [15] are found to be key features, but it is evident that their number and predicted contribution to stability do not correlate. We investigate another intriguing finding, the relatively high exposure of some non-polar sidechains in proteins from organisms adapted to higher temperature. For tryptophan, this increased non-polar surface area is more evident close to the termini of secondary structure elements, consistent with the hypothesis that such effects could play a role in stability to unfolding. Materials and methods Datasets of extremophile and mesophile protein structures. We previously [7] obtained a list, culled at 25% sequence identity, of all monomeric, mesophile proteins in the Protein Data Bank (PDB) [16]. From this list, a ‘290 set’ was made through pairing
582
R.B. Greaves, J. Warwicker / Biochemical and Biophysical Research Communications 380 (2009) 581–585
with thermophile proteins [7]. The remainder of the monomer mesophile set (742 proteins) is now used to expand the mesophile data. Feature calculations gave very similar results for the 290 and 742 protein mesophile sets (e.g. Fig. 1). The original 290 set of mesophile proteins was used to establish a new set of 69 pairings with hyperthermophile protein homologues with the conditions of BLAST [17] E-value <10 2 and chain length difference of 630 amino acids. Homologue pair sets were not made from the much smaller numbers of halophile and psychrophile proteins. A total of 22 halophile derived protein chains (excluding transmembrane proteins) were returned from a cull at 25% sequence identity [18] of a list of monomeric proteins from the PDB. A similar procedure returned a list of 20 protein chains from psychrophilic organisms. None of the psychrophiles in our dataset live at temperatures more than 20 °C away from a mesophile classification. Calculated properties. The methodology follows previous work [7]. GminN is the predicted contribution of ionisable groups to folded state stability, normalised to a per amino acid value, and is estimated with Debye-Hückel electrostatics (uniform relative dielectric of 78.4 and ionic strength of 0.15 M). The percentage of ionisable groups in proteins includes those most commonly ionised at neutral pH (Asp, Glu, His, Lys, Arg, N-t, C-t). The number of contacts per atom was calculated using centre to centre contact distance of 6 Å, for non-hydrogen atoms. Solvent accessible surface area was calculated with a solvent probe radius of 1.4 Å. Statistical significance of differences between distributions of properties was calculated at the 5% level with t-tests.
A
B
Results Datasets Our analysis gives the following sets and numbers of proteins, which have been culled at 25% sequence identity within each set: original mesophile (290); remaining mesophile (742); moderate thermophile (147); hyperthermophile (143); psychrophile (20); halophile (22). The psychrophile, halophile and larger mesophile (742) protein sets are new to this work. The two mesophile protein sets do not overlap, so that consistency of results can be judged from variation between them. Charged group features Fig. 1A shows the predicted contribution of ionisable group interactions to folded state stability, GminN, for the various datasets. As in other features studied, the smaller and larger mesophile protein sets give essentially the same results. The large separation of hyperthermophile proteins is recapitulated from previous work [7]. Halophile proteins are not significantly different from mesophile proteins (see also Table 1) and psychrophile proteins appear to have the smallest magnitude of GminN, although this is not significant at the 5% t-test level (Table 1). In Fig. 1B, comparing the proportions of ionisable groups, both the halophile and psychrophile protein sets now separate significantly (Table 1) from mesophile proteins, in the opposite sense to thermophile proteins. Of particular interest (Fig. 1C) is confirmation that changes in ionisable group proportion are not correlated with the predicted contribution of these groups to folded state stability. Fig. 2A and Table 1 demonstrate the significant separation of the various datasets using the ratio of accessible surface areas for charged and polar amino acids (CvP, [19]), including the psychrophile and halophile proteins. This ratio correlates with the proportion of ionisable groups (not shown). It should be noted that, whilst CvP is an excellent discriminator of the datasets and leads to discussion of the role of charged groups [20], our analysis of both the ionisable group proportion and predicted contribution to stability suggests that charges have more than one role. Solubility (charge number) as well as folded state stability (charge location) should be considered. Packing features In Fig. 2B, the average contacts per atoms show relatively small variation. The only significant separation at the 5% level is halophile from hyperthermophile proteins, with fewer contacts in the hyperthermophile set. These data support the view that packing differences in protein structures, across the ambient temperature range are not clear-cut [21].
C
Fig. 1. Plots of cumulative distributions over proteins for charged group properties in mesophile and extremophile proteins. (A) Predicted ionisable group contribution to folded state stability per amino acid, GminN. (B) Proportion of ionisable groups. (C) Scatter plot of GminN versus proportion of ionisable groups.
Exposed non-polar surface area of amino acids Composition and amino acid substitutions have been wellstudied (recent examples include [13,14]). Preferences for our structural datasets (not shown) are consistent with the previous work. Of note are negative correlations between organism growth temperature (OGT) and content for Asn, Gln, Ser and Thr, consistent with the CvP indicator. Although we do not see a large Trp content rise with OGT, Trp shows interesting behaviour when non-polar exposed area is analysed. The average non-polar surface area of some large sidechains (notably Trp) is bigger in hyperthermophile proteins (Fig. 3). This is counter-intuitive, since one might expect more non-polar burial to increase stability at higher OGT. In contrast to the larger non-polar sidechains, Ala
583
R.B. Greaves, J. Warwicker / Biochemical and Biophysical Research Communications 380 (2009) 581–585 Table 1 t-Test statistics for pairwise comparisons. Fig.
Property
Halo-M290
Halo-M742
Halo-hT
Halo-modT
Psych-M290
Psych-M742
Psych-hT
Psych-modT
1A 1B 2A 2B
GminN (kJ/mole per residue) % ionisables Polar ASA/ Charged ASA Contacts per atom
0.633 3.477 2.725 1.709
1.403 2.975 2.363 0.453
3.907 9.437 5.051 2.947
1.116 4.425 3.004 0.536
1.010 2.916 2.487 0.587
1.878 2.538 2.192 0.713
4.566 7.349 4.347 1.492
1.503 3.708 2.734 0.768
Student’s t-tests are employed for the pairings of each property/dataset. These are applied to the underlying populations of each property, and given in bold where they are different at the 95% confidence level. Protein datasets are mesophile (M290, M742), moderate thermophile (modT), hyperthermophile (hT), halophile (halo), and psychrophile (psych).
A
Analysis of tryptophan non-polar surface area by OGT and location
B
Fig. 2. Cumulative distributions for packing, and the charged versus polar ratio, in mesophile and extremophile proteins. (A) Ratio of polar group ASA to charged group ASA. (B) Contacts per (non-hydrogen) atom.
and Pro show more burial for the proteins from higher growth temperature organisms. We find (Fig. 3) that halophile and psychrophile proteins are close to the mesophile protein datasets in their properties, emphasising the novelty of the result for hyperthermophile proteins. The observation that non-polar sidechains, with no entropic penalty for burial (Ala, Pro), are more buried in hyperthermophile proteins is consistent with a stabilising contribution i.e. non-polar burial without rotameric entropy cost. On the other hand, the specific exposure of large sidechains such as Trp, which in principle can bury a lot of non-polar surface area with relatively small rotameric entropy cost, is a surprise and is now examined more closely.
Fig. 4 shows the distribution of non-polar surface area for Trp residues in a-helix and b-strand elements of the hyperthermophile/mesophile protein datasets. In general there is little difference between the 290 and 742 mesophile protein sets. We next looked in more detail at where Trp residues with relatively high (>60 Å2) non-polar accessible surface area (ASA) were located relative to the ends of secondary structure elements, reasoning that Trp sidechain involvement in stabilising processes may be more associated with the termini of secondary structure elements. This analysis was made focussing on differences between just the homologue pairs of hyperthermophile and mesophile proteins (a total of 69 pairs). Proximity to the end of a secondary structure element is defined as within four amino acids for helix and within 1 amino acid for strand, where the termini are assigned from PDBsum [22]. As expected the majority of Trp residues are in the lower ASA category (i.e. mostly buried). For more exposed residues in the 69 homologue pairs, 18.3% of Trp in the mesophile proteins and 24.8% in hyperthermophile proteins are above the 60 Å2 non-polar ASA threshold. The differences seen therefore apply to fractions of the Trp population, as indicated in Fig. 4. A substantial part of the difference translates to an increased number of Trp residues in ahelices, and near a helical terminus (4.4% of overall Trp for the mesophile protein homologues and 7.9% for thermophile proteins). Discussion Charges and amino composition in proteins from extremophiles With regard to temperature variation, it is apparent that the number of charged groups in our datasets of structures is significantly lower for psychrophile proteins than for mesophile, moderate thermophile or hyperthermophile proteins (Fig. 1B). Lower OGT correlates with lower ionisable group proportion. This result fits with the well-known variation of charged vs polar group ratio [15,19]. It is generally assumed that higher numbers of charged
Fig. 3. Non-polar surface area per residue for amino acids in mesophile and extremophile proteins.
584
R.B. Greaves, J. Warwicker / Biochemical and Biophysical Research Communications 380 (2009) 581–585
A
B
Fig. 4. Cumulative plots of non-polar surface exposure for tryptophan residues in mesophile and hyperthermophile proteins. (A) a-helix. (B) b-sheet.
groups implies larger contributions to folded state stability, but our calculations indicate that this is not the case. Higher OGT does, on average, give both a higher proportion of ionisable groups and bigger GminN, but these features are not correlated (Fig. 1C). We infer that the increase in numbers of ionisable groups may be related to enhancing solubility at temperatures where aggregation tendency due to non-specific interaction of hydrophobic patches would be stronger. The halophile and psychrophile protein sets have lower numbers of charged groups and less predicted GminN contribution to folding stability. These observations fit with proteins for which electrostatic interactions are less important due to salt screening. Relative to other datasets, including psychrophile proteins, halophile proteins are more acidic, a well-known factor [3]. In structural terms we see no evidence that the specific arrangement of acidic groups increases folded state stability compared with ionisable groups in mesophile proteins (Fig. 1A), and the role of an acidic surface could relate to hydration properties in a dehydrating environment [23]. Most of the larger changes in amino acid composition seen in our protein structure datasets are related to charged residues and the CvP feature. Packing A basic measure of packing, the number of average contacts per atom (Fig. 2B) shows more similarity between the datasets in fractional terms than do the charge properties of Fig. 1. In addition, the only significant difference between halophile or psychrophile sets and other sets, at the 5% level, is between halophile and hyperthermophile proteins, and mesophile proteins have a similar distribution of contacts per atom to psychrophile proteins. Nevertheless, it is possible that relatively small contact changes could lead to substantial folding energy consequences, and it is interesting that the most contacts (presumably favourable energetically) are apparent for proteins of the halophile set that lose electrostatic interactions due to both salt screening and a relatively small magnitude of (predicted) GminN (Fig. 1A). Equally, hyperthermophile proteins exhibit looser packing, perhaps suggesting that the drive for folding stability is counteracted by adaptation to cope with larger thermal fluctuations.
We observe increased non-polar ASA for aromatic amino acids in hyperthermophile proteins (Fig. 3), and in a specific study of tryptophan this increase is biased towards the termini of secondary structure elements, particular a-helix. Carbohydrate binding sites are known to have a preference for aromatic residues and tryptophan in particular [24]. Furthermore, analysis of the environment of N-glycosylation sites in protein structures reveals preferences for glycan–aromatic interactions and for sites that are adjacent to secondary structure termini, presumably for fold stabilisation [25]. This comparison is interesting, since many extremophiles use sugars as compatible solutes to ease osmotic stress, and there is evidence that such solutes also act to stabilise protein structure [26]. Many of the hyperthermophile proteins in the subset of 69 homologue pairs are from Thermotoga maritima, an organism within which organic solutes are known to accumulate [27]. An advantage of using the homologue pairs subset is that any functional associations with non-polar ASA, (e.g. enzymes that act on sugar molecules), should approximately cancel between the mesophile and hyperthermophile proteins, so that differences are more likely to relate to structural constraints. An amylase from Pyrococcus woesei, for example, has many tryptophan residues, at varying exposures. The structure of this amylase shows at least two nonactive site sugar binding locations involving tryptophan, but with unknown roles in thermostabilisation [28]. We suggest that non-polar surface area could be used by some proteins in some hyperthermophilic organisms to bind (non-covalently) organic solutes as capping molecules that aid thermostabilisation, particularly close to the termini of secondary structural elements. This hypothesis can be tested with mutagenesis and could lead to a new strategy for engineering thermostability, that has parallels to the introduction of metal ion binding sites. Acknowledgments We thank the UK BBSRC for funding. References [1] C. Gerday, M. Aittaleb, J.L. Arpigny, E. Baise, J.-P. Chessa, G. Garsoux, I. Petrescu, G. Feller, Psychrophilic enzymes: a thermodynamic challenge, Biochem. Biophys. Acta 1342 (1997) 119–131. [2] D.N. Thomas, G.S. Dieckman, Antarctic sea ice – a habitat for extremophiles, Science 295 (2002) 641–644. [3] J.K. Lanyi, Salt-dependent properties of proteins from extremely halophilic bacteria, Bacteriol. Rev. 38 (1974) 272–290. [4] S. DasSarma, B.R. Berquist, J.A. Coker, P. DasSarma, J.A. Müller, Post-genomics of the model haloarchaeon Halobacterium sp. NRC-1, Saline Syst. 2 (2006) 3. [5] Y.A. Goo, J. Roach, G. Glusman, N.S. Baliga, K. Deutsch, M. Pan, S. Kennedy, S. DasSarma, W.V. Ng, L. Hood, Low-pass sequencing for microbial comparative genomics, BMC Genomics 5 (2004) 3. [6] E. Toyota, K.S. Ng, S. Kuninaga, H. Sekizaki, K. Itoh, K. Tanizawa, M.N.G. James, Crystal structure and nucleotide sequence of an anionic trypsin from chum salmon (Oncorhynchus keta) in comparison with atlantic salmon (Salmo salar) and bovine trypsin, J. Mol. Biol. 324 (2002) 391–397. [7] R.B. Greaves, J. Warwicker, Mechanisms for stabilization and the maintenance of solubility in proteins from thermophiles, BMC Struct. Biol. 7 (2007) 18. [8] I. Korndörfer, B. Steipe, R. Huber, A. Tomschy, R. Jaenicke, The crystal structure of holo-glyceraldehyde-3-phosphate dehydrogenase from the hyperthermophilic bacterium Thermotoga maritima at 2.5 Å resolution, J. Mol. Biol. 246 (1995) 511–521. [9] G. Gianese, P. Argos, S. Pascarella, Structural adaptation of enzymes to low temperatures, Protein Eng. 14 (2001) 141–148. [10] G. Gianese, F. Bossa, S. Pascarella, Comparative structural analysis of pychrophilic and meso- and thermophilic enzymes, Proteins Struct. Funct. Genet. 47 (2002) 236–249. [11] S.P. Kennedy, W.V. Ng, S.L. Salzberg, L. Hood, S. DasSarma, Understanding the adaptation of halobacterium species NRC-1 to its extreme environment through computational analysis of its genome sequence, Genome Res. 11 (2001) 1641–1650. [12] E. Narinx, E. Baise, C. Gerday, Subtilisin from psychrophilic Antarctic bacteria: characterization and site-directed mutagenesis of residues possibly involved in the adaptation to cold, Protein Eng. 10 (1997) 1271–1279. [13] K.B. Zeldovich, I.N. Berezovsky, E.I. Shakhnovich, Protein and DNA sequence determinants of thermophilic adaptation, PLoS Comput. Biol. 3 (2007) e5.
R.B. Greaves, J. Warwicker / Biochemical and Biophysical Research Communications 380 (2009) 581–585 [14] G. Saelensminde, O. Halskau Jr., R. Helland, N.-P. Willassen, I. Jonassen, Structure-dependent relationships between growth temperature of prokaryotes and the amino acid frequency in their proteins, Extremophiles 11 (2007) 585–596. [15] K. Suhre, J.-M. Claverie, Genomic correlates of hyperthermostability, an update, J. Biol. Chem. 278 (2003) 17198–17202. [16] H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shinyalov, P.E. Bourne, The Protein Data Bank, Nucleic Acids Res. 28 (2000) 235–242. [17] S.F. Altschul, T.L. Madden, A.A. Schaffer, J.H. Zhang, Z. Zhang, W. Miller, D.J. Lipman, Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Res. 25 (1997) 3389–3402. [18] G. Wang, R.L. Dunbrack Jr., PISCES: a protein sequence culling server, Bioinformatics 19 (2003) 1589–1591. [19] C. Cambillau, J.-M. Claverie, Structural and genomic correlates of hyperthermostability, J. Biol. Chem. 275 (2000) 32383–32386. [20] E. Alsop, M. Silver, D.R. Livesay, Optimized electrostatic surfaces parallel increased thermostability: a structural bioinformatics analysis, Protein Eng. 16 (2003) 871–874. [21] A. Karshikoff, R. Ladenstein, Proteins from thermophilic and mesophilic organisms do not differ in packing, Protein Eng. 11 (1998) 867–872.
585
[22] R.A. Laskowski, V.V. Chistyakov, J.M. Thornton, PDBsummore: new summaries and analyses of the known 3D structures of proteins and nucleic acids, Nucleic Acids Res. 33 (2005) D266–D268. [23] F. Frolow, M. Harel, J.L. Sussman, M. Mevarech, M. Shoham, Insights into protein adaptation to a saturated salt environment from the crystal structure of a halophilic 2Fe-2S ferredoxin, Nat. Struct. Biol. 3 (1996) 452–458. [24] C. Taroni, S. Jones, J.M. Thornton, Analysis and prediction of carbohydrate binding sites, Protein Eng. 13 (2000) 89–98. [25] A.-J. Petrescu, A.-L. Milac, S.M. Petrescu, R.A. Dwek, M.R. Wormald, Statistical analysis of the protein environment of N-glycosylation sites: implications for occupancy, structure folding, Glycobiology 14 (2004) 103–114. [26] N. Borges, A. Ramos, N.D.H. Raven, R.J. Sharp, H. Santos, Comparative study of the thermostabilizing properties of mannosylglycerate and other compatible solutes on model enzymes, Extremophiles 6 (2002) 209–216. [27] L.O. Martins, L.S. Carreto, M.S. Da Costa, H. Santos, New compatible solutes related to di-myo-inositol-phosphate in members of the order Thermotogales, J. Bacteriol. 178 (1996) 5644–5651. [28] A. Linden, O. Mayans, W. Meyer-Klaucke, G. Antranikian, M. Wilmanns, Differential regulation of a hyperthermophilic a-amylase with a novel (Ca, Zn) two-metal by zinc, J. Biol. Chem. 11 (2003) 9875–9884.