Calculation of the standard molal thermodynamic properties of crystalline peptides

Calculation of the standard molal thermodynamic properties of crystalline peptides

Available online at www.sciencedirect.com Geochimica et Cosmochimica Acta 80 (2012) 70–91 www.elsevier.com/locate/gca Calculation of the standard mo...

1MB Sizes 58 Downloads 133 Views

Available online at www.sciencedirect.com

Geochimica et Cosmochimica Acta 80 (2012) 70–91 www.elsevier.com/locate/gca

Calculation of the standard molal thermodynamic properties of crystalline peptides Douglas E. LaRowe a,⇑, Jeffrey M. Dick b,1 a

School of Earth and Atmospheric Sciences, Georgia Institute of Technology, Atlanta, GA 30332, United States b School of Earth and Space Exploration, Arizona State University, Tempe, AZ 85287, United States Received 7 June 2011; accepted in revised form 22 November 2011; available online 7 December 2011

Abstract To augment the relatively sparse set of thermodynamic data available for high molecular weight biopolymers, group additivity algorithms have been developed to estimate the heat capacity power function coefficients and standard molal thermodynamic properties of crystalline peptides in the multitude of biogeochemical environments in which they are found. Group contributions representing the 20 common amino acids plus 5-hydroxylysine and 4-hydroxyproline for each coefficient and property were generated using the thermodynamic properties of crystalline amino acids, polypeptides and other organic compounds. These group contributions were in turn used to compute the thermodynamic properties of naturally occurring proteins that are found in a crystalline state in cells. The coefficients and properties of the model compounds, group contributions and proteins are tabulated. In a demonstration of the uncertainty of the thermodynamic properties of the groups generated in this study, experimentally determined heat capacities and entropies of crystalline homopolypeptides and proteins taken from the literature have been compared to estimates of these quantities. Additionally, standard molal volumes for 24 amino acids have been recalculated in light of inconsistencies in an earlier analysis, and the standard molal thermodynamic properties of aqueous and crystalline methionine at 25 °C and 0.1 MPa have been reassessed. Calculations of this kind can be carried out to thermodynamically describe the biogeochemical interactions throughout the broad range of environmental settings in which they are known to occur. Ó 2011 Elsevier Ltd. All rights reserved.

1. INTRODUCTION 1.1. Overview Quantifying the response of biomolecules to changing environmental conditions such as oxygen concentration, pH, temperature, pressure and water content is crucial to understanding the degradation of organic matter, the sta⇑ Corresponding author. Present address: Department of Earth Sciences, University of Southern California, Zumberge Hall of Science, 3651 Trousdale Pkwy, Los Angeles, CA 90089-0740, United States. E-mail address: [email protected] (D.E. LaRowe). 1 Present address: Western Australia Organic and Isotope Geochemistry Centre, Department of Chemistry, Curtin University of Technology, Perth, WA 6845, Australia.

0016-7037/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.gca.2011.11.041

bility and thus functionality of biomacromolecules in living organisms and the origin and early evolution of organic molecules. Towards this end, the thermodynamic properties of a large number of organic compounds and the equation of state parameters required to calculate these properties as function of temperature and pressure have been reported in the literature (see LaRowe and Van Cappellen, 2011). Despite these efforts, however, there is a relative lack of internally consistent thermodynamic data and equation of state parameters for biopolymers such as nucleic acids and polysaccharides, even though they play critical roles in the biosphere. Notable exceptions include those for aqueous unfolded proteins (Amend and Helgeson, 2000; Dick et al., 2006) and aqueous nicotinamide coenzymes (LaRowe and Helgeson, 2006b). The goal of the present communication is to contribute towards closing this gap by establishing a thermodynamic database for crystalline polypeptides and

Thermodynamic properties of crystalline peptides

the corresponding properties of groups that can be used to estimate the thermodynamic properties of any molecule containing a crystalline peptide. 1.2. Crystalline and amorphous proteins in natural settings Performing essential structural, catalytic and signaling functions, proteins and shorter polypeptides are omnipresent in the biosphere. While carrying out many of these roles, polypeptides can be fully or partially in the aqueous state, exist as amorphous solids or take on a fully crystalline state analogous to minerals (Morris et al., 2009). Unfolded enzymes are probably among the most common fully-aqueous polypeptide molecules. However, upon folding and assuming a complex three-dimensional structure, as enzymes are required to do to properly perform their catalytic functions, some of the interior amino acids that constitute these proteins can be effectively secluded from the solvent structure, i.e., liquid water with a dielectric constant consistent with that of the bulk-phase medium. In this case, the amino acids that are not solvated by water can be considered to be in the crystalline state (Klapper, 1971; Bello, 1978; Chothia, 1984; Murphy et al., 1990; Murphy and Gill, 1991). When serving a structural function, proteins are likely to be in a solid phase that can be viewed as crystalline rather than amorphous given the regular fibril patterns in which they are typically found (Morris et al., 2009; Shoulders and Raines, 2009). For instance, the protein families known as actin and tubulin, though soluble as monomers, can form long polymers to make the microtubules that comprise cytoskeleton structures such as cilia, flagella and centrioles (Wade, 2009), even in bacteria (Lo¨we et al., 2004).Solid proteins are also critical components of extracellular matrices, organic complexes that provide structural support, adhesive sites and substrates for microbes (Wierzbicka-Patynowski and Schwarzbauer, 2003; Hynes, 2009). Furthermore, many Bacteria and Archaea produce cell-surface proteins that have been shown to form a crystalline or semi-crystalline lattice that is resilient to a wide range of ambient temperature, pH and chemical conditions (Akca et al., 2002). On a larger scale, fibril-forming proteins such as collagens are major constituents of mammalian guts, connective tissues like cartilage, tendons, ligaments and intervertebral disks, cornea (Hassell and Birk, 2010) and skin (Di Lullo et al., 2002; Shoulders and Raines, 2009).Crystalline proteins give biomineralized structures such as bones, teeth, shells and spicules their strength and flexibility (Sudo et al., 1997; Weiner and Addadi, 1997; Perry and KeelingTucker, 2000; Margolis et al., 2006; Zhang and Zhang, 2006; Olszta et al., 2007; Killian and Wilt, 2008; Mann et al., 2010) and help control their nucleation, orientation and morphology (Campbell, 1999). Additionally, solid proteins constitute commonly used natural fibers such as wool and silk and comprise large proportions of hair, finger and toe nails, feathers, hooves, horns, quills, beaks, claws, baleen, scales, viral capsids and some exoskeletons (Marshall et al., 1991; Brush, 1996; Alibardi, 2003, 2006). Due to the vast quantities of polypeptides, both aqueous and crystalline, that have been produced over geologic time,

71

some fossil proteins persist for long periods of time in a metastable state. For instance, fossil proteins preserved between biomineral layers can yield evolutionary information since nucleic acids are so rarely preserved, and intracrystalline proteins, analogous to fluid inclusions surviving millions of years, can provide information on fossil stratigraphy (Mitterer, 1993; Robbins et al., 1993; Mayer, 1994a,b; Hedges and Keil, 1995; Knicker and Hatcher, 1997; Mongenot et al., 2001; Riboulleau et al., 2002). The thermodynamic properties reported in the present communication can be used to better understand how chemical variables have influenced fossil protein preservation because the potential for preservation or degradation of polypeptides in sediments is at least partly a function of the amino acid composition (Knicker et al., 2001). The following text summarizes the relevant thermodynamic relations, data and techniques used to generate the properties of crystalline polypeptides. 2. APPROACH AND THERMODYNAMIC RELATIONS 2.1. Group additivity In order to extend the relatively sparse thermodynamic data set available in the literature for crystalline peptides (Hutchens et al., 1969a,b; Domalski, 1972; Finegold and Kumar, 1981; Roles and Wunderlich, 1991; Diaz et al., 1992; Roles and Wunderlich, 1993; Zhang et al., 1996; Di Lorenzo et al., 1999; Drebushchak et al., 2008), the group additivity method for estimating thermodynamic properties of organic compounds (e.g., Benson and Buss, 1958; Benson, 1968; Domalski and Hearing, 1988, 1993; Amend and Helgeson, 1997b; Richard and Helgeson, 1998) has been adopted. In this approach, the contribution of a group of atoms in a molecule to the thermodynamic properties of the entire compound is determined. These groups can in turn be added together to estimate the thermodynamic properties of larger species. For instance, the group contribution of the Gibbs energy of formation for a carbon atom bonded to a hydrogen atom in an aqueous aromatic ring compound,

or [CHaro], can be estimated by

dividing the Gibbs energy of formation of aqueous benzene, an aromatic six-carbon hexagonal ring compound (C6H6), by 6. Schematically, this can be represented by

ð1Þ

For a DGof of benzene(aq) equal to 133,800 J mol1 (Amend and Helgeson, 1997b), the [CHaro] group can be said to contribute1/6 of the Gibbs energy of formation for benzene, or 22,300 J mol1. The thermodynamic properties of this aqueous aromatic carbon–hydrogen group can then be used to estimate the thermodynamic properties of other species

72

D.E. LaRowe, J.M. Dick / Geochimica et Cosmochimica Acta 80 (2012) 70–91

containing this group. This method has a history of being successfully applied to a large number of organic species beginning with the ideal gas state (Benson and Buss, 1958), then extended to condensed (Domalski and Hearing, 1988, 1993) and aqueous states (Cabani et al., 1981) and later to compounds more relevant to geochemistry and biogeochemistry (Amend and Helgeson, 1997c; Helgeson et al., 1998; Richard and Helgeson, 1998). Throughout the text, the abbreviations and chemical formulas of groups are enclosed in brackets, e.g., [AABB] for the amino acid backbone. It should be noted that the groups considered in this study are distinct from the types of groups defined in earlier studies (Benson and Buss, 1958; Domalski and Hearing, 1988, 1993; Richard and Helgeson, 1998), which take into account the effect that neighboring atoms have on the thermodynamic properties of groups. For instance, the contribution of a methyl group to the thermodynamic properties of aqueous xylene, shown below, depends on the ring carbon to which they are attached:

DGoT ;P ¼ DGof  S oT r ;P r ðT  T r Þ þ 

Z

T

Tr

C oP r d ln T þ Z

DH oT ;P ¼ DH of þ  Z  T

Pr

T Tr

C oP r dT  T

P

V o dP

ð2Þ

Pr

T

Tr P

Z

Z

C oP r dT þ

Z

P

V o dP Pr

 o  @V dP ; @T P T

ð3Þ

and S oT ;P ¼ S oT r ;P r þ

Z

T

Tr

C oP r d ln T 

Z

P

Pr

 o  @V dP @T P T

where DGof , DH ofr , and S oT r ;P r , stand for the standard molal Gibbs energy and enthalpy of formation and entropy at the reference temperature and pressure, Tr and Pr, of 298.15 K and 0.1 MPa, C oP r refers to the standard molal isobaric heat capacity at the reference pressure, Vo designates the standard molal  o volume at the temperature and pressure of interest and @V represents the standard molal thermal @T P expansivity at constant pressure. In the absence of lambda phase transitions, values of C oP r as a function of temperature for crystalline organic compounds, including amino acids, can be represented by the Maier–Kelley power function (Helgeson et al., 1998): C oP r ¼ a þ bT þ cT 2

126,630

123,990

126,340

It can be seen from the different values of the Gibbs energies of formation for these three isomers of aqueous xylene (Plyasunova et al., 2004), that though they have the same chemical formula and are comprised of the same bond types, the proximity of the methyl groups on the benzyl rings to one another affects, however slightly, the contribution that each of these methyl groups has to the DGof of xylene. When a relatively robust thermodynamic dataset is available, these nearest neighbor effects can be taken into account. However, due to the scarcity of thermodynamic data for crystalline polypeptides, the group properties developed in this study do not take nearest neighbor effects into account. Although the uncertainty associated with not knowing the impact of one nearest neighbor versus another is difficult to quantify in the absence of a sufficiently large data set, the effect of polypeptide molecular structure on values of heat capacity are discussed further in Section 5. 2.2. Thermodynamic relations The temperature and pressure dependence of the standard molal Gibbs energy and enthalpy of formation (DGof and DH of ) and standard molal entropy (S o ) of a given crystalline, liquid, ideal gas or aqueous organic compound can be expressed in terms of their apparent standard molal counterparts, which are defined by (Helgeson et al., 1978, 1998)

ð4Þ

ð5Þ

where a, b, and c correspond to temperature-independent coefficients of the compound of interest (Maier and Kelley, 1932). Combining Eq. (5) with Eqs. (2)–(4) and integrating the heat capacity terms leads to    T DGoT ;P ¼ DGof  S oT r ;P r ðT  T r Þ þ a T  T r  T ln Tr ! Z P ðc  bT 2r T ÞðT  T r Þ2 V o dP ; ð6Þ þ  2 2T r T Pr   b 1 1  DH oT ;P ¼ DH of þ aðT  T r Þ þ ðT 2  T 2r Þ  c 2 T Tr  Z P  o  Z P @V V o dPT  T dP ð7Þ þ @T P Pr Pr T and     T c 1 1 þ bðT  T r Þ   2 2 Tr 2 T Tr Z P  o   @V dP :  @T P Pr T

S oT ;P ¼ S oT r ;P r þ a ln

ð8Þ

In a first approximation, values of Vo are taken to be equal to those of the standard molal volume at the reference temperature and pressure of 298.15 K and 0.1 MPa, V oT r ;P r , for the crystalline amino acids, peptides, amino acid and protein backbones and amino acid side chains considered in this study due to the observation that the standard molal volumes of minerals and organic crystals do not change appreciably over the range of temperatures and pressures encountered in the earth’s crust(Helgeson et al., 1978, 1998; Richard and Helgeson, 1998). The accuracy of this

Thermodynamic properties of crystalline peptides

approximation is discussed further in the Section 6. The standard state adopted in this study for solid species is one of unit activity of the pure stoichiometric solid at any pressure and temperature. Values of DGof for nearly all groups and species considered in this study were calculated using DGof ¼ DH of  T DS of

ð9Þ

73

3. AMINO ACID BACKBONE AND FUNCTIONAL GROUPS The amino acid backbone, [AABB], refers to the amine, a-carbon and carboxylic acid groups that constitute all amino acids without a functional group (also known as a side chain) attached. That is, an idealized structural formula for [AABB] is

where DS of refers to the standard molal entropy of formation, which is computed using X m  S oelements ð10Þ DS of ¼ S oi  The symbols S oi and S oelements in Eq. (10) refer to the standard molal entropy of the ith species and of the elements that constitute this species in their stable form at 25 °C and 0.1 MPa, respectively, and m represents the number of moles of a given element in one mole of the ith species. Values of S oelements were taken from Cox et al. (1989).

Idealized structures and chemical formulas of the functional groups considered in this study are shown in Fig. 1, along with the names of the corresponding amino acids. The amino acids 5-hydroxylysine and 4-hydroxyproline are not shown in this figure, but the atoms in lysine and

Fig. 1. Idealized structures and chemical formulas of the functional groups considered in this study and their corresponding amino acids.

74

D.E. LaRowe, J.M. Dick / Geochimica et Cosmochimica Acta 80 (2012) 70–91

hydroxyproline are numbered such that the locations of the hydroxyl groups can be easily envisioned. Values of DH of , So, Vo, a and b for the crystalline [AABB] group were calculated using a group additivity scheme written as N½AABB ¼ NAAi  N½Ri 

ð11Þ

where N represents any thermodynamic property or parameter and AAi and [Ri] stand for the ith amino acid and the ith functional group of that amino acid, respectively. Although group additivity contributions for some of the amino acid functional groups have already been determined, e.g., the [CH3] group that constitutes alanine’s side chain, this is not the case for many of the amino acids. In order to use Eq. (11) to determine the properties of [AABB], the properties of the functional groups must first be calculated. As an example, the thermodynamic properties of the group contributions of the leucine side chain, [leu], have not been reported in the literature, but they can be calculated from other groups that are available. The leucine side chain, as shown in Fig. 1, consists of two [CH3], one [CH2] and one [CH], or in the notation used

in this manuscript, [leu] = 2[CH3] + [CH2] + [CH]. To distinguish between groups such as [CH2] that are part of an algorithm used to calculate the properties of a side chain, [Ri], these groups will be designated by [ri]. To be clear, the distinction between [Ri] and [ri] is that the former is composed of the latter: X ½Ri  ¼ ½ri  ð12Þ Values of NAAi used in Eq. (11), except for Vo for all amino acids and a, b, and c for leucine, were taken from Helgeson et al. (1998). Although Helgeson et al. (1998) reports values of Vo for amino acids, these values were incorrectly calculated from crystallographic unit cell parameters reported elsewhere in the literature. Apparently, unit cell volumes seem not to have been correctly converted into molar densities. Consequently, the standard molal volumes of the amino acids used in this study, listed in Table 1, were calculated from density data taken from the sources indicated in this table. The densities used to calculate the volumes reported in Table 1 were calculated by the authors of the references listed from measurements of the crystallographic properties of amino acids. Since such data are not

Table 1 Standard molal volumes and group additivity algorithms or sources of experimental data for selected crystalline amino acids, reference compounds and groups at 25 °C and 0.1 MPa. Compound or [group]

Vo (cm3 mol1)

Source

[AABB] [Caro] [CH] [CH2] [CH3] [NH2] [CHaro] Alanine Arginine Arginine–HCl Asparagine Aspartic acid Cysteine Deoxyribose Glutamic acid Glutamic acid–HCl Glutamine Glycine Histidine Hydroxylysine Hydroxyproline Indole Isoleucine Leucine Lysine Methionine Phenylalanine Proline Ribose Serine Threonine Tyrosine Tryptophan Valine

40.98 12.9 12.7 15.3 16.5 13.3 10.5 63.64 113.0 146.8 79.60 80.91 81.70 84.36 93.71 128.7 94.90 46.63 118.4 133.3 95.34 98.78 109.1 115.1 123.2 110.9 129.8 85.28 94.4 68.38 81.37 128.5 160.6 92.75

See text Richard and Helgeson (1998) Richard and Helgeson (1998) Richard and Helgeson (1998) Richard and Helgeson (1998) Richard (personal communication) Richard and Helgeson (1998) Density data in Simpson and Marsh (1966) =(Arginine–HCl)  (glutamic acid HCl–glutamic acid) Density data in Sridhar et al. (2002) =Glutamine  [CH2] Density data in Derissen et al. (1968) Density data in Harding and Long (1968) LaRowe and Helgeson (2006a) Density data in Hirokawa (1955) Density data in Dawson (1953) Density data in Cochran and Penfold (1952) Density data in Marsh (1958) Density data in Madden et al. (1972) =Lysine + (ribose  deoxyribose) =Proline + (ribose  deoxyribose) Density data in Roychowdhury and Basak (1975) Density data in Torii and Iitaka (1971) Density data in Harding and Howieson (1976) =[AABB] + 4[CH2] + [NH2] =Cysteine + [CH] + [CH3] =[AABB] + [CH2] + [Caro ]+5[CHaro] Density data in Wright and Cole (1949) LaRowe and Helgeson (2006a) Density data in Shoemaker et al. (1953) Density data in Shoemaker et al. (1950) Density data in Mostad et al. (1972) =[AABB] + [CH] + indole Density data in Torii and Iitaka (1970)

Aro – aromatic.

Thermodynamic properties of crystalline peptides 260

220

a 200

b 240

Not included in regression

leucylglycine

180 -1

C (J K mol )

160

p

0

140

p

-1

-1

, J K mol

-1

220

C

o

75

200

180

alanylglycine

120

160 100

leucine 140

80 100

150

200

250

300

350

400

220

240

TEMPERATURE, K

260

280

300

TEMPERATURE (K)

240

c 220

glycylglycylglycine

0

180

p

-1

-1

C (J K mol )

200

160

glycylglycine

140

120 200

220

240

260

280

300

TEMPERATURE (K)

Fig. 2. Standard molal heat capacities (C op ) of crystalline (a) leucine, (b) leucylglycine, alanylglycine, (c) glycylglycine and glycylglycylglycine as a function of temperature at 0.1 MPa. The symbols denote values of C op taken from (a) Hutchens et al. (1963) (b) Huffman (1941) and (c) Hutchens et al. (1969a) (glycylglycine) and Drebushchak et al. (2008) (diglycylglycine). The regression lines represent fits of Eq. (5) to the data.

available for all the amino acids considered in this study, the group additivity algorithms illustrated in Table 1 were used to estimate values of Vo where necessary. The values of a, b, and c for leucine reported in Helgeson et al. (1998) were not adopted in this study for several reasons. Helgeson et al. (1998) regressed all of the heat capacity data for crystalline leucine shown in Fig. 2a with Eq. (5) yielding the following values for the heat capacity power function coefficients: a = 44.261 J K1 mol1, b = 1.1376 J K2 mol1and c = 4198,600 J K mol1. However, it can be seen in this figure that the six highest-temperature C op measurements are departing from the linear trend observed for all of the other measurements in a manner that is consistent with a crystallographic lambda transition. Furthermore, it is worth noting that values of C op as a

function of temperature for all of the other crystalline amino acids (Helgeson et al., 1998) and polypeptides (see below) for which data have been reported in the literature, exhibit a linear trend. As a result, a value of c equal to 0 J K mol1 was assumed for all of these compounds. In light of this, experimental values of C op as a function of temperature for leucine have been regressed without the six highest temperature points shown in Fig. 2a. The line shown in this figure corresponds to a linear fit of this truncated data set, resulting in the following values of the heat capacity power function coefficients for leucine: a = 28.01 J mol1 K1, b = 0.5539 J mol K2 and c = 0 J K mol1. Because values of N½Ri  cannot be calculated for all of the above mentioned properties for all of the amino acid side

76

D.E. LaRowe, J.M. Dick / Geochimica et Cosmochimica Acta 80 (2012) 70–91

Table 2 Summary of group additivity algorithms used to calculate values of heat capacity power function coefficients and standard molal thermodynamic properties at 25 °C and 0.1 MPa for [AABB]. P Generic algorithm: N½AABB ¼ N½AAi   N½ri  =Alanine  [CH3] =Valine  [CH]  2[CH3] =Leucine  [CH2]  [CH]  2[CH3] =Isoleucine  [CH2]  [CH]  2[CH3] * =Phenylalanine  [CH2]  5[CHaro]  [Caro] All [ri] values taken from Richard and Helgeson (1998). Those for [CH2] and [CH] are averages of values given for even and odd carbon number. The values for [Caro] correspond to those for the first group listed in Table 11 in Richard and Helgeson (1998). * Not used to calculate Vo for [AABB].

chains considered in the present study, values of the thermodynamic properties and parameters of the amino acid backbone, N[AABB], were computed using the average of five amino acids and their respective functional groups: alanine, valine, leucine, isoleucine and phenylalanine for all properties and parameters except that phenylalanine was excluded from the Vo algorithm since its value for this property was also estimated. The algorithms used to calculate values of N[AABB] are depicted in Table 2 and the resulting values of DH of , So, Vo, a and b for [AABB] are given in

Table 3. Because all but one of the groups, [CHaro], that constitute a part of the amino acid side chain, [ri], taken from Richard and Helgeson (1998) have non-zero values of c, values of a and b for these groups were revised to reflect the observation that values of C op for all of the crystalline amino acids and polypeptides reported in the literature exhibit linear trends as function of temperature. This was accomplished by calculating values of C op as a function of temperature using Eq. (5) and the values of a, b and c reported in Richard and Helgeson (1998) and fitting these C op values with linear functions (c = 0 J K mol1) from 240 to 400 K. The resulting linearized heat capacity power function coefficients for these groups closely reproduce the values of C op generated from the original ones to better than 1.5 J K1 mol1 at 298.15 K. These values of a and b are listed in Table 4. Similarly, values of DH of , So, Vo, a and b for the amino acid functional groups, or side chains, [Ri], were calculated using N½Ri  ¼ NAAi  N½AABB

ð13Þ

where values for NAAi were taken from Helgeson et al. (1998) except for Vo for all amino acids, a and b for leucine, as noted above, and DH of for methionine, which was based on an suspect enthalpy of combustion (Tsuzuki et al., 1958). A different experimental value for DH of of crystalline

Table 3 Summary of heat capacity power function coefficients and values of the standard molal thermodynamic properties at 25 °C and 0.1 MPa for crystalline amino acid functional, amino acid backbone and polypeptide backbone groups. a

Group

DGof

[ala] [arg] [asn] [asp] [cys] [glu] [gln] [gly] [his] [ile] [leu] [lys] [met] [phe] [pro] [ser] [thr] [trp] [tyr] [val] [hyd-pro] [hyd-lys] [AABB] [PBB]

9.669 120.5 168.8 369.6 20.32 370.6 171.0 17.16 154.6 13.55 3.63 14.77 35.95 149.2 59.14 155.0 141.5 241.4 38.52 1.644 89.42 133.8 355.8 101.7

a b c d e f

DH of

a

55.52 116.4 281.5 466.1 22.88 502.5 318.3 30.00 40.71 130.8 139.6 171.5 70.25 40.30 18.70 225.5 251.6 92.18 177.9 110.7 196.6 349.4 502.9 187.1

kJ mol1. J K1 mol1. cm3 mol1. Calculated using values of a and b using Eq. (5). J K1 mol1. J K2 mol1.

So

b

47.99 169.4 93.30 88.91 88.66 107.0 113.8 22.30 159.2 126.8 130.6 147.6 150.2 132.4 82.84 67.95 71.50 169.8 132.8 97.65 86.94 151.7 80.51 53.95

Vo

c

14.55 63.93 30.51 31.82 32.61 44.62 45.81 2.460 69.27 60.04 65.97 74.11 61.81 80.71 36.19 19.29 32.28 111.5 79.41 43.66 46.25 84.21 45.92 40.98

C op

b,d

53.23 164.3 91.47 86.18 93.20 106.0 115.1 30.24 152.7 119.3 124.2 142.1 128.3 134.0 82.36 66.55 88.79 169.0 147.5 99.84 140.3 199.9 69.00 32.94

ae

bf

12.77 14.26 7.883 7.602 32.18 7.376 2.305 22.02 18.69 6.632 8.904 22.88 2.050 28.10 21.74 7.623 20.34 27.83 27.55 5.674 193.7 194.9 36.92 39.19

0.2213 0.5988 0.3332 0.3146 0.4205 0.3803 0.3937 0.1753 0.5748 0.4225 0.4463 0.5532 0.4233 0.5437 0.3492 0.2488 0.3660 0.6602 0.5872 0.3539 1.120 1.324 0.1076 0.02095

Thermodynamic properties of crystalline peptides

methionine has been adopted instead (Sabbah and Minadakis, 1981), and was used to recalculate DGof using Eqs. (9) and (10). These values for crystalline methionine are listed in Table 5 along with revised values of DGof , DH of and So for neutral, positive and negatively charged aqueous species of methionine, which were reported in Amend and Helgeson (1997a) and Dick et al. (2006) and based on the erroneous value of DH of ðcrÞ .These changes do not affect calculated values of DGo of solution for methionine, but do increase the values of DGof for both crystalline and aqueous methionine. Though Sabbah and Minadakis (1981) do not discuss why the enthalpy of combustion they measured differs from that reported by Tsuzuki et al. (1958), the values of DH of derived from these measurements can be compared to an esti-

Table 4 Linearized values of the heat capacity power function coefficients for the (ribose–deoxyribose) reaction and crystalline groups. Group

aa

bb

[CH3] [CH2] [CH] [CHaro] (Ribose –deoxyribose)

15.20 9.745 41.22 7.866 172.0

0.1469 0.1088 0.08276 0.02684 0.7711

77

Aro – aromatic. a J K1 mol1. b J K2 mol1.

Table 5 Summary of heat capacity power function coefficients and values of the standard molal thermodynamic properties at 25 °C and 0.1 MPa for selected amino acids and peptides. Unless noted otherwise, values of a and b were regressed using the experimental data shown in Fig. 2. Compound Alanylglycine (cr) Cysteine (cr) Glycine (cr) Glycylglycine (cr) Diglycylglycine (cr) 4-Hydroxyproline (cr) 5-Hydroxylysine (cr) Leucine (cr) Ieucylglycine (cr) Methionine (cr) Methionine (aq) (Methionine)+ (aq) (Methionine) (aq) [Met] (aq) cc

Formula C5H10N2O3 C3H7NO2S C2H5NO2 C4H8N2O3 C6H11N3O4 C5H9NO3 C6H14N2O3 C6H13NO2 C8H16N2O3 C5H11NO2S C5H11NO2S C5H12NO2S+ C5H10NO2S [C3H7S]

DGof

a

DH of f

488.0 340.0f,k 377.8l 489.1f 608.8f 449.8f 494.1f 356.7l 470.1f 324.4f 322.0x 335.0aa 269.4aa 33.10

a

Sob g

776.72 530.1l 537.2l 746.89o 965.7o 703.8 856.6 646.8l 860.6o 577.48v 565.72y 567.8bb 521.62bb 66.95

C op h

213 169.9l 103.5l 180.3p 253.85r 168.2 232.9 211.8l 281h 231.5w 262.8z 299.3z 234.3z 170.8

b

Voc i

182.4 162.2l 99.24l 163.8i 229.19i 209.3i 269.0i 193.2i 256.2i 197.3l

j

101.8 81.70m 46.63n 87.07q 120.5s 95.34 132.5 115.1t 153.6u 110.9

ad

be

42.82 4.732l 14.90l 35.44 43.943 156.8 158.0 28.01 42.94 38.97l

0.4682 0.5281l 0.2829l 0.4304 0.6212 1.228 1.432 0.5539 0.7154 0.53091l

kJ mol1. J K1 mol1. c cm3 mol1. d J K1 mol1. e J K2 mol1. f Calculated from DH of , So and So of the elements given in Cox et al. (1989). g Diaz et al. (1992). h Huffman (1941). i Calculated from values of a, b, and c and Eq. (5). j Calculated from density data given in Koch and Germain (1970). k Value in Helgeson et al. (1998) appears to be a typographical error. l Helgeson et al. (1998). m Calculated using density data in Harding and Long (1968). n Calculated from density data given in Marsh (1958). o Domalski (1972). p Domalski and Hearing (1996). q Calculated from density given in Hughes and Moore (1949). r Drebushchak et al. (2008). s Calculated from unit cell parameters given in Drebushchak et al. (2008). t Calculated from density data in Harding and Howieson (1976). u Estimated using Vo = Vo[PBB] + Vo[AABB] + Vo[leu] + Vo[gly] and values of group volumes listed in Table 1. v Sabbah and Minadakis (1981). w Hutchens et al. (1964). x Calculated from DGfo(cr) and a solubility measurement given in Cabani and Gianni (1986). y Calculated from DH of ðcrÞ and DH osol , given in Rodante (1989). z Calculated from DGof , DH of and So of the elements given in Cox et al. (1989). aa Calculated from DGof (aq) and DH oioniz , given in Rodante (1989). bb Calculated from DH of (aq) and DH oioniz , given in Rodante (1989). cc Calculated using [met](aq) = methionine(aq)  [AABB](aq) and the corresponding properties for methionine(aq), given above, and [AABB](aq) from Dick et al. (2006). a

b

78

D.E. LaRowe, J.M. Dick / Geochimica et Cosmochimica Acta 80 (2012) 70–91

mated value of DH of for methionine using the group additivity method used in this study. Although the groups required to estimate the thermodynamic properties of crystalline methionine are not available, a close approximation of it can be represented as methionine = cysteine + 2[CH2]. It should be noted that the structural locations and thus bonding characteristics of the sulfur and a carbon and a hydrogen atom in cysteine and methioine are not the same (see Fig. 1). However, using the values of DH of for cysteine in Table 5 and the average of the [CH2] groups reported in Richard and Helgeson (1998), the aforementioned approximation yields an enthalpy of formation for methionine equal to 588.52 kJ mol1. This compares favorably to the value reported by Sabbah and Minadakis (1981), 577.48 kJ mol1, while DH of for methionine derived from Tsuzuki et al. (1958) is equal to 758.14 kJ mol1. This approximation supports using the DH of for methionine reported by Sabbah and Minadakis (1981). Additionally, values of DGof , DH of and So for the aqueous methionine functional group, [met], have also been revised using Eq. (13) and the corresponding properties of aqueous [AABB] taken from Dick et al. (2006). These species properties are listed in Table 5. Because hydroxylated versions of proline and lysine, 4hydroxyproline and 5-hydroxylysine, are common components of the ubiquitous crystalline protein collagen, the thermodynamic properties and parameters for these species have also been estimated. Because the group difference between the ith amino acid and the ith hydroxylated amino acid is a [CH2] group being replaced by a [HCOH], values of DH of , So, Vo and a and b for crystalline 4-hydroxyproline and 5-hydroxylysine were calculated using the corresponding properties and parameters for proline and lysine and the difference in these values between ribose and deoxyribose, Nribose and Ndeoxyribose, organic species that also differ by a [HCOH] group being replaced by a [CH2] group. Schematically,

line (c = 0 J K mol1) over 260–400 K; the resulting difference are within 1.9 J K1 mol1 of those calculated using the original a, b and c parameters. The linearized values of a and b used in Eq. (15) are listed in Table 4. The values of DGof shown in Tables 3 and 5 for 4hydroxyproline and 5-hydroxylysine and [AABB] and [Ri] were calculated using the corresponding values of DH of and So and Eqs. (9) and (10), while the values of C oP at 25 °C and 0.1 MPa, also given in Tables 3 and 5, were calculated with the appropriate values of a and b and Eq. (5). 4. PEPTIDE BACKBONE Naturally occurring proteins commonly consist of >100 amino acids joined by peptide bonds, the order of which is referred to as the primary structure of the protein. Some portions of the protein are not linear, but are folded into secondary geometries including helices and b-sheets, pleated sheet-like patterns. These secondary structures can in turn be arranged in such a way as to give proteins an even more complex tertiary structure. Furthermore, protein aggregates can also form giving rise to superstructures and microcompartments that carry out complex functions. One of significant note is the carboxysome, a microcompartment found in cyanobacteria and some chemotrophs, which is responsible for enhancing CO2 fixation (Yeates et al., 2008). In order to be able to estimate the thermodynamic properties of polypeptides, the contribution of the repeating peptide unit, the common feature to all polypeptides, must be determined. Although secondary and tertiary structures can affect the properties of crystalline peptides, there are not enough data available in the literature to assess how they impact the thermodynamic properties of a given polypeptides, though this topic is discussed further in Section 5. The approach used in this study requires the corresponding properties of the peptide backbone, [PBB], which is the

ð14Þ

equivalent of the amino acid backbone minus [–H] and [–OH] groups:

and rearranged in equation form, NhydAAi ¼ NAAi þ ðNribose  Ndeoxyribose Þ

ð15Þ

where NhydAAi refers to the ith hydroxylated amino acid. These values are given in Table 5. The thermodynamic properties of ribose and deoxyribose required to evaluate Eq. (15) were taken from LaRowe and Helgeson (2006a). The corresponding properties of the functional groups, N½Ri  , for these hydroxylated amino acids were calculated as described above, using Eq. (13). As with leucine and the [ri] groups, the difference in the values of a, b and c between ribose and deoxyribose were linearized by regressing the difference in C op between them as a function of temperature with a straight

Values of DH of , So, Vo, a and b for the crystalline peptide backbone, N[PBB], were calculated using   P NPi  NAAi i N½PBB ¼ N½AABB þ ð16Þ n1

Thermodynamic properties of crystalline peptides Table 6 Standard molal enthalpies of formation for selected peptides at 25 °C and 0.1 MPa. Compound

Formula

Dipeptides Alanylalanine Alanylphenylalanine Glycylphenylalanine Glycylvaline Valylphenylalanine

C6H12N2O3 C12H16N2O3 C11H14N2O3 C7H14N2O3 C14H20N2O3

Tripeptides Glycylalanylphenylalanine Leucylglycylglycine

C14H19N3O4 C10H19N3O4

Tetrapeptide Triglycylglycine a b c

DH of

79

sum of all of the side chain functional groups, [Ri]. An idealized polypeptide structure containing n polymerized amino acids can be represented as:

a

807.32b 712.1c 685.8c 836.8c 767.8c 928.8c 1086c

Values of DGof DH of , So, Vo, a and b for the ith crystalline polypeptide are calculated using NPi ¼ N½AABB þ ðn  1ÞN½PBB þ

n X

N½Ri 

ð17Þ

i¼1

C8H14N4O5

1191c

kJ mol1. Diaz et al. (1992). Domalski (1972).

where Pi refers to the ith polypeptide, the AAi term refers to the sum of the amino acids that comprise it and n stands for the number of amino acids in the ith polypeptide. As an example of how the algorithm summarized by Eq. (16) functions, let Pi = alanylalanine, a dipeptide. From a group additivity perspective, alanylalanine = [AABB] + [PBB] + 2[ala]. The reason that this additivity scheme assigns alanylalanine one [PBB] instead of two is that there is only one peptide bond in this compound. As a result, one set of structural [–H] and [–OH] groups have been lost in the formation of this peptide bond. Thus, for all polypeptides, there will be (n  1) peptide bonds and an [AABB] group. The value of DH of for [PBB] was calculated using an average of the result of applying Eq. (16) to values of DH of from 12 polypeptides, listed in Tables 5 and 6, combined with data for the amino acids in Helgeson et al. (1998) and [AABB]. Similarly, though with a smaller data set, values of So and a and b for [PBB] were calculated using the average of the corresponding values of So for glycylglycine, diglycylglycine, leucylglycine and alanylglycine listed in Tables 5 and 6 and Eq. (16). Values of a and b for these di- and tripeptides were determined by regressing experimental heat capacity data as a function of temperature with Eq. (5). The regressions are shown in Fig. 2b and c and the sources of these data are indicated in the caption. The value of Vo for [PBB] was calculated using Eq. (16) and values of Vo for glycine, glycylglycine, diglycylglycine and [AABB] taken from Tables 4 and 5. The value of DGof shown in Table 3 for [PBB] was calculated using the corresponding values of DH of and So and Eqs. (9) and (10), while the value of C oP at 25 °C and 0.1 MPa, also given in Table 3, was calculated with the appropriate values of a and b and Eq. (5). 5. ESTIMATING THERMODYNAMIC PROPERTIES OF POLYPEPTIDES As alluded to in the previous section, the group additivity scheme adopted in this study represents the ith polypeptides, Pi, as consisting of one amino acid backbone unit, [AABB], (n  1) peptide backbone units, [PBB], where n refers the number of amino acids in the polypeptide, and the

and the corresponding values of these properties and parameters for [AABB], [PBB] and [Ri], given in Table 3. Because this algorithm ignores nearest neighbor effects and potential secondary and tertiary structural features, Eq. (17) should be regarded as an estimate of the sum of the thermodynamic properties of polymerized crystalline amino acids, rather than of the properties of complex, three-dimensional proteins. However, since secondary and tertiary geometries are determined by the primary structure of a polypeptide and these structures are generally the result of hydrogen bonds between amino acid functional groups, Eq. (17) captures the bulk of the features that contribute to the thermodynamic properties of crystalline polypeptides. This is illustrated in the first two following subsections in which experimental values of C oP for crystalline homopolypeptides and proteins taken from the literature are compared to values of C oP estimated for the same polypeptides using Eq. (17) and the group contributions taken from Table 3. Additionally, values of So for two crystalline proteins derived from experimental values of C oP are similarly compared to estimated values of So for these compounds. In the last subsection of this section, values of C oP and DGof are tabulated for a number of proteins found in extant organisms. 5.1. Comparing experimental and estimated homopolypeptide C oP Although there are few thermodynamic data available for crystalline polypeptides, the heat capacities of a number of homopolypeptides have been reported in the literature. Homo-polypeptides are polypeptides consisting of just one type of amino acid, or a repeating set of amino acids. For example, homocysteine is a homopolypeptide polymer of the amino acid cysteine only. In the group additivity model adopted in the study, a homocysteine molecule consisting of n cysteine function groups would be represented by [AABB] + (n  1)[PBB] + n[cys], where [cys] stands for cysteine’s functional group. Experimental values of C oP for a number of homopolypeptides are shown in Fig. 3 as a function of temperature. The lines in these plots refer to regressions of these data carried out using Eq. (5). The resulting a and b coefficients for each repeating unit in these polypeptides are listed in Table 7, along with the sources of the data. With the exception of poly(proline-glycine-proline), Fig. 3e, all of the C oP values refer to just one amino acid-peptide backbone combination,

80

D.E. LaRowe, J.M. Dick / Geochimica et Cosmochimica Acta 80 (2012) 70–91 120

240

a

b 110

200

polyglycine ( β - sheet)

polymethionine -1

C (J K mol )

-1 -1

polyvaline

o

p

o

90

polyglycine (3 - helix)

-1

160

80

1

p

C (J K mol )

100

120 70

polyalanine

polyglycine

60

80

50 200

250

300

350

150

400

200

250

TEMPERATURE (K)

300

350

400

TEMPERATURE (K)

220

c

d 250

200

polyasparagine -1

200

-1

-1

160

o

polyhistidine 150

p

140

p

o

polytyrosine C (J K mol )

polyphenylalanine

-1

C (J K mol )

180

120

polyproline

100 100

polyserine

80 200

250

300

350

50 200

400

250

TEMPERATURE (K) 400

350

300

350

400

TEMPERATURE (K) 280

e

f

poly(proline-glycine-proline)

260

200

-1

p

-1

200

o

p

o

250

220

-1

C (J K mol )

300

-1

C (J K mol )

240

180

polyleucine

polytryptophan

160 150 140

These data were excluded from the regression 100 200

120 250

300

TEMPERATURE (K)

350

400

200

250

300

350

400

TEMPERATURE (K)

Fig. 3. Standard molal heat capacities (C op ) of crystalline (a) polymethionine, polyvaline, polyalanine, (b) b-sheet polyglycine, 31 helix polyglycine, polyglycine, (c) polyphenylalanine, polyasparagine, polyserine (d) polytyrosine, polyhistidine, polyproline, (e) poly(proline– glycine–proline), polyleucine and (f) polytryptophan and as a function of temperature at 0.1 MPa. The symbols denote values of C op taken from the sources listed in Table 7 and the regression lines represent fits of Eq. (5) to the data. The circular and square symbols in (f) refer to data from Di Lorenzo et al. (1999) and Roles et al. (1993), respectively. The solid circles in (b) are values of C op not included in the regression.

Table 7 Summary of heat capacity power function coefficients and experimental and estimated values of C op at 25 °C and 0.1 MPa for selected polypeptides. Amino acid/polymera

Experimental C op

Poly-alanine Poly-asparagine Poly-glycine Poly-glycine (b-sheet)* Poly-glycine (31-helix)* Poly-histidine Poly-leucine Poly-methionine Poly-phenylalanine Poly-proline Poly-(proline-glycine-proline) Poly-serine Poly-tryptophan Poly-tryptophan Poly-tyrosine Poly-valine Anhydrous bovine chymotrypsinogen A Bovine a-chymotrypsinogen type II Bovine b-Lactoglobulin Bovine b -Iactoglobulin Chicken ovalbumin Ovalbumin Bovine pancrease ribonuclease A Anhydrous bovine zinc insulin dimer Chicken lysozyme Horse myoglobin

265 91 263 88–175 88–175 320 1326 1220 156 453 21f 68 183 Not given 766 73 241g 241g 162g 162g 385g 385g 124g 102g 129g 154g

93.33 138.7 66.59 101.6 93.82 147.1 170.3 174.7 169.3 115.9 299.2 113.3 191.7 207 184.2 144.1 33,190 34,780 28,550 24,980 60,070 55,710 17,530 14,390 18,580 22,150

b,c

ab

bd

Estimated C op

19.97 20.67 4.122 37.90e 32.17e 20.47 26.14 25.80 21.34 9.773 3.040 9.267 47.20 4.84 29.46 33.67 2268 6590 3499 4324 2779 4848 2883 1255 532.5 4093

0.246 0.396 0.2095 0.2138e 0.2056e 0.5619 0.4835 0.6695 0.6393 0.3558 1.014 0.4110 0.8012 0.711 0.7168 0.3703 103.7 94.55 107.5 69.28 210.8 170.6 49.12 44.07 60.52 60.57

86.17 119.1 63.18 63.18 63.18 185.6 157.1 161.2 167.0 115.3 293.8 99.49 202.0 202.0 180.5 132.8 30,550 30,550 22,370 22,370 51,860 51,860 16,490 13,900 17,160 21,000

b,f

Source of regressed Cp data Roles and Wunderlich (1991) Roles et al. (1993) Roles and Wunderlich (1991) Finegold and Kumar (1981) Finegold and Kumar (1981) Roles et al. (1993) Roles et al. (1993) Roles et al. (1993) Roles et al. (1993) Roles et al. (1993) Roles and Wunderlich (1993) Roles et al. (1993) Roles et al. (1993) Di Lorenzo et al. (1999) Roles et al. (1993) Roles and Wunderlich (1991) Hutchens et al. (1969a) Zhang et al. (1996) Di Lorenzo et al. (1999) Zhang et al. (1996) Di Lorenzo et al. (1999) Zhang et al. (1996) Zhang et al. (1996) Hutchens et al. (1969a) Di Lorenzo et al. (1999) Di Lorenzo et al. (1999)

Thermodynamic properties of crystalline peptides

Compound

For the homopolypeptides all values given per mole of repeating unit, or functional group, and for the rest, per mole of protein. a Calculated from average molecular weight of the polymer. b J K1 mol1. c Calculated from Eq. (5) and values of a, b and c. d J K2 mol1. e See text. f Number of (proline–glycine–proline) triplets per molecule. g Polymer length and composition taken from the active, final form of each enzyme according to the Uniprot software package available from the Swiss Institute of Bioinformatics website: http://uniprot.org; the names of the enzymes used in the respective papers are used here to facilitate those interested in finding the original data. * a and b determined from regressing values of C op that were plotted from a function given in the indicated reference.

81

82

D.E. LaRowe, J.M. Dick / Geochimica et Cosmochimica Acta 80 (2012) 70–91

[Ri] + [PBB]. All of the heat capacities of these molecules could be represented by linear forms of Eq. (5), i.e., c = 0 J K mol1, including helical polyglycine. Only the higher-temperature heat capacity values for helical polyglycine, the open circles in Fig. 3b, have been included in its regression due to the likelihood that the crystal structure of this compound undergoes a broad lambda transition from at least 150 K to about 250 K. Furthermore, there is no experimental evidence that values of C oP for proteins containing helical structures show a large deviation from a linear trend as a function of temperature. Values of C oP at 298.15 K for all the polypeptides shown in Fig. 3 have been calculated using the corresponding values of a and b and Eq. (5) per repeating polymer unit, and are listed in Table 7. For the group additivity scheme adopted here, the repeating polymer unit includes the peptide backbone unit [PBB] and a functional group [Ri]. For example, the repeating polymer unit in polycysteine, [–cys–], is defined by [–cys–] = [PBB] + [cys], which in general form for other homopolypeptides is ½-AA- ¼ ½PBB þ ½Ri 

ð18Þ

where [–AA–] stands for any amino acid residue. In order to test the accuracy of the group additivity algorithm adopted in this study, values of the experimental heat capacities for the repeating homopolypeptide units, C op;½-AA- ; are compared to estimated values of C oP for these homopolypeptide repeating units, C op;½-AA-est at 25 °C and 0.1 MPa.These estimated values of C oP were calculated using the groups listed in Table 3 and C op;½-AA-est: ¼ C op;½Ri  þ C op;½PBB

ð19Þ

These values are also listed in Table 7 and shown in comparison with the experimentally derived heat capacities in Fig. 4. It can be seen in Fig. 4 that the estimated values of C oP for the repeating amino acid units in these polymers are

o

25 C 0.1 MPa

250

-1

-1

(J K mol )

300

Estimated C

o p

200

150 3 - helix polyglycine 1

100

- sheet polyglycine

50 50

100

similar to and, in general, less than the experimental values. It is remarkable that the estimates are as close to the experimental values as they are given that the samples used in these experimental studies are not well-characterized. Specifically, in the experiments carried out to measure the heat capacities of these homo-polypeptides (Finegold and Kumar, 1981; Roles and Wunderlich, 1991, 1993; Roles et al., 1993), some of the samples contained residual salts, all samples contained some water with variable amounts driven off during sample preparation, and the lengths of the homopolypeptides were calculated from average molecular weights of the compounds used in the experiments. Furthermore, none of the polypeptide structures were determined, except for b-polyglycine and helical polyglycine, despite the fact that crystalline homopolypeptides are known to take on an array of structures including linear, b-sheet and helical (Finegold and Kumar, 1981; Roles and Wunderlich, 1991, 1993; Roles et al., 1993).The effects of sample impurities and characterization are illustrated in Fig. 3f. Here, values of C oP for polytryptophan from two different studies are plotted as a function of temperature. The value of C oP from Di Lorenzo et al. (1999) at 25 °C, 207 J K1 mol1, is 7.4% larger than the one reported in Roles et al. (1993), 191.7 J K1 mol1. By comparison, the estimated C oP for polytryptophan generated in this study is between these two values, 202.0 J K1 mol1. The polypeptides listed in Tables 5 and 6 were chosen to be the model compounds used to generate the groups in Table 3 rather than the homopolypeptides because the former were better characterized and contained fewer impurities.

150

200 o p

Experimental C

250 -1

300

-1

(J K mol )

Fig. 4. Comparison of experimental and estimated standard molal heat capacities (C op ) of the crystalline polypeptides shown in Fig. 3 and listed in Table 7 at 25 °C and 0.1 MPa. The line represents a 1:1 reference to illustrate that the estimated values tend to be lower than the experimental values.

5.2. Comparing experimental and estimated protein C oP and So Experimental values of C oP for a number of crystalline proteins are shown in Fig. 5 as a function of temperature. The solid lines in these plots refer to regressions of these heat capacity data carried out using Eq. (5). The resulting a and b coefficients for these proteins are listed in Table 7, along with the sources of the data. All of the C oP values for these polypeptides are simple linear functions of temperature over the temperature range considered here, soc was set to 0 J K mol1 in Eq. (5). The dashed lines represent estimates of C oP for each of these proteins calculated using Eqs. (5) and (17), the values of a and b for the groups listed in Table 3 and the composition of the proteins according to the Uniprot knowledgebase that is available from the Swiss Institute of Bioinformatics website: http://uniprot.org. It can be seen that the estimated values of a and b yield values of C oP as a function of temperature that are similar to the experimental values of C oP , though consistently lower. This deviation could be due to the fact that the estimation algorithm does not include secondary and tertiary structural features of the folded polypeptides, just the type of amino acid in them. Furthermore, and similar to the homopolypeptides, the crystalline proteins used in the experimental determinations of C oP contain unknown impurities including water molecules, depending on the source of the protein and its preparation prior to experimentation. For instance, the values of C oP for chymotrypsinogen, b-lactoglobulin and ovalbumin

Thermodynamic properties of crystalline peptides

45

a

34

83

b

32

-1

C (kJ K mol )

30 28

-1 o

30

p

o

35

p

-1

-1

C (kJ K mol )

Di Lorenzo et al. (1999)

Zhang et al. (1996)

40

26 24 22

Hutchens et al. (1969) 25

Zhang et al. (1996)

20

chymotrypsinogen

-lactoglobulin 18

250

300

350

250

400

TEMPERATURE (K)

300

350

400

TEMPERATURE (K)

80 75

c

22

d

20

Di Lorenzo et al. (1999)

-1

C (kJ K mol )

-1

65 60 55

o

Di Lorenzo et al. (1999) 18

p

o

p

-1

-1

C (kJ K mol )

70

16 50 Zhang et al. (1996) 45

ovalbumin

14

lysozyme

40 250

300

350

400

TEMPERATURE (K)

220

240

260

280

300

320

340

TEMPERATURE (K)

Fig. 5. Standard molal heat capacities (C op ) of crystalline (a) chymotrypsinogen, (b) b-lactoglobulin, (c) ovalbumin, (d) lysozyme, (e) ribonuclease, (f) myoglobin and (g) insulin from different organisms as a function of temperature at 0.1 MPa. The symbols refer to experimental data taken from the indicated sources, while the solid lines represent fits of these data using Eq. (5). The dashed lines refer to estimated values of C op generated in this study.

shown in Figs. 5a–c have all been measured in two separate studies resulting in values that differ from one another by a factor similar to that of the estimated and the experimental values. The percent error between the estimated values of C oP for proteins at 25 °C and 0.1 MPa, C oP Pi ;est ; and the experimental values at the same pressure and temperature, C oP Pi ;expt , are tabulated in Table 8. Perhaps of greatest importance in explaining the discrepancies between the estimated and measured heat capacities of the proteins listed in Table 7 is the extent to which they form secondary structures such and helices and b-sheets, which the algorithm adopted in this study does not take into account. Secondary structures in proteins, which are formed through non-covalent bonds, mostly hydrogen bonds, are very common. Stickle et al. (1992) estimate that the average protein has 1.1 intramolecular hydrogen bonds per amino acid, and conclude that there is a significant network of hydrogen bonds mostly in helices and b-sheets. Although the energetic contribution of hydrogen bonds in the solid phase cannot be determined (Steiner, 2002), it

cannot be ignored that this network of secondary bonding structures will have some impact on the thermodynamic properties of polypeptides. It can be seen in Table 8 that the proteins for which the percent error for the heat capacity estimates is low tend to have a lower proportion of amino acids in b-sheets. This is shown graphically in Fig. 6. Although it is not a sharp correlation, there is a general trend in which values of estimated C oP , the groups for which are taken from molecules without a secondary structure, are closer to the experimental values of C oP for proteins containing relatively few amino acids in the b-sheet structural configuration. This suggests that the systematic underestimates of C oP for proteins are due, as least in part, to the contribution of the secondary structure of the proteins, in particular b-sheets. It should be noted that the anhydrous bovine pancreatic insulin protein is a dimer that contains a Zn atom. The values of a and b for Zn required to calculated C oP for this protein were taken to be equal to that of elemental, metallic zinc (Kelley, 1934). Because only one zinc atom occurs

84

D.E. LaRowe, J.M. Dick / Geochimica et Cosmochimica Acta 80 (2012) 70–91 24 24

e

f 23

22

22

p

o

18

-1

21 20

o

16

p

-1

Zhang et al. (1996)

-1

C (kJ K mol )

-1

C (kJ K mol )

Di Lorenzo et al. (1999) 20

19 18

14

17

myoglobin

ribonuclease 16

12 250

300

350

240

400

TEMPERATURE (K)

260

280

300

320

TEMPERATURE (K)

15

g

o

Hutchens et al. (1969a)

13

12

p

-1

-1

C (kJ K mol )

14

11

10

insulin 9 200

220

240

260

280

300

TEMPERATURE (K)

Fig 5. (continued)

Table 8 Summary of percent error in estimated values of heat capacity for selected polypeptides at 25 °C and 0.1 MPa and their secondary structural composition. % error C op

Polypeptide c

Chymotrypsinogen Chymotrypsinogend Lactoglobuline Lactoglobulind Ovalbumine Ovalbumind Ribonucleased Insulinc Lysozymee Myoglobine

8.0 12 22 10 14 6.9 5.9 3.4 7.6 5.2

a

% AA in helicesb

% AA in b-sheetsb

% AA in secondary structures

9.96 9.96 16.7 16.7 27.5 27.5 21 24.5 31.8 74

35.7 35.7 50.6 50.6 39.5 39.5 41.1 0 9.3 0

45.7 45.7 67.3 67.3 67 67 62.1 24.5 41.1 74

AA stands for amino acid. a Calculated using: % error = 100*(ðC op Pi ;expt  C op Pi ;est Þ=C op Pi ;expt ). b Calculated from structural data taken from the Swiss Institute of Bioinformatics website: http://uniprot.org. c Hutchens et al. (1969a). d Zhang et al. (1996). e Di Lorenzo et al. (1999).

per molecule of this protein, which contains 102 amino acids and has a molecular weight of 11,544 g mol1, any

uncertainty associated with using elemental zinc to represent the group contribution of Zn to the C oP of crystalline

Thermodynamic properties of crystalline peptides

Eq. (17) to estimate the thermodynamic contributions of the amino acids to the properties of anhydrous bovine pancreatic insulin. For instance, the measured heat capacity of this protein is 14,390 J K1 mol1 at 298.15 K and 0.1 MPa and that for elemental zinc is 25.47 J K1 mol1, 0.18% of that for the protein. The percent error between the estimated and measured heat capacities of anhydrous bovine pancreatic insulin protein at 298.15 K and 0.1 MPa is 3.4%. As a result, if another value of C op were used to represent Zn in this protein, even if its contribution to C oP were 50% different from that of elemental Zn, this would still represent a miniscule contribution to the uncertainties for C oP : The heat capacities for two of the proteins discussed above, chymotrypsinogen and insulin, have been measured over a broad enough temperature range that their standard molal entropies have also been determined. In both cases the values of these entropies at 25 °C and 0.1 MPa have been derived from experimental C oP measurements per gram of protein (Hutchens et al., 1969a), but the authors of this study had only estimates of the amino acid composition and thus molecular weights of these polypeptides (Brown et al., 1955; Dayhoff and Eck, 1968). Using compositional data from the Uniprot knowledgebase, more precise values of So for these proteins have been computed. The resulting revised values of So for anhydrous bovine chymotrypsinogen A and insulin

25 C 0.1 MPa

% error, C

o p Pi ,exp.

&C

o p Pi ,est.

o

20

15

10

5

0

10

20

30

40

50

85

60

% AA in -sheets

Fig. 6. Percent error of estimated values of C op for the proteins shown in Fig. 5 as a function of the percentage of amino acids in those proteins that occur in b-sheets. See Tables 7 and 8. The line is a least-squares correlation line.

anhydrous bovine pancreatic insulin rather than a Zn group derived from a polyatomic compound is trivial compared to the uncertainty arising from using the groups in Table 3 and

Table 9 Standard deviations of group contributions for crystalline [AABB], [PBB] and representative uncertainties associated with the amino acids used to calculate values of [Ri] in Table 3. DH of Amino acids [AABB] [PBB]

a

So

6±2100d ±11,460 (2.26%) ±3190 (1.72%)

b

C op

6±2.1d ±5.4 (1.29%) ±6.0 (11.1%)

b

Vo

6±0.8d ±9.4 (13.2%) ±4.7 (15.6%)

c

– ±3.4 (6.8%) ±2.5 (6.0%)

The parenthetical numbers refer to the percentage that the uncertainties are of the accepted values. a J mol1. b J K1 mol1. c cm3 mol1. d Helgeson et al. (1998).

Table 10 Summary of heat capacity power function coefficients and values of the standard molal thermodynamic properties at 25 °C and 0.1 MPa for selected proteins. a

Protein

Formula

Length

DGof

1. 2. 3. 4. 5. 6.

C1717H2873N459O535S13 C1785H2973N499O545S16 C2555H4032N640O865S14 C2151H3412N574O627S16 C5676H9113N1489O1863S3 C427H712N128O131S3

364 380 530 425 1198 97

62371.3 61489.7 105199 67880.0 215983 14434.7

FTSZ1_METJA FTSZ2_METJA CSG_METJA RBL_METJA SLAP_GEOSE CSOA_HALNC

DH of

a

134276 136242 208857 153640 449700 32550.0

The length refers to the number of polymerized amino acids per protein. a kJ mol1. b J K1mol1. c cm3 mol1. d Calculated using values of a and b using Eq. (5). e J K1mol1. f J K2mol1.

So

b

55631.1 57955.2 80868.2 67420.4 177841.0 13997.1

Vo

c

7268.0 7510.1 10646.6 8847.3 23551.5 1798.9

C op

b,d

47900.3 49858.1 69814.8 58246.3 155404.9 12092.7

ae

bf

9670.00 10059.14 13964.47 10906.60 30521.57 2575.99

128.221 133.486 187.321 158.777 418.851 31.918

86

D.E. LaRowe, J.M. Dick / Geochimica et Cosmochimica Acta 80 (2012) 70–91 260

240 220

a

240 220

S 0 (J K mol )

-1

-1 0

180

140

p

-1

-1

C (J K mol )

200

160

b

200 180 160

120

140

100

120

80

1

2

100

3

1

glycine / molecule

2

3

glycine / molecule -500

120

c

d

-600

(kJ mol )

-1

3

-1

V 0 (cm mol )

-700 100

-900

H

f

0

80

-800

60

-1000 -1100 -1200

40

1

2

3

glycine / molecule

1

2

3

4

glycine / molecule

Fig. 7. Standard molal (a) heat capacities (C op ), (b) entropies (So), (c) volumes (Vo) and (d) enthalpies of formation (DH of ) of crystalline glycine and polyglycine compounds. The lines represent linear fits to these data which were taken from Tables 5 and 6.

are 34,080 and 15,190 J K1 mol1. Similar to the C oP values for homo-polypeptides and proteins, values of So for these two proteins have been estimated using Eq. (17), values of So for the groups listed in Table 3 and the compositions of these proteins taken from the Uniprot knowledge base. The estimated values of So for chymotrypsinogen and insulin are 35,200 and 15,910 J K1 mol1, respectively. The estimated values are slightly more positive, 3.3% and 4.8%, respectively, than their experimentally determined counterparts. Again though, the samples used in the experiments contained impurities that could explain part of this difference, and the groups used in the estimation algorithm do not take into account secondary structural features of the folded proteins. As with the values of a and b for insulin, the value of So for metallic, elemental zinc was used in the molecular weight and S° calculations (Cox et al., 1989). 5.3. Estimating protein C oP and G of Using the approach summarized by Eq. (17), the thermodynamic properties in Table 3 and compositional data taken from the UniProt knowledge base, values of heat capacity power function coefficients and values of the standard molal thermodynamic properties at 25 °C and 0.1 MPa for six proteins have been computed and are listed in Table 9. The prokaryotic organisms from which these proteins originate and their functions are summarized in Table 10. The thermodynamic properties of these proteins were estimated because with the exception of RBL_MET-

JA, a protein involved in carbon-fixation in a hyperthermophile, they have structural roles, a common function of crystalline proteins. RBL_METJA was chosen not only for the role it plays in the carbon cycle, but as a point of comparison for the other proteins. Five of the six proteins listed in Tables 9 and 10 are from thermophiles or hyperthermophiles, and the sixth, CSOA_HALNC, is from a halophile. Additionally, CSG_METJA and SLAP_GEOSE are proteins that exists on the outer surfaces of organisms (Akca et al., 2002) and thus are in direct contact with their geochemical environments. Values of the standard molal heat capacities and Gibbs energies of formation of the six proteins shown in Tables 9 and 10 are plotted as a function of temperature in Fig. 8. These values of C op and DGof were calculated using the thermodynamic properties shown in Table 9, Eqs. (5) and (6) and the CHNOSZ software package (Dick 2008). The values of C op and DGof in the (a) and (b) panels in Fig. 8 are given per mole of protein and those for (c) and (d) are normalized per amino acid in the respective proteins. Rather than use their abbreviated names, the numbers on the curves in these plots correspond to the proteins as shown in Table 9. Not surprisingly, the absolute values of C op , Fig. 8a, are lowest for the smallest protein, CSOA_HALNC, and highest for the largest protein, SLAP_GEOSE, with the rest falling in between in order of size. A similar, though inverted trend is seen in Fig. 8b for values of DGof . However, the values of C op and DGof that are normalized per amino acid

Thermodynamic properties of crystalline peptides

a ΔG f ° (MJ (mol protein)−1)

C p ° (kJ K−1 (mol protein)−1)

200

5

150

100 3 4 2 1

50

0

87

b

6

−50 1,2 4 3

−100 −150 −200

5

6 0

20

40

60

80

100

120

−250

140

0

20

TEMPERATURE (° C)

60

80

100

120

140

6

c

ΔG f ° (kJ (mol residue)−1)

C p ° (J K−1 (mol residue)−1)

180

40

TEMPERATURE (° C)

170 160 150 140 4 ,3 1,2 5 6 120 130

d

4 2 1

−160

5

−180

3 −200

−220 0

20

40

60

80

100

TEMPERATURE (° C)

120

140

0

20

40

60

80

100

TEMPERATURE (° C)

120

140

Fig. 8. Standard molal heat capacities (C op ), (a) and (c), and Gibbs energies of formation (DGof ), (b) and (d), as a function of temperature for the six proteins listed in Tables 10 and 11. The values in (a) and (b) are given per mole of protein and those in (c) and (d) are normalized per amino acid in the respective proteins, which are abbreviated as 1 – FTSZ1_METJA, 2 – FTSZ2_METJA, 3 – CSG_METJA, 4 – RBL_METJA, 5 – SLAP_GEOSE and 6 – CSOA_HALNC.

Table 11 Identification numbers, source organisms and function of proteins shown inTable 10. Namea

Accession no.b

Organism

Protein common name

Function(s)

FTSZ1_METJA FTSZ2_METJA CSG_METJA RBL_METJA SLAP_GEOSE CSOA_HALNC

Q57816 Q58039 Q58232 Q58632 P35825 P45689

Methanocaldococcus jannaschii Methanocaldococcus jannaschii Methanocaldococcus jannaschii Methanocaldococcus jannaschii Geobacillus stearothermophilus Halothiobacillus neapolitanus

Cell division protein Cell division protein Cell surface glycoprotein RuBisCOc Surface-layer protein Carboxysome shell protein

Prokaryotyic homologue of tubulin Prokaryotyic homologue of tubulin Protein/cell recognition Carbon fixation Protection; mechanical stability Structural; carbon fixation

a b c

Swiss-Prot name for protein. Unique identification for proteins. Ribulose-1,5-bisphosphate carboxylase oxygenase.

in each protein, Fig. 8c and d, do not follow the same progression as those that are calculated per mole of protein. This reflects the different proportions of amino acids in each of these proteins and the varying amounts each polymerized amino acids contributes to the thermodynamic properties of the entire proteins. Though beyond the scope of this study, the calculated values of DGof shown in Fig 8d for crystalline proteins can be combined with aqueous value of DGof those for the same proteins to estimate their standard Gibbs energies of solvation. These calculations could be applied to better understand the energetics of precipitation reactions in various subcellular compartments as well as in abiotic scenarios involving polypeptides.

6. UNCERTAINTIES The combined uncertainties associated with measurements and group contributions of DH of , So, Vo and C oP at 25 and 0.1 MPa for crystalline amino acids, the amino acid backbone, [AABB], and peptide backbone, [PBB], are shown in Table 9. The uncertainties for the amino acids are listed due to their importance in calculating the groups representing the amino acid functional groups, [Ri]. The uncertainties for DH of , So and C oP for the amino acids were taken directly from Helgeson et al. (1998). No value for Vo is given because nearly all of these values for the amino acids are simply calculated in the present study from density measurements

88

D.E. LaRowe, J.M. Dick / Geochimica et Cosmochimica Acta 80 (2012) 70–91

and their molecular weights. Any uncertainties associated with Vo would result from experimental error, which were not reported in the sources of the density data. The uncertainties shown in Table 9 for the thermodynamic properties of [AABB] and [PBB] have been determined by calculating the standard deviation of the different model compound algorithms used to calculate these values as discussed in Sections 3 and 4. Representative uncertainties for the a and b parameters that have been generated in this study are particularly difficult to assess. Examination of the fits of Eq. (5) in Fig. 2 reveal that the associated values of a and b afford close approximation of C oP values for these compounds as a function of temperature. Those for [AABB] and [PBB] are best summarized by the uncertainty given in Table 9 for C oP since the major portion of the values of C oP for these groups are based on the heat capacities of the amino acids given in Helgeson et al. (1998). In addition to the nearness of the estimates of C oP and So for the polypeptides discussed in Section 5, the group additivity approach adopted in this study can also be justified as a valid method for estimating the thermodynamic properties of peptides by examining the trends shown in Fig. 7for C oP , So, Vo and DH of among glycine, glycylglycine, diglycylglycine and triglycylglycine at 25 °C and 0.1 MPa. The incremental changes in each of these properties follows a nearly linear trend, which is analogous to the patterns observed in homologous series and carbon-number systematics in organic molecules (Domalski and Hearing, 1993; Helgeson et al., 1998; Richard and Helgeson, 1998). The accuracy of the approximation that V o ¼ V oT r ;P r , as stated in Section 2, can be assessed by calculating the energetic consequence of a change in polypeptide volume resulting from a three order of magnitude pressure difference. For example, the difference in values of Vo for chicken lysozyme, a 129 amino acid polypeptide, at 0.1 and 100 MPa is 201.4 cm3 mol1, or 1.8%. Taking into account that 1 cm3 = 10 J MPa1 and that the Gibbs energy of formation of chicken egg-white lysozyme is 18,216 kJ mol1, it can be seen using Eq. (2) that the volumetric difference between 0.1 and 100 MPa leads to a 20 kJ mol1 difference in DGof for this polypeptide, or 0.1%. The value of DGof for chicken lysozyme was calculated using the composition of the polypeptide taken from the Uniprot database, the groups listed in Table 3 and Eq. (17). The volume change was calculated using measured densities of crystalline chicken lysozyme at 0.1 and 100 MPa (Kundort and Richards, 1988) and a molecular weight of 14313.1 g mol1. Given the above analysis, the approximation that V o ¼ V oT r ;P r has a negligible contribution to the uncertainty in calculated values of DGof to greater than 100 MPa. 7. CONCLUDING REMARKS The equations of state parameters and standard molal thermodynamic properties of the groups that constitute crystalline peptides computed above make it possible to quantify the thermodynamic properties of any polypeptide at both ambient and elevated temperatures and pressures. As a result, biogeochemical reactions involving crystalline polypeptides can be quantified in any of the environments

in which organic compounds are known to exist. Calculations of this kind can be used to better understand how chemical variables influence the precipitation of proteins, how mineral surfaces interact with the polypeptides that constitute cell walls and extracellular matrices, to constrain the persistence of fossil proteins and to explore how amino acids polymerize in an abiotic setting. ACKNOWLEDGEMENTS The research reported above was enabled by financial support from Georgia Institute of Technology (DEL) and National Science Foundation Postdoctoral Fellowship program Grant 0847616 (JMD). This contribution was substantially improved by constructive reviews by Johnson Haas and an anonymous reviewer. Kathleen Johnson and the Department of Earth System Science at the University of California, Irvine deserve special recognition for supporting DEL during the revision of this manuscript.

REFERENCES Akca E., Claus H., Schultz N., Karbach G., Schlott B., Debaerdemaeker T., Declercq J.-P. and Ko¨nig H. (2002) Genes and dervied amino acid sequences of S-layer proteins from mesophilic, thermophilic, and extremely thermophilic methanococci. Extremophiles 6, 351–358. Alibardi L. (2003) Adaptation to the land: the skin of reptiles in comparison to that of amphibians and endotherm amniotes. J. Exp. Zool. B - Mol. Dev. Evol. 298, 12–41. Alibardi L. (2006) Structural and immunocytochemical characterization of keratinization in vertebrate epidermis and epidermal derivatives. Int. Rev. Cytol. 253, 177. Amend J. P. and Helgeson H. C. (1997a) Calculation of the standard molal thermodynamic properties of aqueous biomolecules at elevated temperatures and pressures. 1. L-a-amino acids. J. Chem. Soc. Faraday Trans. 93, 1927–1941. Amend J. P. and Helgeson H. C. (1997b) Group additivity equations of state for calculating the standard molal thermodynamic properties of aqueous organic species at elevated temperatures and pressures. Geochim. Cosmochim. Acta 61, 11– 46. Amend J. P. and Helgeson H. C. (1997c) Solubilities of the common L-a-amino acids as a function of temperature and solution pH. Pure Appl. Chem. 69, 935–942. Amend J. P. and Helgeson H. C. (2000) Calculation of the standard molal thermodynamic properties of aqueous biomolecules at elevated temperatures and pressures II.Unfolded proteins. Biophys. Chem. 84, 105–136. Bello J. (1978) Tight packing of protein cores and interfaces: relation to conservative amino acid sequences and stability of protein–protein interaction. Int. J. Peptide Protein Res. 12, 38– 41. Benson S. W. (1968) Thermochemical Kinetics: Methods for the Estimation of Thermochemical Data and Rate Parameters. John Wiley & Sons, New York. Benson S. W. and Buss J. H. (1958) Additivity rules for the estimation of molecular properties. Thermodynamic properties. J. Chem. Phys. 29, 546–572. Brown H., Sanger F. and Kitai R. (1955) The structure of pig and sheep insulins. Biochem. J. 60, 556–564. Brush A. H. (1996) On the origin of feathers. J. Evol. Biol. 9, 131– 142. Cabani S. and Gianni P. (1986) Gas–liquid and solid–liquid phase equilibria in binary aqueous systems of nonelectrolytes. In

Thermodynamic properties of crystalline peptides Thermodynamic Data for Biochemistry and Biotechnology (ed. H. -J. Hinz). Springer-Verlag, Berlin. pp. 259–275. Cabani S., Gianni P., Mollica V. and Lepori L. (1981) Group contributions to the thermodynamic properties of non-ionic organic solutes in dilute aqueous solution. J. Sol. Chem. 10, 563–595. Campbell A. A. (1999) Interfacial regulation of crystallization in aqueous environments. Curr. Opin. Colloid Interface Sci. 4, 40– 45. Chothia C. (1984) Principles that determine the structure of proteins. Ann. Rev. Biochem. 53, 537–572. Cochran W. and Penfold B. R. (1952) The crystal structure of Lglutamine. Acta Crystallogr. 5, 644–653. Cox J. D., Wagman D. D. and Medvedev V. A. (1989) CODATA Key Values for Thermodynamics. Hemisphere, New York. Dawson B. (1953) The crystal structure of DL-glutamic acid hydrochloride. Acta Crystsllogr. 6, 81–87. Dayhoff M. O. and Eck R. V. (1968) Atlas of protein sequence and structure. National Biomedical Research Foundation, Silver Spring, Maryland. Derissen J. L., Endeman H. J. and Peerdeman A. F. (1968) The crystal and molecular structure of L-aspartic acid. Acta Crystallogr. B 24, 1349–1354. Di Lorenzo M. L., Zhang G., Pyda M., Lebedev B. V. and Wunderlich B. (1999) Heat capacity of solid-state biopolymers by thermal analysis. J. Polymer Sci. B Polymer Phys. 37, 2093– 2102. Di Lullo G. A., Sweeney S. M., Ko¨rkko¨ J., Ala-Kokko L. and San Antonio J. D. (2002) Mapping the ligand-binding sites and disease-associated mutations on the most abundant protein in the human, type I collagen. J. Biol. Chem. 277, 4223–4231. Diaz E. L., Domalski E. S. and Colbert J. C. (1992) Enthalpies of combustion of glycylglycine and DL-alanyl-DL-alanine. J. Chem. Thermodyn. 24, 1311–1318. Dick J. M. (2008) Calculation of the relative metastabilities of proteins using the CHNOSZ software package. Geochem. Trans. 9, 10. Dick J. M., LaRowe D. E. and Helgeson H. C. (2006) Temperature, pressure and electrochemical constraints on protein speciation: group additivity calculation of the standard molal thermodynamic properties of ionized unfolded proteins. Biogeosciences 3, 311–336. Domalski E. S. (1972) Selected values of heats of combustion and heats of formation of organic compounds containing the elements C, H, N, O, P, and S. J. Phys. Chem. Ref. Data 1, 221–277. Domalski E. S. and Hearing E. D. (1988) Estimation of the thermodynamic properties of hydrocarbons at 298.15 K. J. Phys. Chem. Ref. Data 17, 1637–1678. Domalski E. S. and Hearing E. D. (1993) Estimation of the thermodynamic properties of C–H–N–O–S–halogen compounds at 298.15 K. J. Phys. Chem. Ref. Data 22, 805–1159. Domalski E. S. and Hearing E. D. (1996) Heat capacities and entropies of organic compounds in the condensed phase. J. Phys. Chem. Ref. Data 25, 1–525. Drebushchak V. A., Kovalevskaya Y. A., Paukov I. E. and Boldyreva E. V. (2008) Low-temperature heat capacity of diglycylglycine: some summaries and forecasts for the heat capacity of amino acids and peptides. J. Therm. Anal. Calorim. 93, 865–869. Finegold L. and Kumar P. K. (1981) Specific heat of polyglycine I and II in the temperature interval 150–375 K. Thermochim. Acta 48, 51–59.

89

Harding M. M. and Howieson R. M. (1976) L-leucine. Acta Crystallogr. B 32, 633–634. Harding M. M. and Long H. A. (1968) The crystal and molecular structure of L-cysteine. Acta Crystallogr. B 24, 1096–1102. Hassell J. R. and Birk D. E. (2010) The molecular basis of corneal transparency. Exp. Eye Res. 91, 326–335. Hedges J. I. and Keil R. G. (1995) Sedimentary organic matter preservation: an assessment and speculative hypothesis. Mar. Chem. 49, 81–115. Helgeson H. C., Delany J. M., Nesbitt H. W. and Bird D. K. (1978) Summary and critique of the thermodynamic properties of rock-forming minerals. Am. J. Sci. 278, 1–229. Helgeson H. C., Owens C. E., Knox A. M. and Richard L. (1998) Calculation of the standard molal thermodynamic properties of crystalline, liquid, and gas organic molecules at high temperatures and pressures. Geochim. Cosmochim. Acta 62, 985–1081. Hirokawa S. (1955) A new modification of L-glutamic acid and its crystal structure. Acta Crystallogr. 8, 637–641. Huffman H. M. (1941) Thermal data XIV. The heat capacities and entropies of some compounds having the peptide bond. J. Am. Chem. Soc. 63, 688–689. Hughes E. W. and Moore W. J. (1949) The crystal structure of bglycylglycine. J. Am. Chem. Soc. 71, 2618–2623. Hutchens J. O., Cole A. G. and Stout J. W. (1963) Heat capacities from 11 to 305oK, entropies, and free energies of formation of L-valine, L-isoleucine, and L-leucine. J. Phys. Chem. 67, 1128– 1130. Hutchens J. O., Cole A. G. and Stout J. W. (1964) Heat capacities and entropies of L-cystine and L-methionine. J. Biol. Chem. 239, 591–595. Hutchens J. O., Cole A. G. and Stout J. W. (1969a) Heat capacities from 11 to 305° K and entropies of hydrated and anhydrous bovine zinc insulin and bovine chymotrypsinogen A. J. Biol. Chem. 244, 26–32. Hutchens J. O., Cole A. G. and Stout J. W. (1969b) Heat capacities from 11 to 305° K, entropies, and free energies of formation of glycylglycine. J. Biol. Chem. 244, 33–35. Hynes R. O. (2009) The extracellular matrix: not just pretty fibrils. Science 326, 1216–1219. Kelley K. K. (1934) Contributions to the Data on Theoretical Metallurgy II. High-Temperature Specific-Heat Equations for Inorganic Substances. United States Burean of Mines, Wasington DC. Killian C. E. and Wilt F. H. (2008) Molecular aspects of biomineralization of the echinoderm endoskeleton. Chem. Rev. 108, 4463–4474. Klapper M. H. (1971) On the nature of the protein interior. Biochim. Biophys. Acta 229, 557–566. Knicker H., del Rı´o J. C., Hatcher P. G. and Minard R. D. (2001) Identification of protein remnants in insoluble geopolymers using TMAH thermochemolysis/GC-MS. Org. Geochem. 32, 397–409. Knicker H. and Hatcher P. G. (1997) Survival of protein in an organic-rich sediment: possible protection by encapsulation in organic matter. Naturwissenschaften 84, 231–234. Koch M. H. J. and Germain G. (1970) Structure cristalline de de´rive´s d’acides amine´s I. L-alanylglycine. Acta Crystallogr. B 26, 410–417. Kundort C. E. and Richards F. M. (1988) Effect of hydrostatic pressure on the solvent in crystals of hen egg-white lysozyme. J. Mol. Biol. 200, 401–410. LaRowe D. E. and Helgeson H. C. (2006a) Biomolecules in hydrothermal systems: calculation of the standard molal thermodynamic properties of nucleic-acid bases, nucleosides,

90

D.E. LaRowe, J.M. Dick / Geochimica et Cosmochimica Acta 80 (2012) 70–91

and nucleotides at elevated temperatures and pressures. Geochim. Cosmochim. Acta 70, 4680–4724. LaRowe D. E. and Helgeson H. C. (2006b) The energetics of metabolism in hydrothermal systems: calculation of the standard molal thermodynamic properties of magnesium-complexed adenosine nucleotides and NAD and NADP at elevated temperature and pressures. Thermochim. Acta 448, 82–106. LaRowe D. E. and Van Cappellen P. (2011) Degradation of natural organic matter: a thermodynamic analysis. Geochim. Cosmochim. Acta 75, 2030–2042. Lo¨we J., van den Ent F. and Amos L. A. (2004) Molecules of the bacterial cytoskeleton. Ann. Rev. Biophys. Biomol. Struct. 33, 177–198. Madden J. J., McGandy E. L., Seeman N. C., Harding M. M. and Hoy A. (1972) The crystal structure of the monoclinic form of L-histidine. Acta Crystallogr. B 28, 2382–2389. Maier C. G. and Kelley K. K. (1932) An equation for the representation of high-temperature heat content data. J. Am. Chem. Soc. 54, 3243–3246. Mann K., Wilt F. H. and Poustka A. J. (2010) Proteomic analysis of sea urchin (Strongylocentrotus purpuratus) spicule matrix. Proteome Sci. 8, 1–12. Margolis H. C., Beniash E. and Fowler C. E. (2006) Role of macromolecular assembly of enamel matrix proteins in enamel formation. J. Dental Res. 85, 775–793. Marsh R. E. (1958) A refinement of the crystal structure of glycine. Acta Crystallogr. 11, 654–663. Marshall R. C., Orwin D. F. G. and Gillespie J. M. (1991) Structure and biochemistry of mammalian hard keratin. Electron Microsc. Rev. 4, 47–83. Mayer L. M. (1994a) Relationships between mineral surfaces and organic carbon concentrations in soils and sediments. Chem. Geol. 114, 347–363. Mayer L. M. (1994b) Surface area control of organic carbon accumulation in continental shelf sediments. Geochim. Cosmochim. Acta 58, 1271–1284. Mitterer R. M. (1993) The diagenesis of proteins and amino acids in fossil shells. In Organic Geochemistry (eds. M. H. Engel and S. A. Macko). Plenum, New York, pp. 739–753. Mongenot T., Riboulleau A., Garcette-Lepecq A., Derenne S., Pouet Y., Baudin F. and Largeau C. (2001) Occurrence of proteinaceous moieties in S- and O-rich Late Tithonian kerogen (Kashpir oil Shales, Russia). Org. Geochem. 32, 199–203. Morris A. M., Watzky M. A. and Finke R. G. (2009) Protein aggregation kinetics, mechanism, and curve-fitting: a review of the literature. Biochim. Biophys. Acta 1794, 375–397. Mostad A., Nissen H. M. and Rømming C. (1972) The crystal structure of L-tyrosine. Acta Chem. Scand. 26, 3819–3833. Murphy K. P. and Gill S. J. (1991) Solid model compounds and the thermodynamics of protein unfolding. J. Mol. Biol. 222, 699– 709. Murphy K. P., Privalov P. L. and Gill S. J. (1990) Common features of protein unfolding and dissolution of hydrophobic compounds. Science 247, 559–561. Olszta M. J., Cheng X., Jee S. S., Kumar R., Kim Y.-Y., Kaufman M. J., Douglas E. P. and Gower L. B. (2007) Bone structure and formation: a new perspective. Mater. Sci. Eng. R58, 77– 116. Perry C. C. and Keeling-Tucker T. (2000) Biosilicification: the role of the organic matrix in structure control. J. Biol. Inorg. Chem. 5, 537–550. Plyasunova N., Plyasunov A. and Shock E. L. (2004) Database of thermodynamic properties for aqueous organic compounds. Int. J. Thermophys. 25, 351–360.

Riboulleau A., Mongenot T., Baudin F., Derenne S. and Largeau C. (2002) Factors controlling the survival of proteinaceous material in Late Tithonian kerogens (Kashpir Oil Shales, Russia). Org. Geochem. 33, 1127–1130. Richard L. and Helgeson H. C. (1998) Calculation of the thermodynamic properties at elevated temperatures and pressures of saturated and aromatic high molecular weight solid and liquid hydrocarbons in kerogen, bitumen, petroleum, and other organic matter of biogeochemical interest. Geochim. Cosmochim. Acta 62, 3591–3636. Robbins L. L., Muyzer G. and Brew K. (1993) Macromolecules from living and fossil biominerals: iplications for the establishment of molecular phylogenies In Organic Geochemistry (eds. M. H. Engel and S. A. Macko). Plenum, New York. pp. 799– 816. Rodante F. (1989) Thermodynamics of the “standard” a-amino acids in water at 25° C. Thermochim. Acta 149, 157–171. Roles K. A. and Wunderlich B. (1991) Heat capacities of solid poly(amino acids). I. Polyglycine, (poly(L-alanine), and poly(Lvaline). Biopolymers 31, 477–487. Roles K. A. and Wunderlich B. (1993) Heat capacities of solid copoly(amino acid)s. J. Polymer Sci. B Polymer Phys. 31, 279– 285. Roles K. A., Xenopoulos A. and Wunderlich B. (1993) Heat capacities of solid poly(amino acid)s. II. The remaining polymers. Biopolymers 33, 753–768. Roychowdhury P. and Basak B. S. (1975) The crystal structure of indole. Acta Crystallogr. B 31, 1559–1563. Sabbah R. and Minadakis C. (1981) Thermodynamique de ´ tude thermochimique de la L-cyste´ine substances soufre´es. II. E et de la L-me´thionine. Thermochim. Acta 43, 269–277. Shoemaker D. P., Barieau R. E., Donohue J. and Lu C. (1953) The crystal structure of DL-serine. Acta Crystallogr. 6, 241–256. Shoemaker D. P., Donohue J., Schomaker V. and Corey R. B. (1950) The crystal structure of LS-threonine. J. Am. Chem. Soc. 72, 2328–2349. Shoulders M. D. and Raines R. T. (2009) Collagen structure and stability. Ann. Rev. Biochem. 78, 929–958. Simpson H. J. and Marsh R. E. (1966) The crystal structure of Lalanine. Acta Crystallogr. 20, 550–555. Sridhar B., Srinivasan N., Dalhus B. and Rajaram R. K. (2002) A triclinic polymorph of L-argininium chloride. Acta Crystallogr. Sect. E58, 747–749. Steiner T. (2002) The hydrogen bond in the solid state. Angew. Chem. Int. Edit. 41, 48–76. Stickle D. F., Presta L. G., Dill K. A. and Rose G. D. (1992) Hydrogen bonding in globular proteins. J. Mol. Biol. 226, 1143–1159. Sudo S., Fujikawa T., Nagakura T., Ohkubo T., Sakaguchi K., Tanaka M., Nakashima K. and Takahashi T. (1997) Structures of mollusc shell framework proteins. Nature 387, 563–564. Torii K. and Iitaka Y. (1970) The crystal structure of L-valine. Acta Crystallogr. B 26, 1317–1326. Torii K. and Iitaka Y. (1971) The crystal sturcture of L-isoleucine. Acta Crystallogr. B 27, 2237–2246. Tsuzuki T., Harper D. O. and Hunt H. (1958) Heats of combustion. VII. The heats of combustion of some amino acids. J. Phys. Chem. 62, 1594. Wade R. H. (2009) On and around microtubules: an overview. Mol. Biotechnol. 43, 177–191. Weiner S. and Addadi L. (1997) Design strategies in mineralized biological materials. J. Mater. Chem. 7, 689–702. Wierzbicka-Patynowski I. and Schwarzbauer J. E. (2003) The ins and outs of fibronectin matrix assembly. J. Cell Sci. 116, 3269– 3276.

Thermodynamic properties of crystalline peptides Wright B. A. and Cole P. A. (1949) Preliminary examination of the crystal structure of L-proline. Acta Crystallogr. 2, 129–130. Yeates T. O., Kerfield C. A., Heinhorst S., Cannon G. C. and Shively J. M. (2008) Protein-based organelles in bacteria: carboxysomes and related microcompartments. Nat. Rev. Microbiol. 6, 681–691. Zhang C. and Zhang R. (2006) Matrix proteins in the outer shells of molluscs. Mar. Biotech. 8, 572–586.

91

Zhang G., Gerdes S. and Wunderlich B. (1996) Heat capacities of solid, globular proteins. Macromol. Chem. Phys. 197, 3791– 3806. Associate editor: Dimitri A. Sverjensky