509
Energy functions
Gordon *, Shannon
D Benjamin Recent
successes
promise acid
in protein
of computational
energy
expressions
optimized molecular
design
for target
for design mechanics
differ and
in Structural
force
the rely on
of different
structures.
Chemical Pasadena.
A Marshall*
methods
amtno
The force
from those typically molecular dynamics
IHoward Hughes Medical institute, lnstttute of Technology, MC 147-75, e-mall:
[email protected]
Opinion
Illustrated
These
the quality
protein
Addresses *DIVISION of Chemistry and of Technology, MC 147-75.
Current
have
approaches.
to evaluate
sequences
for protein design
California USA
lnstltute
Division of Biology, California Pasadena, CA 91125, USA;
Biology
1999.
also
9:509-513
Ltd
ISSN
cvise
dccomposahle
pr0tcin
;cpproach for
for
3 desired
that
protein
represents
tleties. c;ich
the
iTrtictIIrc.
[ 1’1. A potential
;ts tarxct
of stochastic
of amino energy
factors.
as well
h;i\c with
detail sidechain the :md
fiscd
two
flcxibiiitv
function.
function
arc
of amino as ranked
the
expcrimen-
inti)
terms fi\,e
describing lentiy
that
broad the bonded.
considered
have
been
categories.
nest.
‘I%r finally
briefly
energies
and
which are mechanics
computed differently force fields.
Force-field
used First.
pac~king among Nonbonded examine
designed to the
thus we
t>r.
the
cerrns
to correlate
field
must
experimental
model
the
the
design
problems energy
free
and proteins
state.
he pair-
with that
are
the
internal
fall COWare
coordinate and
in typical
stability
of proteins.
entropy, molecular
Design
II;II
strucTure:
are
considered
assiinied
nonbonded to
to
be
eclualiy
are
[7]
indicate
Hecause
and
of
of the
demands
[ 111, are the
Similarly, cluite
not
de\ eloped
effective
for
the
are 3s
llnfolded
years,
been
constructed.
hccn ha\xe
used been
design
by the the
first Very
in these evaluvted and
potential
\Tili
experimenr:ll
the
b:liancr for
enerfiy
results.
fields
studies is ncccsmust be
function.
tailored
for
potentials
that fields
properly
force fields, however, throu,@ ;I comparison
force
systematic
comprising
fields
force
mcchan[9.10]
compatibility
energy
experimental
design
pair
aensitivitv new force that
few
the
appropriate
derived
potential force
design, molecular [8], AMHER
in srructiirc
protein
described
Ed.
sidechains 3re modeled
necessarily
statisrically
1121 do not manifest rhc strllcrllrai sary for protein design. Instead.
few
rcsid-
sidechains
in
by protein
used to perform as (XIAKM~l
design.
p;Ist
nc~
among
all sequences
posed
widely such
Lund I~KKIDIN(~ 3rc
the
however.
isoenerge-tic.
fields that are its calculations,
that
strnc-
properties
inrcrdctions
probable
force folded
residual
be insignificant, the solvated. 2Ii rnc;imerY
fully
intend-
as the
have
can
the
studies
alter
large
terms
as well
sometimes
may
mosT
terms
of folding,
theoretical
can
computational
potentials
energy
Laria-
rotamers ‘l-he force
example.
energy
recluire
unfolded
mutations
and
the l:or
by design
the
Experimental
protein
requirements
the
produced with
Ilnfolded
dictions
energies
that :uc not interactions
uolvation than
‘The
discuss
atoms polar stirl.ey
development the potential
Protein design presents a demanding task for a potential energy function. Design potentials must be sensitive to the subtle changes in amino acid identity that are known to perturb
rhat
and
energies
cd
factor
implemcntcd.
‘I’hc purpose of this re\.iew is to discuss the of protein design force fields 3nd ro survey energy
;Icid.
Agorithms
tally determined stability and structure of the proteins are analyzed 3nd rational improvements potential
iising called
amino
se:lrch
I;in;tlly,
exccpby
ofcach
deterministic
hv
energy
sub-
protein
notable
is inrrodiiced conformations.
the optimal combination for the target structure,
potenti
demand
quickly.
c:llculated
with design.
tx
being
function 3s the
used
[61 arc then used to find :tcid sidechain rotnmer\ the
:tcids
is i~cctl to predict the energl; of secluence for 3 target protein
structures.
IS], to represent
dosed-kmp
sequence
efforts
tions [.2,.1,-l”~. Atomic-lcvcl statisticallysignificant roramers
general.
fold
design
.I varier);
;I
optimal
dominant
(1urrent
backbones
is
rho
of protein stability possible :mino acid
to small
iinfolded state ensemble. In design calculations. the iinfolded state is commoniy assumed to have
design
finding
sensitive
compatible
complexity
that 0959-440X
be o\crl?;
protein
c~ombinatorial
state.
Introduction (:omputation31
be of
algorithms
turc Science
not
must
search
As the
http:/lbiomednet.comlelecref/095944OXOO900509 c Elsevler
should
irccluiremcnrs
used in calculations.
Engineering, CA 91125,
fields
L Mayo’
rions in rotamer geometry, however, as discrete ;Irc used to model sidech:iin conformations. field
fields
and Stephen
each
Over
the
design
have
terms
have
and e\‘en fewer of design pre-
t:uture
progress
be realized
by
\,alidation
of
the
in
contin-
the
terms
function.
p(Jtentid
van der Waals Packing core
specificity
is critical
calclliations,
stlldics. sufficient
which
;I force field to design
packing
can
distance
restraints
be
der Waals potential. basis for sidechain native-like selecting
to protein
comprise
that models well-folded
evaluated ‘I’his packing
folded states against disordered
majority
For
protein
of
design
only packing specificity proteins [l.%-16j. Although
exclusively
[ 171, most
design.
the
design potential specificity,
using programs provides thereby
with well-organized or molten-globule
is
interatomic utilize
a van
a physical favoring cores states.
and The
510
Engineering
Figure
and design
Hydrogen dependent,
1
bonds are typically represented 12-10 hydrogen-bond potential:
by an angle-
1
e
H
\.i
where R,, is the equilibrium distance, D, is the well depth and K is the interatomic distance between the donor and acceptor heavy atoms. The angle-dependence term, F( 8), is typically cos”8. where 8 is the donor-hydrogen-acceptor angle.
@ -A 0 =c
,,,,r\\” \
L ~~~ ~~-~----.-.-
._~...-.. -.-. .-..- ~.
Current
I” Structural ..- .-.Op~mon ..-. -...-------
Biology
_1
An example of a nonphysical hydrogen-bond geometry that can be selected when a hydrogen-bond potential that is dependent only on 0 is used for protein design. A more restrictive hydrogen-bond potential, described in Equations (3) through (6), correctly predicts that no favorable interaction is present because 4 = 90’.
van der Waals energy, /‘:C,n,-. is typically 1.ennard-Jones 1 Z-6 expression:
calculated
using a
We have observed that calculations performed with the above potential will allow rotameric arrangements with nonphysical hydrogen-bond geometries, as shown in Figure 1. ‘lb circumvent this problem, we employ more restrictive hybridization-dependent angle-dependence terms that enforce reasonable geometries [18]:
s-f+ dhttor - .spj ampor
b‘ = cosz ecos~ (Q-1 09.5) e > YO”, qkl0Y.S”
.sp’ honor - .sp’ mcepior
b‘ = cost &os~ cp
(1)
< YO”
(3) (4)
d, > YO” ‘I’he interatomic distance, K. is computed from atomic coordinates. ‘I’hc equiiihrium radii. K,,, and well depths, II,,. are parameters that are defined within each force field. ‘live examinations of van der Weals parameters underscore the need to tune molecular mechanics potential functions to protein design. Lazar and co-workers [16] compared the predictive ability of variations of Hagler and AhlHEK \‘an der Waals parameters for a scf of ubiquitin variants with redesigned cores. [‘nited-atom parameters from .4hlHKKYS MWC markedly superior to the other \,ariations when IISC~ in conjunction with a dctailcd rotamer library. llahiyat and hlayo [ 1.51 generated scquencrs by systematically varying the scale of the atomic radii. based on the I~K~II)lN~; parameter set. mJ by using rotamcrs bvith explicit hydrahvii atoms. Scaling the radii bv 3 factor 0fO.W achic\ ed the optimal balance between. packing spccit’icitv ;IIld h)-drophohic collapse. as rcprcccntcd by a \olvation Icrm (discussed below).
.‘p-’ htm-
s/t2 ffottor - spy nrcepfor
bonding
and
for
fir11
seqiience
design
[ 19”].
8
(5)
I<‘ = ~0s~ &os2 (max[f#J,q]).(h)
A potential energy term baaed on the above cquations allows only physically reasonable sidechain-sidechain and sidechain-backbone hydrogen bonds. ITnfortunately, using ;I highly restrictive energy term in combination with a discrete rotumer library cawm thr force field to predict poor energies for some sequences that may acruatly form good hydrogen-bond interactions.
Electrostatics stability is subject to t:dvorJble electrostatic interdcfions are not thought t0 be strong enough to compensatc for the energy of desolvation [ZO]. In more extreme conditions, howr\,er. salt bridges may stabilize proreins [Z 1.221. Moreover. rlectrostatics may play a more significant role in defining the specificity, rather than the scabitit!: of folding and of functional interactions 12%261. dcbatc.
As the majority of computational protein design stlldies hale focused on protein cores, electrostatic and hydrogenbonding terms have not been as thoroughly validated b! espcrimcnt. Nekcrtheless. initial forays ha\ e pro\-en these IcrnIs to be useful for the design of helical surfaces [IX]
b’ = cd
‘I’he angles $ and cp refer to the hydrogen-acceptor-lease angle (whcrr the base is the atom covalently attached to the acceptor) and the angle between the normals of the planes defined by the siu atoms attached to the two sp? ccntcrs, respectively.
‘l’hr
Hydrogen
- s/9 fiweptor
role
of
At
electrostatics
modcrdte
in protein
temperatures,
Energy
(:omputational protein design efforts have not yet devcloped an electrostatic term intended to represent these considerations. Rather, electrostatics are used sparingly. destabilizing interactions primarily to guard against between like-charged residues. ‘I’he simplest treatment of electrostatic interactions is based on Coulomb’s law, which describes the energy of two charges, Q, and gti separated by distance R in a medium with dielectric constant I:
functions
Figure ___
2
for protein
design
Gordon, Marshall and Mayo
511
--..-
/--(a)
(W
Our laboratory uses a distance-attenuated version of (:oulomb’s law, with an effective dielectric constant value of WR and partial atomic charges that give a total coulombit energy of approximately +/-1 kcal/mol for the interaction between juxtaposed charged residues. ‘I’hus, electrostatic contriblltions to the total energ); are only significant when charged atoms arc in close proximity. In sharp contrast. electrostatic energy is often the largest conrributor to the total energy in potentials llsed for molecular mechanics and dynamics calculations.
Internal
coordinate
terms
‘1)pical molecular mechanics force fields have terms that evaluate bonds. angles. torsions and inversions among atoms that are covalently attached. ‘l’hese internal coordinate OI‘hondcd’ energies must be considered when generating rotamcrs or modifying the protein backbone and. in some GIWS, have been LISL’C~ for protein design [4”,161. ‘I’hc usefirlness of these terms for design, however. has not been rigorously demonstrdrcd. iIs rcjtamers dcri\,ed from the statistical analysis of protein structure databases generally have good internal coordinate rncrgiea. many desijin potential functions do not include them at all.
Solvation 13ecausc the hydrophobic effect dri\-es protein folding 171. modeling solvation effects is critical to a protein design force field. ‘l’he computational expense of explicitly modeling protciil-S(Jh’enc interactions for at1 the sequences under prohibitively expensive. consideration is. holrevcr. ‘I’hereforc, several groups have employed approximate mrthods utilizing octdnol-water and gas-\vater free energy of transfer data for each amino acid [28.29]. ‘l’he experimentally mcasurcd free energies of transfer are correlated with the mrJccular surface area [.iO], as shown in 1:igure 2. ‘I’hesr cnergicc are either used directly for residues in the protein core [Al ] or they are scaled by the change in the solvent-exposed surface area that is associated with protein folding (14,.X].
‘I‘hc cncrgy required to transfer a sidcchain from a solvated, unfolded protein to a partially or completeI\ desotvated position in the folded protein is not necessaril) the same as the transfer cncrgy from water to gas or to a nonpolar solvent. Hut. the approximate linear relationship
I Cc)
I- _ ~~~_-_ _~~--~ ~_ _ _ - ---.-~.cs!L!E!~~~Pm sh!ctur~lB~~ Pairwise calculation of buried surface areas. (a) Unfolded or referenceexposed surface areas of two sidechain rotamers. (b) Folded, exposed surface area for the rotamer pair. (c) Burled surface area for the rotamer pair, calculated by subtracting (b) from (a).
between transfer energy and the change in surface area should be correct for both cases. IIahiyat and hlayo [ 111 determined rhe optimal WIIKS for polar and nonpolar atomic salvation paramrtrrs hv fitring them to the expcrimentally determined stability of designed proteins. ‘I’hc inclusion of a hydrophobic burial benefit and a polar burial pcnalt~ in the protein design force field provides a significant improvement in the predictive power compared with 3 force field with onlv a vsn dcr Weals term. ‘Ike other considerations have affected the formulation of 3 protein design solvation potential. ITirsr, a negative design term that penalizes the exposure of nonpolar surface area is sometimes used [ 1 &X3]. Although nonpolar exposure should not destahilizc a protein, it can lead to aggregation or misfolding. Therefore, a nonpolar exposure penalty is required to limit the amount of exposed, nonpolar surface area at boundary and surface positions [M’]. Second, man); optimization algorithms require that energy terms bc pairwise decomposable, but the pairwise calculation of buried surface areas leads to significant overcounting. Street and 1Iayo [35J have developed a pairwise exprtzision with one scJahlc pardmeter that closely reproduces both the true buried area and the true exposed sol\,ent-accessible sttrfidcc areas.
Entropy A simple entropy term is sometimes incorporated into protein design potential functions [.11,32]. The change in
512
Engineering
and
sidechain entrop!~ in the nuinixr of that
design
upon folding is modeled as thr change r(Jtad>lc btJndS. making the ~~SsllnlptiOn
coilfori7iation31
folded either
state.
freedom
‘I’hc
unfolrlrd
1,~ assuming
latcd
or by
fixing
that
is complctcly state
all the
restricted
entropies
are
rot;lmers
to semi-empirical
arc estimates
in the
eqt~ally popu[SJ]. ‘I’he
of 3n entropy rum based on the number of rotatable I~oncls did not significantly improve rhc correlarion Ixtwccn rhc prtxiictcd and ObSctTed stabilities of the CX Ii\+1 coiletl-coil tort‘ [ 1-l]. ‘l‘his simple model for cntrop> ma\ ha\c f;riletl bec~~ux ir neglects residual sidcchain cntrop in foldccl proteins and possible t-esidtial \trticTttrc in rhr unfolded sr3tc.
References Papers of particular have been highllghted “of
and recommended Interest, as.
published
withln
reading the annual
period
3.
Desjarlals modification
7.
DIII KA, Shortle D: Denatured 1991, 60:795-825.
8.
Brooks BR, Bruccolerl RE, Olafson BD, States DJ, Swaminathan S, Karplus M: CHARMM: a program for macromolecular energy, minimization, and dynamics calculations. J Comput Chem 1983, 4:187-217.
9.
Wetner SJ, Kollman PA, Case DA, Singh UC, Ghto C, Alagona G, Profeta SJ, Weiner P: A new force field for molecular mechanical simulation of nucleic acids and proteins. J Am Chem Sot 1984. 106:765-784.
10.
Cornell WD, Cieplak P, Bayly Cl. Gould IR, Merz KM Jr, Ferguson DM. Spellmeyer DC, For T, Caldwell JW, Kollman PA: A secondgeneration force field for the simulation of proteins, nucleic acids, and organic molecules. J Am Chem Sot 1995, 117:5179-5197.
11.
Mayo SL, Olafson field for molecular
12.
Bowie JU, Luthy R, Elsenberg D: A method to identify sequences that fold into a known three-dimensional Science 1991, 253:164-l 70
13
Desjarlals of proteins.
14.
Dahlyat BI, Mayo 5:895-903
15.
Dahlyat protein
16.
Lazar GA. Deqarlais hydrophobic core
1 7.
Jiang X, Bishop EJ. FarId RS: A de nova designed properties that characterize natural hyperthermophilic J Am Chem Sot 1997,119:838-839.
18.
Dahiyat positions
19 .. This datlon uses
Dahlyat BI, Mayo SL: De nova protein design: fully automated sequence selection. Soence 1997. 278:82-87. paper describes the first fully automated design and expenmental vailof a novel sequence for an entire protein. The force field employed several of the energy functions described in this revfew.
20.
Hendsch continuum
21.
Elcock AH: implications 284:489-502.
22.
de Bakker PIW, Hunenberber PH. McCammon JA: Molecular dynamics simulations of the hyperthermophilic protein Sac7d from Sulfolobus acidocalcfarius: contribution of salt bridges to thermostability. J MO/ Biol 1999, 285:181 l-1 830
23.
Lumb KJ, Kim PS: A buried uniqueness in a designed 1995. 34:8642-8648.-
24.
Schneider JP, Lear in a heterodimeric 119:5742-5743.
25.
Smdelar protein
26.
Spek EJ, Bui AH, Lu M, Kallenbach stabilize the GCN4 leucine zipper.
27.
Dill KA: Dominant 29:7133-7155.
28.
Faxhere J-L, Plicska V: Hydrophobic side-chains from the partitioning Eur J Med Chem 1983, 18369-375.
29.
001 T, Oobatake M, Nementhy G. Scheraga areas as a measure of the thermodynamic
of review.
of special Interest outstanding Interest
1. Street AG. Mayo SL, Computational protein design. Structure . 1999, 7:R105-RI 09. This review surveys the computational pnnclples of the protein descgn lncludtng residue class+flcatlon, negative design and discretizatlon sIdechaIn conformatlons. 2.
6.
forward
I’rotein design force fields h;1vc Ixcn succcasfnl, in part. Ixx2usc of their strinjicnq. Rcsrricti\,c functions. swh 3s rhc I an der \i’aals und rhc hvl)ridi;lation-dcpentfent hvdroin ptrcictilar. rcbiilr in 3 \wy high gen-bond potcntialz. false-ncgatil,c raw. rciu:lioii r3Tc and a siK:nificant Forrtrnsrel~. mm) design force field5 al50 slio\\ a low f~rlxptl\iti\x raw. ‘l‘hcrcforc, sccliicncc~ that ;irc selcctcd in pt-otcin de5i,gtl srtidics rend to fold pi-o~Jcrl\, C\‘CII though many othct ccl113ll\ accept:ilJlc wq~~ct~ccs :IIC rejcctd.
l
Ponder packing different
calculated
inclusion
Looking
5.
Harbury PB, Tudor B, Kim PS: Repacking backbone freedom: structure prediction Acad So USA 1995. 92:8408-8412.
protein cores with for coiled coils. Proc
SU A, Mayo SL: Coupling backbone flexibility sequence selection in protein design. Profe,n 6~1701-1707,
and So
amino 1997,
cycle, of
Nafl
acid
4. ..
Harbury PB, Plecs JJ, Tudor 8, Alber T, Kim PS: High-resolution protein design with backbone freedom. Science 1998, 282:1462-1467, The de now design of an unnatural fold, a right-handed coiled coll, accomplished using a computation that incorporates both sidechaln backbone fiewbillty.
was and
JW, Richards FM: Tertiary templates for proteins criteria in the enumeration of allowed sequences structural classes. J MO/ Viol 1987, 193:775-791. JR, Clarke ND: and design.
Computer Curr Opin states
BD, Goddard simulations.
JR, Handel Profem
search Struct
of proteins.
WA ill: Dreiding J Whys Chem
TM: De nova design Sci 1995, 4:2006-2018.
SL: Protein
algorithms No/ 1998,
design
Annu
- use of for
in protein 8:471-475. Rev
- a generic force1990, 94:8897-8909.
of the
automation.
protein structure.
hydrophobic Profein
cores So
BI, Mayo SL: Probing the role of packing specificity design. Proc Nat/ Acad Sci USA 1997, 94:10172-l
BI, Gordon of protein
JR, Handel of ubiquitin.
DB, Mayo helices.
in 0177.
protein
with proteins.
SL: Automated design of the surface Profein So 1997. 6:13331337.
The stability of salt for hyperthermophilic
bridges
proteins - a 1994, 3:21 l-226
at high temperatures: proteins. J MO/ Biol 1998,
polar interaction heterodimeric
JD, DeGrado coiled coil.
forces
1996.
TM, De nova design of the Protean So 1997, 6: 1 167-I 178.
ZS, Tudor B: Do salt bridges stabilize electrostatic analysis. Pi&e/n So
CV, Hendsch structure and
B/o&em
imparts structural coiled coil. Bfochemlstry
WF: A designed J Am Chem Sot
buried 1997,
ZS, Tudor B: Effects of salt bridges design. Protein SC; 1998, 7:1898-l
in protein
salt
bridge
on 914.
NR: Surface salt bridges Protein SC; 1998, 7:2431-2437. folding.
Bochemistry
1990,
parameters of amino-acid of N-acetyl-amino-acid amides. HA: Accessible parameters
surface of
Energy
hydration of peptides. 84:3086-3090.
Nat/ Acad
Wesson molecular 11227-235
31.
Hellrnga HW, Richards FM: Optimal sequence of known structure by simulated evolution. USA 1994, 91:5803-5807 Kono H, Nrshiyama hydrophobic core based on side-chain
D: Atomic of proteins
SC/ USA
30.
32.
L, Ersenberg dynamics
Proc
M, Tanokura of Thermus packing.
33.
1987
solvation parameters in solution. Protean
selection Proc Nafl
applied Sci 1992,
to
in protein Acad So
M, Doi J: Designing the flaws malate dehydrogenase Protein Eng 1998. 11:47-52.
functions
for
protein
design
Gordon,
Sun S. Brem R, Chan HS, Dill KA: Designing fold with good hydrophobic cores. Protein
Marshall
and
Mayo
amino acid sequences to Eng 1995,8:1205-l 213.
34. Malakauskas SM, Mayo SL: Design, structure and stability . hyperthermophilic protein variant. Nat Struck Biol 1998, The authors impart hyperthermal stability to a mesophilic protein principles of computational protein desrgn to amino acid residues at the boundary between the protein core and the surface. 35.
Street AG, Mayo SL: Pairwise calculation accessible surface areas. Fold Des 1998,
36.
Sternberg entropy empirical
513
of protein 3:253-258.
of a 5:470-475. by applyrng posrtioned
solvent
MJE, Chickos JS: Protein side-chain conformational derived from fusion data - comparison with other scales. Protein Eng 1994, 7:i 49-l 55.