Energy functions for protein design

Energy functions for protein design

509 Energy functions Gordon *, Shannon D Benjamin Recent successes promise acid in protein of computational energy expressions optimized mol...

492KB Sizes 29 Downloads 56 Views

509

Energy functions

Gordon *, Shannon

D Benjamin Recent

successes

promise acid

in protein

of computational

energy

expressions

optimized molecular

design

for target

for design mechanics

differ and

in Structural

force

the rely on

of different

structures.

Chemical Pasadena.

A Marshall*

methods

amtno

The force

from those typically molecular dynamics

IHoward Hughes Medical institute, lnstttute of Technology, MC 147-75, e-mall: [email protected]

Opinion

Illustrated

These

the quality

protein

Addresses *DIVISION of Chemistry and of Technology, MC 147-75.

Current

have

approaches.

to evaluate

sequences

for protein design

California USA

lnstltute

Division of Biology, California Pasadena, CA 91125, USA;

Biology

1999.

also

9:509-513

Ltd

ISSN

cvise

dccomposahle

pr0tcin

;cpproach for

for

3 desired

that

protein

represents

tleties. c;ich

the

iTrtictIIrc.

[ 1’1. A potential

;ts tarxct

of stochastic

of amino energy

factors.

as well

h;i\c with

detail sidechain the :md

fiscd

two

flcxibiiitv

function.

function

arc

of amino as ranked

the

expcrimen-

inti)

terms fi\,e

describing lentiy

that

broad the bonded.

considered

have

been

categories.

nest.

‘I%r finally

briefly

energies

and

which are mechanics

computed differently force fields.

Force-field

used First.

pac~king among Nonbonded examine

designed to the

thus we

t>r.

the

cerrns

to correlate

field

must

experimental

model

the

the

design

problems energy

free

and proteins

state.

he pair-

with that

are

the

internal

fall COWare

coordinate and

in typical

stability

of proteins.

entropy, molecular

Design

II;II

strucTure:

are

considered

assiinied

nonbonded to

to

be

eclualiy


are

[7]

indicate

Hecause

and

of

of the

demands

[ 111, are the

Similarly, cluite

not

de\ eloped

effective

for

the

are 3s

llnfolded

years,

been

constructed.

hccn ha\xe

used been

design

by the the

first Very

in these evaluvted and

potential

\Tili

experimenr:ll

the

b:liancr for

enerfiy

results.

fields

studies is ncccsmust be

function.

tailored

for

potentials

that fields

properly

force fields, however, throu,@ ;I comparison

force

systematic

comprising

fields

force

mcchan[9.10]

compatibility

energy

experimental

design

pair

aensitivitv new force that

few

the

appropriate

derived

potential force

design, molecular [8], AMHER

in srructiirc

protein

described

Ed.

sidechains 3re modeled

necessarily

statisrically

1121 do not manifest rhc strllcrllrai sary for protein design. Instead.

few

rcsid-

sidechains

in

by protein

used to perform as (XIAKM~l

design.

p;Ist

nc~

among

all sequences

posed

widely such

Lund I~KKIDIN(~ 3rc

the

however.

isoenerge-tic.

fields that are its calculations,

that

strnc-

properties

inrcrdctions

probable

force folded

residual

be insignificant, the solvated. 2Ii rnc;imerY

fully

intend-

as the

have

can

the

studies

alter

large

terms

as well

sometimes

may

mosT

terms

of folding,

theoretical

can

computational

potentials

energy

Laria-

rotamers ‘l-he force

example.

energy

recluire

unfolded

mutations

and

the l:or

by design

the

Experimental

protein

requirements

the

produced with

Ilnfolded

dictions

energies

that :uc not interactions

uolvation than

‘The

discuss

atoms polar stirl.ey

development the potential

Protein design presents a demanding task for a potential energy function. Design potentials must be sensitive to the subtle changes in amino acid identity that are known to perturb

rhat

and

energies

cd

factor

implemcntcd.

‘I’hc purpose of this re\.iew is to discuss the of protein design force fields 3nd ro survey energy

;Icid.

Agorithms

tally determined stability and structure of the proteins are analyzed 3nd rational improvements potential

iising called

amino

se:lrch

I;in;tlly,

exccpby

ofcach

deterministic

hv

energy

sub-

protein

notable

is inrrodiiced conformations.

the optimal combination for the target structure,

potenti

demand

quickly.

c:llculated

with design.

tx

being

function 3s the

used

[61 arc then used to find :tcid sidechain rotnmer\ the

:tcids

is i~cctl to predict the energl; of secluence for 3 target protein

structures.

IS], to represent

dosed-kmp

sequence

efforts

tions [.2,.1,-l”~. Atomic-lcvcl statisticallysignificant roramers

general.

fold

design

.I varier);

;I

optimal

dominant

(1urrent

backbones

is

rho

of protein stability possible :mino acid

to small

iinfolded state ensemble. In design calculations. the iinfolded state is commoniy assumed to have

design

finding

sensitive

compatible

complexity

that 0959-440X

be o\crl?;

protein

c~ombinatorial

state.

Introduction (:omputation31

be of

algorithms

turc Science

not

must

search

As the

http:/lbiomednet.comlelecref/095944OXOO900509 c Elsevler

should

irccluiremcnrs

used in calculations.

Engineering, CA 91125,

fields

L Mayo’

rions in rotamer geometry, however, as discrete ;Irc used to model sidech:iin conformations. field

fields

and Stephen

each

Over

the

design

have

terms

have

and e\‘en fewer of design pre-

t:uture

progress

be realized

by

\,alidation

of

the

in

contin-

the

terms

function.

p(Jtentid

van der Waals Packing core

specificity

is critical

calclliations,

stlldics. sufficient

which

;I force field to design

packing

can

distance

restraints

be

der Waals potential. basis for sidechain native-like selecting

to protein

comprise

that models well-folded

evaluated ‘I’his packing

folded states against disordered

majority

For

protein

of

design

only packing specificity proteins [l.%-16j. Although

exclusively

[ 171, most

design.

the

design potential specificity,

using programs provides thereby

with well-organized or molten-globule

is

interatomic utilize

a van

a physical favoring cores states.

and The

510

Engineering

Figure

and design

Hydrogen dependent,

1

bonds are typically represented 12-10 hydrogen-bond potential:

by an angle-

1

e

H

\.i

where R,, is the equilibrium distance, D, is the well depth and K is the interatomic distance between the donor and acceptor heavy atoms. The angle-dependence term, F( 8), is typically cos”8. where 8 is the donor-hydrogen-acceptor angle.

@ -A 0 =c

,,,,r\\” \

L ~~~ ~~-~----.-.-

._~...-.. -.-. .-..- ~.

Current

I” Structural ..- .-.Op~mon ..-. -...-------

Biology

_1

An example of a nonphysical hydrogen-bond geometry that can be selected when a hydrogen-bond potential that is dependent only on 0 is used for protein design. A more restrictive hydrogen-bond potential, described in Equations (3) through (6), correctly predicts that no favorable interaction is present because 4 = 90’.

van der Waals energy, /‘:C,n,-. is typically 1.ennard-Jones 1 Z-6 expression:

calculated

using a

We have observed that calculations performed with the above potential will allow rotameric arrangements with nonphysical hydrogen-bond geometries, as shown in Figure 1. ‘lb circumvent this problem, we employ more restrictive hybridization-dependent angle-dependence terms that enforce reasonable geometries [18]:

s-f+ dhttor - .spj ampor

b‘ = cosz ecos~ (Q-1 09.5) e > YO”, qkl0Y.S”

.sp’ honor - .sp’ mcepior

b‘ = cost &os~ cp

(1)

< YO”

(3) (4)

d, > YO” ‘I’he interatomic distance, K. is computed from atomic coordinates. ‘I’hc equiiihrium radii. K,,, and well depths, II,,. are parameters that are defined within each force field. ‘live examinations of van der Weals parameters underscore the need to tune molecular mechanics potential functions to protein design. Lazar and co-workers [16] compared the predictive ability of variations of Hagler and AhlHEK \‘an der Waals parameters for a scf of ubiquitin variants with redesigned cores. [‘nited-atom parameters from .4hlHKKYS MWC markedly superior to the other \,ariations when IISC~ in conjunction with a dctailcd rotamer library. llahiyat and hlayo [ 1.51 generated scquencrs by systematically varying the scale of the atomic radii. based on the I~K~II)lN~; parameter set. mJ by using rotamcrs bvith explicit hydrahvii atoms. Scaling the radii bv 3 factor 0fO.W achic\ ed the optimal balance between. packing spccit’icitv ;IIld h)-drophohic collapse. as rcprcccntcd by a \olvation Icrm (discussed below).

.‘p-’ htm-

s/t2 ffottor - spy nrcepfor

bonding

and

for

fir11

seqiience

design

[ 19”].

8

(5)

I<‘ = ~0s~ &os2 (max[f#J,q]).(h)

A potential energy term baaed on the above cquations allows only physically reasonable sidechain-sidechain and sidechain-backbone hydrogen bonds. ITnfortunately, using ;I highly restrictive energy term in combination with a discrete rotumer library cawm thr force field to predict poor energies for some sequences that may acruatly form good hydrogen-bond interactions.

Electrostatics stability is subject to t:dvorJble electrostatic interdcfions are not thought t0 be strong enough to compensatc for the energy of desolvation [ZO]. In more extreme conditions, howr\,er. salt bridges may stabilize proreins [Z 1.221. Moreover. rlectrostatics may play a more significant role in defining the specificity, rather than the scabitit!: of folding and of functional interactions 12%261. dcbatc.

As the majority of computational protein design stlldies hale focused on protein cores, electrostatic and hydrogenbonding terms have not been as thoroughly validated b! espcrimcnt. Nekcrtheless. initial forays ha\ e pro\-en these IcrnIs to be useful for the design of helical surfaces [IX]

b’ = cd

‘I’he angles $ and cp refer to the hydrogen-acceptor-lease angle (whcrr the base is the atom covalently attached to the acceptor) and the angle between the normals of the planes defined by the siu atoms attached to the two sp? ccntcrs, respectively.

‘l’hr

Hydrogen

- s/9 fiweptor

role

of

At

electrostatics

modcrdte

in protein

temperatures,

Energy

(:omputational protein design efforts have not yet devcloped an electrostatic term intended to represent these considerations. Rather, electrostatics are used sparingly. destabilizing interactions primarily to guard against between like-charged residues. ‘I’he simplest treatment of electrostatic interactions is based on Coulomb’s law, which describes the energy of two charges, Q, and gti separated by distance R in a medium with dielectric constant I:

functions

Figure ___

2

for protein

design

Gordon, Marshall and Mayo

511

--..-

/--(a)

(W

Our laboratory uses a distance-attenuated version of (:oulomb’s law, with an effective dielectric constant value of WR and partial atomic charges that give a total coulombit energy of approximately +/-1 kcal/mol for the interaction between juxtaposed charged residues. ‘I’hus, electrostatic contriblltions to the total energ); are only significant when charged atoms arc in close proximity. In sharp contrast. electrostatic energy is often the largest conrributor to the total energy in potentials llsed for molecular mechanics and dynamics calculations.

Internal

coordinate

terms

‘1)pical molecular mechanics force fields have terms that evaluate bonds. angles. torsions and inversions among atoms that are covalently attached. ‘l’hese internal coordinate OI‘hondcd’ energies must be considered when generating rotamcrs or modifying the protein backbone and. in some GIWS, have been LISL’C~ for protein design [4”,161. ‘I’hc usefirlness of these terms for design, however. has not been rigorously demonstrdrcd. iIs rcjtamers dcri\,ed from the statistical analysis of protein structure databases generally have good internal coordinate rncrgiea. many desijin potential functions do not include them at all.

Solvation 13ecausc the hydrophobic effect dri\-es protein folding 171. modeling solvation effects is critical to a protein design force field. ‘l’he computational expense of explicitly modeling protciil-S(Jh’enc interactions for at1 the sequences under prohibitively expensive. consideration is. holrevcr. ‘I’hereforc, several groups have employed approximate mrthods utilizing octdnol-water and gas-\vater free energy of transfer data for each amino acid [28.29]. ‘l’he experimentally mcasurcd free energies of transfer are correlated with the mrJccular surface area [.iO], as shown in 1:igure 2. ‘I’hesr cnergicc are either used directly for residues in the protein core [Al ] or they are scaled by the change in the solvent-exposed surface area that is associated with protein folding (14,.X].

‘I‘hc cncrgy required to transfer a sidcchain from a solvated, unfolded protein to a partially or completeI\ desotvated position in the folded protein is not necessaril) the same as the transfer cncrgy from water to gas or to a nonpolar solvent. Hut. the approximate linear relationship

I Cc)

I- _ ~~~_-_ _~~--~ ~_ _ _ - ---.-~.cs!L!E!~~~Pm sh!ctur~lB~~ Pairwise calculation of buried surface areas. (a) Unfolded or referenceexposed surface areas of two sidechain rotamers. (b) Folded, exposed surface area for the rotamer pair. (c) Burled surface area for the rotamer pair, calculated by subtracting (b) from (a).

between transfer energy and the change in surface area should be correct for both cases. IIahiyat and hlayo [ 111 determined rhe optimal WIIKS for polar and nonpolar atomic salvation paramrtrrs hv fitring them to the expcrimentally determined stability of designed proteins. ‘I’hc inclusion of a hydrophobic burial benefit and a polar burial pcnalt~ in the protein design force field provides a significant improvement in the predictive power compared with 3 force field with onlv a vsn dcr Weals term. ‘Ike other considerations have affected the formulation of 3 protein design solvation potential. ITirsr, a negative design term that penalizes the exposure of nonpolar surface area is sometimes used [ 1 &X3]. Although nonpolar exposure should not destahilizc a protein, it can lead to aggregation or misfolding. Therefore, a nonpolar exposure penalty is required to limit the amount of exposed, nonpolar surface area at boundary and surface positions [M’]. Second, man); optimization algorithms require that energy terms bc pairwise decomposable, but the pairwise calculation of buried surface areas leads to significant overcounting. Street and 1Iayo [35J have developed a pairwise exprtzision with one scJahlc pardmeter that closely reproduces both the true buried area and the true exposed sol\,ent-accessible sttrfidcc areas.

Entropy A simple entropy term is sometimes incorporated into protein design potential functions [.11,32]. The change in

512

Engineering

and

sidechain entrop!~ in the nuinixr of that

design

upon folding is modeled as thr change r(Jtad>lc btJndS. making the ~~SsllnlptiOn

coilfori7iation31

folded either

state.

freedom

‘I’hc

unfolrlrd

1,~ assuming

latcd

or by

fixing

that

is complctcly state

all the

restricted

entropies

are

rot;lmers

to semi-empirical

arc estimates

in the

eqt~ally popu[SJ]. ‘I’he

of 3n entropy rum based on the number of rotatable I~oncls did not significantly improve rhc correlarion Ixtwccn rhc prtxiictcd and ObSctTed stabilities of the CX Ii\+1 coiletl-coil tort‘ [ 1-l]. ‘l‘his simple model for cntrop> ma\ ha\c f;riletl bec~~ux ir neglects residual sidcchain cntrop in foldccl proteins and possible t-esidtial \trticTttrc in rhr unfolded sr3tc.

References Papers of particular have been highllghted “of

and recommended Interest, as.

published

withln

reading the annual

period

3.

Desjarlals modification

7.

DIII KA, Shortle D: Denatured 1991, 60:795-825.

8.

Brooks BR, Bruccolerl RE, Olafson BD, States DJ, Swaminathan S, Karplus M: CHARMM: a program for macromolecular energy, minimization, and dynamics calculations. J Comput Chem 1983, 4:187-217.

9.

Wetner SJ, Kollman PA, Case DA, Singh UC, Ghto C, Alagona G, Profeta SJ, Weiner P: A new force field for molecular mechanical simulation of nucleic acids and proteins. J Am Chem Sot 1984. 106:765-784.

10.

Cornell WD, Cieplak P, Bayly Cl. Gould IR, Merz KM Jr, Ferguson DM. Spellmeyer DC, For T, Caldwell JW, Kollman PA: A secondgeneration force field for the simulation of proteins, nucleic acids, and organic molecules. J Am Chem Sot 1995, 117:5179-5197.

11.

Mayo SL, Olafson field for molecular

12.

Bowie JU, Luthy R, Elsenberg D: A method to identify sequences that fold into a known three-dimensional Science 1991, 253:164-l 70

13

Desjarlals of proteins.

14.

Dahlyat BI, Mayo 5:895-903

15.

Dahlyat protein

16.

Lazar GA. Deqarlais hydrophobic core

1 7.

Jiang X, Bishop EJ. FarId RS: A de nova designed properties that characterize natural hyperthermophilic J Am Chem Sot 1997,119:838-839.

18.

Dahiyat positions

19 .. This datlon uses

Dahlyat BI, Mayo SL: De nova protein design: fully automated sequence selection. Soence 1997. 278:82-87. paper describes the first fully automated design and expenmental vailof a novel sequence for an entire protein. The force field employed several of the energy functions described in this revfew.

20.

Hendsch continuum

21.

Elcock AH: implications 284:489-502.

22.

de Bakker PIW, Hunenberber PH. McCammon JA: Molecular dynamics simulations of the hyperthermophilic protein Sac7d from Sulfolobus acidocalcfarius: contribution of salt bridges to thermostability. J MO/ Biol 1999, 285:181 l-1 830

23.

Lumb KJ, Kim PS: A buried uniqueness in a designed 1995. 34:8642-8648.-

24.

Schneider JP, Lear in a heterodimeric 119:5742-5743.

25.

Smdelar protein

26.

Spek EJ, Bui AH, Lu M, Kallenbach stabilize the GCN4 leucine zipper.

27.

Dill KA: Dominant 29:7133-7155.

28.

Faxhere J-L, Plicska V: Hydrophobic side-chains from the partitioning Eur J Med Chem 1983, 18369-375.

29.

001 T, Oobatake M, Nementhy G. Scheraga areas as a measure of the thermodynamic

of review.

of special Interest outstanding Interest

1. Street AG. Mayo SL, Computational protein design. Structure . 1999, 7:R105-RI 09. This review surveys the computational pnnclples of the protein descgn lncludtng residue class+flcatlon, negative design and discretizatlon sIdechaIn conformatlons. 2.

6.

forward

I’rotein design force fields h;1vc Ixcn succcasfnl, in part. Ixx2usc of their strinjicnq. Rcsrricti\,c functions. swh 3s rhc I an der \i’aals und rhc hvl)ridi;lation-dcpentfent hvdroin ptrcictilar. rcbiilr in 3 \wy high gen-bond potcntialz. false-ncgatil,c raw. rciu:lioii r3Tc and a siK:nificant Forrtrnsrel~. mm) design force field5 al50 slio\\ a low f~rlxptl\iti\x raw. ‘l‘hcrcforc, sccliicncc~ that ;irc selcctcd in pt-otcin de5i,gtl srtidics rend to fold pi-o~Jcrl\, C\‘CII though many othct ccl113ll\ accept:ilJlc wq~~ct~ccs :IIC rejcctd.

l

Ponder packing different

calculated

inclusion

Looking

5.

Harbury PB, Tudor B, Kim PS: Repacking backbone freedom: structure prediction Acad So USA 1995. 92:8408-8412.

protein cores with for coiled coils. Proc

SU A, Mayo SL: Coupling backbone flexibility sequence selection in protein design. Profe,n 6~1701-1707,

and So

amino 1997,

cycle, of

Nafl

acid

4. ..

Harbury PB, Plecs JJ, Tudor 8, Alber T, Kim PS: High-resolution protein design with backbone freedom. Science 1998, 282:1462-1467, The de now design of an unnatural fold, a right-handed coiled coll, accomplished using a computation that incorporates both sidechaln backbone fiewbillty.

was and

JW, Richards FM: Tertiary templates for proteins criteria in the enumeration of allowed sequences structural classes. J MO/ Viol 1987, 193:775-791. JR, Clarke ND: and design.

Computer Curr Opin states

BD, Goddard simulations.

JR, Handel Profem

search Struct

of proteins.

WA ill: Dreiding J Whys Chem

TM: De nova design Sci 1995, 4:2006-2018.

SL: Protein

algorithms No/ 1998,

design

Annu

- use of for

in protein 8:471-475. Rev

- a generic force1990, 94:8897-8909.

of the

automation.

protein structure.

hydrophobic Profein

cores So

BI, Mayo SL: Probing the role of packing specificity design. Proc Nat/ Acad Sci USA 1997, 94:10172-l

BI, Gordon of protein

JR, Handel of ubiquitin.

DB, Mayo helices.

in 0177.

protein

with proteins.

SL: Automated design of the surface Profein So 1997. 6:13331337.

The stability of salt for hyperthermophilic

bridges

proteins - a 1994, 3:21 l-226

at high temperatures: proteins. J MO/ Biol 1998,

polar interaction heterodimeric

JD, DeGrado coiled coil.

forces

1996.

TM, De nova design of the Protean So 1997, 6: 1 167-I 178.

ZS, Tudor B: Do salt bridges stabilize electrostatic analysis. Pi&e/n So

CV, Hendsch structure and

B/o&em

imparts structural coiled coil. Bfochemlstry

WF: A designed J Am Chem Sot

buried 1997,

ZS, Tudor B: Effects of salt bridges design. Protein SC; 1998, 7:1898-l

in protein

salt

bridge

on 914.

NR: Surface salt bridges Protein SC; 1998, 7:2431-2437. folding.

Bochemistry

1990,

parameters of amino-acid of N-acetyl-amino-acid amides. HA: Accessible parameters

surface of

Energy

hydration of peptides. 84:3086-3090.

Nat/ Acad

Wesson molecular 11227-235

31.

Hellrnga HW, Richards FM: Optimal sequence of known structure by simulated evolution. USA 1994, 91:5803-5807 Kono H, Nrshiyama hydrophobic core based on side-chain

D: Atomic of proteins

SC/ USA

30.

32.

L, Ersenberg dynamics

Proc

M, Tanokura of Thermus packing.

33.

1987

solvation parameters in solution. Protean

selection Proc Nafl

applied Sci 1992,

to

in protein Acad So

M, Doi J: Designing the flaws malate dehydrogenase Protein Eng 1998. 11:47-52.

functions

for

protein

design

Gordon,

Sun S. Brem R, Chan HS, Dill KA: Designing fold with good hydrophobic cores. Protein

Marshall

and

Mayo

amino acid sequences to Eng 1995,8:1205-l 213.

34. Malakauskas SM, Mayo SL: Design, structure and stability . hyperthermophilic protein variant. Nat Struck Biol 1998, The authors impart hyperthermal stability to a mesophilic protein principles of computational protein desrgn to amino acid residues at the boundary between the protein core and the surface. 35.

Street AG, Mayo SL: Pairwise calculation accessible surface areas. Fold Des 1998,

36.

Sternberg entropy empirical

513

of protein 3:253-258.

of a 5:470-475. by applyrng posrtioned

solvent

MJE, Chickos JS: Protein side-chain conformational derived from fusion data - comparison with other scales. Protein Eng 1994, 7:i 49-l 55.