Protein folding and deterministic chaos: Limits of protein folding simulations and calculations

Protein folding and deterministic chaos: Limits of protein folding simulations and calculations

096w3779/91s3.00+.00 Pcrgamon Fbss plc Chaos. Solikms & Fracfols Vol. 1. No. 4. pp. 375 - 382. 1991 Printed in Great Britain Protein Folding and Det...

574KB Sizes 0 Downloads 76 Views

096w3779/91s3.00+.00 Pcrgamon Fbss plc

Chaos. Solikms & Fracfols Vol. 1. No. 4. pp. 375 - 382. 1991 Printed in Great Britain

Protein Folding and Deterministic Limits of Protein Folding Simulations

Chaos:

and Calculations

Gerald Biihm

Institut fiir Biophysik und Physikalische Biochemie UniversiClt Regensburg.D-8400 Regensburg.F.R.G.

Abstract - The aim of the present work is to suggest that protein folding is a highly process

which generally

to the availability

it is obviously

algorithm

with

dynamics

simulations

show

on digital computers. This limitation

of compufi~rg resso~ces

suggested previously;

dimensional

cannot be simulated

sufficient

structure

accuracy

parameter(s)

a small

protein

with

is not due

as it has been

for any deterministic

of protein 46 amino

folding.

acids

Molecular

whose

is known, suggest the native state to be a ‘fxedarrracror’.

that any oh irtitio calculation

protein is controlled

to quantify

to describe the dynamics

on crambin.

parameters,

or exactforcejield

impossible

complex

The results

of protein structure must fail if the folding

process of a

by a kiwfic process, i.e. when the native state is the khetically

accessible

minimw~ on the energy hyperspace but not the fhermody/lurnicully possible gl&~J In this case only

non-dynamical

(knowledge-ba.sed

approaches)

computational valuable

methods

methods like puffer,1 recognition

for the design and description

promising

to date to deduce the structure of an unknown

information

may

of protein structures than the tmditional

methods

method

structure.

and aeurul network methods

based algorithmic

used to date. Knowledge-based

minimum.

or dutubuse processing

can provide a reasonable three-dimensional

generic ulgorihms

like

three-

modelling

force-field

is therefore

protein

from

Novel

be more

the most

the sequence

solely. INTRODUCTION

The

protein folding problem

is sometimes

considered to be the

‘second part of the genetic code”,

ing the transition from the translated protein sequence to the native tertiary structure. The information for this process is intrinsically conditions

embedded

in vivo. A ‘folding matrix’.

has been suggested that chaperones

relationship

preferrably however,

the it

have failed so far. The most

protein is knowledge-bused

(‘compurufive’)

of structures is guided by known structures from homologous 375

required

for in vivo folding’.

for the sequence-structure

method to deduce the structure of an unknown

ing’; here, the generation

conditions,

from external sources, is not neccessary;

may be of relevance

All attempts to extract the information promising

in the protein sequence and the solvent

e.g. information

defm-

proteins.

modell-

376

G. Bdm

Other

methods

which have been published

of the native structure ‘. Also, for a correct description

was directed by some knowledge es of the protein, the precision

for atom positions required

binding parts must be better than 1

In this

work

unpredictable,

the fundamental

of our mathematics;

events,

in the axiomatic

intrinsically

in the beginning)

One

principles

of events are based on decimal

the errors

negligible

way

ligand-

chaos’ was first suggested during

dynamical

underlying

investiga-

processes in nature have been shown

to be

those events are known’.

due to the problem of describing initial conditions as well as parameters for an algorithm in

This is mainly

descriptions

process-

site and/or

is a process of highly deterministic chaos, as it has been

events6; until today, many non-linear

although

of functional

A.

it is sugested that protein folding

tions on climate

or the calculation

to describe at least the active

for a multitude of natural events. The term ‘deterministic

shown

terms

so far do either lack general applicability’,

system of scientific

explanation

numbers with limited

precision.

On the pathways

the algorithmic

than the corresponding

are layers of ‘neurons’

with different

(usually

used, most of dynamical

the error

(that

was

value.

is to use a rrelrrul nemork

problem

on the fact that patterns can be learned by special programs basis for the learning

is currently

in any decimal number will add up, until finally

included

will be of greater magnitude

to circumvent

which

(NN)‘. This method

in a training-

weights (‘synupfic

is based

and a recall-phase).

strength’).

The

New patterns are

associated with the stored information. Another

of solutions.

algebra is the gencfic algoriflrm (GA).

new method in computer

lels in biological

evolution.

The

GAS find the solution to a problem

attempts

problem

space. GA have no apriori during the evolution

knowledge

of the problem

AND

described

ture is known so far. The resolution All molecular

dynamics

simulation

described

U.S.A.),

and DISCOVER

release 2.5 for the CRAY

equipped

3.3.2

on both machines

with a 12.5 MHz

ond of simulation.

of

were determined

R3000

whereas the CRAY

4/432.

with the program Graphics

Both machines

The numerical

methods used. A simulation

Inc.,

for

4/432.

equipped

under

identical

with 4 processors, took

on the IRIS

about

View, UNIX

calculations

are based mainly

of Crambin

RISC processor, took about 12,000 seconds CPU time Y-MP

struc-

DISCOVER

Mountain

were running

differences

to be below 0.5 %; these differences

and the processing MIPS

Y-MP

5.2, respectively).

bridges; it is

whose three-dimensional

on the 4D/?O,

per picosec60-80

seconds

picosecond.

Simulations ture. The

release 2.6 for the IRIS 4D series (Silicon

and UNICOS

data representation

per simulated

this is

the problem

A.

in this work were performed

U.S.A.),

performed

conditions:

a small plant seed protein with 46 amino

structural elements

of the crystal structure is I.0

Inc., San Diego,

internal

in the

METIIODS

later is crambin.

secondary

(Biosym

(IRIX

space or the environmental

located

uhyssirricu. It has two short helical regions. two P-strands, and three disulfide

one of the smallest stable proteins with defined

derivutes

of information

and energy minimization”.

protein used for the calculations

acids from Crumk

has its pamI-

to repeated attempts

‘fic~zy logic’ methods may circumvent

of the solution. Similarly,

MATERIALS

The

This development the feedback

toward the solution are called ~er~s - a sequence

scanned

fixed rules for dynamics

by analyzing

cutoff

were

performed

distance

mainly

under in vuc~~ conditions.

for intramolecular

interactions

All hydrogens

between atoms WJS 18.0

were present in the struc-

A. with

2.0

A

switching

377

‘sticchaos Protein folding and detenmm &stance. A was

leapfrog

ufgorithm for the startup of the dynamics was used. Initial equilibrium

of the dynamics

performed over 5.0 ps; the simulations took 80 ps or 100 ps. respectively, with an underlying

time step of

1 fs. Structures were collected each 0.1 ps. For comparison above calculations,

reasons,

one calculation was performed under identical

but with explicit representation

coditions

with respect to the

A

of water molecules. A solvent layer of 8.0

around the

molecule was modelled; for the calculation, periodic boundary conditions were applied. state was simulated by the following two methods: (1) the complete

The denatured proline residues)

was folded into a all-gstrand

conformation,

chain (except the

with Q and w angles of 120.0 degrees for each

amino acid, resulting in a long, stretched chain with maximum solvent accessibihly. (2) Only the a-helical and P-strand regions (e.g. residues 7 to 19.23 to 30; 1 to 4.32 to 35) were folded into the stretched conformation, whereas the regions of turns and loops were kept fixed relative to the native state. This resulted

in a more

compact starting conformation. The simulations described under (2) were divided in two different parts: (a) the fixation of the turns was released immediately after the equilibration

time of the dynamics; (b) turns were kept fixed for the first 40 ps

of the dynamics. Under the latter conditions the chain has already collapsed to a compact structure before the turns are allowed to decompose. Beginning

with the unfolded states described under (I), changes in the backbone dihedral angles of resi-

due Clu-23 (the central residue of the molecule) were introduced to examine the influence of small perturbotions on the resulting structure. An additional torsion of +30’ and -60’ for the q-angle of residue

Glu-23 re-

sultcd in three slightly different initial structures. All other angles in the three structures were identical.

RESULTS The simulations

AND DISCUSSION

performed during this work show that the linal state is reached after about 40 to 50 ps.

After this time a compact structure has emerged; the total energies of the structures are very similar to each other and to the energy of the native state. It may be argued that protein folding takes place within seconds instead of picoseconds. In fact, it cannot be ruled out that there may be major rearrangements on a time scale lo6 larger than the one used here, even under the same conditions.

of the structures

However, a close inspection

of the last 30 picoseconds of the simulations (Fig. 1) suggests that there is a high energy barrier (activation energy) for a significant transition between the final four structures. A comparison of the dynamic trajectories with explicit representation

of water molecules

calculations in vucuo (data not shown) revealed that the resulting conformations

and analogous

and trajectories are complete-

ly different, as it is expected. Electrosrufic shielding effects as well as solvent dumping of the motions protein atoms significantly

delay the collapse of the chain to a compact globule.

compete with molecule-solvent

Intramolecular

of the

interactions

interactions; therefore, the analysis of the simulation gets more complex

and

much more computer time is needed for a simulation. It was not the aim of this work to investigate the influence of explicit solvent representation dynamics

simulations lo but to examine the effect of small structural perturbations

identical conditions;

therefore,

the limitation to in vucuo conditions - whatever arguments

those calculations - was justifiable.

Further work will include (aside from other improvements

sentation) explicit water representation.

on molecular

on dynamical events under exist contrary

to

in model repre-

G. B&M

378

6000

I

I

50

60

I

Unfdlded

co;lformatibns

Native conformation 3000

lo

0

20

40

30

Time

70

80

(ps)

Fig. 1: Total cncrgy (kinetic and potcntid energy) during ‘refolding’ simulation of crambin.

Fig. during

1 demonstrates the simulated

(and the structure,

that the total energy of the system

folding

process.

see Fig. 2) take place. The energy

the native state of the molecule, terms

on the initial

structures

collapse

however,

structure,

up about

diverges;

8 to 10 %, rms deviation

the final structures

examination

different

Surprisingly. and P-sheets; patterns

reveals

native

proteins

these

of the energy fluctuation

to

This holds also true when several

this is indicated

in Fig. 2 and 3. Depending

which are far away from

away from the native

from the native state.

Fig. 3 demonstrates

deviation

the native

(reference)

This

describes

that the evolution

are about 8 to 10 A rms deviation that this is the maximal

structures

protein.

conformation; the collapse

of the structures

all of the

during the

away from each other. that compact

of the simulations

between

locally secondary-structure

Unfortunately

of those secondary

structures

10 ps and 25 ps; sidechain

on the tendency

bonds of the protein of subtle differences

contain

show more regular local geometries,

unambiguously.

of the formation

have a great influence

occurences

within

drops

globular

structures

of the

and the native state represent

four

protein conformations.

tures to occur origins

by hydrogen

globule.

the three final structures

the simulated

can be identified

An analysis

indistuingishable.

result in structures

is about 40 8, mls deviation

given size can adopt. Therefore, completely

in the final state is identical

during the simulation;

all three simulations

chain to a folded, compact

A closer

only minor fluctuations

energy)

boditr,~ energy, vm der Wuuls energy, Codotnhic energy. dispersiott etwrgy)

are divergent

conformation

end

stretched

and kinetic

respectively.

The structures,

The starting

potential

state is reached,

e.g. they are energetically

(hydrqerl

of the energy

are compared,

When a compact

(including

for the structures

backbones.

but the characteristic

they are in different demonstrate interactions to occur,

spatial regions

that the decision of spatially

although

The main reason for the distinct

which occur ‘by chotlce’.

like regions

of short

a-helices

hydrogen

bonding

in all structures.

for the local

neighboring

the final structures structures

struc-

amino acids

do

are stabilized

seem to be the sum of

Protein folding and deterministic chaos

10

20

30

50

60

70

80

Time (ps) Fig. 2: Root mean square dcvinlion of UIC three unfold-4 conformalions of crambin to the native state.

17.5 15.0 12.5 10.0

Y

7.5 5.0 2.5 t

0.0

’ 0

I

I

I

I

I

I

I

10

20

30

40

50

60

70

Time (ps) Fig. 3: Root mean square deviation of tie three unfolded conformations of cmmbin rclativc to each other.

80

G. B&M

380

CONCLUSIONS The divergence dependend results

of the trajectories

on the selected

are independent

term

of the molecules

parameters

during the simulations

and the algorithms

from ail guidelines

may be considered

used, but this seems

and rules examined

so far. The

structures

observed

of the calculated enthalpic terms @otentiul energy), e.g. the native state is energetically

ble from the computationally structures

do not converge

This does which

but diverge

molecules

too simple to be reliable

does really provide

between

sequence

in our mathematical

of modem

science.

systems

axiomatic

of the analogous

processes

via hi/lo-cation graphs; true bifurcation

e.g. the module the folding

methods,

without providing

recently.

finally,

known equivalent

system;

applicability

and characterizing

is negative,

between

several

transition

Lyapunov-exponent

procedure

folding

problem

of structure

they

reduce

calculation of dynamical

constint.

the information e.g.

Some mass

by

the built-up-proce-

graph and thus from another since

problems

to previously

is the description

Others,

rela-

can be

multi-body

by the Feigenbaum

hyperspace.

the conformational

It would be of great interest

statistical

parameters

magnitude

for this value is still controverssl.

(for example,

chemical

systems

the attractor

point

conformational

in hyperinfonnation

which may

also

be useful

the nature of the

for the trajectories

state’

is a fixed point

If it is zero, then the native

states (dynamic

model, e.g. equivalent

would characterize

for the

‘final

structure).

information

establishes

protein

of a unique

in

confonnasubstates

a ‘strange uttructor’

which

of

has no

theory so far.

dimension is a valuable

hyperspace.

that relate

system of mathemat-

imply that this relationship

is the classification

chaotic

(static model, e.g. crystal

parameter

devclopcd

mechanical

such as the simula-

The existence

The Lyuplorov-exponer’t chamcterizes

if the exponent

in protein structure

of proteins;

problem.

graphs are characterized

mational

the surface

that the

energy.

a positive

the chaotic

Also,

the

algorithms

are primitive process

axiomatic

of the protein

standard

point in the bifurcation

describing

be in equilibrium

hemoglobin);

A graphical

lack a general

space of conformations

tion would

formulation

different

the required

of the chaotic

the phase

however,

into account

this holds true for several

of the conformational

of the value of new methods.

(attractor)

is used

of a complex

may be shown to work only since they circumvent

however,

are properties

judgement

indistuingisha-

states;

it must lx taken

does not necessarily system;

to a section

from a completely

Both

There

method,

process

folding

chaos ? One achievement

of deterministic

which have been published

starts

in

converge

and for all deterministic

that our current

to the protein

methods

space.

Especially

for the description

of a protein

or physical

Now, what would be the benefit

dure,

conditions

reason to believe

a solution

and structure

expressed

ideas,

the

folding”.

ics and physics

reducing

different

since

states.

of the force fields and the Newtonian dynamics which

basis

described

different

discouraging.

On the other hand, there is no imminent

tionship

from three slightly

that for all starting

the results are similarly

models and are definitely tion of protein

starting

to completely

in no case proof, of course,

are possible

underlying

refolded

to be mainly

not to be the case

for the description

space for a molecule

reactivity

C,- atoms

to fold; it is equivalent

to find out correlations

the size of the protein, Thefructul

of dynamical

between

fractal

surfacesIs.

The

Also, theory

This

confor-

dimension

Even

may be used to characterize

is often related with a rough surface’3. and self-affine

to the so-called

the chaotic

or amino acid composition).

inda”.”

processes.

the order

the softness

methods

and

have

of of

been

of renormaiizarion

Protein folding and

groups connects dynamical

physical

dependence

different

results with molecular structure

(experimental)

Bifurcation

forces on completely

Ctenrrinistic

381

chaos

this is useful

scales;

states which are detected by experimental

bifurcation

of dynamical

trajectories

of transition

methods.

The folding

these schemes are based on thermodynamical

structural data and thus describe molecule. Now, bifurcation folding pathways. intermediates

the properties

of an ensemble

pathway

criteria

of molecules,

diagrams do provide a simple possibility sense these intermediates

described

via

of proteins which are rather than molecular-

not the fate of a single

for the description

of multiple protein

It has been observed that the refolding pathway and the occurrence

(in a thermodynamical

and parameter

states of the folding or

diagrams are not directly related to kinetical schemes for the refolding

gained by experiments;

thermo-

CdCdatiOd6.

diagrams are helpful schemes for the description

of processes and may be of value for the description

intermediate

for comparing

of the observable

are an ensemble of molecules exhibiting the

same behavior of a measurable parameter) may be different when refolding starts from different conditions. Finding out how the Feigenbaum

constant can be derived from the experimental

different denatured states should provide us with a better understanding In a recent work, El Nashie and Kapitaniak nonchaotic

behavior

simulationsi7.

may be performed

of the folding problem.

found that the distinction

by the Lyapunov

exponent

The authors stress the similarity of the experimental

values of refolding from

between

distribution

chaotic and strange

of symbolic

dynamic

results from nucleic acids research and

chaotic models of solitons in elastic strings. The current

work identities

analogous

facts between known highly chaotic systems and the protein

folding problem. However, at the current state we cannot rule out that the description theory is incomplete,

even if there is evidence. The general applicability

molecular

simulation

contradiction

dynamics

to cxpcrimcntal

is definitely

or computational

indices, chrroric dimensions,

inadequate.

However,

results in the literature.

bijirrcarion diugrums,

in terms of the chaos

of the methods described for the to my knowledge

there

is no

On the other hand, tools likefracfaf

and Lyupunov-exponenrs

may be of value for protein

structure prediction. If protein folding can be identified unambiguously

as a process of deterministic

chaos, then this would

imply that any fib inirio method must fail when the native state of a protein is the kinetically minimum but not the thermodynamically In this case, knowledge-based methods or other pattern sequence. Therefore,

possible minimum intrinsically

methods like homology

recognition

techniques

modelling,

contained in the amino acid chain. generic algorithms,

are the only way to calculate

future work should concentrate

accessible

protein

neural nenvork structure

on those methods rather than the improvement

from of

force field parameters or force field algorithmsr*s*9.

Acknowledgements

- The aid of many collcagucs is gntcfully

Raincr Rudolph. and mosfly Rincr

acknowlcdgcd. Spcchl

thanks IO Profs. Gustav Obcrmair.

Jacnickc for help and kind support; Drs. Mikl6s Cscrze). Ina Koch. and Cornelius

fruitful discussions. This work ~3s supprtcd

FrClmmcl for

by Dcutschc Forschungsgcmcinschaft grant Ja 78129-I.

REFERENCES 1

Jacnicke. R.. Is Thcrr: a Code for Frotcin Folding ? in Prorein Srrucrure and Prorein Engineering.

2

1988 (cd.: Winnrrkcr. E.-L., & Hubcr. R.). Springer Vcrhg. BcrlinIHcidclbcrg/Ncw York. pp. 16-36 (1988) Jacnickc. R.. Folding and Association of Proteins. Prog. Biophys. molec. ho/. 49. 117-237 (1987).

39. Colloquium - Mosbach

382

3

G. B~HM Blundell,

TL.,

Sibnnda. BL.,

Stemberg. MJ.E,

8 Thornton. J.M.. Knowledge-based

the Design of Novel Molecules. Nururc 326.347-352 4

V~uez.

M..

& Schemga

Prediction of Protein Structures and

(1987).

H.A.. Cslcukttion of Protein Conformation by the Build-up

Procedure.

Appliwtion

to Bovine

Pancreatic Trypsin Inhibitor Using Limited Dimulaten Nuclenr Magnetic Resommce Dam. 1. Biomol. Sfrucr. Dynam. 5. 705 755 (1988).

5

Skolnick. J., & Kolinski, A.. Simulntion of the Folding of ;LGlobular Protein. Science 250. 1121-I 125 (1990).

6

Lorenz. E.. Deterministic Nonperiodic Flow. 1. Afmosph. Sci. 20, 130-r14

I

Schuster, H.G., Dererminisric Chaos: An fnrroducrion. VCH Publishers. DeerfiehbWeinheim

8

Crick, F., The Recent Excitement about Neuml Networks. Nature 337, 129-132 (1989).

9

Goldberg. D.E.. AlgoriUuns in Searching, 0ptimi:arion.

IO

vsn Gunsteren. W.F.. & Berendsen. HJ.C., Perspectives in Chemistry. Anger.

1(1963). (1984).

and Cfachine Learning. Addison-Wesley

Computer Simulation of Molecule

Chem. Int. Ed. Engl. 29,992-

(1989).

Dynamics: Methodology,

Applications.

and

1023 (1990).

11 12

Fmuenfelder. H., Biomolecules. in Emerging Synfheses in Science Volume I (cd.: Pines. D.). Addison-Wesley Msndelbrot. B.. The Fracfal Geomefry of Nafure. Freeman, New York (1977).

13 14

Li. H.. Li, Y., & Zhno. H.. Fmctsl Analysis of Protein Chsin Conformation. Inf. /. Biol. Macromol. 12. p. 6-8 (1990). Brynnt. S.H., Ishm. S.A.. & Weaver. D.L.. The Surfxe Area of Monomeric Proteins: Significance of Power L3w Behavior.

15

Cse~o.

16

Fisher. M.E..

17

(1974). El Nzschie.

MS.

147,275-281

(1990)

Proteins S~ruct. Funcr. Genel. 6.4 18423

18 19

M.. & Vicsek. T., Self-Affine The Renonnaliution & Knpitnnti.

(1988).

(1989).

Fmcti

Analysis of Protein Structure Chaos. So/irons and Frucrols 1. in press (1991).

Group

in the Theory of Criticrtl Behavior.

T., Soliton Chaos Mod&

Reviews of Modern

Physics 46. 579-616

for Mechanical snd Biological Elnstic Chains. Physics Letters A

Gamier. J., Protein Structure Prediction. Biochimie 72.5 13-524 (1990). Fasman. G.D.. Predicrion of Protein Strucrure and rhe Principles of Prorein Conformorion.

Plenum Press. New York (1989).