096w3779/91s3.00+.00 Pcrgamon Fbss plc
Chaos. Solikms & Fracfols Vol. 1. No. 4. pp. 375 - 382. 1991 Printed in Great Britain
Protein Folding and Deterministic Limits of Protein Folding Simulations
Chaos:
and Calculations
Gerald Biihm
Institut fiir Biophysik und Physikalische Biochemie UniversiClt Regensburg.D-8400 Regensburg.F.R.G.
Abstract - The aim of the present work is to suggest that protein folding is a highly process
which generally
to the availability
it is obviously
algorithm
with
dynamics
simulations
show
on digital computers. This limitation
of compufi~rg resso~ces
suggested previously;
dimensional
cannot be simulated
sufficient
structure
accuracy
parameter(s)
a small
protein
with
is not due
as it has been
for any deterministic
of protein 46 amino
folding.
acids
Molecular
whose
is known, suggest the native state to be a ‘fxedarrracror’.
that any oh irtitio calculation
protein is controlled
to quantify
to describe the dynamics
on crambin.
parameters,
or exactforcejield
impossible
complex
The results
of protein structure must fail if the folding
process of a
by a kiwfic process, i.e. when the native state is the khetically
accessible
minimw~ on the energy hyperspace but not the fhermody/lurnicully possible gl&~J In this case only
non-dynamical
(knowledge-ba.sed
approaches)
computational valuable
methods
methods like puffer,1 recognition
for the design and description
promising
to date to deduce the structure of an unknown
information
may
of protein structures than the tmditional
methods
method
structure.
and aeurul network methods
based algorithmic
used to date. Knowledge-based
minimum.
or dutubuse processing
can provide a reasonable three-dimensional
generic ulgorihms
like
three-
modelling
force-field
is therefore
protein
from
Novel
be more
the most
the sequence
solely. INTRODUCTION
The
protein folding problem
is sometimes
considered to be the
‘second part of the genetic code”,
ing the transition from the translated protein sequence to the native tertiary structure. The information for this process is intrinsically conditions
embedded
in vivo. A ‘folding matrix’.
has been suggested that chaperones
relationship
preferrably however,
the it
have failed so far. The most
protein is knowledge-bused
(‘compurufive’)
of structures is guided by known structures from homologous 375
required
for in vivo folding’.
for the sequence-structure
method to deduce the structure of an unknown
ing’; here, the generation
conditions,
from external sources, is not neccessary;
may be of relevance
All attempts to extract the information promising
in the protein sequence and the solvent
e.g. information
defm-
proteins.
modell-
376
G. Bdm
Other
methods
which have been published
of the native structure ‘. Also, for a correct description
was directed by some knowledge es of the protein, the precision
for atom positions required
binding parts must be better than 1
In this
work
unpredictable,
the fundamental
of our mathematics;
events,
in the axiomatic
intrinsically
in the beginning)
One
principles
of events are based on decimal
the errors
negligible
way
ligand-
chaos’ was first suggested during
dynamical
underlying
investiga-
processes in nature have been shown
to be
those events are known’.
due to the problem of describing initial conditions as well as parameters for an algorithm in
This is mainly
descriptions
process-
site and/or
is a process of highly deterministic chaos, as it has been
events6; until today, many non-linear
although
of functional
A.
it is sugested that protein folding
tions on climate
or the calculation
to describe at least the active
for a multitude of natural events. The term ‘deterministic
shown
terms
so far do either lack general applicability’,
system of scientific
explanation
numbers with limited
precision.
On the pathways
the algorithmic
than the corresponding
are layers of ‘neurons’
with different
(usually
used, most of dynamical
the error
(that
was
value.
is to use a rrelrrul nemork
problem
on the fact that patterns can be learned by special programs basis for the learning
is currently
in any decimal number will add up, until finally
included
will be of greater magnitude
to circumvent
which
(NN)‘. This method
in a training-
weights (‘synupfic
is based
and a recall-phase).
strength’).
The
New patterns are
associated with the stored information. Another
of solutions.
algebra is the gencfic algoriflrm (GA).
new method in computer
lels in biological
evolution.
The
GAS find the solution to a problem
attempts
problem
space. GA have no apriori during the evolution
knowledge
of the problem
AND
described
ture is known so far. The resolution All molecular
dynamics
simulation
described
U.S.A.),
and DISCOVER
release 2.5 for the CRAY
equipped
3.3.2
on both machines
with a 12.5 MHz
ond of simulation.
of
were determined
R3000
whereas the CRAY
4/432.
with the program Graphics
Both machines
The numerical
methods used. A simulation
Inc.,
for
4/432.
equipped
under
identical
with 4 processors, took
on the IRIS
about
View, UNIX
calculations
are based mainly
of Crambin
RISC processor, took about 12,000 seconds CPU time Y-MP
struc-
DISCOVER
Mountain
were running
differences
to be below 0.5 %; these differences
and the processing MIPS
Y-MP
5.2, respectively).
bridges; it is
whose three-dimensional
on the 4D/?O,
per picosec60-80
seconds
picosecond.
Simulations ture. The
release 2.6 for the IRIS 4D series (Silicon
and UNICOS
data representation
per simulated
this is
the problem
A.
in this work were performed
U.S.A.),
performed
conditions:
a small plant seed protein with 46 amino
structural elements
of the crystal structure is I.0
Inc., San Diego,
internal
in the
METIIODS
later is crambin.
secondary
(Biosym
(IRIX
space or the environmental
located
uhyssirricu. It has two short helical regions. two P-strands, and three disulfide
one of the smallest stable proteins with defined
derivutes
of information
and energy minimization”.
protein used for the calculations
acids from Crumk
has its pamI-
to repeated attempts
‘fic~zy logic’ methods may circumvent
of the solution. Similarly,
MATERIALS
The
This development the feedback
toward the solution are called ~er~s - a sequence
scanned
fixed rules for dynamics
by analyzing
cutoff
were
performed
distance
mainly
under in vuc~~ conditions.
for intramolecular
interactions
All hydrogens
between atoms WJS 18.0
were present in the struc-
A. with
2.0
A
switching
377
‘sticchaos Protein folding and detenmm &stance. A was
leapfrog
ufgorithm for the startup of the dynamics was used. Initial equilibrium
of the dynamics
performed over 5.0 ps; the simulations took 80 ps or 100 ps. respectively, with an underlying
time step of
1 fs. Structures were collected each 0.1 ps. For comparison above calculations,
reasons,
one calculation was performed under identical
but with explicit representation
coditions
with respect to the
A
of water molecules. A solvent layer of 8.0
around the
molecule was modelled; for the calculation, periodic boundary conditions were applied. state was simulated by the following two methods: (1) the complete
The denatured proline residues)
was folded into a all-gstrand
conformation,
chain (except the
with Q and w angles of 120.0 degrees for each
amino acid, resulting in a long, stretched chain with maximum solvent accessibihly. (2) Only the a-helical and P-strand regions (e.g. residues 7 to 19.23 to 30; 1 to 4.32 to 35) were folded into the stretched conformation, whereas the regions of turns and loops were kept fixed relative to the native state. This resulted
in a more
compact starting conformation. The simulations described under (2) were divided in two different parts: (a) the fixation of the turns was released immediately after the equilibration
time of the dynamics; (b) turns were kept fixed for the first 40 ps
of the dynamics. Under the latter conditions the chain has already collapsed to a compact structure before the turns are allowed to decompose. Beginning
with the unfolded states described under (I), changes in the backbone dihedral angles of resi-
due Clu-23 (the central residue of the molecule) were introduced to examine the influence of small perturbotions on the resulting structure. An additional torsion of +30’ and -60’ for the q-angle of residue
Glu-23 re-
sultcd in three slightly different initial structures. All other angles in the three structures were identical.
RESULTS The simulations
AND DISCUSSION
performed during this work show that the linal state is reached after about 40 to 50 ps.
After this time a compact structure has emerged; the total energies of the structures are very similar to each other and to the energy of the native state. It may be argued that protein folding takes place within seconds instead of picoseconds. In fact, it cannot be ruled out that there may be major rearrangements on a time scale lo6 larger than the one used here, even under the same conditions.
of the structures
However, a close inspection
of the last 30 picoseconds of the simulations (Fig. 1) suggests that there is a high energy barrier (activation energy) for a significant transition between the final four structures. A comparison of the dynamic trajectories with explicit representation
of water molecules
calculations in vucuo (data not shown) revealed that the resulting conformations
and analogous
and trajectories are complete-
ly different, as it is expected. Electrosrufic shielding effects as well as solvent dumping of the motions protein atoms significantly
delay the collapse of the chain to a compact globule.
compete with molecule-solvent
Intramolecular
of the
interactions
interactions; therefore, the analysis of the simulation gets more complex
and
much more computer time is needed for a simulation. It was not the aim of this work to investigate the influence of explicit solvent representation dynamics
simulations lo but to examine the effect of small structural perturbations
identical conditions;
therefore,
the limitation to in vucuo conditions - whatever arguments
those calculations - was justifiable.
Further work will include (aside from other improvements
sentation) explicit water representation.
on molecular
on dynamical events under exist contrary
to
in model repre-
G. B&M
378
6000
I
I
50
60
I
Unfdlded
co;lformatibns
Native conformation 3000
lo
0
20
40
30
Time
70
80
(ps)
Fig. 1: Total cncrgy (kinetic and potcntid energy) during ‘refolding’ simulation of crambin.
Fig. during
1 demonstrates the simulated
(and the structure,
that the total energy of the system
folding
process.
see Fig. 2) take place. The energy
the native state of the molecule, terms
on the initial
structures
collapse
however,
structure,
up about
diverges;
8 to 10 %, rms deviation
the final structures
examination
different
Surprisingly. and P-sheets; patterns
reveals
native
proteins
these
of the energy fluctuation
to
This holds also true when several
this is indicated
in Fig. 2 and 3. Depending
which are far away from
away from the native
from the native state.
Fig. 3 demonstrates
deviation
the native
(reference)
This
describes
that the evolution
are about 8 to 10 A rms deviation that this is the maximal
structures
protein.
conformation; the collapse
of the structures
all of the
during the
away from each other. that compact
of the simulations
between
locally secondary-structure
Unfortunately
of those secondary
structures
10 ps and 25 ps; sidechain
on the tendency
bonds of the protein of subtle differences
contain
show more regular local geometries,
unambiguously.
of the formation
have a great influence
occurences
within
drops
globular
structures
of the
and the native state represent
four
protein conformations.
tures to occur origins
by hydrogen
globule.
the three final structures
the simulated
can be identified
An analysis
indistuingishable.
result in structures
is about 40 8, mls deviation
given size can adopt. Therefore, completely
in the final state is identical
during the simulation;
all three simulations
chain to a folded, compact
A closer
only minor fluctuations
energy)
boditr,~ energy, vm der Wuuls energy, Codotnhic energy. dispersiott etwrgy)
are divergent
conformation
end
stretched
and kinetic
respectively.
The structures,
The starting
potential
state is reached,
e.g. they are energetically
(hydrqerl
of the energy
are compared,
When a compact
(including
for the structures
backbones.
but the characteristic
they are in different demonstrate interactions to occur,
spatial regions
that the decision of spatially
although
The main reason for the distinct
which occur ‘by chotlce’.
like regions
of short
a-helices
hydrogen
bonding
in all structures.
for the local
neighboring
the final structures structures
struc-
amino acids
do
are stabilized
seem to be the sum of
Protein folding and deterministic chaos
10
20
30
50
60
70
80
Time (ps) Fig. 2: Root mean square dcvinlion of UIC three unfold-4 conformalions of crambin to the native state.
17.5 15.0 12.5 10.0
Y
7.5 5.0 2.5 t
0.0
’ 0
I
I
I
I
I
I
I
10
20
30
40
50
60
70
Time (ps) Fig. 3: Root mean square deviation of tie three unfolded conformations of cmmbin rclativc to each other.
80
G. B&M
380
CONCLUSIONS The divergence dependend results
of the trajectories
on the selected
are independent
term
of the molecules
parameters
during the simulations
and the algorithms
from ail guidelines
may be considered
used, but this seems
and rules examined
so far. The
structures
observed
of the calculated enthalpic terms @otentiul energy), e.g. the native state is energetically
ble from the computationally structures
do not converge
This does which
but diverge
molecules
too simple to be reliable
does really provide
between
sequence
in our mathematical
of modem
science.
systems
axiomatic
of the analogous
processes
via hi/lo-cation graphs; true bifurcation
e.g. the module the folding
methods,
without providing
recently.
finally,
known equivalent
system;
applicability
and characterizing
is negative,
between
several
transition
Lyapunov-exponent
procedure
folding
problem
of structure
they
reduce
calculation of dynamical
constint.
the information e.g.
Some mass
by
the built-up-proce-
graph and thus from another since
problems
to previously
is the description
Others,
rela-
can be
multi-body
by the Feigenbaum
hyperspace.
the conformational
It would be of great interest
statistical
parameters
magnitude
for this value is still controverssl.
(for example,
chemical
systems
the attractor
point
conformational
in hyperinfonnation
which may
also
be useful
the nature of the
for the trajectories
state’
is a fixed point
If it is zero, then the native
states (dynamic
model, e.g. equivalent
would characterize
for the
‘final
structure).
information
establishes
protein
of a unique
in
confonnasubstates
a ‘strange uttructor’
which
of
has no
theory so far.
dimension is a valuable
hyperspace.
that relate
system of mathemat-
imply that this relationship
is the classification
chaotic
(static model, e.g. crystal
parameter
devclopcd
mechanical
such as the simula-
The existence
The Lyuplorov-exponer’t chamcterizes
if the exponent
in protein structure
of proteins;
problem.
graphs are characterized
mational
the surface
that the
energy.
a positive
the chaotic
Also,
the
algorithms
are primitive process
axiomatic
of the protein
standard
point in the bifurcation
describing
be in equilibrium
hemoglobin);
A graphical
lack a general
space of conformations
tion would
formulation
different
the required
of the chaotic
the phase
however,
into account
this holds true for several
of the conformational
of the value of new methods.
(attractor)
is used
of a complex
may be shown to work only since they circumvent
however,
are properties
judgement
indistuingisha-
states;
it must lx taken
does not necessarily system;
to a section
from a completely
Both
There
method,
process
folding
chaos ? One achievement
of deterministic
which have been published
starts
in
converge
and for all deterministic
that our current
to the protein
methods
space.
Especially
for the description
of a protein
or physical
Now, what would be the benefit
dure,
conditions
reason to believe
a solution
and structure
expressed
ideas,
the
folding”.
ics and physics
reducing
different
since
states.
of the force fields and the Newtonian dynamics which
basis
described
different
discouraging.
On the other hand, there is no imminent
tionship
from three slightly
that for all starting
the results are similarly
models and are definitely tion of protein
starting
to completely
in no case proof, of course,
are possible
underlying
refolded
to be mainly
not to be the case
for the description
space for a molecule
reactivity
C,- atoms
to fold; it is equivalent
to find out correlations
the size of the protein, Thefructul
of dynamical
between
fractal
surfacesIs.
The
Also, theory
This
confor-
dimension
Even
may be used to characterize
is often related with a rough surface’3. and self-affine
to the so-called
the chaotic
or amino acid composition).
inda”.”
processes.
the order
the softness
methods
and
have
of of
been
of renormaiizarion
Protein folding and
groups connects dynamical
physical
dependence
different
results with molecular structure
(experimental)
Bifurcation
forces on completely
Ctenrrinistic
381
chaos
this is useful
scales;
states which are detected by experimental
bifurcation
of dynamical
trajectories
of transition
methods.
The folding
these schemes are based on thermodynamical
structural data and thus describe molecule. Now, bifurcation folding pathways. intermediates
the properties
of an ensemble
pathway
criteria
of molecules,
diagrams do provide a simple possibility sense these intermediates
described
via
of proteins which are rather than molecular-
not the fate of a single
for the description
of multiple protein
It has been observed that the refolding pathway and the occurrence
(in a thermodynamical
and parameter
states of the folding or
diagrams are not directly related to kinetical schemes for the refolding
gained by experiments;
thermo-
CdCdatiOd6.
diagrams are helpful schemes for the description
of processes and may be of value for the description
intermediate
for comparing
of the observable
are an ensemble of molecules exhibiting the
same behavior of a measurable parameter) may be different when refolding starts from different conditions. Finding out how the Feigenbaum
constant can be derived from the experimental
different denatured states should provide us with a better understanding In a recent work, El Nashie and Kapitaniak nonchaotic
behavior
simulationsi7.
may be performed
of the folding problem.
found that the distinction
by the Lyapunov
exponent
The authors stress the similarity of the experimental
values of refolding from
between
distribution
chaotic and strange
of symbolic
dynamic
results from nucleic acids research and
chaotic models of solitons in elastic strings. The current
work identities
analogous
facts between known highly chaotic systems and the protein
folding problem. However, at the current state we cannot rule out that the description theory is incomplete,
even if there is evidence. The general applicability
molecular
simulation
contradiction
dynamics
to cxpcrimcntal
is definitely
or computational
indices, chrroric dimensions,
inadequate.
However,
results in the literature.
bijirrcarion diugrums,
in terms of the chaos
of the methods described for the to my knowledge
there
is no
On the other hand, tools likefracfaf
and Lyupunov-exponenrs
may be of value for protein
structure prediction. If protein folding can be identified unambiguously
as a process of deterministic
chaos, then this would
imply that any fib inirio method must fail when the native state of a protein is the kinetically minimum but not the thermodynamically In this case, knowledge-based methods or other pattern sequence. Therefore,
possible minimum intrinsically
methods like homology
recognition
techniques
modelling,
contained in the amino acid chain. generic algorithms,
are the only way to calculate
future work should concentrate
accessible
protein
neural nenvork structure
on those methods rather than the improvement
from of
force field parameters or force field algorithmsr*s*9.
Acknowledgements
- The aid of many collcagucs is gntcfully
Raincr Rudolph. and mosfly Rincr
acknowlcdgcd. Spcchl
thanks IO Profs. Gustav Obcrmair.
Jacnickc for help and kind support; Drs. Mikl6s Cscrze). Ina Koch. and Cornelius
fruitful discussions. This work ~3s supprtcd
FrClmmcl for
by Dcutschc Forschungsgcmcinschaft grant Ja 78129-I.
REFERENCES 1
Jacnicke. R.. Is Thcrr: a Code for Frotcin Folding ? in Prorein Srrucrure and Prorein Engineering.
2
1988 (cd.: Winnrrkcr. E.-L., & Hubcr. R.). Springer Vcrhg. BcrlinIHcidclbcrg/Ncw York. pp. 16-36 (1988) Jacnickc. R.. Folding and Association of Proteins. Prog. Biophys. molec. ho/. 49. 117-237 (1987).
39. Colloquium - Mosbach
382
3
G. B~HM Blundell,
TL.,
Sibnnda. BL.,
Stemberg. MJ.E,
8 Thornton. J.M.. Knowledge-based
the Design of Novel Molecules. Nururc 326.347-352 4
V~uez.
M..
& Schemga
Prediction of Protein Structures and
(1987).
H.A.. Cslcukttion of Protein Conformation by the Build-up
Procedure.
Appliwtion
to Bovine
Pancreatic Trypsin Inhibitor Using Limited Dimulaten Nuclenr Magnetic Resommce Dam. 1. Biomol. Sfrucr. Dynam. 5. 705 755 (1988).
5
Skolnick. J., & Kolinski, A.. Simulntion of the Folding of ;LGlobular Protein. Science 250. 1121-I 125 (1990).
6
Lorenz. E.. Deterministic Nonperiodic Flow. 1. Afmosph. Sci. 20, 130-r14
I
Schuster, H.G., Dererminisric Chaos: An fnrroducrion. VCH Publishers. DeerfiehbWeinheim
8
Crick, F., The Recent Excitement about Neuml Networks. Nature 337, 129-132 (1989).
9
Goldberg. D.E.. AlgoriUuns in Searching, 0ptimi:arion.
IO
vsn Gunsteren. W.F.. & Berendsen. HJ.C., Perspectives in Chemistry. Anger.
1(1963). (1984).
and Cfachine Learning. Addison-Wesley
Computer Simulation of Molecule
Chem. Int. Ed. Engl. 29,992-
(1989).
Dynamics: Methodology,
Applications.
and
1023 (1990).
11 12
Fmuenfelder. H., Biomolecules. in Emerging Synfheses in Science Volume I (cd.: Pines. D.). Addison-Wesley Msndelbrot. B.. The Fracfal Geomefry of Nafure. Freeman, New York (1977).
13 14
Li. H.. Li, Y., & Zhno. H.. Fmctsl Analysis of Protein Chsin Conformation. Inf. /. Biol. Macromol. 12. p. 6-8 (1990). Brynnt. S.H., Ishm. S.A.. & Weaver. D.L.. The Surfxe Area of Monomeric Proteins: Significance of Power L3w Behavior.
15
Cse~o.
16
Fisher. M.E..
17
(1974). El Nzschie.
MS.
147,275-281
(1990)
Proteins S~ruct. Funcr. Genel. 6.4 18423
18 19
M.. & Vicsek. T., Self-Affine The Renonnaliution & Knpitnnti.
(1988).
(1989).
Fmcti
Analysis of Protein Structure Chaos. So/irons and Frucrols 1. in press (1991).
Group
in the Theory of Criticrtl Behavior.
T., Soliton Chaos Mod&
Reviews of Modern
Physics 46. 579-616
for Mechanical snd Biological Elnstic Chains. Physics Letters A
Gamier. J., Protein Structure Prediction. Biochimie 72.5 13-524 (1990). Fasman. G.D.. Predicrion of Protein Strucrure and rhe Principles of Prorein Conformorion.
Plenum Press. New York (1989).