177
New Monte Carlo algorithms for protein folding Ulrich HE Hansmann* and Yuko Okamotot Over
the
past
algorithms
three
have
decades,
been
a number
introduced
of powerful
to the
protein
simulation
folding
problem. For many years, the emphasis has been placed how to both overcome the multiple minima problem and the
conformation
with
the
global
minimum
potential
on find
energy.
Since the new view of the protein folding mechanism (based on the free energy landscape of the protein system) arose in the past few years, however, it is now of interest to obtain a global
knowledge
intermediate methods As well
and
of the phase denatured
space, states
including
of proteins.
the Monte
have proved especially valuable for these as new, powerful optimization techniques,
algorithms
that
conventional
can
sample
methods
have
much
a wider
been
established.
phase
Carlo
purposes. novel space
than
Addresses *Department MI 49931-l
of Physics, Michigan Technological University, Houghton, 295, USA; e-mail:
[email protected] +Department of Theoretical Studies, Institute for Molecular Science and Department of Functional Molecular Science, The Graduate University for Advanced Studies, Okazaki, Aichi 444-8585, Japan; e-mail:
[email protected]
Current
Opinion
in Structural
Biology
1999,
9:177-l
83
http://biomednet.com/elecref/0959440X00900177 0 Elsevier
Science
Ltd ISSN
0959-440X
Abbreviations HMC hybrid MC MC Monte Carlo MCM MC with minimization MD molecular dynamics PERM pruned-enriched Rosenbluth
method
Introduction The Holy Grail of computational biochemistry is to understand the kinetics of the folding of proteins and peptides, and to predict the folded conformation solely from amino acid sequence information by computer simulations. Describing the intramolecular interactions through force-fields, the equations of motion can, in principle, be solved numerically for each atom in a protein. Hence, in theory, one can follow the trajectory in time by a molecular dynamics (MD) simulation and study explicitly the process of folding. The final conformer is identified as the folded state and equilibrium properties can be calculated by computing averages over the sampled set of conformations. An alternative approach is a Monte Carlo (MC) simulation at a relevant temperature. In this case, trial moves are generated randomly and are accepted or rejected according to the Boltzmann weight. If the chosen algorithm satisfies the ‘detailed balance’ condition and each configuration can be reached in a finite number of steps (ergodicity), the resulting Markov process will converge to the canonical distribution, which should contain the folded conformation of the protein.
Thermodynamic quantities are again calculated by computing averages over the sampled conformations. An obvious advantage of MD over MC is that MD simulations allow one to follow the classical trajectory of the system, whereas the dynamics in MC is artificial. Hence, with the exception of lattice models (see the review by Thirumalai and Klimov in this issue, pp 197-207), MD is the method of choice for the investigation of the kinetics of folding. It was claimed in early work [l], however, that MD is also more advantageous than MC in calculating equilibrium properties. Only recently has that claim been disputed. Bouzida etal [Z] reanalyzed the original data and their results indicate rather that the MC method is superior to MD. This conclusion seems to be supported by a more recent study by Senderowitz and Still [3]. There are various reasons for the long-lasting preference for MD over MC in computational protein folding studies. The complex form of the intramolecular interactions, containing both repulsive and attractive terms, yields a rough energy landscape and a huge number of local minima. Hence, at temperatures of experimental interest (= 3OOK),conventional MC or MD simulations tend to get trapped in one of the configurations of energy local minima. Only small parts of the phase space are sampled and, thus, physical quantities ca.nnot be calculated accurately. For that reason, protein folding simulations were long restricted to exploring the local neighborhood of a well-known state. For instance, the flexibility of a folded conformation and the temperature-induced unfolding of a protein have been studied (as recently reviewed in [4’]). In both cases, the interest is in kinetics and, for such problems, MD is obviously superior to MC. Only recently has there been renewed interest in MC methods, motivated by two parallel developments. Firstly, a new view of the folding process has been developed over the past few years that asserts that a full understanding of the folding process requires a global knowledge of the free energy landscape of the protein system ([S-9], see also the review by Thirumalai and Klimov in this issue, pp 197-207). Hence, there is a shift in interest towards the thermodynamics of protein folding. For the investigation of the equilibrium properties of proteins, MD is no longer the single choice. Secondly, a number of algorithms that allow an improved sampling of low-temperature configurations have been recently introduced. Although many of the new techniques can also be used for MD, they were usually first developed as MC methods. In this review, we will discuss some of the new techniques that motivated, in part, the renewed interest in MC methods. We will focus on the algorithmic improvements of MC techniques over the past few years, neglecting a large
178
Theory and simulation
body of important work in which dramatically increased computational power allowed investigations into new and interesting problems with conventional techniques (see, for example, [lo]). We will not review the recent advances in force-field or protein model development (see, for instance, [l 1’,12]) or new progress in theories as to how to include solvent effects in protein simulations (see, for example, [13]), although there is interplay between these developments and the algorithmic improvements. We first discuss the role of MC methods as global optimization algorithms. We then consider the role of MC methods as a means to calculate thermodynamic quantities.
Global
optimization
by Monte
Carlo simulations
Over the past three decades, much effort has been invested into the development of novel simulation techniques to enhance the sampling of low-energy conformations. For many years, the emphasis in protein studies was on predicting the structure of proteins. Assuming that the native structure is thermodynamically stable, it is reasonable to identify the global minimum conformation in the free energy at T = 300K with the lowest potential energy conformation and to search for this conformation using powerful optimization techniques. Apart from a small number of interesting attempts to use deterministic methods (for instance, the CXBBalgorithm [14] or the graph theoretic algorithm [1.5]), stochastic algorithms are usually employed. Probably the most commonly used method is simulated annealing [16]. Its underlying idea of modeling the crystal growth process in nature is easy to understand and simple to implement. Any MC or MD technique can be converted into a simulated annealing algorithm by decreasing the temperature gradually during the run. It can be shown that, with a logarithmic annealing schedule, it is ensured that the simulation will find the global minimum [ 171. As a result of the limitations in available computer resources, however, one is often forced to have faster annealing schedules, for which success is no longer guaranteed. Still, ever since the first applications [18,19], MC simulated annealing is the method of choice in many protein folding simulations and recent applications include [12,13,20,21]. A review of the applications of MC simulated annealing to the ah z'nitio prediction of oligopeptide conformations is given in [ZZ]. Various proposals have been made to increase the efficiency of simulated annealing in protein folding simulations. Weighted-ensemble simulated annealing [23] performs simulated annealing using multiple system copies. The distribution of copies is adjusted as the cooling proceeds, in order to take advantage of the hierarchical structure of the energy landscape of biomolecules. Another promising approach is the use of the Tsallis weight [24], r+ (x) = [1-(l-q)pE(x)]Y~(l-U!, instead of the Boltzmann weight, W&Y) = exp(-@Z(x)), in the simulated annealing simulation. As a result of the power-law form of the new weight, the probability of crossing energy barriers and escaping from
local minima is increased in simulations with Tsallis weights. The deviation from the canonical distribution is controlled by the Tsallis factor and, for q 2 1 and q = 1, the Tsallis weight reduces to the usual Boltzmann weight. Applications to protein folding can be found in [25--271. Another technique, which became popular over the past few years, is genetic algorithms [28]. The main idea is not to study a single configuration trajectory, but to study a population of configurations to which one assigns the potential energy as a fitness score. Choosing some kind of string representation, this population then evolves by a series of ‘biological operations’, such as mutations, recombination and selection, towards an optimal state. Unlike in the case of simulated annealing, however, there is no asymptotic global optimum convergence proof. Nevertheless, genetic algorithms proved successful in finding low-energy conformations of small peptides (for a review, see [29]). Examples of recent applications include [30] and the ‘mining minima’ algorithm [31], which, in addition, attempts to construct an approximate partition function out of the distribution of collected local minima. Also very successful is the MC with minimization (MCM) method [32], in which the current configuration is changed by a large, random move and is followed by local minimization with respect to the potential energy. The thus obtained new configuration is then accepted or rejected according to the XIetropolis criterion. The method realizes only an approximate Markov chain, however, as the ‘detailed balance’ condition is not obeyed. Hence, the obtained sample of local minima is not a rigorous Boltzmann sample and the method cannot be used to calculate thermodynamic quantities. MCM proved to be a useful tool for finding the lowest-energy states of small peptides [33] or in docking simulations [34]. An interesting modification of this technique was proposed by V&quez et&. [35], who replaced the potential energy in the Metropolis test by an approximate free energy. A more recent, promising approach is the diffusion-processcontrolled MC method [36’]. The concept behind this method is chat, as a protein finds its native conformation without exploring all local minima, there is no need to search all conformations, only those that are reachable in a reasonable time. ITsing diffusion considerations, this kinetic condition is translated into a condition on the maximal angular deviations of ail residues between one conformation and the next. The method was used to find low-energy states of an off-lattice simplified model of avian pancreatic polypeptide [36’]. Guiding the search process using additional knowledge of the system under consideration is also enforced by other groups (see, for instance, [37]). The development of novel global optimization algorithms for the protein folding problem is still an active area of research. Techniques that attempt to ease the search process by transforming the energy landscape into a simpler form are another promising example [38,39*-l.
New
Monte
Progress can also be expected from emerging techniques such as taboo search [40] and quantum computing [41]. What is needed most today, however, are reliable criteria for comparison of the various methods. Quantitative investigations of the efficiency of various techniques and their comparison were only recemly started. Such examples are found in [33,39”,42]. which favor simulated annealing and MCM over genetic algorithms and other stochastic optimization methods. Other examples are presented in [43’], in which annealing versions of generalized ensemble techniques (which will be described later in this review) were found to be superior to regular simulated annealing.
Calculation of thermodynamic Monte Carlo simulations
quantities
by
Optimization techniques are useful only when interest is restricted to predicting the structure of proteins, as they do not allow one to calculate thermodynamic averages or to study the thermal behavior of proteins. With the recognition of the energy landscape theory and funnel concept [S-9], there has been, over the past few years, an increasing interest in the thermodynamics of proteins. In order to investigate such questions by computer simulations, one has to sample a set of configurations from a canonical ensemble and take an average of the chosen quantity over this ensemble. In the following, we describe new MC methods in the light of the efficiency of sampling the phase space and for the purpose of studying the thermal behavior of proteins. It should be emphasized that the described new MC methods can also be used as optimizers, for instance, by using them in a simulated annealing run. Chain
growth
methods
chain growth methods, protein or polymer conformations are built up in such a way that, by construction, they are distributed according to the chosen ensemble [44,45]. A recent variant of this idea is the pruned-enriched Rosenbluth method (PERM) [46], whereby the polymer chain grows by placing monomers with a certain probability p,(i) on vacant sites. Each chain carries a weight KG the growth process, where depending on IV,, = m,exp(-@En)W,-,, with WI = 1, AE, is the energy gain for adding monomer 72and m, is the Rosenbluth factor. If the weight falls below a threshold Wtl(l), the chain is with probability of a half either eliminated (‘pruned’) or kept and its weight doubled. If the weight exceeds anothis replaced er threshold W,(2), the conformation (‘enriched’) by 111copies, each with weight IV,lm. Although the efficiency of PERM in studi,s of lattice heteropolymers [47’] is impressive, its application seems to be restricted to minimal protein models. In
Improved
updates
Although chain growth methods directly construct an equilibrium configuration, most MC algorithms start from a random conformation and realize (for instance, through the use of the Metropolis algorithm) a Markov chain with local updates, which guarantees that the simulation will converge
Carlo
algorithms
for protein
folding
Hansmann
and Okamoto
179
after a certain time to the chosen ensemble. To overcome the problem of poor sampling at low temperatures, local updates are not sufficient and some kind of improved updating scheme is preferred in order to enhance the sampling. An intuitive example is hybrid Monte Carlo (HMC) [48], which was first applied to the protein folding problem by Brasset&. [49]. In this case, a short MD run (with the momenta taken from a Maxwell distribution) is used to move the whole systern from a given conformation to a new one, which is then accepted or rejected according to the Metropolis criterion. In this way, a collective move is realized. The advantage of HMC over regular MD is that the trajectory can be followed over a long period of time, with a large step size, because the Metropolis step corrects for the discretization errors (due to a finite step size) encountered during the time evolution of the MD run. The observed improvement in efficiency over regular MC was only modest, however, and HMC by itself does not seem to surpass the multiple minima problem. For special cases, like homopolymers [SO’] and the simulation of proteins in the contact map representation [l 1’1, other nonlocal moves can be implemented naturally. All these moves depend strongly on the chosen model, however, and are often not known apiorr’. To develop appropriate updates, one has to carefully study the collective motions of a protein. Such analyses yielded, for instance, the scaled collective variable MC of Noguti and Go [Sl]. and the Go and Scherdga update [%I, whose efficiency was recently studied using simulated annealing and genetic algorithms [53]. The Go and Scherdgd update is a cooperative change of dihedral angles in a short stretch of the protein backbone, so that the protein conformation remains unchanged outside of this window. As such a move does not obey the ‘detailed balance’ condition, its application to MC simulations [54] is limited when one tries to calculate physical quantities. In a canonical simulation, the probability of crossing an energy barrier of A& is proportional to exp(-PA@. It is then obvious that the slow convergence of MC simulations of proteins is a problem only at low temperatures. Hence, a smart update is to choose the trial conformation from a distribution of the same system at a higher temperature. This trial conformation is then accepted or rejected according to the usual hletropolis criterion (corresponding to the low temperature). This method, called J-walking [SS], suffers from the problem that the acceptance rate decreases rapidly with increasing temperature difference. To avoid the decreasing acceptance rate, Zhou and Berne [56] proposed to choose as a trial conformation the minimized high-temperature conformation, a technique they call smart walking ((s-walking). It is, however, not clear whether s-walking realizes a Markov chain and yields the correct distribution. Yet another related approach exists, q-walking [57’]. In q-walking, the trial conformation is not chosen from a higher temperature distribution, but from a Tsallis generalized statistical mechanics [24] distribution. For suitable choices of the Tsallis factor Q,the obtained distributions will resemble Boltzmann-Gibbs distributions with tails to higher energies.
180
Theory
and simulation
It is obvious that such a distribution has a larger overlap with the low-temperature canonical distribution and, hence, the acceptance rate is higher in q-walking than in J-walking.
[66,67] leads to a uniform distribution in temperature and l/R sampling [68] yields a uniform distribution in microcanonical entropy.
Another technique, parallel tempering, is closely related [58,59]. In this method, one considers an artificial system built of N noninteracting copies of the molecule, each at a different temperature Tp In addition to the standard MC or MD moves, which affect only one copy, parallel tempering introduces a new global update [58,59], the exchange of conformations between a pair of copies. As this move introduces conformational changes in two copies of the molecule, it follows that the exchange is accepted or rejected according to the Metropolis criterion. The exchange of conformations is a new and improved update, which increases the thermalization of the canonical simulation for each temperature. The first application of parallel tempering to the protein folding problem can be found in [60].
The greatest advantage of generalized ensemble algorithms lies in the fact that, from a single simulation run, one can not only obtain the lowest energy conformation, but can also obtain any thermodynamic quantity at any temperature. For the latter, one can use reweighting techniques [69] to construct canonical distributions at a given temperature. An overview of these algorithms and their application to the protein folding problem is given in [70].
Generalized
ensemble
It has to be noted that, unlike the canonical ensemble, the weights are not a p&on’ known in generalized ensembles and, hence, their calculation is usually done by an iterative procedure [43’]. For instance, for the calculations in [71], about 40% of the total central processing unit (CPU) time was spent on this task.
simulations
Whether one can find improved updates strongly depends on the system under investigation. A more general approach to enhance sampling in protein folding simulations is to perform simulations in so-called generalized ensembles, in which, through the construction of the ensemble, the probability of crossing an energy barrier does not decrease exponentially with barrier heights. To be more specific, the weights are chosen so that a MC or MD simulation will lead to a uniform distribution of a prechosen physical quantity. Probably the earliest realization of this idea is umbrella sampling [61]; recent applications of this to protein folding simulations can be found in [62’]. This idea was lately revived and a variety of new algorithms were developed whose usefulness for simulations of biological molecules has been increasingly recognized. In passing, we note that many of the improved updates discussed above can also be used in generalized ensemble simulations (as was discussed for the parallel tempering update in [60]).
The first application of generalized ensemble techniques to the protein folding problem can be found in [71], in which the multicanonical MC technique was used. A formulation of the MD method was also developed [72,73] and, together with a multicanonical HMC algorithm [72], was applied to the protein folding problem. Recent applications include the study of the molecular mechanism of cooperative folding in proteins [74], the helix-coil transition in homopolymers [50’,75] and the structure prediction of the C peptide of ribonuclease A [76”]. Extensions to higher dimension generalized ensembles were proposed [77,78].
The most prominent example of these newer and more elaborate techniques is probably the multicanonical algorithm [63], which is sometimes also referred to as entropic sampling [64] (in [65], it was shown that both algorithms are mathematically identical). Here, conformations with energy E are assigned a weight w,,,,(E) 0~ l/n(E) = exp(S@)), where n(E) is the density of states and S(E) is the microcanonical entropy. It is obvious that a standard update scheme, such as the Metropolis algorithm, will again realize a Markov chain. The generated conformations, however, are in equilibrium not with respect to the canonical distribution, but with respect to the multicanonical distribution. The simulation will lead to a uniform distribution of energy P,,(E) 0~ n(E) a~,&?$ = constant. Hence, a simulation with this weight factor, which has no temperature dependence, generates a one-dimensional random walk in the energy space, allowing itself to escape from any state of energy local minimum. Similarly, simulated tempering
Another variant of generalized ensemble techniques is motivated by the observation that one would like to sample most of the time in the low-energy region, but should visit high-energy states with finite probability. In this way, the simulation can overcome energy barriers and escape from states of energy local minima. It was proposed to realize such an ensemble by updating conformations according to a weight w(E) = (1 + f3(E-E,)/n,)-+~ [%I]. Here, E. is an estimator for the ground-state energy and tiF is the number of degrees of freedom of the system. Note that this weight can be understood as a special case of the weights used in Tsallis generalized statistical mechanics formalism [24] (the Tsallis parameter q is chosen as q = 1 + l/nF). In contrast to other generalized ensemble techniques, the weight of the new ensemble is explicitly given and one only needs to find an estimator for the ground-state energy E,, which was found to be easier than determining the weights of other generalized ensembles [SZ]. The new ensemble was used to determine the characteristic temperatures of the
Simulated tempering was extensively used by the Lund group for the simulation of minimal protein models (for a review, see [79]). Their most recent applications of the method include a study of both the folding properties in an off-lattice model [80] and the importance of local interactions for folding [Sl”].
New
Monte
folding of a small peptide [83”] and to investigate its freeenergy landscape at various temperatures [84]. Similar ideas based on Tsallis weights were pursued by Andricioaei and Straub in [25,57’].
Conclusions
Carlo
12.
13.
Financial support from the Japanese Ministry of Education, Science, Sports and Culture, the Japanese Society for the Promotion of Science, and a Research Excellence Fund (E27448) from the State of Michigan is gratefully acknowledged.
References
and recommended
15.
Papers of particular interest, have been highlighted as:
published
within the annual period
16. 17.
2.
Senderowitz
S, Swendsen
simulation
H, Still WC:
RH: Efficient
Monte
of biological
molecules.
Carlo methods Phys Rev A
Sampling comparison
potential energy surface of of Metropolis Monte Carlo and Chem 1998, IS:1 294-l 299.
Brooks CL Ill: Simulations of protein folding and unfolding. Curr Opin Struct &o/1998, 8:222-226. the author provides a concise and comprehensive review of molecular dynamics simulations that are used in the investigation of free-energy landscapes for protein folding. JN, Luthey-Schulten
folding: the energy 1997, 48545-600.
Z, Wolynes
landscape
perspective.
6.
Dill KA, Chan HS: From Levinthal Sfruct Bid 1997,4:1 O-19.
7.
Shakhnovich
El: Theoretical
thermodynamics 8.
Veitshans
Dobson
theory 10.
19.
contact
and energy properties.
PA: Pathways in a 1 -microsecond 1998,282:740-744.
Vendruscolo
maps.
IP, Maranas
S: Simulations of the polyalanine peptide using
Physica
A 1997,239:244-254.
F: First-orinciole
CA: Prediction
CD, Floudas
global
of oligopeptide
optimization.
J Glob
Opt
.I: A araoh-theoretic algorithm for modeling of protein structure. J MO/ 5iol 1998,
Samudrala
R. Moult
Kirkpatrick
S, Gelatt
annealing.
Science
Geman
Kawai
KE: Conformational of the global minimum annealing method.
2914373-4376.
Y: A prediction of tertiary structures of simulated annealing method. Protein
T, Okamoto
by the Monte
1989,
JW, Schmidt
molecules: location by the simulated
Lett 1988,
H, Kikuchi
Carlo
3:85-94.
Carlacci
L: Conformational
solvation
and ionization
analysis of [Mets&enkephalin: considerations. J Comput Aided
MO/ Des
12:195-213.
Okamoto
Y, Masuya
formation
in BPTKI
Chem
by simulated
relaxation, Gibbs distributions, of images. /EEE 7ians Paft Anal
restoration
SR, Cui W, Moskowitz
1998, 21.
MP: Optimization
l-680.
220:67
Intel 1984,6:721-741.
Wilson
fng
CD Jr, Vecchi
1983,
D: Stochastic
S, Geman
peptide
M, Nabeshima
6-36)
M, Nakazawa
by Monte
Carlo
T: B-Sheet
simulated
annealing.
Phys Len 1999,299:17-24.
Y: Protein folding problem as studied by new simulation algorithms. Recent Research Devel Pure Applied 1998, 2:1-23.
Okamoto
Huber
GA, McCammon
faster
JA: Weighted-ensemble
optimization
on hierarchical
24.
Tsallis C: Possible generalization J Statist Phys 1988, 52:479-487.
25.
Andricioaei
Chem
simulated energy surfaces.
Struct
26.
folding:
27.
Moret
Simulated annealing with Tsallis comparison. Physica A 1997,242:250-257.
UHE:
MA, Pascutti
molecular J Comput
PG, Bisch
optimization Chem 1998,
29.
Pedersen
31.
32.
from
JT, Moult J: Genetic Curr Opin Struct
Head
MS, Given
k this paper, a Monte Carlo procedure for simulating proteins in a contact space representation is developed. It is shown that existing sets of pairwise contact energy parameters are not sufficient to single out the native state.
Systems.
for protein
annealing. Ann Arbor:
structure
6:227-231.
MK: ‘Mining
JA, Gilson
of conformational
Li Z, Scheraga
HA: Monte
multiple-minima
problem l-661
minima’: direct free energy. J Phys Chem A 1997,
Carlo-minimization in protein folding.
approach Proc
to the
Nat/ Acad
Sci
5.
H, V&squez M: Efficiency of simulated annealing and the Monte Carlo minimization method for generating a set of low energy structures of peptides. J MO/ Sfruct (Theochem) 1997,398399:517-522.
33.
Meirovitch
34.
Trosset
of
algorithms Bioll996,
-a
Dandekar T, Argos P: Identifying the tertiary fold of small proteins with different topologies from sequence and secondary structure using the genetic algorithm and extended criteria specific for strand regions. J MO/ &o/1996, 256:645-660.
USA 1987,84:661
to a protein folding intermediate simulation in aqueous solution.
simulated
Holland J: Adaption in Natural and Artificial University of Michigan Press; 1975.
prediction.
weights
KC: Stochastic
PM, Mundim
using generalized 19:647-657.
28.
in press.
in the space
JE: Generalized simulated annealing using Tsallis statistics: application to conformational of a tetrapeptide. Phys Rev E 1996,
computation 101:1609-1618.
a perspective
Int Ed 1999,
Hansmann
numerical
folding Biol 1997, 7:29-40.
statistics.
53:R3055-R3058.
30.
Nat
folding kinetics: landscapes in terms of Fold Des 1997,2:1-22.
E: Efficient dynamics Fold Des 1998,3:329-338.
M, Domany
to funnels.
of Boltxmann-Gibbs
I, Straub
algorithms optimization
D: Protein
CM, Sali A, Karplus M: Protein and experiment. Angew Chem
observed Science 11.
of protein
Curr Opin
T, Klimov D, Thirumalai
Duan Y, Kollman
PG: Theory of protein Annu Rev Phys Chem
to pathways
studies
and kinetics.
timescales, pathways sequence-dependent 9.
181
Phys Rev E 1997,55:4822-4825.
4.
Onuchic
Androulakis
Tetrahedron
of review,
JA: Simulation methods for protein Biopolymers 1980, lS:lOOl-1016.
glycyl glycine peptide: stochastic dynamics. J Comput
5.
863.
analysis of flexible energy conformation
1992,45:8894-8901. 3.
Okamoto
N, Doniach
of a short
Y. Hirata
120:1855-l
annealing:
fluctuations.
Bouzida D, Kumar for the computer
M. Okamoto
Machine 18.
23.
SH, McCammon
structure
and
determination of peptide conformations’in solvents: &nbin&n of Monte Carlo simulated annealing and RISM theory. J Am Chem Sot 1998,
Kinoshita
and the Bayesian
of special interest **of outstanding interest Northrup
Hansmann
279:287-302.
l
1.
M, Grenbech-Jensen
comparative
22.
reading
Pellegrini
folding
conformations via deterministic 1997, II :I -34.
20.
Adcnowledgements
for protein
thermodynamic properties potentials of mean force.
14.
In this review, we have discussed a number of MC techniques that overcome the multiple minima problem in protein folding simulations. We tried to show that some of the novel simulation techniques can sample a much wider phase space than conventional methods, which can be used in global optimization and/or in the calculation of thermal averages. It can be expected that the new algorithms, together with recent progress in force-field refinement, including the effects of solvent, will lead to an increased understanding of the protein folding problem, not only in minimal models, but also in realistic protein systems. The application of these powerful techniques to simulations of small proteins (with = 50 amino acids) now seems to be feasible and should be undertaken.
algorithms
docking approach 96;801
HA: Reaching the global minimum in simulations: Monte Carlo energy minimization using Bezier splines. Proc Nat/ Acad Sci USA 1998,
J-Y, Scheraga
l-801
5.
182
3.5.
Theory
and
simulation
VCquez M, Meirovitch E, Meirovitch H: A free energy based Monte Carlo minimization procedure for biomolecules. J fhys Chem 1994, 98:9380-9382.
36. .
Derreumaux P: Finding the low-energy forms of avian pancreatic polypeptide with the diffusion-process-controlled Monte Carlo method. J Chem Phys 1998,109:1567-l 574. The a6 initio folding of the avian pancreatic polypeptide was studied using the diffusion-process-controlled Monte Carlo method. Starting from extend. ed conformations, the simulation converges within a few thousand MC steps towards the structure obtained by an X-ray experiment 37.
Kolinski A, Skolnick J: Assembly experimental data: an efficient 32:475-494.
of protein structura Monte Carlo model.
36
Kostrowicki J, Scheraga HA: Application of the diffusion equation method for global optimization to oligopeptides. J Phys Chem 1992, 96:7442-7449.
D, Klinowski J: Taboo search: an approach problem. Science 1995, 267:664-666.
41.
Hogg T: Highly structured searches fhys Rev Left 1998,80:2473-2476.
42.
Westhead DR, Clark DE, Murray search algorithms for molecular 1997, 1 I :209-228.
with
quantum
computers.
CW: A comparison of heuristic docking. J Comput Aided MO/ Des
43. .
Hansmann UHE, Okamoto Y: Numerical comparisons of three recently proposed algorithms in the protein folding problem. J Compuf Chem 1997,18:920-933. The authors show that the efficiency of three major generalized ensemble algorithms (multicanonical, simulated tempering and 1 Ik sampling) in finding the ground state of a small peptide differ little from each other and that the algorithms are superior to simulated annealing. Meirovitch H, V&squez M, Scheraga HA: Free energy and stability macromolecules studied by the double scanning simulation procedure. 1 Chem Phys 1990,92:1248-l 257.
45.
Velikson B, Garel T, Niel J-C, Orland H, Smith J-C: Conformational distribution of heptaalanine: analysis using a new Monte Carlo chain growth method. J Comput Chem 1992, 13:1216-1233.
46.
Grassberger of I3 polymers 56:3682-3693.
P: Pruned-enriched of chain length
of
Rosenbluth method: simulations up to 1 000 000. Phys Rev E 1997,
47. .
Bastolla U, Frauenkorn H, Gerstner E, Grassberger P, Nadler W: Testing a new Monte Carlo algorithm for protein folding. Proteins 1998, 3252-66. It is shown that the pruned-enriched Rosenbluth method (PERM) allows greatly enhanced sampling of equilibrium configurations for minimal protein models. 48. 49.
Duane S, Kennedy AD, Pendleton BJ, Roweth Carlo. Phys Lelf 1987, Bl95:216-221. Brass A, Pendleton BJ, Chen Y, Robson simulation theory and initial comparison dynamics. Biopolymers 1993,33:1307-l
54.
Hoffmann D, Knapp E-W: Folding pathways of a helix-turn-helix model protein. J Phys Chem 6 1997, 101:6734-6740.
55.
Frantz DD, Freeman DL, Doll JD: Reducing in Monte Carlo simulations by J-walking: clusters. J Chem Phys 1990, 93:2769-2784.
56.
Zhou R, Berne BJ: Smart walking: sampling of protein conformations. 107:9185-9196.
D: Hybrid
B: Hybrid Monte with molecular 315.
Monte Carlo method of native proteins.
52.
Go N, Scheraga HA: Ring closure deformations of chain molecules. 3:178-l 87.
behavior to atomic
a new method for Boltzmann J Chem Phys 1997,
Geyer CJ, Thompson EA: Annealing applications to ancestral inference. 90:909-920.
Markov chain Monte Carlo J Am Statist Assn 1995,
with
59.
Hukushlma application 65:1604-l
60.
Hansmann UHE: Parallel tempering algorithm for conformational studies of biological molecules. Chem Phys Lett 1997, 281:140-i 50. -
61.
Torrie GM, Valleau JP: Nonphysical sampling distributions in Monte Carlo free-energy estimation: umbrella sampling. J Comput Phys 1977, 23:187-l 99.
K, Nemoto K: Exchange Monte Carlo method and to spin glass simulations. J Phys Sot Jpn 1996, 608.
Bartels C, Karplus M: Probability distribution for complex systems: adaptive umbrella sampling of the potential energy. J Phys Chem 6 1998, 102:865-880. The authors present a vanant of umbrella sampling and study low energy conformations of Met-enkephalin. The close relationship between thetr ‘adaptive umbrella sampling’ and the multicanonical algorithm is pointed out. 62. .
63.
Berg BA, Neuhaus phase transitions.
T: Multicanonical Phys Left 1991,
64.
Lee J: New Monte Carlo Lett 1993, 71:211-214.
65.
Berg BA, Hansmann UHE, simulation of a first-order Chem 1995, 99:2236-2237.
66.
Lyubartsev AP, Martinovski AA, Shevkunov SV, Vorontsov-Velyaminov PN: New approach to Monte Carlo calculations of the free energy: method of expanded ensembles. J Chem Phys 1992, 96:1776-1783.
67.
Marinari scheme.
68.
Hesselbo B, Stinchcombe RB: Monte Carlo simulation and global optimization without parameters. Phys Rev Lett 1995, 74~2151-2155.
69.
Ferrenberg AM, Swendsen studying phase transitions.
70.
Hansmann UHE, Okamoto Y: The generalized-ensemble approach for protein folding simulations. In Annual Reviews in Computational Phys/cs VI. Edited by Stauffer D. Singapore: World Scientific; 1999, 9:129-157.
71.
Hansmann UHE, by multicanonical multiple-minima 14:1333-l 338.
72.
Hansmann Langevin ensemble.
73.
Nakaiima N. Nakamura H. Kidera A: Multicanonical ensemble genirated by molecula;dynamics simulation for enhanced conformational sampling of peptides. J Phys Chem 1997, 101:817-824.
algorithm: Okamoto transition
E, Parisi G: Simulated Europhys Lett 1992,
algorithms B267:249-253. entropic
sampling.
Y: Comment for protein
tempering: 19:451-458.
for first
order fhys
Rev
on ‘Monte Carlo folding: J Phys
a new Monte
Carlo
Monte RH: New Monte Carlo technique for Phys Rev Letf 1986, 61:2635-2638
Carlo
Hansmann UHE, Okamoto Y: Finite-size scaling of helix-coil transitions in poly-alanine studied by multicanonical simulations. I Chem fhys 1999, 110:1267-l 276. Results from multicanonical simulations of polyalanine chains of various chain lengths are reported. The finite-sze scaling analysis was used lo investigate the nature of the helix-coil transitions and to calculate estimates for the critical exponents. Noguti T, Go N: Efficient fluctuating conformations 24:527-546.
quasi-ergodic applications
an efficient 1995,
58.
50. .
51,
D: Local moves: folding. Proteins
Andricioaei I, Straub JE: On Monte Carlo and molecular dynamics methods inspired by Tsallis statistics: methodology, optimization, and application to atomic cluster. J Chem Phys 1997, 107:9117-9124. The authors promote the use of Tsallis weights in simulations of proteins and other complex systems. Of special interest is their q-walking algorithm, whereby the trial configuration of the J-walking method is chosen from a Tsallis distribution.
to the multiple
44.
A, Le Grand SM, Eisenberg for simulation of protein
57. .
Hamacher K, Wenzel W: Scaling behaviour of stochastic minimization algorithms in a perfect funnel landscape. Phys Rev E 1999, 59:938-941. This paper presents a careful inves6gabon into how the efficiency of stochastic minimization algorithms decreases with increasing size of the molecules. It is shown that, for MCM and a stochastic tunneling method, the computational effort increases according to a power law with system size, whereas genetic algorithms show an exponential increase. Cvijovlc minima
Elofsson algorithm 23~73-82.
from sparse Proteins 1998,
39 ..
40.
53.
for simulation of Biopolymefs 1985,
and local conformational Macromolecules 1970,
Okamoto Y: Prediction of peptide conformation algorithm: new approach to the problem. J Cornput Chem 1993,
UHE, Okamoto Y, Eisenmenger F: Molecular dynamics, and hybrid Monte Carlo simulations in a multicanonical Chem Phys Left 1996,259:321-330.
New
74. 75.
Ha0 M-H, Scheraga folding of proteins.
HA: klolecular J Mol Bioll998,
mechanisms 277:973-983.
Monte
of cooperative
Kemp JP, Chen ZY: Formation of helical states in wormlike polymer chains. Phys Rev Lett 1998,81:3880-3883.
Hansmann UHE, Okamoto Y: Tertiary structure prediction of Cpeptide of ribonuclease A by multicanonical algorithm. J Phys Chem l3 1998,102:653-656. ..The results of mulftcanonlcal slmulaflons of the c; pepftde from nbonuclease A are presented. It is shown that there is remarkable agreement between simulations and experiments. Kumar S, Payne P, V&squez using iterative techniques.
M: Method J Comput
for free-energy calculations Chem 1996, 17:1269-l 275.
78.
Higo J, Nakajima N, Shirai H, Kidera A, Nakamura H: Two-component multicanonical Monte Carlo method for effective conformational sampling. J Comput Chem 1997, 18:2086-2092.
79.
lrblck A: Dynamical-parameter algorithms for protein folding. In Monte Carlo Approach to Biopoiymers and Protein Folding. Edited by Grassberger P, Barkema GT, Nadler W. Singapore: World Scientific; 1998:98-l 10.
60.
lrblck A, Peterson C, Potthast sequences with good folding Phys Rev E 1997,55:860-667.
F: Identification of amino properties in an off-lattice
algorithms
for
protein
folding
Hansmann
and Okamoto
183
8 I.
Ir&ck A, Peterson C, Poiihast F, Sommelius 0: Local interactions and protein folding. A 3D off-lattice approach. J Chem Phys 1997, 1071273-282. The authors present a careful investigation into the influence of local interactions in the protein folding process using the simulated tempering algorithm. Unlike earlier lattice simulations, their off-lattice approach demonstrates that, in three dimensions, one can generate sequences with good folding properties using only two types of residues. l .
76. ..
77.
Carlo
acid model.
82.
Hansmann UHE, Okamoto method for systems with 1997, 56:2226-2233.
Y: Generalized-ensemble rough energy landscape.
Monte Carlo Phys Rev E
63. l *
Hansmann UHE, Masuya M, Okamoto Y: Characteristic temperatures of folding of a small peptide. Proc Nat/ Acad Sci USA 1997, 94:10652-l 0656. The power of the generalized ensemble approach was demonstrated by investigating the folding of a small peptide. The characteristic temperatures of folding are calculated from a single simulation run. It is shown that energy landscape theory and funnel concerif, which were developed in the context of minimal protein models, also describe folding In more realistic models. 84.
Hansmann landscape 34472-483.
UHE, Okamoto for the peptide
Y, Onuchic JN: The folding Met-enkepha1il)Protein.s
funnel 1999,