Enhanced sampling techniques in biomolecular simulations

Enhanced sampling techniques in biomolecular simulations

JBA-06869; No of Pages 11 Biotechnology Advances xxx (2014) xxx–xxx Contents lists available at ScienceDirect Biotechnology Advances journal homepag...

1MB Sizes 0 Downloads 101 Views

JBA-06869; No of Pages 11 Biotechnology Advances xxx (2014) xxx–xxx

Contents lists available at ScienceDirect

Biotechnology Advances journal homepage: www.elsevier.com/locate/biotechadv

1

Research review paper

2Q2

Enhanced sampling techniques in biomolecular simulations Vojtěch Spiwok, Zoran Šućur, Petr Hošek Department of Biochemistry and Microbiology, Institute of Chemical Technology, Prague, Technická 3, Prague 6 166 28, Czech Republic

5

a r t i c l e

6 7

Available online xxxx

8 9 10 11 12 13 14

Keywords: Molecular dynamic simulation Alchemistic simulations Parallel tempering Metadynamics Free energy surface Drug design

a b s t r a c t

R O

i n f o

O

4

D

P

Biomolecular simulations are routinely used in biochemistry and molecular biology research; however, they often fail to match expectations of their impact on pharmaceutical and biotech industry. This is caused by the fact that a vast amount of computer time is required to simulate short episodes from the life of biomolecules. Several approaches have been developed to overcome this obstacle, including application of massively parallel and special purpose computers or non-conventional hardware. Methodological approaches are represented by coarse-grained models and enhanced sampling techniques. These techniques can show how the studied system behaves in long time-scales on the basis of relatively short simulations. This review presents an overview of new simulation approaches, the theory behind enhanced sampling methods and success stories of their applications with a direct impact on biotechnology or drug design. © 2014 Elsevier Inc. All rights reserved.

R

N C O

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

C

. . . . . . . . . . . . . . . . . .

E

. . . . . . . . . . . . . . . . . .

R

Introduction. . . . . . . . . . . . . . . . . . . . . Hardware approaches to improve sampling . . . . . . Methodological approaches to improve sampling . . . . Coarse-graining . . . . . . . . . . . . . . . . Thermodynamic-based methods . . . . . . . . . Alchemistic methods . . . . . . . . . . . Parallel tempering . . . . . . . . . . . . Metadynamics . . . . . . . . . . . . . . Combination of methods . . . . . . . . . Success stories. . . . . . . . . . . . . . . . . . . . Lead optimization by Alchemistic methods . . . . Protein structure prediction by parallel tempering . Metadynamic design of peptide ligands . . . . . . Modelling of target conformation by metadynamics Ligand design by metadynamics . . . . . . . . . Conclusions and future prospects . . . . . . . . . . . Acknowledgements . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

U

49

Contents

T

27

31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

15 16 17 18 19 20 21 22 23 24

E

28 26 25

30 29

F

Q83Q1

Abbreviations: 5-HT2a, 5-hydroxytryptamine receptor, type 2a; ABMD, adiabatic bias molecular dynamics; AMBER, assisted model building with energy refinement program; BPTI, bovine pancreatic trypsin inhibitor; COX, cyclooxygenase; CPU, central processing unit; CV, collective variable; EGFR, epidermal growth factor receptor; FGF, fibroblast growth factor; FGFR, fibroblast growth factor receptor; GPCR, G-protein coupled receptor; GPU, graphical processingunit; HIV,humanimmunodeficiency virus; mGluR2,metabotropic glutamate receptor 2; NAMD, nanoscale molecular dynamic program; RMSD, root-meansquare deviation E-mail address: [email protected] (V. Spiwok).

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Introduction

50

Biomolecular simulations, namely their fathers Martin Karplus, Michael Levitt and Arieh Warshel, were awarded Nobel prize in 2013 (Cui and Nussinov, 2014). The first of the trio, Martin Karplus, was involved in the first atomistic biomolecular simulation published in 1977 (McCammon et al., 1977). They simulated 9 ps of life of bovine pancreatic trypsin inhibitor (BPTI). This system was composed of less than 1000 atoms. Since that time we have experienced an enormous

51

http://dx.doi.org/10.1016/j.biotechadv.2014.11.011 0734-9750/© 2014 Elsevier Inc. All rights reserved.

Please cite this article as: Spiwok V, et al, Enhanced sampling techniques in biomolecular simulations, Biotechnol Adv (2014), http://dx.doi.org/ 10.1016/j.biotechadv.2014.11.011

52 53 54 55 56 57

115

Methodological approaches to improve sampling

116

Coarse-graining

117

An easy way to make simulations faster is to simplify the studied system. This is the basis of coarse grained models of biomolecular

86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112

118

C

84 85

E

79 80

R

77 78

R

75 76

O

73 74

C

71 72

N

69 70

U

67 68

Thermodynamic-based methods

140

F

113 114

Before we present methodological methods aimed at sampling improvement, let us introduce hardware approaches. The history of biomolecular simulations has been significantly influenced by the boom of personal computers in the last decades. Computers were expensive scientific instruments in the early times, but they have evolved into today's inexpensive personal object of everyday life. Biomolecular simulations, as well as other areas of scientific computing, benefit from this development. A typical supercomputer from the 1980s contained a single or few powerful central processing units (CPUs). In the 1990s, the trend has changed from building supercomputers with a single strong CPU to combining smaller, often commodity, computers into clusters (Sterling, 2001). The world's most powerful computers today are composed of hundreds of thousands or millions of CPU cores (see www.top500.org for the biannually updated list of 500 world's fastest computers). At the same time, biologists also became prominent customers of high-performance computing centres and terminated the dominance of military industry, oil drillers and other previous users of massive computing. Another hardware approach to increase computing power for biomolecular simulations is in application of a non-conventional hardware, such as graphical processing units (GPUs) (Pronk et al., 2013; Harvey et al., 2009). Industry of computer gaming hardware developed GPUs with enormous computing power, which can be used to speed up molecular simulations, provided that GPUs can be efficiently handled by the simulation software. Another strategy is to design special-purpose hardware, represented by Anton computer (Dror et al., 2011; Lindorff-Larsen et al., 2011; Shaw et al., 2008, 2010; Snow et al., 2002). This machine contains pieces of hardware tailored for molecular-simulation-specific calculations and is significantly faster than the general purpose computers. Numerous successful projects have also made use of computers of volunteers in distributed computing schemes, such as Folding@ home project (Shirts and Pande, 2000).

65 66

O

83

64

R O

Hardware approaches to improve sampling

62 63

119 120

P

82

60 61

systems, which fall into the category of mesoscopic simulations. In coarse-grained simulations, a group of atoms is reduced to a single particle that represents their physico-chemical properties (Tozzini, 2005). The scheme common to many coarse-grained models is to represent four non-hydrogen atoms by one particle. Simulations of such simplified systems are significantly faster due to two effects: first, the number of particles and especially particle–particle interactions is lower and, second, bonds vibrate with lower frequencies, which makes it is possible to increase simulation time step. Coarsegrained force fields (parameters of covalent and non-covalent interactions) were developed for proteins (de Jong et al., 2013; Shih et al., 2006), membrane components (Marrink et al., 2004; Shih et al., 2006), nucleic acids (Maciejczyk et al., 2010) and carbohydrates (Lopez et al., 2009). These models perform very well for systems where bulk properties dominate over atomic details, such as membranes and membrane–protein interactions (Potocký et al., 2014), formation of membrane nanobodies (Shih et al., 2007), formation of lipid rafts (Risselada and Marrink, 2008) and many others (Marrink and Tieleman, 2013). However, coarse-grained models lack atomic details and therefore they are not suitable for “detailed” phenomena such as binding of a ligand to a protein.

Experimental scientists in drug discovery and biotechnology work with thermodynamic parameters, such as dissociation constants of protein–ligand complexes or free energies stabilizing folded proteins. It is a great challenge to predict values of these parameters by biomolecular simulations. In order to do so, it is necessary to design a structural parameter s, further referred to as a collective variable (CV), that reaches different values in key configurations of the studied system, for example in different conformations of a protein or in different binding poses of a protein–ligand complex. Biomolecular simulation techniques such as molecular dynamic simulation and Monte Carlo method sample the studied system canonically. It is possible to simulate certain molecular system by one of these methods and then analyse the trajectory to calculate evolution of the collective variable s. Next, it is possible to calculate time spent in different configurations with different values of s. This can be simply converted to corresponding probabilities of configurations. The term “canonical sampling” means, that such probabilities are the same as the probabilities in the real macroscopic system, provided that two conditions are fulfilled: first, energies of covalent and non-covalent bonds are accurately modelled and, second, a simulation is sufficiently long. Equilibrium probabilities can be converted to the free energy surface:

T

81

growth of simulated time scales and system sizes. Recent simulations reach millisecond time scales (Lindorff-Larsen et al., 2011) or tens of millions of atoms (Zhao et al., 2013). This huge progress was possible mainly owing to nearly exponential growth of computer power over the decades. The question arises whether such increase of computer power is satisfactory to make biomolecular simulation routine techniques in drug discovery or protein and enzyme design. Unfortunately, the answer is no. Today, we can simulate nanoseconds from the life of a solvated average-size protein per day on a single personal computer, microseconds on large parallel computers and milliseconds on a special hardware. In principle, we can predict the native structure of a protein by simulating its folding from the fully unfolded structure (Duan and Kollman, 1998; Lindorff-Larsen et al., 2011; Shaw et al., 2010; Snow et al., 2002). Analogously, it is possible to predict the binding mode of a ligand in a protein just by simulating a box containing the protein, ligand and solvent until the complex is formed (Dror et al., 2011). There are examples of successful simulations of protein folding or ligand binding (Dror et al., 2011; Duan and Kollman, 1998; Shaw et al., 2010; Snow et al., 2002); however, these examples are far from routine in screening large libraries of compounds or protein mutants in drug discovery or protein engineering campaigns. This situation signifies that hardware development must be complemented by design of new sophisticated simulation methods.

D

58 59

V. Spiwok et al. / Biotechnology Advances xxx (2014) xxx–xxx

E

2

F ðsÞ ¼ −kT ln ðP ðsÞÞ;

121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139

141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160

ð1Þ

where F is the free energy, P is probability, s is the collective variable (could be replaced by multi-dimensional vector s), k is Boltzmann constant and T is thermodynamic temperature. Determination of a model free energy surface of protein–ligand association is illustrated in Fig. 1. It is similar for protein folding simulations; by replacing the “complex” and the “dissociated state” by “folded protein” and “unfolded protein”, respectively, to get a folding free energy surface. Unfortunately, on personal computers we usually cannot simulate the whole process of binding or folding because its time-scale is too long. It is even more difficult to simulate multiple folding/unfolding or binding/ unbinding events, which are necessary to calculate the free energy surface. This is an opportunity for enhanced sampling techniques described below. Some enhanced sampling techniques use Eq. (1) to predict the free energy surface, whereas other methods require different approaches.

162

Alchemistic methods One of the goals of the early chemists – alchemists – was to convert one element to another, usually a cheap element to gold. Elements are being converted from one element to another, at least computationally,

175

Please cite this article as: Spiwok V, et al, Enhanced sampling techniques in biomolecular simulations, Biotechnol Adv (2014), http://dx.doi.org/ 10.1016/j.biotechadv.2014.11.011

163 164 165 166 167 168 169 170 171 172 173 174

176 177 178 Q3

3

F

V. Spiwok et al. / Biotechnology Advances xxx (2014) xxx–xxx

192 193 194 195 196 197 198 199 200 201 202 203 204 205 206

R O

P

D

E

190 191

T

188 189

C

186 187

E

184 185

model protein unfolding from the native structure, but not protein folding from the unfolded structure. Indeed, successful high temperature unfolding simulations have been reported and they have helped to elucidate many issues related to protein folding (DeMarco et al., 2004; Toofanny and Daggett, 2012). Fortunately, there is a way to exploit high temperatures to enhance sampling at biological temperatures, which is represented by parallel tempering method. The method has been first introduced as a variant of Monte Carlo method (Swendsen and Wang, 1986). Wider application on biological systems was made possible by development of parallel tempering molecular dynamic simulation (Sugita and Okamoto, 1999). The parallel tempering scheme starts with choosing the temperatures. Values of temperature must cover a wide range including the biologically relevant ones as well as temperatures when sampling is significantly enhanced. Typically, 15–50 temperatures with bottom range of 270–300 K and top range of 400–500 K are used for a small protein in a small solvent box. In parallel tempering, the studied system is simulated in parallel replicas, each at one of the selected temperatures.

R

182 183

in Alchemistic simulations (Simonson et al., 2002). Imagine we simulate a biomolecular system, for example a protein–ligand complex. During the simulation one atom, for example a selected hydrogen atom of the ligand, is gradually being converted to chlorine. Gradual conversion means that its molecular weight, bond length, partial atomic charge and van der Waals diameter evolve gradually as a function of a parameter λ, which ranges from 0 (ligand with hydrogen) to 1 (ligand with chlorine). Simultaneously, it is possible to calculate how energy of the studied system responds to this conversion. Next, it is possible to calculate the profile of free energy as a function of λ. Different Alchemistic methods, such as free energy perturbation (Zwanzig, 1954), thermodynamic integration (Resat and Mezei, 1993) or Bennett acceptance ratio method (Bennett, 1976) use slightly different procedures to calculate this profile. Why do we calculate free energy difference associated with conversion of one element into another, which is a non-physical process that cannot take place in reality? The answer is that because laws of thermodynamics apply also for these non-physical processes. It is possible to draw a cycle of four processes, two physical and two non-physical (Fig. 2). The first physical process is binding of a ligand with hydrogen (XH) to the protein. The second physical process is binding of a ligand with chlorine (XCl) to the protein. Non-physical processes are represented by the conversion of the ligand with hydrogen to the ligand with chlorine in the binding site of the protein and the analogous process in bulk solution. When we know binding free energy of the ligand XH, for example it has been determined experimentally, it is possible to calculate free energy changes of both non-physical processes and then we can calculate binding free energy of the ligand XCl as:

R

180 181

N C O

179

O

Fig. 1. Schematic demonstration of the concept of free energy surface and canonical sampling on a model energy surface of a protein–ligand interaction. Trajectory of a sufficiently long molecular dynamics simulation (left) can be converted to the free energy surface (right) using Eq. (1). Unfortunately, this procedure cannot be applied to some processes due to poor sampling (bottom).

Δ F freeXCl→boundXCl ¼ ΔF freeXH→boundXH þ ΔF boundXH→boundXCl −Δ F freeXH→freeXCl :

209 210 211 212 213 214 215 216 217 218 219 220 221

As will be shown later, this is useful in lead optimization campaigns as it makes it possible to predict the outcome of a chemical substitution of a molecule without computationally expensive ligand-binding simulations.

U

208

ð2Þ

Parallel tempering Significant improvement of sampling can be achieved at elevated temperatures. Intuitively, systems at higher temperatures can overcome energy barriers easily, compared to the situation at low temperatures. However, temperature does not change only free energy barriers, but it also changes free energy values of individual minima. For example, the native structure of a protein represents its global free energy minimum at biological temperatures. However, the unfolded structure becomes the global free energy minimum at temperatures above protein's melting point. It is therefore possible to use high-temperature simulations to

Fig. 2. Schematic representation of the thermodynamic cycle used in Alchemistic calculation of protein–ligand binding free energies. Binding free energy ΔFfreeXCl → boundXCl can be calculated from known (for example experimentally measured) free energy ΔFfreeXH → boundXH and free energies ΔFboundXH → boundXCl and ΔFfreeXH → freeXCl by Eq. (2). For practical reasons the value of ΔFfreeXH → freeXCl is usually calculated in the absence of the protein.

Please cite this article as: Spiwok V, et al, Enhanced sampling techniques in biomolecular simulations, Biotechnol Adv (2014), http://dx.doi.org/ 10.1016/j.biotechadv.2014.11.011

222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239

249

P¼e

268 269 270 271 272 273 274 275 276 277 278

V bias ðsðt ÞÞ ¼

X

 X  ðsi ðtÞ−Si;t0 Þ2 − we

2σ 2 i

Fig. 3. Schematic representation of parallel tempering method in 20 replicas. The simulation replica (red line, other three selected replicas are shown in grey) starting from an “ugly” conformation (left) tends to exchange for higher temperature replicas. After several exchange attempts it reaches a high temperature where sampling of conformations is enhanced (top). Once the molecule adopts a “nice” conformation, it tends to exchange for low temperature replicas until a low temperature is reached (right). Structure images and profile do not present real results of parallel tempering and are used only for illustration.

279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304

ð4Þ

t0 ≤ t

where w and σi is height and width of a hill, respectively, s(t) is a vector of CVs at time t (with dimension d and components s1(t), s2(t) to sd(t)) and Si,t′ is position of a hill deposited at time t′ (si(t′)). These hills accumulate during the simulation and they “flood” the free energy minima. This allows the system to overcome free energy barriers. Fig. 4 shows progress of this flooding after 100, 200, 400 and 1000 steps of metadynamics. System can therefore easily escape the minimum at the dissociated state and explore all relevant minima in a reasonable number of steps. In molecular dynamic simulation it is possible to calculate the free energy surface by Eq. (1). In metadynamics it is no longer possible because the bias potential spoils canonical sampling. However, we can look again at Fig. 4, namely at the profile after 1000 steps (in red). The sum of the energy surface and the bias potential fluctuates around a flat level as the result of complete flooding. It is therefore possible to

T

266 267

C

264 265

E

262 263

R

260 261

R

258 259

O

256 257

C

254 255

where Ei and Ej and Ti and Tj are energies and temperatures of the i-th and j-th replica, respectively. The computer then generates a random number in the range from 0 to 1. If this number is lower than the probability, replicas are exchanged. Typically, exchanges should occur in 30% of exchange attempts. At the end of the simulation, it is possible to collect all coordinate snapshots at a certain temperature. The exchange criterion presented in Eq. (3) is designed in the way that such collected snapshots sample the studied system canonically at a given temperature. This means that it is possible to calculate probabilities of different conformational families and the free energy surface by Eq. (1), not only for the biological temperature. A simplified explanation (Fig. 3) of parallel tempering could be as follows: we can start the simulation from an “ugly” protein conformation with few favourable non-covalent interactions, thus its potential energy is high. Due to high energy, the replica with such conformation tends to exchange its coordinates with the coordinates of higher temperature replicas and it tends to climb on the temperature range. At high temperatures, owing to enhanced sampling, it can reach some “nice” conformation with many favourable non-covalent interactions. Nice interactions mean low potential energy. Such replica with low energy tends to exchange with lower temperature replicas and it tends to descend on the temperature range. The system can therefore reach “nice” (e.g. native) structure much more efficiently compared to a simulation at a single temperature. It should be also noted that parallel tempering is only one of the methods in the family of replica exchange methods. Replicas may differ in other parameters than temperature; for example an intensively studied group of methods are Hamiltonian replica exchange techniques (Bussi, 2014).

N

252 253

ð3Þ

U

251

  ðEi −E j Þ kT1 i −kT1 j

F

247 248

O

246

R O

244 245

Metadynamics Another group of enhanced sampling methods uses a bias force or bias potential to improve sampling. In this review we will use one method, metadynamics (Barducci et al., 2011; Laio and Parrinello, 2002; Sutto et al., 2012), as a representative of this group; however, there are many other techniques based on the similar principles, such as umbrella sampling (Torrie and Valleau, 1977), local elevation method (Hansen and Hünenberger, 2010; Huber et al., 1994), conformational flooding (Grubmüller, 1995) or adaptive biasing force methods (Babin et al., 2008), to name some. The principle of metadynamics can be demonstrated on a model energy surface representing a fictive protein–ligand interaction (Fig. 4). The global minimum corresponds to the bound state; the major local minimum corresponds to the dissociated state. A classical molecular dynamic simulation starting from the dissociated state would probably stay for a very long time in this minimum due to the fact that this minimum is surrounded by high energy barriers. Metadynamics addresses this problem by application of a history dependent bias potential. Before we start metadynamics, we have to choose few (usually two or three) collective variables s. The distance of the ligand from the binding site was used as the only CV in the demonstration of metadynamics illustrated in Fig. 4. A metadynamic simulation starts as a usual molecular dynamic simulation. The only difference is that in regular intervals a bias potential hill is being added to the potential energy of the studied system. These hills are functions of the CVs and accumulate along the simulation. The resulting bias potential can be written as:

P

242 243

Occasionally, for example every picosecond, energies of neighbouring replicas are compared. Naturally, a replica simulated at lower temperature tends to have lower potential energy than the one simulated at higher temperature. However, as temperature differences between replicas are small, it may happen that the potential energy of the high-temperature replica is lower. In this case, coordinates of replicas are exchanged and simulation of low-temperature continues at higher temperature and vice versa (probability of exchange is 1). If not, replica exchange probability is calculated from the potential energy difference as:

D

240 241

V. Spiwok et al. / Biotechnology Advances xxx (2014) xxx–xxx

E

4

Fig. 4. Schematic representation of metadynamics flooding on a model energy surface of a protein–ligand interaction. Molecular dynamic simulation starting from the dissociated state would stay very long time in the local minimum. The history-dependent bias potential helped to reach the global minimum (bound state) between 200–400 steps. Whole configurational space is sampled within 1000 steps.

Please cite this article as: Spiwok V, et al, Enhanced sampling techniques in biomolecular simulations, Biotechnol Adv (2014), http://dx.doi.org/ 10.1016/j.biotechadv.2014.11.011

306 307 308 309 310 311 312 313 314 315 316 317 318 319

V. Spiwok et al. / Biotechnology Advances xxx (2014) xxx–xxx

use the negative value of the bias potential as an estimate (or “imprint”) of the free energy surface:

F ðsÞ ¼ −

X

 X  ðsi ðtÞ−Si;t0 Þ2 − 2σ 2 i

we

:

surface oscillates between these two extremes. The resulting estimate of the free energy surface may suffer hysteresis and low accuracy. It is extremely difficult to design a CV or CVs that account for all potentially slow degrees of freedom, especially in biological systems. Application of many (more than three) CVs is not likely to be useful in metadynamics. Several knowledge-based collective variables were developed to describe complex structural changes. These collective variables are based on a preliminary knowledge of the studied process. Branduardi et al. (2007) developed path collective variables. Application of these CVs requires a preliminary generation of representative structures sorted along the studied process. For example, in order to simulate closing and opening of the active site of an enzyme, it is necessary to generate the series of representative (landmark) structures of the enzyme starting from the open conformation and going via partially closed to closed conformation. The first path CV, S-path, is defined as:

ð5Þ

0

t ≤ t

323

 0X   0     . 1 kT δ s t −s exp þV bias s t 0 A  .  F ðsÞ ¼ −kT ln @ X exp þV bias ðsðt 0 ÞÞ kT

345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375

Collective variables. Choice of collective variables is an extremely important step in designing a metadynamic simulation. In principle, any structural parameter can be used as a collective variable in metadynamics, provided that: 1. it can be calculated as a function of atomic coordinates of the studied system s(r), and, 2. it is possible to calculate a derivative ds(r)/dri for any i-th coordinate (any x, y or z coordinate of any atom). These conditions are satisfied by most stereochemical parameters such as distances (Laio and Parrinello, 2002), valence and dihedral angles (Laio and Parrinello, 2002), radius of gyration, root mean square deviation (RMSD) from a reference structure, potential energy (Laio and Parrinello, 2002), coordination number of groups of atoms (Iannuzzi et al., 2003) or pharmacophore-based descriptors (Spiwok et al., 2012). Special purpose collective variables were introduced for enhancement of protein secondary structure formation, such as α-helical, parallel β-sheet or antiparallel β-sheet content (Pietrucci and Laio, 2009). For successful metadynamics, collective variables must satisfy the other two conditions. First, values of collective variable must differ for all key states of the studied system, for example for folded and unfolded protein, and dissociated and bound ligand. Second, collective variables must represent all potential sampling bottlenecks. For example, it is possible to describe the progress of binding of a ligand to a protein by the distance between the ligand and the active site (Fig. 4). This CV is low for the bound state and high for the dissociated state. The first condition is thus satisfied. However, it may happen that a side chain of some residue may occasionally close the entrance to the active site. An accurate free energy surface would be obtained if the active site could open and close rapidly during the simulation. On the other hand, if this process is slow and the active site remains closed during the whole simulation, the ligand is predicted as being a non-binder. If the active site closes and opens very slowly, the predicted free energy

ZpathðrÞ ¼ −

1 X ln expð−λDðri ; rÞÞ: λ

376 377 378 379 380 381 382 383 384 385 386 387 388 389 390

ð7Þ

where D is the distance (for example RMSD) between the current coordinates r and the i-th landmark structure ri. Parameter λ is a smoothing coefficient. If the landmark structure ri is similar to r, then the term exp(−λD(ri, r)) approaches to one and i exp(−λD(ri, r)) approaches the value of i. On the other hand, a landmark structure distant from r almost does not influence the value of S-path because exp(−λD(ri, r)) is close to zero. The value of S-path therefore approximates the value i of the nearest landmark structure. The second path collective variable is called Z-path. This CV is usually used in a tandem with S-path. It is defined as:

T

343 344

C

341 342

E

339 340

R

337 338

R

335 336

N C O

334

ð6Þ

where δ is Dirac function, to calculate the free energy surface from virtually any biased simulation. Classical molecular dynamics does not use any bias potential and the term exp(+ Vbias/kT) is thus equal to one. Therefore, the equation becomes equivalent to Eq. (1). Similarly, in a late stage of metadynamics when the free energy minima are flooded, the sum of free energy surface and bias potential is flat. The system can sample the space of CVs with equivalent probability. Eq. (6) thus becomes equivalent to Eq. (5). On-the-fly reweighing method gives some freedom to developers of new biased simulation techniques for which the free energy surface cannot be calculated neither by Eq. (1) nor Eq. (5). Moreover, on-theflight reweighing can be used to calculate free energy surface for CVs that were not used as CVs to construct the metadynamic bias potential (Bonomi et al., 2009a).

U

333

X i expð−λDðri ; rÞÞ SpathðrÞ ¼ X expð−λDðri ; rÞÞ

O

331

R O

329 330

P

327 328

D

325 326

E

324

The way how the free energy surface is calculated from an unbiased molecular dynamic simulations and metadynamics is different. It is possible to calculate the free energy surface by a classical molecular dynamic simulation using Eq. (1), if one has enough computational resources and/or is patient enough. Alternatively, it is possible to use metadynamics and predict the free energy surface as a negative imprint of the bias potential (Eq. (5)). Recently a unifying method of these two approaches was introduced (Bonomi et al., 2009a; Dickson et al., 2010; Tribello et al., 2012). On-the-fly reweighing uses this formula:

F

320 321

5

392 393 394 395 396 397 398 399 400

ð8Þ

This value is low if r is close to any landmark structure ri and high if r diverges from the set of landmark structures. Path CVs make it possible to represent a relatively complex structural transition, for example opening and closing of a binding site, by a single parameter or pairs of parameters. Another knowledge-based approach uses different linear (Spiwok et al., 2007; Sutto et al., 2010) or non-linear (Hashemian et al., 2013; Spiwok and Králová, 2011; Tribello et al., 2012) dimensionality reduction techniques to do the same job. Well tempered metadynamics. Significant improvement of metadynamic performance has been achieved by introduction of well tempered metadynamics (Barducci et al., 2008). In classical version of metadynamics it is necessary to find a compromise between speed of flooding and accuracy of the free energy surface. High hill height (high w in Eqs. (4) and (5)) causes rapid flooding of the free energy surface, but the free energy surface estimate is inaccurate. On the other hand, low w gives accurate free energy surface, but the flooding is slow. It would be therefore useful to design a flooding scheme with high hills at the beginning and small hills at the end of the simulation. This is the case of well tempered metadynamics (Barducci et al., 2008), where height of hills decreases with accumulated bias potential. This method provides smoothly converging free energy surface estimate and does not force the system to sample non-realistic (overflooded) states. In the previous sub-chapter we discussed the example of the side chain blocking the entrance to the active site. If the active site is closed, the calculated free energy surface overpopulates the unbound state because the ligand cannot enter the active site. The advantage of well tempered metadynamics is that flooding of the free energy minimum corresponding to the unbound state slows down until it almost stops. This minimum is not over-flooded and the ligand can “wait” until the binding site opens. Hysteresis of the free energy surface is reduced. A relatively accurate free energy surface can be therefore obtained even though the sampling is complicated by the

Please cite this article as: Spiwok V, et al, Enhanced sampling techniques in biomolecular simulations, Biotechnol Adv (2014), http://dx.doi.org/ 10.1016/j.biotechadv.2014.11.011

402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433

493

Success stories

494 495

In this chapter we would like to present several studies employing the above described enhanced sampling techniques with a particular focus on their practical impact on drug discovery and related areas.

457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491

496

C

455 456

E

454

R

452 453

R

450 451

O

448 449

C

446 447

N

444 445

U

442 443

501

Protein structure prediction by parallel tempering

554

In principle, it is possible to predict the structure of almost any protein by simulating its folding from a fully unfolded conformation. Practically, this is extremely difficult in most cases because of long computational time required. Fast-folding miniproteins, with sizes ranging

555

T

492

Combination of methods The description of methods in previous chapters indicates that different methods are suitable for studying different systems and processes. For example, metadynamics makes it possible to precisely model certain process defined by collective variables, for example a small conformational change in the active site of a large enzyme. Unfortunately, sampling of only few CVs can be actively enhanced. On the other hand, parallel tempering enhances sampling of all degrees of freedom through elevated temperatures. Choice of collective variables before the start of the simulation is not necessary. Unfortunately, the size of the simulated system is limited because a huge number of replicas must be used for large systems. Why not to use the best of both worlds? Metadynamics and parallel tempering were successfully combined in a parallel tempering metadynamics (Bussi et al., 2006). The studied system is simulated in multiple replicas at different temperatures similar to classical parallel tempering. Replica exchanges are attempted in regular intervals with an exchange criterion similar to the one given in Eq. (3); however, this criterion contains the potential energy as well as the bias potential. Free energy surfaces can be calculated for different temperatures as a negative imprint of the bias potential by equation similar to Eq. (5). Parallel tempering metadynamics provides advantages of both methods it combines. Metadynamics makes it possible to enhance sampling with high selectivity, for example in a small region of a large protein. On the other hand, parallel tempering enhances sampling of those processes that are difficult or impossible to be described by a limited number of collective variables. The success stories described below indicate that parallel tempering metadynamics is a very efficient receipt for simulation of ligand binding or conformational change, which takes place in a small region of a large protein. Temperatures of replicas should range from a temperature slightly below the biological one to an elevated temperature, which enhances sampling but does not significantly unfold the protein in the simulated time scale. A recently introduced well tempered ensemble (Bonomi and Parrinello, 2010) can be used to further decrease the number of replicas in parallel tempering metadynamics (Deighan et al., 2012). Another combination of metadynamics and replica exchange method is the method called bias exchange metadynamics (Piana and Laio, 2007). The system is again simulated in multiple replicas. Simulation in each replica is biased by a one-dimensional bias potential with a single CV. The number of CVs is therefore equivalent to the number of replicas (an additional “neutral” replica without bias potential may be also introduced). Replicas can occasionally swap their coordinates, i.e. the system biased along one CV becomes biased along another CV. The criterion for replica exchange is calculated from the bias potentials. The method makes it possible to calculate a series of one-dimensional free energy surfaces for each CV by Eq. (5). More complicated analysis of the results (Marinelli et al., 2009), similar to the above described on-the-fly reweighing (Bonomi et al., 2009a; Dickson et al., 2010; Tribello et al., 2012), must be used to obtain a multi-dimensional free energy surface of the studied system.

We will start the presentation of enhanced sampling techniques by Alchemistic simulations and their application in lead optimization. The first phase of a pre-clinical drug development – lead finding – is characterised by serial testing of large series of compounds. Application of biomolecular simulation techniques (and even enhanced sampling techniques) is very difficult in this phase because large number of compounds must be tested at high computational costs. Compromise between the accuracy and cost is thus typical for computational (such as docking and pharmacophore-based (Sotriffer, 2011)) as well as experimental techniques used in lead finding. On the other hand, the second phase – lead optimization – deals with few compounds. Therefore more accurate and more expensive computational methods can be efficiently employed. In this phase, the lead compound is being modified in order to improve its efficacy and other properties. Alchemistic simulations are particularly powerful in predicting the effect of a small chemical modification of a compound on its binding to a target. This fact makes Alchemistic methods useful in the lead optimization phase. This has been demonstrated by the group of William L. Jorgensen (Bollini et al., 2011). The target of their interest is the allosteric site of HIV reverse transcriptase. An interesting technique used by this group is “chlorine scanning”, where each hydrogen atom is being converted to chlorine by Alchemistic method and the corresponding free energy difference of binding is calculated, namely by free energy perturbation combined with Monte Carlo simulation method. A negative change in free energy indicates that the site is suitable for further substitution for a more bulky group, not only for chlorine. This approach can be demonstrated by serial substitution of an inhibitor discovered previously by docking-based virtual screening with anti-HIV activity EC50 of 4.8 μM (Fig. 5) (Bollini et al., 2011). It was predicted and later experimentally confirmed that a double substitution in positions 2 and 5 or 2 and 6 of the terminal phenyl ring enhances binding to target. Predicted negative values of ΔΔF (difference between binding ΔF of substituted and unsubstituted compound) indicate that the substituted compound binds better than unsubstituted one. Other substitutions, in general, were predicted to weaken the interaction. This was verified by experimental testing, at least for compounds that were synthesized, leading to 0.3 μM inhibitors. Inspired by other HIV reverse transcriptase inhibitor, Bollini et al. tried to Alchemistically convert the methylene linker between both benzene rings to oxygen (Fig. 6). Predicted ΔΔF was negative, indicating that the series of compound with oxygen linker binds better than those with methylene linker. Substitution in positions 3 and 5 was predicted to further enhance affinity in the oxygen-linked series. After a series of computational predictions, chemical syntheses and experimental testing, the study resulted to an inhibitor with an ultimate picomolar value of EC50 (Fig. 6, right). The Alchemistic method, namely free energy perturbation, served in this study not only to verify and rationalize the results of experimental work, but also to accurately predict the outcome of a substitution and to drive the synthesis of compounds. It is unlikely to get such accurate predictions simply by comparing potential energies of static structures of putative protein–ligand complexes or by basic empirical scoring functions. (See Fig. 7.)

F

441

500

O

440

Lead optimization by Alchemistic methods

R O

438 439

This chapter is not intended as a full list of application of these methods. 497 Instead, it presents studies that may become future landmarks in the 498 field. 499

P

436 437

hindering side chain. This makes well tempered metadynamics especially suitable for modelling of biological systems with huge number of degrees of freedom. Since its development in 2008 it has been demonstrated by numerous practical simulations that well tempered metadynamics converges to an exact free energy surface. However, a mathematical proof was missing. This proof was provided recently by Dama et al, (2014).

D

434 435

V. Spiwok et al. / Biotechnology Advances xxx (2014) xxx–xxx

E

6

Please cite this article as: Spiwok V, et al, Enhanced sampling techniques in biomolecular simulations, Biotechnol Adv (2014), http://dx.doi.org/ 10.1016/j.biotechadv.2014.11.011

502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 Q4

556 557 558

7

567 568 569 570 571 572 573 574

Metadynamic design of peptide ligands

576

Metadynamics can be particularly powerful in modelling of conformational changes of small molecules and peptides. Spitaleri and coworkers used metadynamics to design peptide antagonists of αvβ3 (Ghitti et al., 2012; Spitaleri et al., 2011). This member of integrin family is a prominent target for anti-angiogenic therapy. So far it has been targeted mostly by biotechnological drugs such as monoclonal antibodies (Landen et al., 2008). Many natural ligands of this receptor contain a conserved motif Arg-Gly-Asp (RGD), which is crucial for interaction with the receptor. The receptor can be also tricked by a reverse sequence motif isoDGR with the peptide bond between D and G formed by Cγ-carboxylic group of D (Curnis et al., 2006).

583 584 585 586

R

R

N C O

581 582

R O

P

U

579 580

E

575

577 578

D

565 566

Short cyclic peptides are very interesting for therapeutic intervention of integrins because the cyclisation can restrain peptide's conformation and orient side chains of the (iso)DGR motif in the “receptor-ready” conformation, i.e. in the conformation similar to the peptide–receptor complex. However, cyclic peptides are still relatively flexible and it is important to identify those peptides for which the receptor-ready conformation is dominant. It is unlikely to predict these conformations by simple comparison of potential energies of static peptide structures, because free energy values of different conformers is influenced by peptide–water interactions, entropy effects etc. Classical molecular dynamic simulation is likely to be inefficient because of long simulations necessary to sample the whole conformational space of the peptide. Metadynamics with properly chosen CVs seems to be a viable strategy. Spitaleri et al. (2011) used well-tempered metadynamics with Ramachandran torsions of glycine in RGD or (iso)DGR motif as collective variables. Conformational free energy surface was determined for four cyclic peptides, namely a known high-affinity ligand c(RGDf(NMe)V) (f denotes D-Phe, NMe denotes peptide bond methylation), low-affinity disulphide-linked ligand CDGRC and two new peptides — disulphidelinked CisoDGRC and ac-CisoDGRC (ac denotes N-terminal acetylation). It was found that conformational equilibria of these molecules are different. The molecule c(RGDf(NMe)V) preferred the conformation found in an experimentally determined structure of its complex with a receptor. CDGRC preferred similar values of glycine Ramachandran torsions as c(RGDf(NMe)V); however, this conformation was not suitable for binding to the receptor due to reverse order of residues. Finally, it was predicted that CisoDGRC and ac-CisoDGRC exist in an equilibrium of three similarly populated conformers. One of the conformers was ready to bind to the receptor as predicted by docking study and further supported by NMR and affinity measurements. This approach combining conformational “tuning” of a cyclic peptide by metadynamics, followed by molecular

E

563 564

T

561 562

from 10 to ~100 residues, are often used as model systems to test this approach. These proteins usually fold in microsecond time-scales, which makes folding simulations feasible on current hardware. Jiang and Wu (2014) used parallel tempering to simulate folding of 14 different fast-folding mini-proteins (10–80 amino acid residues) in order to test their newly developed force field (force field refers to parameterisation of covalent and non-covalent interactions). For each mini-protein they carried out 0.4 to 3.3 μs (which is generally less than their respective folding times) parallel tempering simulation in 12–36 replicas. In total, 13 out of 14 mini-proteins folded correctly (with RMSD from the experimental structure of 0.5–3.2 Å). Moreover, they observed multiple folding and unfolding transitions in most simulation, which is necessary for prediction of folding thermodynamics. Completeness of sampling was comparable with unbiased simulation of same proteins on a specialized hardware Anton (Lindorff-Larsen et al., 2011), but with at least one order of magnitude shorter simulations.

C

559 560

O

Fig. 5. Schema of lead optimization by Alchemistic simulations (Bollini et al., 2011).

F

V. Spiwok et al. / Biotechnology Advances xxx (2014) xxx–xxx

Fig. 6. Schema of lead optimization by Alchemistic simulations (Bollini et al., 2011).

Please cite this article as: Spiwok V, et al, Enhanced sampling techniques in biomolecular simulations, Biotechnol Adv (2014), http://dx.doi.org/ 10.1016/j.biotechadv.2014.11.011

587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617

V. Spiwok et al. / Biotechnology Advances xxx (2014) xxx–xxx

The authors first had to predict the activation route of each receptor, in order to design the activation CV. They used the adiabatic bias force (ABMD) method to predict the pathway of conformational change. Briefly, ABMD uses a fictive particle, which is connected by harmonic bonds with a certain collective variable of the studied molecular system. This particle can be forced to navigate the system to explore different values of this collective variable. In this study they used ABMD to force transition between activated and inactivated conformations of both unliganded receptors. Activated and inactivated conformations were obtained by homology modelling on the basis of experimental structures. Once the transition route was sampled, it was possible to define path collective variables, add (dock) tested ligands and perform metadynamic calculations. For monomeric 5-HT2A they found that the complex with ligand clozapine prefers the inactivated conformation, the complex with 2,5dimethoxy-4-iodoamphetamine prefers the activated conformation and the complex with methysergide prefers the inactivated conformation of the receptor. These results are in agreement with experimental efficacies of the studied ligands. Then they studied the effect of a ligand bound in the binding site of 5-HT2A of 5-HT2A/mGluR2 heterodimer on activation of unliganded mGluR2. In detail, they constructed the structure of 5-HT2A/mGluR2 heterodimer, they docked each ligand to the 5-HT2A binding site and calculated the free energy surface with the mGluR2 activation CV. They found that clozapine bound to 5-HT2A promotes activation of mGluR2. The other two compounds, 2,5-dimethoxy-4-iodoamphetamine and methysergide, shifted equilibrium toward the inactivated conformation. These results are in agreement with experimental data and may explain different pharmacologies of 5-HT2A ligands. Moreover, structures of the receptor sampled by metadynamics in this study may serve as targets for structure-based design of “tuned” antipsychotic drugs, for example by virtual screening campaigns.

661 662

Ligand design by metadynamics

693

Metadynamics shows a great potential for ligand design. As already mentioned, without sampling enhancement it is extremely difficult to simulate binding and unbinding of a ligand to and from a protein, especially because of the long time scale off these processes. However, it can be efficiently enhanced by a metadynamic bias potential. This approach was employed in the first metadynamic docking study (Gervasio et al., 2005). Different ligands (benzamidine, 4-chlorbenzamidine, phosphocholine and staurosporine) were successfully docked into active sites of their target (trypsin, antibody and cyclin-dependent kinase 2). The resulting free energy surfaces were in a good agreement with experimentally determined dissociation constants. In order to improve efficiency of this pioneer methodology, it was necessary to solve the problem of inefficient sampling once the ligand exits the binding site. It is relatively easy to navigate the ligand out from the binding site by a simple collective variable such as distance from a selected biding site residue. On the other hand, driving the ligand into the binding site is usually much more complicated as the ligand can stick on the surface of the protein or enter various dead end routes. It is therefore necessary to somehow confine movements of the ligand outside the binding site and simultaneously it is necessary to address the effect of such confinement on the predicted value of binding free energy. Two different approaches were introduced to solve this problem. One solution can be in application of path collective variables. This approach was elegantly employed by Heptares company in their metadynamic-based scoring of compounds bound to G-protein coupled receptors (GPCRs) (Mason et al., 2013). This methodology shows that application of enhanced sampling simulation allows a compromise between the accuracy and speed. The procedure starts by generating the system containing the receptor (locked in the space by harmonic potential restraints), explicitly modelled water, ions and the tested ligand placed in a distance of approximately 2.5 nm from the binding

694 695

Modelling of target conformation by metadynamics

622

Molecular targets of drug discovery, mostly proteins, are flexible. Target flexibility may become extremely important in some classes of targets, such as G-protein coupled receptors (GPCRs), where targeting different receptor conformations by small molecule compounds may lead to different responses, such as agonistic, antagonistic, partially agonistic or others. For an efficient design of GPCR drugs it is important to know not only the time-average structure of the receptor, but also the structure of the receptor in an appropriate state. Determination of X-ray structures of selected GPCRs is one of major achievements of structural biology. Structural biology methods also shed light on the molecular basis of receptor activation and allosteric signal transduction. However, application of molecular modelling techniques can help complete the picture of conformational changes associated with GPCR signalling. Fribourg et al. (2011) studied how selected antipsychotic drugs modulate GPCR signalling. The aim of this study was to explain why compounds with similar affinities toward serotonin receptor 5-HT2A show different signalling patterns. It was found previously (Vilardaga et al., 2008), that GPCR may form dimers and even heterodimers with different GPCRs. Signalling by heterodimeric receptor may be significantly different from signalling by a mixture of a corresponding monomeric receptors. 5-HT2A is a serotonin receptor acting via G protein complex Gq. Another receptor in this story, metabotropic glutamate 2 receptor (mGluR2), acts via G protein complex G i . Heterodimeric receptor 5-HT2A/mGluR2 can activate both Gq and Gi by binding serotonergic or glutamatergic drugs. This may explain different signalling patterns of different 5-HT2A ligands. It was found that formation of 5-HT2A/mGluR2 heterodimer reduces signalling by 5-HT2A's Gq and enhances signalling by mGluR2's Gi. Balance between these signalling pathways may be altered by the studied 5-HT2A ligands. In order to see this effect at the atomic level, the authors performed adiabatic bias molecular dynamic (ABMD) and metadynamic calculations. The aim of these simulations was to calculate the conformational free energy surface of a ligand-bound receptor. The degree of receptor activation was used as a collective variable. The resulting 1D free energy surface is characterised by the global minimum at a low value of activation CV, provided that the ligand binds to the inactivated conformation, or at high values of activation CV, provided that the ligand binds to the activated conformation.

634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660

C

E

R

632 633

R

630 631

O

629

C

627 628

N

625 626

U

623 624

P

621

T

620

docking of predominant conformations and NMR and affinity measurements, is now intensively used by this group to design new integrin ligands.

D

618 619

E

Fig. 7. Free energy surface of binding of SSR128129E to D3 domain of FGFR. Adopted from Herbert et al. (2013). © Cell Press.

R O

O

F

8

Please cite this article as: Spiwok V, et al, Enhanced sampling techniques in biomolecular simulations, Biotechnol Adv (2014), http://dx.doi.org/ 10.1016/j.biotechadv.2014.11.011

663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692

696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724

V. Spiwok et al. / Biotechnology Advances xxx (2014) xxx–xxx

746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790

Application of biomolecular simulations in drug discovery and protein engineering is still limited by simulated time scales. The time scale may increase in the future due to development of more powerful hardware, by application of special purpose computers or by quantum computing (Perdomo-Ortiz et al., 2012). Nevertheless, enhanced sampling techniques can always bring the advantage. These techniques are also becoming more widespread in the field of biomolecular simulations. This is owing to the fact that many software packages include functionalities for enhanced sampling simulations. Basic biomolecular simulation software such as AMBER (Salomon-Ferrer et al., 2013), Gromacs (Pronk et al., 2013) or NAMD (Phillips et al., 2005) contain functionalities for Alchemistic methods and parallel tempering. Biased simulations, in particular metadynamics, can be efficiently ported to these packages by Plumed software (Bonomi et al., 2009b; Tribello et al., 2014). Some of the success stories introduced in the previous text indicate that a very promising strategy is to combine different enhanced sampling techniques, in particular a biased simulation with replica exchange methods (Bussi et al., 2006; Deighan et al., 2012; Herbert et al., 2013; Piana and Laio, 2007; Marinelli et al., 2009). Finally, combination of metadynamics (or other enhanced sampling technique) with machine learning (Hashemian et al., 2013; Spiwok and Králová, 2011; Tribello et al., 2012) may help simplify and automatize design of collective variables. An important aspect of any biomolecular simulation method is its accuracy. Metadynamics has a privileged position amongst enhanced sampling techniques because important methodological studies aimed at assessment (Laio et al., 2005) and improvement (Barducci et al., 2008; Dama et al., 2014) of metadynamic accuracy have been carried out since its introduction in 2002. We still must keep in mind that the accuracy of sampling enhancement is just one side of the coin. The other side of the coin is the accuracy of the molecular simulation layer, namely the accuracy of modelling of covalent and non-covalent interactions, also referred to as the force field. Proteins, nucleic acids, carbohydrates and lipids are composed of a limited number of building blocks. Owing to this fact, the current molecular force fields for these biomolecules seem to converge to a very good accuracy (Lindorff-Larsen et al., 2012; Kirschner and Woods, 2001; Banáš et al., 2010). However, the accuracy of force fields for general drug-like molecules seems to be far from being solved. Drug development benefits from the diversity of the chemical space of drug-like molecules, but simultaneously this diversity complicates development of molecular force fields. Moreover, force field inaccuracies tend to manifest in longer time scales, which can be reached either by increasingly long biomolecular simulations or by application of enhanced sampling techniques. Success of metadynamics in modelling of protein–ligand interactions may open this method for wide application in drug discovery. These methods still require relatively long simulations, despite application of enhanced sampling. Therefore they are not suitable for serial testing of millions of compounds. This problem can be solved in near future by fragment-based ligand design. It is possible to decompose millions of compounds into thousands of fragments. These fragments can be docked into the protein by relatively expensive methods, such

802

O

F

801

R O

744 745

Conclusions and future prospects

P

742 743

791 792

D

740 741

There are several issues related to medicinal chemistry and drug design, not discussed above, that could be solved by metadynamics, such as protein–protein interactions or the effect of mutations on protein structure and function. The former issue is represented by modelling of oligomerization of bacteriophage T4 fibritin trimerization domain (foldon) (Barducci et al., 2013). Metadynamic modelling of mutation outcome is represented by conformational free energy modelling of the activation loop and other structural elements in receptor protein kinase, namely epidermal growth factor receptor (EGFR), and its three oncogenic mutants (Sutto and Gervasio, 2013).

E

738 739

T

736 737

C

734 735

E

732 733

R

731

R

729 730

N C O

727 728

site. The system is optimized by energy minimization and tempered by a short (2 ps) molecular dynamic simulation. Then the ligand is docked into the active site by adiabatic bias molecular dynamics simulation (ABMD). Again, this step was modest in terms of computational costs. Next, the trajectory is used to pick two landmark structures, one for the bound and one for the unbound state, for definition of path collective variables. The path collective variable S-path was used as the only CV in well-tempered metadynamics, again very short (2 ps). This metadynamic procedure can be repeated multiple times (102 simulations per ligand) to obtain a statistically processable set of free energy profiles. This metadynamic-based scoring procedure was applied mostly in a druggability prediction study (Mason et al., 2013). Another solution to this problem is funnel metadynamics (Limongelli et al., 2013). This method uses distance as a CV together with an additional funnel-shaped potential, which confines movements of the ligand. The conical tip of the funnel points outward the binding site, whereas the cone opens up toward the binding site. The shape of the funnel does not restrain the ligand when it is in the vicinity of the binding site. On the other hand, it prevents the ligand from “getting lost” when unbound from the binding site. The absolute binding free energy can be calculated by application of a simple correction to the effect of the funnel. This approach has been successfully tested on binding of benzamidine to trypsin and binding of a selective COX-2 inhibitor to COX-2 (Limongelli et al., 2013). Possibly the biggest impact of metadynamic method on drug discovery is recent determination of the binding mode of fibroblast growth factor receptor (FGFR) allosteric ligand SSR128129E (Bono et al., 2013; Herbert et al., 2013). FGFR is a receptor protein-tyrosine kinase involved in cell development, angiogenesis and other processes. Loss of control of this receptor can lead to cancer development. The compound SSR128129E was developed by experimental high throughput screening and lead optimization. It has an antagonistic effect at nanomolar concentrations by binding onto the extracellular domain of the receptor, which means that it does not have to enter the cell and compete directly with kinase active site. The compound modulates the interaction with the natural effector protein FGF, but does not fully compete with it. It was experimentally shown that SSR128129E binds to extracellular domains D2 and D3 with a different affinity and that overall binding constant is influenced by the presence of domain D2, D3 and natural ligand FGF. Knowledge of the “mode of action”, which usually means knowing the 3D structure of the ligand–target complex, is one of prerequisites for the progress in a drug discovery pipeline. However, the FGFR– SSR128129E complex had been highly resistant to all structural biology efforts, which had blocked future clinical testing and application of this promising compound. Therefore, Gervasio and co-workers used metadynamics to predict the structure of this complex (Herbert et al., 2013). They used bias exchange metadynamics in five replicas, one neutral, two biasing selected protein–ligand distances and pseudotorsions (dihedral angle describing orientation of the ligand relative to the protein), one biasing secondary structure transitions and one biasing closing/opening of the binding site. Besides this they used also classical (non-bias-exchange) metadynamic simulations with two CVs. The authors correctly predicted that SSR128129E binds weakly to the extracellular domain D2 and strongly to the extracellular domain D3. Important conformational changes were observed in the target upon binding. They included stabilization of one α-helix and one β-sheet-to-α-helix transition. This transition forms a cavity where SSR128129E binds and which is potentially druggable by other compounds. This study demonstrates that the application of an enhanced sampling technique, in this case metadynamics, in combination with indirect evidences from molecular biological techniques, NMR spectroscopy, site-directed mutageneses and a structure–activity relationship study can predict and experimentally confirm the binding mode and the mode of action of a small biologically active molecule when structural biology methods fail.

U

725 726

9

Please cite this article as: Spiwok V, et al, Enhanced sampling techniques in biomolecular simulations, Biotechnol Adv (2014), http://dx.doi.org/ 10.1016/j.biotechadv.2014.11.011

793 794 795 796 797 798 799 800 Q5

803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854

864

References

865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931

Babin V, Roland C, Sagui C. Stabilization of resonance states by an asymptotic Coulomb potential. J Chem Phys 2008;128:134101. Banáš P, Hollas D, Zgarbová M, Jurečka P, Orozco M, Cheatham T, et al. Performance of molecular mechanics force fields for RNA simulations. Stability of UUCG and GNRA hairpins. J Chem Theory Comput 2010;6:3836–49. Barducci A, Bussi G, Parrinello M. Well-tempered metadynamics: a smoothly converging and tunable free-energy method. Phys Rev Lett 2008;100:020603. Barducci A, Bonomi M, Parrinello M. Metadynamics. WIREs Comput Mol Sci 2011;1: 826–43. Barducci A, Bonomi M, Prakash MK, Parrinello M. Free-energy landscape of protein oligomerization from atomistic simulations. Proc Natl Acad Sci U S A 2013;18:E4708–13. Bennett CH. Efficient estimation of free energy differences from Monte Carlo data. J Comp Physiol 1976;22:245–68. Bollini M, Domaoal RA, Thakur VV, Gallardo-Macias R, Spasov KA, Anderson KA, et al. Computationally-guided optimization of a docking hit to yield catechol diethers as potent anti HIV agents. J Med Chem 2011;54:8582–91. Bono F, De Smet F, Herbert C, De Bock K, Georgiadou M, Fons P, et al. Inhibition of tumor angiogenesis and growth by a small-molecule multi-FGF receptor blocker with allosteric properties. Cancer Cell 2013;23:477–88. Bonomi M, Parrinello M. Enhanced sampling in the well-tempered ensemble. Phys Rev Lett 2010;104:190601. Bonomi M, Barducci A, Parrinello M. Reconstructing the equilibrium Boltzmann distribution from well-tempered metadynamics. J Comput Chem 2009a;30:1615–21. Bonomi M, Branduardi D, Bussi G, Camilloni C, Provasi D, Raiteri P, et al. PLUMED: a portable plugin for free energy calculations with molecular dynamics. Comput Phys Commun 2009b;180:1961–72. Branduardi D, Gervasio FL, Parrinello M. From A to B in free energy space. J Chem Phys 2007;126:054103. Bussi G. Hamiltonian replica exchange in GROMACS: a flexible implementation. Mol Phys 2014;112:379–84. Bussi G, Gervasio FL, Laio A, Parrinello M. Free-energy landscape for beta hairpin folding from combined parallel tempering and metadynamics. J Am Chem Soc 2006;128: 13435–41. Cui Q, Nussinov R. Making biomolecular simulations accessible in the post-Nobel prize era. PLoS Comput Biol 2014;14:e1003786. Curnis F, Longhi R, Crippa L, Cattaneo A, Dondossola E, Bachi A, et al. Spontaneous formation of L-isoaspartate and gain of function in fibronectin. J Biol Chem 2006;281:36466–76. Dama JF, Parrinello M, Voth GA. Well-tempered metadynamics converges asymptotically. Phys Rev Lett 2014;112:240602. de Jong DH, Singh G, Bennett WFD, Arnarez C, Wassenaar TA, Schäfer LV, et al. Improved parameters for the Martini coarse-grained protein force field. J Chem Theory Comput 2013;9:687–97. Deighan M, Bonomi M, Pfaendtner J. Efficient simulation of explicitly solvated proteins in the well-tempered ensemble. J Chem Theory Comput 2012;8:2189–92. DeMarco ML, Alonso DOV, Daggett V. Diffusing and colliding: the atomic level folding/ unfolding pathway of a small helical protein. J Mol Biol 2004;341:1109–24. Dickson BM, Lelièvre T, Stoltz G, Legoll F, Fleurat-Lessard P. Free energy calculations: an efficient adaptive biasing potential method. J Phys Chem B 2010;114:5823–30. Dror RO, Pan AC, Arlow DH, Borhani DW, Maragakis P, Shan Y, et al. Pathway and mechanism of drug binding to G-protein-coupled receptors. Proc Natl Acad Sci U S A 2011; 108:13118–23. Duan Y, Kollman PA. Pathways to a protein folding intermediate observed in a 1microsecond simulation in aqueous solution. Science 1998;282:740–4. Fribourg M, Moreno JL, Holloway T, Provasi D, Baki L, Mahajan R, et al. Decoding the signaling of a GPCR heteromeric complex reveals a unifying mechanism of action of antipsychotic drugs. Cell 2011;147:1011–23. Gervasio FL, Laio A, Parrinello M. Flexible docking in solution using metadynamics. J Am Chem Soc 2005;127:2600–7. Ghitti M, Spitaleri A, Valentinis B, Mari S, Asperti C, Traversari C, et al. Molecular dynamics reveal that isoDGR-containing cyclopeptides are true αvβ3 antagonists unable to promote integrin allostery and activation. Angew Chem Int Ed Engl 2012;51:7702–5. Grubmüller H. Predicting slow structural transitions in macromolecular systems: conformational flooding. Phys Rev E 1995;52:2893–906. Hansen HS, Hünenberger PH. Using the local elevation method to construct optimized umbrella sampling potentials: calculation of the relative free energies and interconversion barriers of glucopyranose ring conformers in water. J Comput Chem 2010; 31:1–23.

F

863

Funding was provided by Czech Ministry of Education, Youth and Sport via COST actions GLISTEN (CM1207, LD14133) and MultiGlycoNano (CM1102, LD1324).

O

861 862

R O

Acknowledgements

P

860

C

E

R

R

O

C

N

U

857 858 Q6

Harvey M, Giupponi G, De Fabritiis G. ACEMD: accelerated molecular dynamics simulations in the microseconds timescale. J Chem Theory Comput 2009;5:1632–9. Hashemian B, Millán D, Arroyo M. Modeling and enhanced sampling of molecular systems with smooth and nonlinear data-driven collective variables. J Chem Phys 2013;139:214101. Herbert C, Schieborr U, Saxena K, Juraszek J, De Smet F, Alcouffe C, et al. Molecular mechanism of SSR128129E, an extracellularly acting, small-molecule, allosteric inhibitor of FGF receptor signaling. Cancer Cell 2013;23:489–501. Huang D, Lüthi U, Kolb P, Cecchini M, Barberis A, Caflisch A. In silico discovery of βsecretase inhibitors. J Am Chem Soc 2006;128:5436–43. Huber T, Torda AE, van Gunsteren WF. Local elevation: a method for improving the searching properties of molecular dynamics simulation. J Comput-Aided Mol Des 1994;8:695–708. Iannuzzi M, Laio A, Parrinello M. Efficient exploration of reactive potential energy surfaces using Car–Parrinello molecular dynamics. Phys Rev Lett 2003;90:238302. Jiang F, Wu Y-D. Folding of fourteen small proteins with a residue-specific force field and replica-exchange molecular dynamics. J Am Chem Soc 2014;136:9536–9. Kirschner KN, Woods RJ. Solvent interactions determine carbohydrate conformation. Proc Natl Acad Sci U S A 2001;98:10541–5. Laio A, Parrinello M. Escaping free-energy minima. Proc Natl Acad Sci U S A 2002;99: 12562–6. Laio A, Rodriguez-Fortea A, Gervasio FL, Ceccarelli M, Parrinello M. Assessing the accuracy of metadynamics. J Phys Chem B 2005;109:6714–21. Landen CN, Kim TJ, Lin YG, Merritt WM, Kamat AA, Han LY, et al. Tumor-selective response to antibody-mediated targeting of αvβ3 integrin in ovarian cancer. Neoplasia 2008;10:1259–67. Limongelli V, Bonomi M, Parrinello M. Funnel metadynamics as accurate binding freeenergy method. Proc Natl Acad Sci U S A 2013;110:6358–63. Lindorff-Larsen K, Piana S, Dror RO, Shaw DE. How fast-folding proteins fold. Science 2011;334:517–20. Lindorff-Larsen K, Maragakis P, Piana S, Eastwood MP, Dror RO, Shaw DE. Systematic validation of protein force fields against experimental data. PLoS One 2012;7:e32131. Lopez CA, Rzepiela A, de Vries AH, Dijkhuizen L, Huenenberger PH, Marrink SJ. The Martini coarse grained force field: extension to carbohydrates. J Chem Theory Comput 2009;5:3195–210. Maciejczyk M, Spasic A, Liwo A, Scheraga HA. Coarse-grained model of nucleic acid bases. J Comput Chem 2010;31:1644–55. Marinelli F, Pietrucci F, Laio A, Piana S. A kinetic model of Trp-cage folding from multiple biased molecular dynamics simulations. PLoS Comput Biol 2009;5:e1000452. Marrink SJ, Tieleman DP. Perspective on the Martini model. Chem Soc Rev 2013;42: 6801–22. Marrink SJ, de Vries AH, Mark AE. Coarse grained model for semi-quantitative lipid simulations. J Phys Chem B 2004;108:750–60. Mason JS, Bortolato A, Weiss DR, Deflorian F, Tehan B, Marshall FH. High end GPCR design: crafted ligand design and druggability analysis using protein structure, lipophilic hotspots and explicit water networks. In Silico Pharmacol 2013;1:23. McCammon JA, Gelin BR, Karplus M. Dynamics of folded proteins. Nature 1977;267: 585–90. Murray CW, Blundell TL. Structural biology in fragment-based drug design. Curr Opin Struct Biol 2010;20:497–507. Perdomo-Ortiz A, Dickson N, Drew-Brook M, Rose G, Aspuru-Guzik A. Finding low-energy conformations of lattice protein models by quantum annealing. Sci Rep 2012;2:571. Phillips JC, Braun R, Wang W, Gumbart J, Tajkhorshid E, Villa E, et al. Scalable molecular dynamics with NAMD. J Comput Chem 2005;26:1781–802. Piana S, Laio A. A bias-exchange approach to protein folding. J Phys Chem B 2007;111: 4553–9. Pietrucci F, Laio A. A collective variable for the efficient exploration of protein β-sheet structures: application to SH3 and GB1. J Chem Theory Comput 2009;5:2197–201. Potocký M, Pleskot R, Pejchar P, Vitale N, Kost B, Zárský V. Live-cell imaging of phosphatidic acid dynamics in pollen tubes visualized by Spo20p-derived biosensor. New Phytol 2014;203:483–94. Pronk S, Páll S, Schulz R, Larsson P, Bjelkmar P, Apostolov R, et al. GROMACS 4.5: a highthroughput and highly parallel open source molecular simulation toolkit. Bioinformatics 2013;29:845–54. Resat H, Mezei M. Studies on free energy calculations. I. Thermodynamic integration using a polynomial path. J Chem Phys 1993;99:6052–61. Risselada HJ, Marrink SJ. The molecular face of lipid rafts in model membranes. Proc Natl Acad Sci U S A 2008;105:17367–72. Salomon-Ferrer R, Case DA, Walker RC. An overview of the Amber biomolecular simulation package. WIREs Comput Mol Sci 2013;3:198–210. Shaw DE, Deneroff MM, Dror RO, Kuskin JS, Larson RH, Salmon JK, et al. Anton, a specialpurpose machine for molecular dynamics simulation. Commun ACM 2008;51:91–7. Shaw DE, Maragakis P, Lindorff-Larsen K, Piana S, Dror RO, Eastwood MP, et al. Atomiclevel characterization of the structural dynamics of proteins. Science 2010;330: 341–6. Shih AY, Arkhipov A, Freddolino PL, Schulten K. Coarse grained protein–lipid model with application to lipoprotein particles. J Phys Chem B 2006;110:3674–84. Shih AY, Freddolino PL, Sligar SG, Schulten K. Disassembly of nanodiscs with cholate. Nano Lett 2007;7:1692–6. Shirts M, Pande VS. Screen savers of the world unite! Science 2000;290:1903–4. Simonson T, Archontis G, Karplus M. Free energy simulations come of age: protein–ligand recognition. Acc Chem Res 2002;35:430–7. Snow CD, Ngyen H, Pande VS, Gruebele M. Absolute comparison of simulated and experimental protein-folding dynamics. Nature 2002;420:102–6. Sotriffer C. Virtual screening: principles, challenges, and practical guidelines. Wiley-VCH; 2011.

T

859

as metadynamics, and the results can be combined in order to predict affinities of compounds in the original library. This fragment-based screening turned out to be efficient in computational docking (Huang et al., 2006) as well as experimental (Murray and Blundell, 2010) screening campaigns.

D

855 856

V. Spiwok et al. / Biotechnology Advances xxx (2014) xxx–xxx

E

10

Please cite this article as: Spiwok V, et al, Enhanced sampling techniques in biomolecular simulations, Biotechnol Adv (2014), http://dx.doi.org/ 10.1016/j.biotechadv.2014.11.011

932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 Q7 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017

V. Spiwok et al. / Biotechnology Advances xxx (2014) xxx–xxx

Swendsen RH, Wang JS. Replica Monte Carlo simulation of spin glasses. Phys Rev Lett 1986;57:2607–9. Toofanny RD, Daggett V. Understanding protein unfolding from molecular simulations. WIREs Comput Mol Sci 2012;2:405–23. Torrie GM, Valleau JP. Nonphysical sampling distributions in Monte Carlo free-energy estimation: umbrella sampling. J Comput Phys 1977;23:187–99. Tozzini V. Coarse-grained models for proteins. Curr Opin Struct Biol 2005;15:144–50. Tribello G, Ceriotti M, Parrinello M. Using sketch-map coordinates to analyze and bias molecular dynamics simulations. Proc Natl Acad Sci U S A 2012;109:5196–201. Tribello G, Bonomi M, Branduardi D, Camilloni C, Bussi G. PLUMED 2: new feathers for an old bird. Comput Phys Commun 2014;185:604–13. Vilardaga JP, Nikolaev VO, Lorenz K, Ferrandon S, Zhuang Z, Lohse MJ. Conformational cross-talk between alpha2A-adrenergic and mu-opioid receptors controls cell signaling. Nat Chem Biol 2008;4:126–31. Zhao G, Perilla JR, Yufenyuy EL, Meng X, Chen B, Ning J, et al. Mature HIV-1 capsid structure by cryo-electron microscopy and all-atom molecular dynamics. Nature 2013;497:643–6. Zwanzig RW. High-temperature equation of state by a perturbation method. I. Nonpolar gases. J Chem Phys 1954;22:1420–6.

F

Spitaleri A, Ghitti M, Mari S, Alberici L, Traversari C, Rizzardi GP, et al. Use of metadynamics in the design of isoDGR-based αvβ3 antagonists to fine-tune the conformational ensemble. Angew Chem Int Ed Engl 2011;50:1832–6. Spiwok V, Králová B. Metadynamics in the conformational space nonlinearly dimensionally reduced by Isomap. J Chem Phys 2011;135:224504. Spiwok V, Lipovová P, Králová B. Metadynamics in essential coordinates: free energy simulation of conformational changes. J Phys Chem B 2007;111:3073–6. Spiwok V, Hlat-Glembová K, Tvaroška I, Králová B. Conformational free energy modeling of druglike molecules by metadynamics in the WHIM space. J Chem Inf Model 2012; 52:804–13. Sterling T. Beowulf cluster computing with Linux. MIT Press; 2001. Sugita Y, Okamoto Y. Replica-exchange molecular dynamics method for protein folding. Chem Phys Lett 1999;314:141–51. Sutto L, Gervasio FL. Effects of oncogenic mutations on the conformational free-energy landscape of EGFR kinase; 2013. Sutto L, D'Abramo M, Gervasio FL. Comparing the efficiency of biased and unbiased molecular dynamics in reconstructing the free energy landscape of Met-enkephalin. J Chem Theory Comput 2010;6:3640–6. Sutto L, Marsili S, Gervasio FL. New advances in metadynamics. WIREs Comput Mol Sci 2012;2:771–9.

O

1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037

11

U

N C O

R

R

E

C

T

E

D

P

R O

1057

Please cite this article as: Spiwok V, et al, Enhanced sampling techniques in biomolecular simulations, Biotechnol Adv (2014), http://dx.doi.org/ 10.1016/j.biotechadv.2014.11.011

1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056