Molecular simulation of peptides coming of age: Accurate prediction of folding, dynamics and structures

Molecular simulation of peptides coming of age: Accurate prediction of folding, dynamics and structures

Accepted Manuscript Molecular simulation of peptides coming of age: Accurate prediction of folding, dynamics and structures Panagiota S. Georgoulia, N...

16MB Sizes 0 Downloads 17 Views

Accepted Manuscript Molecular simulation of peptides coming of age: Accurate prediction of folding, dynamics and structures Panagiota S. Georgoulia, Nicholas M. Glykos PII:

S0003-9861(18)30734-3

DOI:

https://doi.org/10.1016/j.abb.2019.01.033

Reference:

YABBI 7939

To appear in:

Archives of Biochemistry and Biophysics

Received Date: 14 September 2018 Revised Date:

23 January 2019

Accepted Date: 28 January 2019

Please cite this article as: P.S. Georgoulia, N.M. Glykos, Molecular simulation of peptides coming of age: Accurate prediction of folding, dynamics and structures, Archives of Biochemistry and Biophysics (2019), doi: https://doi.org/10.1016/j.abb.2019.01.033. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Molecular simulation of peptides coming of age: accurate prediction of folding, dynamics and structures. Department of Chemistry and Biomedical Sciences, Faculty of Health and Life Sciences, Linnæus University, Kalmar, 39182, Sweden 2 3

Linnæus University Centre of Excellence “Biomaterials Chemistry”, Kalmar, 3912, Sweden

Department of Molecular Biology and Genetics, Democritus University of Thrace, University Campus, Alexandroupolis, 68100, Greece

SC

1

RI PT

Panagiota S. Georgoulia1,2 , Nicholas M. Glykos3 *

M AN U

Abstract

The application of molecular dynamics simulations to study the folding and dynamics of peptides has attracted a lot of interest in the last couple of decades. Following the successful prediction of the folding of several proteins using molecular simulation, foldable peptides emerged as a favourable system mainly due to their application in improving protein structure prediction methods and in drug design studies. However,

D

their performance is inherently linked to the accuracy of the empirical force fields used

TE

in the simulations, whose optimisation and validation is of paramount importance. Here we review the most important findings in the field of molecular peptide simulations and highlight the significant advancements made over the last twenty years. Special

EP

reference is made on the simulation of disordered peptides and the remaining challenge to find a force field able to describe accurately their conformational landscape. Keywords:

AC C

peptide simulations, peptide folding, peptide dynamics, empirical force fields, validation.

Preprint submitted to Archives in Biochemistry and Biophysics

January 30, 2019

ACCEPTED MANUSCRIPT

1

1. Overview Molecular simulations have evolved to a powerful theoretical technique to study

3

peptide structure and dynamics, as indicated by the plethora of references in the lit-

4

erature. The study of peptides thrived in the beginning of this millennium and still

5

holds as an active field of research after 20 years. This interest is mainly due to the

6

ability of foldable peptides to mimic essential characteristics of the much larger protein

7

systems while at the same time maintaining the simplicity of smaller systems. This

8

makes peptide systems not only easier to comprehend, but also cost efficient in terms

9

of both computational power and experimental resources required. Their advantages

10

as model systems have been eloquently presented in other reviews[24, 63]. Herein, we

11

try to underline their vital role in the development and advancement of force fields

12

through numerous cycles of validation and optimisation that lead to what we perceive

13

to be significant improvements in the ability of simulations to capture and reproduce

14

the experimentally accessible physical reality.

15

2. Peptide folding simulations: two decades of continuous advancement

D

M AN U

SC

RI PT

2

Historically, the study of peptides was prompted by the fact that they resemble

17

the early events in protein folding and thus offer an excellent opportunity to decipher

18

protein-structure relationships[35, 75, 97]. Early studies suggested that peptides may

19

serve as nucleation sites from where the folding is initiated[171, 200]. This paved the

20

way to a number of studies aiming to characterise the timescales of the early folding

21

events and to model them computationally.

EP

TE

16

The speed limit of folding has been estimated to be N/100µs, where N is the number

23

of residues[96], with its actual value being dependant on a number of events such as

24

the formation of hydrophobic core, hydrogen bonds, electrostatic interactions, solvation

25

energy, conformational entropy, diffusion, etc.[126]. The timescales for the formation

26

of basic structural elements have been placed to the sub-second time-scale[20, 43]. The

27

fastest early folding events were grouped by Gnanakaran et al.[63] based on the minimal

28

sequence that can form each of these structures, from loop-closure events that can take

29

place in the 10ns time-scale[103], to α-helix formation which takes place in the 100-

30

200ns time-scale[57, 199] and to the slower β-hairpin formation that may require several

31

microseconds[132]. The Eaton group contributed with extensive studies on the speed of

32

folding, setting an upper limit for a 50-residue protein to 1 microsecond[42, 66]. This

AC C

22

2

ACCEPTED MANUSCRIPT

33

nano- to micro-second regime allowed a unique opportunity for a direct comparison

34

between theoretical calculations and experimental measurements[27, 95, 103, 202]. The combination of these relatively short folding timescales with the increase of the

36

computational power available to the research groups led to numerous peptide folding

37

simulation studies which significantly advanced the field of peptide folding simulations.

38

One of the pioneering studies was by Daura et al. that showed in atomistic detail the

39

reversible folding of two β-peptides in solution[32]. Later, the Caflisch group showed the

40

impressive, for that time, folding of a 20-residue peptide to a three-stranded antiparallel

41

β-sheet through a cumulative simulation time of 4µs[49, 50]. Likewise, Pande et al.,

42

illustrated the efficiency of using different starting conformations and high-temperature

43

unfolding to study the folding pathway of a β-hairpin fragment of protein G[143].

44

Challenging short peptides with many charged residues (10 out of 21 amino acids) were

45

successfully computationally folded to the native structure in an attempt to describe

46

the folding pathway and estimate folding rates that were experimentally difficult to

47

access[194].

M AN U

SC

RI PT

35

In later years, the increase in available computational power made it possible for

49

the molecular simulations to reach even higher time scales in the range of tens to

50

hundreds of microseconds, approaching the folding times of longer peptides (20-60

51

amino acids)[125]. One of the first notable studies was the milestone 1 microsecond

52

single folding simulation of the 36 residue villin headpiece in explicit water by Duan and

53

Kollman[38]. Shortly after that, the Pande group demonstrated the efficiency of using

54

many short trajectories instead of a single long one to describe the folding pathway of

55

small peptides like BBA and villin headpiece (23 and 36 residue long, respectively), an

56

effort made possible through the development of distributed computing [45, 104, 167,

57

175, 205].

TE

EP

AC C

58

D

48

A few peptides served as preferable study systems due to their unique properties

59

and abundance of experimental data, like the 20-residue peptide, Trp-cage, the smallest

60

stably folded peptide showing two-state folding properties [28, 151, 170] and the WW

61

domain (approximately 40 residues) of Pin1 protein[46, 52]. The later one served

62

especially as a workhorse for small all beta-sheet peptides to examine force fields’ bias

63

towards helical conformations[53, 131].

64

All-atom molecular simulations in explicit solvent have nowadays reached the

65

impressive millisecond timescale thanks to the improvement of the parallelization

66

algorithms[2], the highly optimised GPU implementations[55, 159, 180] and special3

ACCEPTED MANUSCRIPT

purpose supercomputers like Anton[163, 164]. For an example of the latter, it was

68

the increased simulation length afforded by the Anton machine, together with a better

69

performing force field, that allowed the correct folding of both FiP35, the fastest fold-

70

ing variant of WW domains, and BPTI (58 residues) to experimental resolution[164],

71

overcoming deficiencies of previous trials[52, 53]. The Shaw group in 2011 took the

72

whole field of molecular simulation a huge step forward by demonstrating the correct

73

folding of 12 structurally diverse small proteins, ranging from 10 to 80 residues and

74

from all structural classes[108]. Apart from the obvious significance of a successful in

75

silico folding study, they demonstrated the transient formation of native-like structural

76

elements already in the unfolded state, providing unprecedented insights into their

77

folding mechanism.

SC

M AN U

78

RI PT

67

The small size of the peptides is counteracted by their high flexibility and increased dynamics which made the probing of their full conformational potential extremely dif-

80

ficult with conventional unconstrained molecular dynamics simulations. Thus, and

81

more often that not, enhanced methods are employed to study peptide systems. In

82

fact most of the work cited in this review concerns studies that make use of some of the

83

established accelerated dynamics techniques, such as the replica exchange molecular dy-

84

namics (REMD)[181], simulated tempering (ST)[120] and adaptive tempering[206] to

85

name but few. The common concept in all of these methods is to enhance the conforma-

86

tional sampling through different biasing methods (for example umbrella sampling[182],

87

metadynamics[100], and accelerated MD[68]) to achieve the desired long timescales and

88

retrieve thermodynamic and kinetic properties for the system. Although an in-depth

89

presentation of these techniques is beyond the scope of this review, we should empha-

90

sise the contribution of these methods to the field of peptide folding[76, 179, 207]

91

and refer the interested reader to some excellent reviews already available in the

92

literature[36, 122, 130, 141].

TE

EP

AC C

93

D

79

To summarise the progress outlined above, we believe that molecular simulation

94

of peptides has reached the level of maturity where they can realistically describe in

95

atomistic detail the structure and dynamics of peptides, justifying their naming as a

96

computational microscope[37]. In the past, a folding simulation could have been char-

97

acterised as successful based solely on its ability to locate the native (experimentally

98

determined) structure. It is an indication of the maturity of the field that simply lo-

99

cating the native peptide structure through simulation is no longer sufficient for an

100

application to be considered successful. The major question for recent applications 4

ACCEPTED MANUSCRIPT

is how realistically can the simulation describe the folding mechanism and the corre-

102

sponding folding landscape, including the transiently stable conformations. Present-

103

day molecular simulations are performing relatively well in characterising structurally

104

stable peptide conformations and in estimating rates and free energies of folding[149].

105

However the unfolded state is still poorly understood. The inherent properties of the

106

disordered state —with its multitude of conformations of similar energies which are

107

separated by relatively low energy barriers— amplifies any small force field deficiencies

108

but will most probably not prevent the native structure from being located. The ability

109

to perform extremely long simulations of peptide systems will allow a more extensive

110

sampling of the unfolded/disordered state and should, thus, allow further refinement

111

of the empirical force fields.

112

3. Molecular simulations of health-related peptides

M AN U

SC

RI PT

101

A great interest for the study of the structure and dynamics of peptides came

114

from their applications in drug design, even leading to the creation of a new subfield

115

aptly named “computational peptidology”[209]. Molecular dynamics are a powerful

116

tool towards this means, whose contribution is increasingly being recognised[22, 41].

117

Simulation-based methods are actively used both in early stages of drug discovery as

118

well as in the final steps aiming to decipher binding mechanisms[208].

TE

D

113

Peptides are physiologically involved in many biological processes with vital roles

120

in metabolism and signalling. In particular, they are long-recognised to have a well-

121

established role as anti-cancer, anti-microbial, anti-viral, anti-angiogenic and anti-

122

inflammatory agents. Their inherent capability to modulate protein function increases

123

dramatically their druggability potency[89]. On top of that, they offer enhanced ac-

124

tivity and specificity which is translated in smaller dosages and lower toxicity. How-

125

ever, their susceptibility to proteases appears to be a major bottleneck, underlying

126

the need for alternative administration routes to overcome their low bioavailability[44].

127

Significant improvements in the synthesis and purification techniques have made the

128

production of peptides up to 100 amino acids a routine, opening new horizons for their

129

large-scale health-related applications. A comprehensive presentation of the peptide-

130

based drug market is exhaustively analysed in other reviews[110, 188].

AC C

EP

119

131

Significant advancements have also been made concerning the availability of com-

132

putational tools aiming to assist peptide-based drug design studies[90, 91]. However,

133

most of these tools are limited in their general applicability by making the more often 5

ACCEPTED MANUSCRIPT

than not unjustified assumption that the peptides of interest will have stable (native-

135

like) structures in water. In this connection, special mention must be made of the

136

Rosetta suite of programs which have been catalytic in the field, with pioneering work

137

in the de novo peptide (and protein) structure prediction accuracy as assessed by

138

CASP (Critical Assessment of Structure Prediction)[21]. The Rosetta algorithm im-

139

plemented the fundamental concept of peptide fragments, which are short fragments

140

of known protein structures that are assembled by Monte Carlo methods to predict

141

protein structures[7, 172]. There are also many other in silico tools dedicated to struc-

142

ture prediction but only a handful are adapted for peptides. Their accuracy is limited

143

and highly dependable on the availability of experimental data. The most promising

144

representative of this class is the Pepfold server that is a bioinformatics-based de novo

145

approach to predict tertiary peptide structures from sequence[124]. The algorithm uses

146

Hidden Markov Models and non-overlapping peptide fragments of four amino acids. It

147

should be noted, however, that all of these methods and programs, and irrespectively

148

of their detailed algorithmic basis, utilise empirical force fields to refine their proposed

149

structures, thus highlighting the substantial interplay of peptides with the force fields

150

even in protein structure prediction methods.

D

M AN U

SC

RI PT

134

Health-related applications of peptides also involve the search for small foldable

152

peptides with predefined structural characteristics. Identification of small stably folded

153

and soluble peptides is not trivial[59]. Many small peptides (4-10 residues) of biological

154

interest have been studied computationally but were found to be marginally stable in

155

solution[171]. Well-known exceptions to this rule are the designed 10-residue peptides

156

chignolin[77, 98, 160] and its variant, CLN025[70, 127, 157] that adopt a unique and

157

outstandingly stable β-hairpin structure in water.

EP

AC C

158

TE

151

The question that arises then, is how good are the empirical force fields in describ-

159

ing the structure and dynamics of peptides, especially of those that are not stably

160

folded, or have more than one stable conformation. Are, for example, the force fields

161

sensitive enough to predict the structural plasticity of the peptides, or, the sometimes

162

pronounced effect of mutations on the peptide structure and dynamics? Force field

163

improvement and validation has gone hand-in-hand with peptide simulations. In the

164

following section we present a historical perspective of the evolution of the force fields

165

highlighting the parallel paths that accurate peptide structure predictions and force

166

field development followed through the years.

6

ACCEPTED MANUSCRIPT

167

4. Force fields: then and now A number of force fields have been developed over the years for the simulations of

169

biological macromolecules. These can categorised in the families of CHARMM[114],

170

AMBER[31], OPLS[85] and GROMOS[140]. The similarities and differences among

171

them, especially regarding their parameterisation protocols can be found in other

172

reviews[65, 153]. In what follows, we review the evolution of the force fields from

173

their first appearance to the present, highlighting the pivotal role of peptides in their

174

improvement.

175

4.1. CHARMM force field

SC

Historically, the CHARMM (Chemistry at Harvard Macromolecular Mechan-

M AN U

176

RI PT

168

ics) force field appeared in 1983[23] and was officially released in the version

178

CHARMM19[134] (Figure 1). The published parameters comprised the so called ’ex-

179

tended’ atom types and van der Waals parameters, wherein hydrogen atoms were in-

180

cluded as part of their attached heavy atoms for sulfur and carbon, whereas for nitrogen

181

and oxygen the hydrogen atoms were treated explicitly, hence the name united-atom

182

potentials. One of the major features was the application of the partial atomic charges

183

to fit ab initio calculations, using the TIP3P water model[84] to calibrate the inter-

184

actions. The potential function also included internal energy terms for bonds, angles,

185

dihedrals, improper dihedrals as well as the non-bonded terms for van der Waals and

186

electrostatics, using a truncation cutoff for the latter. This basic form of the energy

187

function has been carried-over to almost all present-day empirical force fields.

TE

EP

188

D

177

CHARMM22[113][114] was later introduced as an ”additive” protein force field that included explicit parameters for all atoms (rendering the term ’extended atom’

190

obsolete), emphasising on the balance of interactions involving the protein and the

191

solvent and in-between. This was mainly achieved through a refinement of the non-

192

bonded interactions already implemented in the CHARMM19 version, by broadening

193

the experimental and ab initio data used. Backbone parameters were based on the

194

N-methylacetamide (NMA) and the alanine dipeptide and the side chains were based

195

on a number of small model compounds. Additive force fields do not allow however

196

the change of the electrostatic parameters as a function of environment, neglecting

197

the electronic polarisation phenomenon. The CHARMM22 protein parameters were

198

maintained to the version that followed, CHARMM27, where only the nucleic acid

AC C

189

7

ACCEPTED MANUSCRIPT

199

and lipid parameters were updated together with parameters for a number of common

200

ions[112]. This force field version remained prevalent for a decade until the introduction of

RI PT

201

the CHARMM22/CMAP version which incorporated the addition of a dihedral correc-

203

tion map (CMAP)[115, 116]. CMAP is a two-dimensional grid of energy corrections

204

in the dihedral space of φ/ψ angles added to the potential energy equation, which

205

improved the secondary structure propensities and compensated for the demonstrated

206

helical bias of the CHARMM force field[178]. The parameters for the CMAP term

207

were derived from a big collection of protein x-ray crystallographic data[116]. The

208

CHARMM22/CMAP version had an improved performance[25], but deficiencies in the

209

equilibrium between helical and extended conformations were noted which affected the

210

ability of the force field to correctly fold small peptides, mostly due to an increased heli-

211

cal propensity[13, 52, 53]. Thus, further improvements were incorporated by correcting

212

the backbone and side-chain dihedral angles to sample better the corresponding regions

213

in the Ramachandran plot[19]. Corrections were based on experimental solution NMR

214

data of weakly structured peptides for all residues except glycine and proline, for which

215

QM-based corrections were made. We should note that this work was not based solely

216

on alanine peptides but also on small helical and hairpin peptides.

D

M AN U

SC

202

In later years, only non-protein parameters regarding lipids, carbohydrates,

218

drug-like and cofactor molecules were implemented, giving rise consequently to

219

CHARMM27r, CHARMM35 and the CHARMM General Force Field (CGenFF). To-

220

gether they resulted in the latest great release of CHARMM36[19, 81] and a few years

221

later of CHARMM36m for intrinsically disordered proteins[82]. In parallel, version

222

CHARMM C22*[150] was presented that aimed towards a better helix-coil balance.

223

This was achieved by replacing the CMAP correction for all residues except glycine and

224

proline, and modifying the side-chain partial charges and torsional terms for residues

225

aspartate, glutamate and arginine (following the philosophy of the ff99SB* and ff03*

226

corrections in the AMBER family that is presented in the next section). A more com-

227

prehensive presentation on each component of the CHARMM all-atom additive force

228

field together with the parameterisation philosophy and methodology and the recently

229

released CHARMM Drude polarizable force field can be found elsewhere[111, 187, 210].

230

4.2. AMBER force field

231 232

AC C

EP

TE

217

The AMBER (Assisted Model Building with Energy Refinement) force field first appeared also in the early 80s, sharing the united-atom idea[195] but was soon extended 8

ACCEPTED MANUSCRIPT

to an all-atom force field with recalibrated torsional and angle parameters based on

234

experimental conformational energies[196] (Figure 2). A decade of improvements in

235

the parameters and the algorithm gave rise to a second generation force field, named

236

ff94[31]. Since then, the tradition has it that protein (and nucleic acid) force fields

237

in the AMBER family are named with ”ff” followed by a two-digit number year. Up

238

to this point, the parameterisation was focused on the gas phase behaviour, but the

239

need to produce potentials that are suitable for condensed phase simulations became

240

apparent for a more balanced modelling of biomolecules in solution. Unlike CHARMM

241

though, the core parameterisation is different here, employing more adjustable empir-

242

ical parameters that can be optimised. For example, the fixed atomic partial charges

243

of CHARMM that are tightly connected to the TIP3P water model are not present

244

here. Instead AMBER employed initially an ESP (electrostatic potential) fit for atomic

245

centred charges and later an RESP fit (restrained ESP) where charges were fitted simul-

246

taneously to several conformations to achieve a better average behaviour. The dihedral

247

parameters were fit to relative QM energies of alternate rotamers of small molecules

248

to represent several conformations of glycine and alanine. Unlike CHARMM, the AM-

249

BER force fields underwent many optimisations and refinements to reach a satisfactory

250

level of accuracy, giving rise to a large number of variants.

SC

M AN U

D

The ff94 force field was shown to over-stabilise helical conformations and over-

TE

251

RI PT

233

estimate computed melting temperatures in peptide simulations, sometimes even bias

253

helices over the native β-hairpin conformation[57, 137, 170]. The ff94 version was soon

254

replaced by ff96 and C96[93], which featured improved calculations for electrostatics

255

and van der Waals interactions and better accounted of long-range effects. This resulted

256

in better fitting to ab initio calculations on small peptides and more balanced solvent-

257

solvent and solute-solvent interactions. One of the main drawbacks of this version

258

was a bias towards β-structures this time[74, 88, 138]. The next version ff99[190]

259

was yet another attempt to refit the backbone dihedral parameters using this time

260

more representative structures of alanine and glycine in the form of tetrapeptides and

261

dipeptides. At that time, generalised parameters for compounds beyond proteins and

262

nucleic acids were also released. The new potential surfaces were significantly different

263

of those of its predecessors.

264

AC C

EP

252

Force field optimisation was assisted by the longer simulation times that became fea-

265

sible, which only uncovered more force field deficiencies. This opened up to a decade of

266

numerous almost in-parallel efforts to improve AMBER force field, making it challeng9

ACCEPTED MANUSCRIPT

ing to keep up with them and even more to choose the appropriate one for a particular

268

study. Despite the extended optimisation of the ff99 version, AMBER force fields con-

269

tinued to suffer from an imbalance of secondary structural elements, over-stabilising

270

α-helical peptides, like their ff94 counterpart[57, 137].

RI PT

267

A few years later ff03 was released, a so called third-generation point-charge all-atom

272

force field[39]. The major contribution was the fitting of the electrostatic potential and

273

main-chain torsion parameters to QM calculations of small peptides in condensed phase

274

with continuum solvent models. Initial testing showed a satisfactory balance between

275

helical and extended conformations and a better description of the polyproline II region.

276

A united-atom counterpart was also released, ff03ua[201], where aliphatic hydrogens of

277

α-carbon and aromatic hydrogens are explicitly represented whereas aliphatic hydro-

278

gens of side-chains are represented united to their carbon atom, to improve speed for

279

computationally demanding cases like folding simulations. A close variant of ff03, ff03*

280

incorporated a correction to the backbone dihedral potential to fit experimental data

281

of helix-coil transition, making it more transferable and able to fold peptides from both

282

helical and beta structural classes[15]. The ff03w force field[16] is a slightly modified

283

version of ff03*, with a backbone dihedral potential correction that allows the usage

284

of the more accurate TIP4P/2005 water model[1]. This updated force field offered a

285

better description of the folding thermodynamics and of the unfolded state as tested

286

in small peptides, whilst preserving its ability to fold to the native structure larger

287

peptides like the Trp cage and the GB1 hairpin. It still suffered though from poor

288

solvation, with less favourable solvation free energies and too favorable protein asso-

289

ciation. In an attempt to further improve the representation of these properties, the

290

ff03ws force field[18] was released which essentially incorporated a tuning parameter

291

for the Lennard-Jones protein-water interaction, specifically the water oxygen. The

292

same scheme applied to the ff99SB force field, gave rise to version ff99SBws[18].

M AN U

D

TE

EP

AC C

293

SC

271

The ff99SB force field[79] was a parallel effort focused largely on improving the φ/ψ

294

dihedral terms of the previous ff94/ff99 energy functions, after showing that the inad-

295

equate backbone dihedral parameterisation was indeed responsible for the weaknesses

296

of that generation of force fields. The major realisation here was that the two sets

297

of dihedral backbone parameters introduced in ff94 version have to be both individu-

298

ally optimised for glycine and non-glycine residues. Glycine due to absence of a Cβ

299

atom needs only one set of dihedral parameters, φ/ψ (defined as φ = C-N-Cα -C, ψ =

300

N-Cα -C-N), whereas all other amino acids that harbor a side-chain need also a φ’/ψ’ 10

ACCEPTED MANUSCRIPT

term (defined as φ’ = C-N-Cα -Cβ , ψ’ = Cβ -Cα -C-N). Then the dihedral potential for

302

the non-glycine residues will be the sum of them. This resulted in better agreement

303

with the PDB data for the dihedral angles and consequently better description of the

304

secondary structural preferences of small peptides as well as improved fitting to NMR

305

observables of larger peptides, like trpzip2, trp cage, villin headpiece, ubiquitin and

306

lysozyme.

307

RI PT

301

In the same spirit, there were some efforts like the AMBER-GS version[57], where the φ/ψ torsion potential was completely removed to fit better to experimental helix-

309

coil parameters, but the α-helical bias was not removed as shown by long equilibrium

310

ensemble simulations[177]. As a remedy, the ff99SB-φ’ force field[177] was proposed

311

where they combined the low helicity of the ff99 potential with the high helicity of

312

the ff94, by removing the additional barriers on the φ rotational degree of freedom.

313

This version provided a better description of the helix-coil transition but its usage to

314

non-helical peptides was not verified.

M AN U

SC

308

The fast —almost chaotic— production of even more versions of AMBER force fields

316

continued unimpeded in the years to come. All versions from this period shared the

317

core potential function of ff99SB which demonstrated superior performance to previous

318

versions[13], proving the significance of the proper backbone dihedral parameterisation.

319

Next in line, came ff99SB* force field[15], which incorporated to 99SB potential the

320

same backbone correction as the ff03* version described above. This improved the

321

helical bias and showed better agreement to NMR data of peptides but with poor

322

thermodynamics description.

EP

TE

D

315

A different approach was introduced with the ff99SB-ILDN version[109], where the

324

torsion potentials of side-chains were targeted this time. Especially four residue types,

325

isoleucine (I), leucine (L), aspartate (D) and asparagine (N), where found to have

326

considerably different rotamer distributions in the simulation in respect to the exper-

327

imentally observed ones. So, the side-chain torsion potentials for these four residues

328

were fitted to high-level QM calculations, improving considerably the agreement to

329

NMR data for the rotamer distributions. Of course combinations of these optimisa-

330

tions surfaced, like ff99SB*-ILDN[15] and later ff99SB*-ILDN-Q[14], which combined

331

the global backbone correction and the modified torsion parameters for residues ILDN,

332

with refitted charges for residues D, E, K, R.

333 334

AC C

323

On another front, the Br¨ uschweiler group introduced the concept of employing experimental NMR data that are commonly used to cross-validate a force field’s per11

ACCEPTED MANUSCRIPT

formance, to optimise them instead. Thus they used a collection of NMR chemical

336

shift data and RDCs for peptides to optimise and improve the ff99SB version using

337

an energy-based reweighting to match the experimental data in an iterative manner,

338

giving rise to the versions ff99SBnmr1[105] and ff99SB-φψ[106] that showed superior

339

performance when compared to its parent force field, ff99SB.

340

RI PT

335

Around that time, it became clear that it was imperative to produce a catholic AMBER version with carefully selected corrections that can reproduce faithfully ex-

342

perimental data for peptides from all structural classes that the developers can respon-

343

sibly recommend to general users of molecular simulations. Towards this direction, a

344

preliminary version, ff12SB, and nowadays ff14SB[118] were officially released. Taking

345

lessons from all the previous optimisation attempts, this version comprised a complete

346

refit of the side-chain dihedral potential of all amino acids, including alternate protona-

347

tion states for the ionizable ones, and empirical adjustments to the backbone dihedral

348

potential, including the φ rotational profile. The superiority of this updated version

349

was tested for reproducibility of the secondary structural content and NMR parameters

350

of peptides and proteins in solution. The ff12SB version was also optimised by adding

351

a CMAP-like correction, ff12SB-cMAP, for simulations with implicit solvent models,

352

when computational speed is a crucial factor[146].

D

M AN U

SC

341

Although the ff14SB is the one officially still recommended by the developers at the

354

time of the writing, another version, ff15SB, came up for use in combination with the

355

TIP3P-FB water model[192]. This version comprised a complete refitting of the bonded

356

parameters of the parental ff99SB force field and was shown to retain the prediction

357

accuracy of equilibrium properties while improving the accuracy of the temperature

358

dependent ones[127]. Its broader application and establishment remains to be seen.

359

For completeness, there are nowadays available force fields that can also describe small

360

molecule ligands (General AMBER force field, GAFF), carbohydrates (GLYCAM),

361

lipids (lipid14) but also amber-compatible parameters for post-translational modifica-

362

tions (Forcefield PTM) and ff15ipq version for implicit polarised charges in explicit

363

solvent[34].

364

4.3. Force fields for disordered peptides

AC C

EP

TE

353

365

Much of the current focus in the community of force field development has turned

366

towards the study of disordered peptides. An accurate description of the landscape of

367

these structurally challenging peptides is of paramount importance due to the increas-

368

ing acknowledgement of their abundance in protein structures as modular elements and 12

ACCEPTED MANUSCRIPT

their participation in many cellular processes[40]. Their main characteristic, the abil-

370

ity to interconvert between different conformations, also presents the greatest challenge

371

which is the requirement to transform force fields that underwent decades of optimisa-

372

tion cycles to model well-folded peptides, to now perform comparably well even with

373

disordered peptides.

RI PT

369

Disordered peptides present the additional challenge of interpreting complex experi-

375

mental observables that are averaged ensembles over transient conformations. The most

376

common experimental techniques used for validation of the simulation data comprise

377

NMR (nuclear magnetic resonance), usually amide proton exchange measurements, J-

378

couplings, chemical shifts, NOEs (nuclear Overhauser effect) and RDC (residual dipo-

379

lar coupling) constants, FRET (F¨orster resonance energy transfer) and SAXS (small

380

angle X-ray scattering), that have been reviewed recently[18, 26, 72, 80]. Infrared spec-

381

troscopy has emerged lately as a powerful tool to probe the conformational dynamics

382

of the disordered states, providing experimental evidence on the secondary structures

383

of sites that can be isolated by isotope labelling that can be compared with computa-

384

tionally modelled amide I spectra[48].

M AN U

SC

374

Naturally, force fields had to embrace the need to represent the disordered state

386

as accurately as the well-structured state. The first attempt to produce such a force

387

field was based on the application of the CMAP correction method to the dihedral

388

potential of 8 disorder-promoting residues (A, G, P, R, Q, S, E, K), using as basis

389

the ff99SB-ildn force field, giving rise to the version ff99IDPs[193] (Figure 2). A later

390

validation study using a set of 3 representative IDPs showed a quantitative agreement

391

with the experimental NMR chemical shifts and more representative conformational

392

sampling, though still suffering from over-stabilisation of structured conformations,

393

most probably due to inadequate representation of polar interactions[203].

TE

EP

AC C

394

D

385

A more comprehensive effort was presented through the refined CHARMM36m

395

(Figure 1) force field[82] that tried to address a reported bias for overpopulation of

396

the αL region of the Ramachandran plot[154]. The exhaustive benchmarking gave

397

satisfactory fitting to experimental parameters for IDPs while retaining consistency

398

with the CHARMM36 counterpart for folded peptides, folding kinetics and free energy

399

calculations. It removed the oversampling of the αL region observed in CHARMM36,

400

but also underestimated the β region as shown through simulations of peptides like

401

chignolin and CLN025. The main issue addressed was the over-compactness, a common

402

characteristic of all current generation empirical force fields when applied to studies of 13

ACCEPTED MANUSCRIPT

403 404

the disordered state. In the same spirit, and inspired by previous optimisation efforts, a force field termed a99SB-disp was released (Figure 2), the most recent to our knowledge empiri-

406

cal force field dedicated to accurately present both folded and unfolded conformational

407

states[156]. This force field is based on the ff99SB-ILDN version with necessary modi-

408

fications to employ the TIP4P-D water model. Its superior performance is attributed

409

to (a) an iterative scheme to optimise the backbone and side-chain torsion parame-

410

ters to represent better the protein-water van der Waals interactions inspired by the

411

ff99SB*-ILDN-Q version, and, (b) the modification of the strength of the Lennard-

412

Jones term for carbonyl oxygen and amide hydrogen pairs. Its performance remains to

413

be evaluated in the years to come.

M AN U

SC

RI PT

405

Amyloid (Aβ) peptides and in particular their assembly into fibrils, have been one of

415

the most challenging peptide systems to study with significant difficulties encountered

416

with both the experimental and computational approaches. The interested reader is re-

417

ferred to some excellent reviews of these Alzheimer’s disease-related peptides[133, 183].

418

A large number of studies appeared which attempted to computationally characterise

419

the peptides’ equilibrium conformations using the most advanced force fields. The great

420

discrepancy among the predicted structures of the monomers as well as of the various

421

oligomers only highlights the difficulty to describe the conformational preferences of

422

disordered peptides and the force field bias that needs yet to be addressed[136]. The

423

differences between the various force fields, sometimes in the context of different water

424

models as well, have been attributed to differing secondary structure biases as well

425

as the intrapeptide hydrogen bonding, with helix-overestimating force fields, like AM-

426

BER99SB and CHARMM22-CMAP, having the poorest agreement to experimentally

427

observed populations[174]. Later studies with improved force field versions accompa-

428

nied by different water models, such as OPLS-AA/TIP3P, AMBER99SB-ILDN/TIP4P-

429

Ew and CHARMM22*/TIP3SP demonstrated strongly converged structural properties

430

for these type of disordered peptides[158] and improved agreement with experimental

431

J-coupling constants, chemical shifts and CD data[176]. These encouraging findings

432

indicate that force field improvements may be heading in the correct direction. How-

433

ever, the force field validation is difficult due to the limited availability of experimental

434

observables for disordered peptide systems. This is particularly concerning when force

435

field performance is found to be dependent on the peptide system per se. For instance,

436

the CHARMM36/TIP3P combination was the most accurate in the case of Aβ10-

AC C

EP

TE

D

414

14

ACCEPTED MANUSCRIPT

40 peptide but not in the case of Aβ1-40 and Aβ1-42 peptides[173]. In a different

438

study, the most recent versions AMBER14SB and CHARMM22* with TIP3P water

439

model showed a helical overestimation compared to CD data, whereas OPLS-AA and

440

AMBER99SB-ILDN with the same water model represented conformational ensembles

441

closer to experiment[119]. Clearly further studies are needed to determine the source

442

of the force field discrepancies, but more importantly to also comprehend whether the

443

limited accuracy for disordered peptide systems is due to the interplay between the

444

force field and water model parameters.

445

4.4. Other biomolecular force fields

SC

RI PT

437

The force field families of CHARMM and AMBER are the ones that underwent

447

through heavy optimisation and validation by the developers but also the rest of the

448

community in the peptide folding and dynamics field. For completeness we should also

449

mention two additional and significant contributions, those of the OPLS and GROMOS

450

force fields.

M AN U

446

The OPLS (optimised potentials for liquid simulations) made its appearance in

452

the 80s as well (OPLS-ua) and was initially intended for simulation of liquid-state

453

properties of water and other organic liquids[84] (Figure 3). A distinct feature was the

454

emphasis given to model the non-bonded interactions to fit liquid-state thermodynamic

455

properties, like density and heat of vaporisation. The first release was a combined

456

AMBER/OPLS force field[86] (AMBER ff94 was largely inspired by OPLS) and was

457

later updated to the all-atom version OPLS-AA[85]. An update of it came out a

458

few years later, where side-chain potentials were reparameterized based on QM-data

459

of small peptides[87]. The major contribution from this effort was, and still is, the

460

development of the water models TIP3P and TIP4P that are heavily used to the

461

present. It was only very recently that new OPLS versions emerged, like OPLS2.1 and

462

OPLS3, that are mostly optimised towards small molecules and protein-ligand binding

463

studies[69].

TE

EP

AC C

464

D

451

Closing this section, we should refer to GROMOS (Groningen molecular simulation)

465

that appeared in the 90s together with the homonymous computer simulation program

466

with the most cited version being GROMOS96[186] (Figure 4). The newest release

467

appeared a few years ago as GROMOS11 with the parameter sets 54A7[161] and now

468

54A8[155]. Its inferior performance compared to other force fields in comparative

469

validation studies of peptide folding and dynamics prevented its broader use in the

470

specified field of peptide simulations and will not be further presented here. 15

ACCEPTED MANUSCRIPT

4.5. Water models for explicit representation of the solute in peptide folding simulations

472

This review is focused on peptide folding simulations that comprise explicit repre-

473

sentation of the solvent. Therefore, the ability of the force fields to describe accurately

474

the conformational landscape of the peptides is inextricably dependent on the ability

475

of the associated water model to describe the physical properties of water molecules in

476

the liquid phase. There are numerous water potentials that have been developed so far,

477

but we shall only mention here the empirical potentials that are used in biomolecular

478

simulations (Figure 5). Detailed presentation of all available water models and their

479

comparative performance can be found elsewhere[30, 64, 73, 139, 142, 185, 189].

SC

480

RI PT

471

Their first appearance coincided with the early protein force field versions in the 80s and they all share the idea of a rigid water monomer with the non-bonded interac-

482

tions described by 3, 4, or 5 sites composed of Coulombic terms for the intermolecular

483

interaction pairs and a Lennard-Jones term for oxygen atoms upon dimerization. His-

484

torically, two quite similar 3-site models were introduced in parallel in 1981. These

485

were the TIPS (transferable intermolecular potential)[83] model, and the SPC (simple

486

point charge)[11] model, two of the most popular potentials for water molecules in

487

molecular simulations even to date[153]. The main difference between TIP and SPC

488

models is the tetrahedral shape geometry adapted by the latter and its ability to re-

489

produce the experimental radial distribution function (including the second peak) and

490

the self-diffusion coefficient. Later the SPC/E[10] model was released that comprised

491

an additional polarization correction leading to better representation of the density,

492

diffusion coefficient and dielectric constant for simulations of liquid water. Together

493

these features made the SPC models more appropriate for modeling the bulk properties

494

of water[121].

D

TE

EP

AC C

495

M AN U

481

Later reparameterisation of TIPS and addition of a Lennard-Jones term to the hy-

496

drogen atoms as well, lead to improved representation of the density of liquid water and

497

gave rise to what became the standard model for protein simulations, the TIP3P[84]

498

model. Concurrently with TIP3P, its 4-site version was released, the TIP4P model,

499

which also incorporated terms for the bisector of the H-O-H angle allowing a better

500

electrostatic distribution. The TIP4P model provided a better description of most of

501

the water’s properties (like the phase diagram) but not of its dielectric constant. The

502

increased computational cost (imposed by the larger number of interactions to be cal-

503

culated) prohibited its wider use in the early peptide folding studies. The most crucial

504

factor however for the wider application of TIP3P was its coupling to the vastly popular 16

ACCEPTED MANUSCRIPT

CHARMM22 force field. The 5-site interaction version, TIP5P[117] appeared in 2000,

506

which replaced the single negative charge on the bisector angle of TIP4P with two

507

single negative charges on the lone-pair electrons. The truncated tetrahedral geometry

508

showed excellent representation of the structure of water, matching the density and

509

radial distribution properties, but suffered in describing the gas phase. The increased

510

complexity also significantly increased the computational requirements. As a result,

511

SPC and TIP4P models provided a satisfying description of liquid’s water structural

512

and thermodynamic parameters at a much lower computational cost.

SC

RI PT

505

The scenery changed with the increase in the available computational power but

514

most importantly the appearance of AMBER force field versions that gave more free-

515

dom on the choice of water model (RESP-fitted atomic charges). All aforementioned

516

water models utilise truncated Coulombic interactions. An improved treatment of the

517

long-range interactions with particle-mesh Ewald summation gave rise to the TIP4P-

518

Ew[78] model which improved the representation of liquid water’s thermodynamic pa-

519

rameters over a large temperature range but without adding significantly to the compu-

520

tational cost. On the same spirit was based the parameterisation of the TIP4P/2005[1]

521

model which reproduces temperature-dependent properties but underestimates the di-

522

electric constant. The TIP4P/ǫ[56] model is fine tuned to match exactly that property

523

in a large temperature regime. An equivalent version is the TIP4Q[4] model that

524

includes an additional charge to increase the dipole moment, but at an additional

525

computational expense compared to other TIP4P-based models. The common charac-

526

teristic of these water models is the larger enthalpy of solvation that leads to stronger

527

interactions between the water and the embedded protein. It is for that reason that

528

they have been lately employed in folding simulations of disordered less hydrophobic

529

peptides, facilitating a more accurate description of the expanded unfolded state. On

530

the same front, the TIP3P-FB and TIP4P-FB[191] models were introduced by applying

531

the so called ForceBalance method that combines reference data produced both exper-

532

imentally and theoretically to derive parameters for force fields. The resultant models,

533

and in particular the TIP4P-FB, manage to reproduce most of the kinetic and thermo-

534

dynamic properties of water, like the viscosity, diffusion coefficient, dielectric constant,

535

radial distribution function and heat of vaporization, in a variety of temperatures and

536

pressures.

AC C

EP

TE

D

M AN U

513

537

The take-home message here is that there is no perfect water model that encapsu-

538

lates all of the experimental physical properties of water. Furthermore, the general 17

ACCEPTED MANUSCRIPT

users are highly restricted in their choice by the compatibility to the protein force

540

field of choice and whether it is implemented in a molecular mechanics program.

541

For example, the molecular dynamics simulation engine NAMD[147] only supports

542

TIP3P- and TIP4P-based models from the whole panel of non polarizable water

543

models. Gromacs[12, 184] on the other hand supports SPC, SPC/E, TIP3P, TIP4P

544

and TIP5P. It has thus become common knowledge in the field that the GROMOS

545

force fields should better be paired with the SPC water models, OPLS force fields

546

are recommended to be used with TIP4P, whilst the CHARMM family of force fields

547

should be paired with the TIP3P model to avoid inconsistencies.

548

fields, especially the latest versions, are more versatile. Despite the relatively poor

549

performance of some water models with respect to the description of the bulk solvent

550

properties, their success in the folding of proteins and peptides appears to indicate that

551

folding per se may not be as sensitive to the properties of bulk solvent. This finding is

552

probably the result of the protein force fields being developed given the available water

553

models, so their deficiencies could be masked in the empirical parameterization of the

554

force fields. As the observation that the main deficiency of current generation of force

555

fields is the poor description of the protein-water interactions, an effort in underway

556

to develop force fields that are parameterized in combination with the newly reported

557

water models. In that spirit the TIP4P-D[148] was recently released in an effort to

558

recapitulate the more extended conformational ensembles of the disordered state by

559

increasing the strength of the dispersion interaction (LJ-term) (see also section 6.

560

The future). As is always the case with molecular simulation, it will be the extensive

561

testing by the community that will define which of these models will survive the acid

562

test of a quantitative comparison with the experimental data, including the ability to

563

accurately describe the kinetics and thermodynamics of peptide folding.

565

SC

EP

TE

D

M AN U

AMBER force

AC C

564

RI PT

539

5. The unceasing quest for a balanced biomolecular force field

566

The previous sections clearly demonstrated that one of the main contributions

567

of peptide simulations is their continuing usage as test systems for the optimisation,

568

correction and validation of empirical force fields[5, 61, 107, 123, 175]. One of the

569

recurring challenges through all these studies has been to convincingly quantify force

570

field accuracy, especially in connection with the fact that the relatively limited simu-

571

lation timescales attainable are insufficient for guaranteeing a faithful configurational 18

ACCEPTED MANUSCRIPT

sampling of the peptides’ folding landscapes[51]. Soon it became clear in the litera-

573

ture that the initial force field versions from all families were suffering from helical

574

bias[13, 51, 53, 62, 79, 123, 137, 165, 178, 204]. A common remedy to tackle the helical

575

bias of the early force fields was to choose one with a bias towards the major secondary

576

structure of the protein under study. This was obviously highly unsatisfactory and

577

proved indeed ineffective, as demonstrated by the folding simulations of villin and Pin1

578

WW domain mentioned earlier with the CHARMM22/CMAP force field[115, 116]. In

579

these studies, villin folded to its native structure[54] but the WW domain, whose na-

580

tive structure is a 3-stranded β-sheet, did not[52]. The reason, of course, was that as

581

far the force field was concerned a misfolded conformation had lower free energy than

582

the native structure[47, 53]. Studies such as these highlight how small force field inac-

583

curacies can have an additive effect and can result in stabilising misfolded non-native

584

structures.

M AN U

SC

RI PT

572

AMBER’s ff99SB was the first successful version to convincingly compensate for

586

that and displayed superior performance over the rest of the force fields of its time:

587

literature consistently provided overwhelming evidence of significantly improved bal-

588

ance of secondary structural elements and reasonable agreement with experimental

589

data through multiple comparative validation studies comprising from short glycine

590

and alanine peptides to longer 10-20 residue peptides (disordered, helical, beta) up to

591

small proteins like lysozyme and ubiquitin[8, 58, 79, 92, 94, 99, 102, 123, 144, 145, 168,

592

168, 169, 197, 198].

TE

Follow-up studies suggested two more force fields, f99SB-ILDN-φ and ff99SB-ILDN-

EP

593

D

585

NMR, with significantly improved agreement with NMR observables, highlighting

595

though that large uncertainties accompany this agreement: statistical calculated er-

596

rors in the validation were in the margin of systematic experimental errors of these

597

parameters[9].

598

AC C

594

The newer versions that surfaced, ff03/ff03* and ff99SB*-ILDN from AMBER, and

599

CHARMM22/CMAP and CHARMM22* from CHARMM, all succeeded in predicting

600

the native states and folding rates (for example the case of the model protein villin

601

headpiece), but displayed distinct discrepancies in the folding mechanism and in the

602

description of the unfolded state in particular, with ff99SB*-ILDN and CHARMM22*

603

being closer to the experiments[150]. CHARMM22* showed superior performance over

604

its counterparts in the CHARMM family in describing the free energy landscape of the

605

β-hairpin GB1[71]. Similarly, f99SB and its variants (ff99SB*, ff99SB-ILDN, ff99SB*19

ACCEPTED MANUSCRIPT

ILDN), ff03, ff03* and the GROMOS96 (43a1p,53a6) succeeded in finding the residual

607

β-hairpin preference of the disordered 16-mer Nrf2-derived peptide but with extremely

608

spurious results that were temperature dependent[29]. In accordance, a parallel study

609

with ff99SB-ILDN, ff99SB*-ILDN, CHARMM22/CMAP and CHARMM22* provided

610

accurate description of the native state of ubiquitin, GB3, villin and WW domain, but

611

force fields with better helix-coil balance, ff99SB*-ILDN, ff03* and CHARMM22* had

612

superior performance in the peptide systems[107]. Our personal recent work also lends

613

further support that for structurally challenging peptide systems, the ff99SB*-ILDN is

614

outperforming its counterparts in the AMBER 99SB family[3, 60, 162].

SC

615

RI PT

606

To summarise, our take on the current literature is that from the set of all extensively tested and validated force fields, there are nowadays versions from both AMBER

617

and CHARMM families that are shown to be balanced and well-performing in a wide

618

variety of tested systems, including cases of intrinsically disordered peptide systems,

619

like the RS repeats[154]. All that being said, we present in the next section our views

620

concerning the remaining challenges towards a universally useful biomolecular force

621

field.

622

6. The future

D

M AN U

616

We believe that a bird’s eye view of the recent literature leaves little doubt: peptide

624

simulations followed the rest of the molecular dynamics field in making a tremendous

625

progress towards accurate and physically relevant simulations. The development of

626

new algorithms, the production and validation of new robust and transferable force

627

fields, together with the increase of computational power available to the research

628

groups, completely transformed the field. Indeed, it is no longer an exaggeration to

629

state that we have reached the point where molecular simulations can faithfully predict

630

and reproduce the peptides’ native states, folding rates and in exceptional cases even

631

provide atomistic description of the folding mechanism in unprecedented detail, often

632

unreachable by experiments. Having said that, there are still a significant number of

633

unresolved issues and deficiencies, some of which are discussed in the paragraphs that

634

follow.

635

AC C

EP

TE

623

Peptides with weak structural propensities are difficult systems to track computa-

636

tionally. The major challenge here is to describe, without overestimating, the residual

637

(if any) secondary structure present in the unfolded state, whilst retaining the fold-

638

ing performance for the folders. Some recently reported issues concern the observed 20

ACCEPTED MANUSCRIPT

temperature dependence of the conformational preferences of peptides and the weak

640

folding cooperativity, whose combination results in a relatively poor description of

641

the kinetics and thermodynamics[33]. As discussed in the previous section on water

642

potentials, the source of this ”overcompactness” compared to the experimental observ-

643

ables (SAXS, FRET, Rg) appears to be the inaccurate description of the solvation

644

of the unfolded chain, possibly due to systematically underestimating the dispersion

645

parameter in the protein-water interactions[18, 72, 148]. A number of approaches have

646

been proposed to remedy this, for example applying a scaling factor to the van der

647

Waals interaction parameter[18], or re-parameterizing the Lennard-Jones parameter

648

for water’s oxygen (TIP4P-D water model[148]) or hydrogen (mTIP3P)[82]. These ap-

649

proaches improved the agreement with the experimental observables for some peptide

650

cases but not for others, rendering this approach not universally applicable. For ex-

651

ample, recent studies showed that the more polar TIP3P model favours more compact

652

structures[176]. On the other hand, the treatment of the protein-water interactions

653

by the TIP4P-Ew and TIP4P/2005 water models, eases the hydrophobically-driven

654

collapse and improves sampling of the disordered state for peptides like Trp-cage and

655

GB1[17], with TIP4P/2005 giving less collapsed conformations[135]. The latest of

656

the released water models, TIP4P-D matched better the experimental Rg in a set of

657

disordered peptides[148].

TE

D

M AN U

SC

RI PT

639

It has, thus, been realized that a new direction for further improvement of the

659

force fields is the one aiming towards an accurate description of the solvation effect

660

and better accounting for the long-range interactions. It is probably not surprising

661

that the newest of the empirical protein force field versions are adapting to new water

662

models that provide a better description of the total physical properties of the solvent.

663

However, obtaining a balance between the water model and the protein force field is

664

not a trivial exercise, especially with respect to peptide folding and dynamics. The

665

bulk properties of the solvent are experimentally tractable and thus it is straightfor-

666

ward to validate the available water models on how well they describe the solvent’s

667

molecular properties. However, the impact of the interactions between the solvent and

668

the biomolecules on the conformation of the latter is still poorly understood and needs

669

further exploration. A much different approach has been introduced with a force field

670

based on the Kirkwood-Buff theory (KBFF)[152] to retrieve force field parameters

671

that reproduce the microscopic characteristics of the solution to better describe the

672

protein-water interactions. This force field was found to compensate for the overcom-

AC C

EP

658

21

ACCEPTED MANUSCRIPT

pacteness and match the experimentally found dimensions of disordered peptides[128],

674

still though at the expense of partially unfolding peptides like GB1[129]. However, this

675

was partially rescued by increasing the interaction cut-off to strengthen the contribu-

676

tion of the long range interactions, like in the case of AMBER99SB*-ILDN with the

677

TIP4P-D water model for Nup peptides[148].

678

RI PT

673

Polarizable force fields might be an alternative route by accounting for the environmental effects. Their main asset is the ability to describe changes of the charge

680

distribution upon conformational changes of the molecules. This can be achieved via

681

fluctuating charge models, Drude oscillator-based models[101], inducible dipole models

682

or multipole electrostatics, like in AMOEBA[166], all of which are detailed in other

683

reviews[6, 67]. Their general applicability is hindered by the large computational bur-

684

den of the additional electrostatic interatomic forces calculations, barely reaching the

685

100ns timescale even for small peptide systems.

M AN U

SC

679

The difficulty in approaching all those unresolved matters in a meaningful way is

687

that on one hand the disordered state is heavily undersampled computationally and on

688

the other hand complex experimental data of disordered peptides can have ambiguous

689

interpretations. We believe that peptides have and will continue to serve as an acid

690

test for the performance of molecular simulations and force fields to reproduce the

691

physical reality: small deficiencies in the force field parameterisation can be shielded

692

in the larger well-folded protein context. Further improvements could be of substantial

693

importance towards more accurate modelling of disease-related issues, like mutation

694

effects, misfolding, aggregation or alternative enzyme conformations that affect drug

695

binding. We hope that this review of the literature regarding molecular simulations

696

of peptides and associated force fields will aid the readers to wisely select a force field

697

suitable for a certain application in order to achieve high quality results.

AC C

EP

TE

D

686

22

ACCEPTED MANUSCRIPT

698

7. Author Information Corresponding Author

700

*Phone: +30-25510-30620. Fax: +30-25510-30620.

701

Web page: https://utopia.duth.gr/glykos/. E-mail: [email protected].

702

ORCID

703

Panagiota S. Georgoulia: 0000-0003-4573-8052

704

Nicholas M. Glykos: 0000-0003-3782-206X

705

Notes

706

The authors declare no competing financial interest.

AC C

EP

TE

D

M AN U

SC

RI PT

699

23

EP

TE

D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

Figure 1: CHARMM family of protein force fields. Circled tree map representation of the various protein-related force fields developed to present time. The size of the circle and the colour is analogous to the number of reported citations. Genealogically-related force field are encircled in the same hierarchical level, which is represented by the thickness of the lines. Force-fields are ordered chronologically from left to right by year of appearance.

24

AC C

EP

TE

D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

Figure 2: AMBER family of protein force fields. Circled tree map representation of the various protein force fields developed to present time as presented in Figure 1.

25

AC C

EP

TE

D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

Figure 3: OPLS family of protein force fields. Circled tree map representation of the various protein force fields developed to present time as presented in Figure 1.

26

AC C

EP

TE

D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

Figure 4: GROMOS family of protein force fields. Circled tree map representation of the various protein force fields developed to present time as presented in Figure 1.

27

AC C

EP

TE

D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

Figure 5: Water models for biomolecular simulations. Circled tree map representation of the water potentials used in simulations of proteins and peptides. The size and colour of the circles follows their popularity as expressed by the number of citations. Encircled are the models that can be used in conjunction with a specific family of force fields, with consecutive levels of hierarchy represented by the line colour.

28

ACCEPTED MANUSCRIPT

707

708

References [1] Abascal, J. L., and Vega, C. A general purpose model for the condensed phases of water: Tip4p/2005. The Journal of chemical physics 123, 23 (2005),

710

234505.

711

RI PT

709

´ll, S., Smith, J. C., [2] Abraham, M. J., Murtola, T., Schulz, R., Pa Hess, B., and Lindahl, E. Gromacs: High performance molecular simulations

713

through multi-level parallelism from laptops to supercomputers. SoftwareX 1

714

(2015), 19–25.

SC

712

[3] Adamidou, T., Arvaniti, K.-O., and Glykos, N. M. Folding simulations of

716

a nuclear receptor box-containing peptide demonstrate the structural persistence

717

of the lxxll motif even in the absence of its cognate receptor. The Journal of

718

Physical Chemistry B 122, 1 (2017), 106–116.

719

M AN U

715

[4] Alejandre, J., Chapela, G. A., Saint-Martin, H., and Mendoza, N. A non-polarizable model of water that yields the dielectric constant and the

721

density anomalies of the liquid: Tip4q. Physical Chemistry Chemical Physics 13,

722

44 (2011), 19728–19740.

D

720

[5] Aliev, A. E., and Courtier-Murias, D. Experimental verification of force

724

fields for molecular dynamics simulations using gly-pro-gly-gly. The Journal of

725

Physical Chemistry B 114, 38 (2010), 12358–12375.

EP

TE

723

[6] Baker, C. M. Polarizable force fields for molecular dynamics simulations of

727

biomolecules. Wiley Interdisciplinary Reviews: Computational Molecular Science

728

729 730

AC C

726

5, 2 (2015), 241–254.

[7] Baker, D., and Sali, A. Protein structure prediction and structural genomics. Science 294, 5540 (2001), 93–96.

731

[8] Baltzis, A. S., and Glykos, N. M. Characterizing a partially ordered

732

miniprotein through folding molecular dynamics simulations: Comparison with

733

the experimental data. Protein Science 25, 3 (2016), 587–596.

734

[9] Beauchamp, K. A., Lin, Y.-S., Das, R., and Pande, V. S. Are protein

735

force fields getting better? a systematic benchmark on 524 diverse nmr measure-

736

ments. Journal of chemical theory and computation 8, 4 (2012), 1409–1414. 29

ACCEPTED MANUSCRIPT

737

[10] Berendsen, H., Grigera, J., and Straatsma, T. The missing term in effective pair potentials. Journal of Physical Chemistry 91, 24 (1987), 6269–6271.

739

[11] Berendsen, H. J., Postma, J. P., van Gunsteren, W. F., and Hermans,

RI PT

738

740

J. Interaction models for water in relation to protein hydration. In Intermolecular

741

forces. Springer, 1981, pp. 331–342.

[12] Berendsen, H. J., van der Spoel, D., and van Drunen, R. Gromacs: a

743

message-passing parallel molecular dynamics implementation. Computer physics

744

communications 91, 1-3 (1995), 43–56.

746

747 748

[13] Best, R. B., Buchete, N.-V., and Hummer, G. Are current molecular

M AN U

745

SC

742

dynamics force fields too helical? Biophysical journal 95, 1 (2008), L07–L09. [14] Best, R. B., de Sancho, D., and Mittal, J. Residue-specific α-helix propensities from molecular simulation. Biophysical journal 102, 6 (2012), 1462–1467. [15] Best, R. B., and Hummer, G. Optimized molecular dynamics force fields

750

applied to the helix- coil transition of polypeptides. The journal of physical

751

chemistry B 113, 26 (2009), 9004–9015. [16] Best, R. B., and Mittal, J. Protein simulations with an optimized water

TE

752

D

749

model: cooperative helix formation and temperature-induced unfolded state col-

754

lapse. The Journal of Physical Chemistry B 114, 46 (2010), 14916–14923.

EP

753

[17] Best, R. B., and Mittal, J. Free-energy landscape of the gb1 hairpin in

756

all-atom explicit solvent simulations with different force fields: Similarities and

757

differences. Proteins: Structure, Function, and Bioinformatics 79, 4 (2011), 1318–

758

AC C

755

1328.

759

[18] Best, R. B., Zheng, W., and Mittal, J. Balanced protein–water interac-

760

tions improve properties of disordered proteins and non-specific protein associa-

761

tion. Journal of chemical theory and computation 10, 11 (2014), 5113–5124.

762

[19] Best, R. B., Zhu, X., Shim, J., Lopes, P. E., Mittal, J., Feig, M., and

763

MacKerell Jr, A. D. Optimization of the additive charmm all-atom protein

764

force field targeting improved sampling of the backbone φ, ψ and side-chain χ1

765

and χ2 dihedral angles. Journal of chemical theory and computation 8, 9 (2012),

766

3257–3273. 30

ACCEPTED MANUSCRIPT

[20] Bieri, O., Wirz, J., Hellrung, B., Schutkowski, M., Drewello, M.,

768

and Kiefhaber, T. The speed limit for protein folding measured by triplet–

769

triplet energy transfer. Proceedings of the National Academy of Sciences 96, 17

770

(1999), 9597–9601.

RI PT

767

[21] Bonneau, R., and Baker, D. Ab initio protein structure prediction: progress

772

and prospects. Annual review of biophysics and biomolecular structure 30, 1

773

(2001), 173–189.

SC

771

[22] Borhani, D. W., and Shaw, D. E. The future of molecular dynamics simula-

775

tions in drug discovery. Journal of computer-aided molecular design 26, 1 (2012),

776

15–26.

M AN U

774

777

[23] Brooks, B. R., Bruccoleri, R. E., Olafson, B. D., States, D. J.,

778

Swaminathan, S. a., and Karplus, M. Charmm: a program for macromolec-

779

ular energy, minimization, and dynamics calculations. Journal of computational

780

chemistry 4, 2 (1983), 187–217.

D

782

[24] Brooks, C. L. Protein and peptide folding explored with molecular simulations. Accounts of chemical research 35, 6 (2002), 447–454.

TE

781

[25] Buck, M., Bouguet-Bonnet, S., Pastor, R. W., and MacKerell Jr,

784

A. D. Importance of the cmap correction to the charmm22 protein force field:

785

dynamics of hen lysozyme. Biophysical journal 90, 4 (2006), L36–L38.

787

788 789 790 791

[26] Burger, V., Gurry, T., and Stultz, C. Intrinsically disordered proteins: Where computation meets experiment. Polymers 6, 10 (2014), 2684–2719.

AC C

786

EP

783

[27] Cellmer, T., Buscaglia, M., Henry, E. R., Hofrichter, J., and Eaton, W. A. Making connections between ultrafast protein folding kinetics and molecular dynamics simulations. Proceedings of the National Academy of

Sciences (2011).

792

[28] Chowdhury, S., Lee, M. C., Xiong, G., and Duan, Y. Ab initio folding

793

simulation of the trp-cage mini-protein approaches nmr resolution. Journal of

794

molecular biology 327, 3 (2003), 711–717.

31

ACCEPTED MANUSCRIPT

[29] Cino, E. A., Choy, W.-Y., and Karttunen, M. Comparison of secondary

796

structure formation using 10 different force fields in microsecond molecular dy-

797

namics simulations. Journal of chemical theory and computation 8, 8 (2012),

798

2725–2740.

RI PT

795

[30] Cisneros, G. A., Wikfeldt, K. T., Ojamae, L., Lu, J., Xu, Y., Tora-

800

bifard, H., Bartok, A. P., Csanyi, G., Molinero, V., and Paesani, F.

801

Modeling molecular interactions in water: From pairwise to many-body potential

802

energy functions. Chemical reviews 116, 13 (2016), 7501–7528.

803

SC

799

[31] Cornell, W. D., Cieplak, P., Bayly, C. I., Gould, I. R., Merz, K. M., Ferguson, D. M., Spellmeyer, D. C., Fox, T., Caldwell, J. W., and

805

Kollman, P. A. A second generation force field for the simulation of proteins,

806

nucleic acids, and organic molecules. Journal of the American Chemical Society

807

117, 19 (1995), 5179–5197.

M AN U

804

[32] Daura, X., Gademann, K., Jaun, B., Seebach, D., Van Gunsteren,

809

W. F., and Mark, A. E. Peptide folding: when simulation meets experiment.

810

Angewandte Chemie International Edition 38, 1-2 (1999), 236–240.

D

808

[33] Day, R., Paschek, D., and Garcia, A. E. Microsecond simulations of

812

the folding/unfolding thermodynamics of the trp-cage miniprotein. Proteins:

813

Structure, Function, and Bioinformatics 78, 8 (2010), 1889–1899.

TE

811

[34] Debiec, K. T., Cerutti, D. S., Baker, L. R., Gronenborn, A. M.,

815

Case, D. A., and Chong, L. T. Further along the road less traveled: Amber

816

ff15ipq, an original protein force field built on a self-consistent physical model.

AC C

817

EP

814

Journal of chemical theory and computation 12, 8 (2016), 3926–3947.

818

[35] Doniach, S., and Eastman, P. Protein dynamics simulations from nanosec-

819

onds to microseconds. Current opinion in structural biology 9, 2 (1999), 157–163.

820

[36] Doshi, U., and Hamelberg, D. Towards fast, rigorous and efficient confor-

821

mational sampling of biomolecules: Advances in accelerated molecular dynamics.

822

Biochimica et Biophysica Acta (BBA)-General Subjects 1850, 5 (2015), 878–888.

823

[37] Dror, R. O., Dirks, R. M., Grossman, J., Xu, H., and Shaw, D. E.

824

Biomolecular simulation: a computational microscope for molecular biology. An-

825

nual review of biophysics 41 (2012), 429–452. 32

ACCEPTED MANUSCRIPT

[38] Duan, Y., and Kollman, P. A. Pathways to a protein folding intermediate

827

observed in a 1-microsecond simulation in aqueous solution. Science 282, 5389

828

(1998), 740–744.

829

RI PT

826

[39] Duan, Y., Wu, C., Chowdhury, S., Lee, M. C., Xiong, G., Zhang, W., Yang, R., Cieplak, P., Luo, R., Lee, T., et al. A point-charge force

831

field for molecular mechanics simulations of proteins based on condensed-phase

832

quantum mechanical calculations. Journal of computational chemistry 24, 16

833

(2003), 1999–2012.

SC

830

[40] Dunker, A. K., Lawson, J. D., Brown, C. J., Williams, R. M.,

835

Romero, P., Oh, J. S., Oldfield, C. J., Campen, A. M., Ratliff, C. M.,

836

Hipps, K. W., et al. Intrinsically disordered protein. Journal of molecular

837

graphics and modelling 19, 1 (2001), 26–59.

839

840

[41] Durrant, J. D., and McCammon, J. A. Molecular dynamics simulations and drug discovery. BMC biology 9, 1 (2011), 71.

[42] Eaton, W. A., Munoz, V., Hagen, S. J., Jas, G. S., Lapidus, L. J.,

D

838

M AN U

834

Henry, E. R., and Hofrichter, J. Fast kinetics and mechanisms in protein

842

folding. Annual review of biophysics and biomolecular structure 29, 1 (2000),

843

327–359.

TE

841

[43] Eaton, W. A., Munoz, V., Thompson, P. A., Henry, E. R., and

845

Hofrichter, J. Kinetics and dynamics of loops, α-helices, β-hairpins, and

846

fast-folding proteins. Accounts of Chemical Research 31, 11 (1998), 745–753.

AC C

EP

844

847

[44] Edwards, C., Cohen, M., and Bloom, S. Peptides as drugs, 1999.

848

[45] Ensign, D. L., Kasson, P. M., and Pande, V. S. Heterogeneity even at

849

the speed limit of folding: large-scale molecular dynamics study of a fast-folding

850

variant of the villin headpiece. Journal of molecular biology 374, 3 (2007), 806–

851

816.

852

[46] Ensign, D. L., and Pande, V. S. The fip35 ww domain folds with structural

853

and mechanistic heterogeneity in molecular dynamics simulations. Biophysical

854

journal 96, 8 (2009), L53–L55.

33

ACCEPTED MANUSCRIPT

[47] Faver, J. C., Benson, M. L., He, X., Roberts, B. P., Wang, B., Mar-

856

shall, M. S., Sherrill, C. D., and Merz Jr, K. M. The energy computa-

857

tion paradox and ab initio protein folding. PloS one 6, 4 (2011), e18868.

RI PT

855

858

[48] Feng, C.-J., Dhayalan, B., and Tokmakoff, A. Refinement of peptide

859

conformational ensembles by 2d ir spectroscopy: Application to ala–ala–ala. Bio-

860

physical Journal 114, 12 (2018), 2820–2832.

[49] Ferrara, P., Apostolakis, J., and Caflisch, A. Thermodynamics and

862

kinetics of folding of two model peptides investigated by molecular dynamics

863

simulations. The Journal of Physical Chemistry B 104, 20 (2000), 5000–5010.

M AN U

SC

861

864

[50] Ferrara, P., and Caflisch, A. Folding simulations of a three-stranded

865

antiparallel β-sheet peptide. Proceedings of the National Academy of Sciences

866

97, 20 (2000), 10780–10785.

867 868

[51] Freddolino, P. L., Harrison, C. B., Liu, Y., and Schulten, K. Challenges in protein-folding simulations. Nature physics 6, 10 (2010), 751. [52] Freddolino, P. L., Liu, F., Gruebele, M., and Schulten, K. Ten-

870

microsecond molecular dynamics simulation of a fast-folding ww domain. Bio-

871

physical journal 94, 10 (2008), L75–L77.

873

TE

[53] Freddolino, P. L., Park, S., Roux, B., and Schulten, K. Force field

EP

872

D

869

bias in protein folding simulations. Biophysical journal 96, 9. [54] Freddolino, P. L., and Schulten, K. Common structural transitions in

875

explicit-solvent simulations of villin headpiece folding. Biophysical journal 97, 8

876

AC C

874

(2009), 2338–2347.

877

[55] Friedrichs, M. S., Eastman, P., Vaidyanathan, V., Houston, M.,

878

Legrand, S., Beberg, A. L., Ensign, D. L., Bruns, C. M., and Pande,

879

V. S. Accelerating molecular dynamic simulation on graphics processing units.

880

Journal of computational chemistry 30, 6 (2009), 864–872.

881

[56] Fuentes-Azcatl, R., and Alejandre, J. Non-polarizable force field of water

882

based on the dielectric constant: Tip4p/ε. The Journal of Physical Chemistry B

883

118, 5 (2014), 1263–1272.

34

ACCEPTED MANUSCRIPT

[57] Garcia, A. E., and Sanbonmatsu, K. Y. α-helical stabilization by side chain

885

shielding of backbone hydrogen bonds. Proceedings of the National Academy of

886

Sciences 99, 5 (2002), 2782–2787.

RI PT

884

887

[58] Georgoulia, P. S., and Glykos, N. M. Using j-coupling constants for force

888

field validation: application to hepta-alanine. The Journal of Physical Chemistry

889

B 115, 51 (2011), 15221–15227.

[59] Georgoulia, P. S., and Glykos, N. M. On the foldability of tryptophan-

891

containing tetra-and pentapeptides: an exhaustive molecular dynamics study.

893

The Journal of Physical Chemistry B 117, 18 (2013), 5522–5532.

M AN U

892

SC

890

[60] Georgoulia, P. S., and Glykos, N. M. Folding molecular dynamics simula-

894

tion of a gp41-derived peptide reconcile divergent structure determinations. ACS

895

Omega 3, 11 (2018), 14746–14754.

[61] Gnanakaran, S., and Garcia, A. E. Validation of an all-atom protein force

897

field: from dipeptides to larger peptides. The Journal of Physical Chemistry B

898

107, 46 (2003), 12555–12557.

D

896

[62] Gnanakaran, S., and Garc´ıa, A. E. Helix-coil transition of alanine peptides

900

in water: Force field dependence on the folded and unfolded structures. Proteins:

901

Structure, Function, and Bioinformatics 59, 4 (2005), 773–782.

EP

902

TE

899

[63] Gnanakaran, S., Nymeyer, H., Portman, J., Sanbonmatsu, K. Y., and Garcıa, A. E. Peptide folding simulations. Current opinion in structural

904

biology 13, 2 (2003), 168–174.

AC C

903

905

[64] Guillot, B. A reappraisal of what we have learnt during three decades of

906

computer simulations on water. Journal of Molecular Liquids 101, 1-3 (2002),

907

219–260.

908

[65] Guvench, O., and MacKerell, A. D. Comparison of protein force fields

909

for molecular dynamics simulations. In Molecular modeling of proteins. Springer,

910

2008, pp. 63–88.

911 912

[66] Hagen, S. J., Hofrichter, J., Szabo, A., and Eaton, W. A. Diffusionlimited contact formation in unfolded cytochrome c: estimating the maximum

35

ACCEPTED MANUSCRIPT

913

rate of protein folding. Proceedings of the National Academy of Sciences 93, 21

914

(1996), 11615–11617.

916

[67] Halgren, T. A., and Damm, W. Polarizable force fields. Current opinion in structural biology 11, 2 (2001), 236–242.

RI PT

915

[68] Hamelberg, D., Mongan, J., and McCammon, J. A. Accelerated molecu-

918

lar dynamics: a promising and efficient simulation method for biomolecules. The

919

Journal of chemical physics 120, 24 (2004), 11919–11929.

SC

917

[69] Harder, E., Damm, W., Maple, J., Wu, C., Reboul, M., Xiang, J. Y.,

921

Wang, L., Lupyan, D., Dahlgren, M. K., Knight, J. L., et al. Opls3:

922

a force field providing broad coverage of drug-like small molecules and proteins.

923

Journal of chemical theory and computation 12, 1 (2015), 281–296.

924

M AN U

920

[70] Hatfield, M. P., Murphy, R. F., and Lovas, S. Vcd spectroscopic proper-

925

ties of the β-hairpin forming miniprotein cln025 in various solvents. Biopolymers

926

93, 5 (2010), 442–450.

[71] Hazel, A. J., Walters, E. T., Rowley, C. N., and Gumbart, J. C.

D

927

Folding free energy landscapes of β-sheets with non-polarizable and polarizable

929

charmm force fields. The Journal of Chemical Physics 149, 7 (2018), 072317.

930

TE

928

[72] Henriques, J., Cragnell, C., and Skepo, M. Molecular dynamics simulations of intrinsically disordered proteins: force field evaluation and comparison

932

with experiment. Journal of chemical theory and computation 11, 7 (2015), 3420–

933

3431.

AC C

EP

931

934

[73] Hess, B., and van der Vegt, N. F. Hydration thermodynamic properties of

935

amino acid analogues: a systematic comparison of biomolecular force fields and

936

water models. The Journal of Physical Chemistry B 110, 35 (2006), 17616–17626.

937

[74] Higo, J., Ito, N., Kuroda, M., Ono, S., Nakajima, N., and Nakamura,

938

H. Energy landscape of a peptide consisting of α-helix, 310-helix, β-turn, β-

939

hairpin, and other disordered conformations. Protein Science 10, 6 (2001), 1160–

940

1171.

941 942

[75] Ho, B. K., and Dill, K. A. Folding very short peptides using molecular dynamics. PLoS computational biology 2, 4 (2006), e27. 36

ACCEPTED MANUSCRIPT

944

945 946

947

[76] Hoang Viet, M., Derreumaux, P., and Nguyen, P. H. Communication: Multiple atomistic force fields in a single enhanced sampling simulation, 2015. [77] Honda, S., Yamasaki, K., Sawada, Y., and Morii, H. 10 residue folded

RI PT

943

peptide designed by segment statistics. Structure 12, 8 (2004), 1507–1518. [78] Horn, H. W., Swope, W. C., Pitera, J. W., Madura, J. D., Dick, T. J., Hura, G. L., and Head-Gordon, T. Development of an improved

949

four-site water model for biomolecular simulations: Tip4p-ew. The Journal of

950

chemical physics 120, 20 (2004), 9665–9678.

SC

948

[79] Hornak, V., Abel, R., Okur, A., Strockbine, B., Roitberg, A., and

952

Simmerling, C. Comparison of multiple amber force fields and development

953

of improved protein backbone parameters. Proteins: Structure, Function, and

954

Bioinformatics 65, 3 (2006), 712–725.

955

M AN U

951

[80] Huang, J., and MacKerell, A. D. Force field development and simulations of intrinsically disordered proteins. Current opinion in structural biology

957

48 (2018), 40–48.

D

956

[81] Huang, J., and MacKerell Jr, A. D. Charmm36 all-atom additive protein

959

force field: Validation based on comparison to nmr data. Journal of computa-

960

tional chemistry 34, 25 (2013), 2135–2145.

EP

TE

958

963

proved force field for folded and intrinsically disordered proteins. nature methods

964

AC C

962

[82] Huang, J., Rauscher, S., Nawrocki, G., Ran, T., Feig, M., de Groot, ¨ller, H., and MacKerell Jr, A. D. Charmm36m: an imB. L., Grubmu

961

14, 1 (2016), 71.

965

[83] Jorgensen, W. L. Quantum and statistical mechanical studies of liquids. 10.

966

transferable intermolecular potential functions for water, alcohols, and ethers.

967

application to liquid water. Journal of the American Chemical Society 103, 2

968

(1981), 335–340.

969

[84] Jorgensen, W. L., Chandrasekhar, J., Madura, J. D., Impey, R. W.,

970

and Klein, M. L. Comparison of simple potential functions for simulating

971

liquid water. The Journal of chemical physics 79, 2 (1983), 926–935.

37

ACCEPTED MANUSCRIPT

[85] Jorgensen, W. L., Maxwell, D. S., and Tirado-Rives, J. Development

973

and testing of the opls all-atom force field on conformational energetics and

974

properties of organic liquids. Journal of the American Chemical Society 118, 45

975

(1996), 11225–11236.

RI PT

972

[86] Jorgensen, W. L., and Tirado-Rives, J. The opls [optimized potentials

977

for liquid simulations] potential functions for proteins, energy minimizations for

978

crystals of cyclic peptides and crambin. Journal of the American Chemical Society

979

110, 6 (1988), 1657–1666.

980

SC

976

[87] Kaminski, G. A., Friesner, R. A., Tirado-Rives, J., and Jorgensen, W. L. Evaluation and reparametrization of the opls-aa force field for proteins

982

via comparison with accurate quantum chemical calculations on peptides. The

983

Journal of Physical Chemistry B 105, 28 (2001), 6474–6487.

984

M AN U

981

[88] Kamiya, N., Higo, J., and Nakamura, H. Conformational transition states of a β-hairpin peptide between the ordered and disordered conformations in ex-

986

plicit water. Protein Science 11, 10 (2002), 2297–2307.

988

989

[89] Keller, T. H., Pichota, A., and Yin, Z. A practical view of druggability. Current opinion in chemical biology 10, 4 (2006), 357–361.

TE

987

D

985

[90] Kier, B. L., and Andersen, N. H. Probing the lower size limit for protein-like fold stability: ten-residue microproteins with specific, rigid structures in water.

991

Journal of the American Chemical Society 130, 44 (2008), 14675–14683.

993

[91] Kliger, Y. Computational approaches to therapeutic peptide discovery. Peptide

AC C

992

EP

990

Science 94, 6 (2010), 701–710.

994

[92] Koller, A. N., Schwalbe, H., and Gohlke, H. Starting structure depen-

995

dence of nmr order parameters derived from md simulations: implications for

996

997

judging force-field quality. Biophysical journal 95, 1 (2008), L04–L06.

[93] Kollman, P., Dixon, R., Cornell, W., Fox, T., Chipot, C., and Po-

998

horille, A. The development/application of a minimalistorganic/biochemical

999

molecular mechanic force field using a combination of ab initio calculations and

1000

experimental data. In Computer simulation of biomolecular systems. Springer,

1001

1997, pp. 83–96. 38

ACCEPTED MANUSCRIPT

[94] Koukos, P. I., and Glykos, N. M. Folding molecular dynamics simulations

1003

accurately predict the effect of mutations on the stability and structure of a

1004

vammin-derived peptide. The Journal of Physical Chemistry B 118, 34 (2014),

1005

10076–10084.

RI PT

1002

[95] Kubelka, J., Henry, E. R., Cellmer, T., Hofrichter, J., and Eaton,

1007

W. A. Chemical, physical, and theoretical kinetics of an ultrafast folding protein.

1008

Proceedings of the National Academy of Sciences 105, 48 (2008), 18655–18662.

1010

1011

[96] Kubelka, J., Hofrichter, J., and Eaton, W. A. The protein folding speed limit. Current opinion in structural biology 14, 1 (2004), 76–88.

M AN U

1009

SC

1006

[97] Kuhlman, B., and Baker, D. Exploring folding free energy landscapes using

1012

computational protein design. Current opinion in structural biology 14, 1 (2004),

1013

89–95.

¨hrova ´, P., De Simone, A., Otyepka, M., and Best, R. B. Force-field [98] Ku

1015

dependence of chignolin folding and misfolding: comparison with experiment and

1016

redesign. Biophysical journal 102, 8 (2012), 1897–1906.

1017

D

1014

[99] L. Fawzi, N., Phillips, A. H., Ruscio, J. Z., Doucleff, M., Wemmer, D. E., and Head-Gordon, T. Structure and dynamics of the aβ21–30 peptide

1019

from the interplay of nmr experiments and molecular simulations. Journal of the

1020

American Chemical Society 130, 19 (2008), 6145–6158.

1023 1024 1025

1026

EP

1022

[100] Laio, A., and Parrinello, M. Escaping free-energy minima. Proceedings of the National Academy of Sciences 99, 20 (2002), 12562–12566.

AC C

1021

TE

1018

[101] Lamoureux, G., MacKerell Jr, A. D., and Roux, B. A simple polarizable model of water based on classical drude oscillators. The Journal of chemical

physics 119, 10 (2003), 5185–5197.

[102] Lange, O. F., Van der Spoel, D., and De Groot, B. L. Scrutinizing

1027

molecular mechanics force fields on the submicrosecond timescale with nmr data.

1028

Biophysical journal 99, 2 (2010), 647–655.

1029

[103] Lapidus, L. J., Eaton, W. A., and Hofrichter, J. Measuring the rate

1030

of intramolecular contact formation in polypeptides. Proceedings of the National

1031

Academy of Sciences 97, 13 (2000), 7220–7225. 39

ACCEPTED MANUSCRIPT

[104] Larson, S. M., Snow, C. D., Shirts, M., and Pande, V. S. Folding@

1033

home and genome@ home: Using distributed computing to tackle previously

1034

intractable problems in computational biology. arXiv preprint arXiv:0901.0866

1035

(2009).

1037

1038

¨schweiler, R. Nmr-based protein potentials. Ange[105] Li, D.-W., and Bru wandte Chemie International Edition 49, 38 (2010), 6778–6780.

[106] Li, D.-W., and Bruschweiler, R. Iterative optimization of molecular me-

SC

1036

RI PT

1032

chanics force fields from nmr data of full-length proteins. Journal of chemical

1040

theory and computation 7, 6 (2011), 1773–1782.

1041

M AN U

1039

[107] Lindorff-Larsen, K., Maragakis, P., Piana, S., Eastwood, M. P.,

1042

Dror, R. O., and Shaw, D. E. Systematic validation of protein force fields

1043

against experimental data. PloS one 7, 2 (2012), e32131.

1045

1046

[108] Lindorff-Larsen, K., Piana, S., Dror, R. O., and Shaw, D. E. How fast-folding proteins fold. Science 334, 6055 (2011), 517–520. [109] Lindorff-Larsen, K., Piana, S., Palmo, K., Maragakis, P., Klepeis,

D

1044

J. L., Dror, R. O., and Shaw, D. E. Improved side-chain torsion potentials

1048

for the amber ff99sb protein force field. Proteins: Structure, Function, and

1049

Bioinformatics 78, 8 (2010), 1950–1958.

1051

EP

1050

TE

1047

[110] Loffet, A. Peptides as drugs: is there a market? Journal of peptide science: an official publication of the European Peptide Society 8, 1 (2002), 1–7. [111] Lopes, P. E., Guvench, O., and MacKerell, A. D. Current status of

1053

protein force fields for molecular dynamics simulations. In Molecular Modeling

1054

AC C

1052

of Proteins. Springer, 2015, pp. 47–71.

1055

[112] MacKerell Jr, A. D. Empirical force fields for biological macromolecules:

1056

overview and issues. Journal of computational chemistry 25, 13 (2004), 1584–

1057

1604.

1058

[113] MacKerell Jr, A. D., Bashford, D., Bellott, M., Dunbrack Jr,

1059

R. L., Evanseck, J. D., Field, M. J., Fischer, S., Gao, J., Guo, H.,

1060

Ha, S., et al. The journal of physical chemistry B 102, 18 (1998), 3586–3616.

40

ACCEPTED MANUSCRIPT

1061

[114] MacKerell Jr, A. D., Brooks, B., Brooks III, C. L., Nilsson, L., Roux, B., Won, Y., and Karplus, M. Charmm: the energy function and

1063

its parameterization. Encyclopedia of computational chemistry 1 (2002).

RI PT

1062

1064

[115] MacKerell Jr, A. D., Feig, M., and Brooks, C. L. Improved treatment of

1065

the protein backbone in empirical force fields. Journal of the American Chemical

1066

Society 126, 3 (2003), 698–699.

[116] Mackerell Jr, A. D., Feig, M., and Brooks III, C. L. Extending the

1068

treatment of backbone energetics in protein force fields: limitations of gas-phase

1069

quantum mechanics in reproducing protein conformational distributions in molec-

1070

ular dynamics simulations. Journal of computational chemistry 25, 11 (2004),

1071

1400–1415.

M AN U

SC

1067

1072

[117] Mahoney, M. W., and Jorgensen, W. L. A five-site model for liquid water

1073

and the reproduction of the density anomaly by rigid, nonpolarizable potential

1074

functions. The Journal of Chemical Physics 112, 20 (2000), 8910–8922. [118] Maier, J. A., Martinez, C., Kasavajhala, K., Wickstrom, L., Hauser,

D

1075

K. E., and Simmerling, C. ff14sb: improving the accuracy of protein side

1077

chain and backbone parameters from ff99sb. Journal of chemical theory and

1078

computation 11, 8 (2015), 3696–3713. [119] Man, V. H., Nguyen, P. H., and Derreumaux, P. High-resolution struc-

EP

1079

TE

1076

tures of the amyloid-β 1–42 dimers from the comparison of four atomistic force

1081

fields. The Journal of Physical Chemistry B 121, 24 (2017), 5977–5987.

1082 1083

AC C

1080

[120] Marinari, E., and Parisi, G. Simulated tempering: a new monte carlo scheme. EPL (Europhysics Letters) 19, 6 (1992), 451.

1084

[121] Mark, P., and Nilsson, L. Structure and dynamics of the tip3p, spc, and

1085

spc/e water models at 298 k. The Journal of Physical Chemistry A 105, 43

1086

(2001), 9954–9960.

1087

[122] Markwick, P. R., and McCammon, J. A. Studying functional dynamics in

1088

bio-molecules using accelerated molecular dynamics. Physical Chemistry Chem-

1089

ical Physics 13, 45 (2011), 20053–20065.

41

ACCEPTED MANUSCRIPT

[123] Matthes, D., and De Groot, B. L. Secondary structure propensities in

1091

peptide folding simulations: a systematic comparison of molecular mechanics

1092

interaction schemes.

RI PT

1090

1093

[124] Maupetit, J., Derreumaux, P., and Tuffery, P. Pep-fold: an online

1094

resource for de novo peptide structure prediction. Nucleic acids research 37,

1095

suppl 2 (2009), W498–W503.

[125] Mayor, U., Johnson, C. M., Daggett, V., and Fersht, A. R. Protein

SC

1096

folding and unfolding in microseconds to nanoseconds by experiment and simu-

1098

lation. Proceedings of the National Academy of Sciences 97, 25 (2000), 13518–

1099

13522.

M AN U

1097

[126] McCammon, J. A. A speed limit for protein folding. Proceedings of the National

1101

Academy of Sciences of the United States of America 93, 21 (1996), 11426.

1102

[127] McKiernan, K. A., Husic, B. E., and Pande, V. S. Modeling the mecha-

1103

nism of cln025 beta-hairpin formation. The Journal of Chemical Physics 147, 10

1104

(2017), 104107.

D

1100

[128] Mercadante, D., Milles, S., Fuertes, G., Svergun, D. I., Lemke,

1106

E. A., and Grater, F. Kirkwood–buff approach rescues overcollapse of a dis-

1107

ordered protein in canonical protein force fields. The journal of physical chemistry

1108

B 119, 25 (2015), 7975–7984.

EP

TE

1105

[129] Mercadante, D., Wagner, J. A., Aramburu, I. V., Lemke, E. A., and

1110

Grater, F. Sampling long-versus short-range interactions defines the ability of

1111 1112

1113

AC C

1109

force fields to reproduce the dynamics of intrinsically disordered proteins. Journal

of chemical theory and computation 13, 9 (2017), 3964–3974.

[130] Mitsutake, A., Sugita, Y., and Okamoto, Y. Generalized-ensemble al-

1114

gorithms for molecular simulations of biopolymers. Peptide Science: Original

1115

Research on Biomolecules 60, 2 (2001), 96–123.

1116

[131] Mittal, J., and Best, R. B. Tackling force-field bias in protein folding simu-

1117

lations: folding of villin hp35 and pin ww domains in explicit water. Biophysical

1118

journal 99, 3 (2010), L26–L28.

42

ACCEPTED MANUSCRIPT

[132] Munoz, V., Thompson, P. A., Hofrichter, J., and Eaton, W. A. Fold-

1120

ing dynamics and mechanism of β-hairpin formation. Nature 390, 6656 (1997),

1121

196.

RI PT

1119

[133] Nasica-Labouze, J., Nguyen, P. H., Sterpone, F., Berthoumieu, O.,

1123

Buchete, N.-V., Cote, S., De Simone, A., Doig, A. J., Faller, P.,

1124

Garcia, A., et al. Amyloid β protein and alzheimer???s disease: When

1125

computer simulations complement experimental studies. Chemical reviews 115,

1126

9 (2015), 3518–3563.

SC

1122

[134] Neria, E., Fischer, S., and Karplus, M. Simulation of activation free

1128

energies in molecular systems. The Journal of chemical physics 105, 5 (1996),

1129

1902–1921.

1130 1131

M AN U

1127

¨ller-Spa ¨th, S., Ku ¨ster, F., Hofmann, H., Haenni, [135] Nettels, D., Mu ¨egger, S., Reymond, L., Hoffmann, A., Kubelka, J., Heinz, D., Ru B., et al. Single-molecule spectroscopy of the temperature-induced collapse

1133

of unfolded proteins. Proceedings of the National Academy of Sciences 106, 49

1134

(2009), 20740–20745.

[136] Nguyen, P. H., Li, M. S., and Derreumaux, P. Effects of all-atom force

TE

1135

D

1132

fields on amyloid oligomerization: Replica exchange molecular dynamics simula-

1137

tions of the aβ 16–22 dimer and trimer. Physical Chemistry Chemical Physics

1138

13, 20 (2011), 9778–9788.

EP

1136

[137] Okur, A., Strockbine, B., Hornak, V., and Simmerling, C. Using pc

1140

clusters to evaluate the transferability of molecular mechanics force fields for

1141

1142 1143 1144

AC C

1139

proteins. Journal of computational chemistry 24, 1 (2003), 21–31.

[138] Ono, S., Nakajima, N., Higo, J., and Nakamura, H. Peptide free-energy profile is strongly dependent on the force field: Comparison of c96 and amber95.

Journal of Computational Chemistry 21, 9 (2000), 748–762.

1145

[139] Onufriev, A. V., and Izadi, S. Water models for biomolecular simulations.

1146

Wiley Interdisciplinary Reviews: Computational Molecular Science 8, 2 (2018),

1147

e1347.

43

ACCEPTED MANUSCRIPT

[140] Oostenbrink, C., Villa, A., Mark, A. E., and Van Gunsteren, W. F.

1149

A biomolecular force field based on the free enthalpy of hydration and solvation:

1150

the gromos force-field parameter sets 53a5 and 53a6. Journal of computational

1151

chemistry 25, 13 (2004), 1656–1676.

RI PT

1148

[141] Ostermeir, K., and Zacharias, M. Advanced replica-exchange sampling

1153

to study the flexibility and plasticity of peptides and proteins. Biochimica et

1154

Biophysica Acta (BBA)-Proteins and Proteomics 1834, 5 (2013), 847–853.

1156

1157

[142] Ouyang, J. F., and Bettens, R.

Modelling water: a lifetime enigma.

CHIMIA International Journal for Chemistry 69, 3 (2015), 104–111.

M AN U

1155

SC

1152

[143] Pande, V. S., and Rokhsar, D. S. Molecular dynamics simulations of un-

1158

folding and refolding of a β-hairpin fragment of protein g. Proceedings of the

1159

National Academy of Sciences 96, 16 (1999), 9062–9067.

1160 1161

[144] Patapati, K. K., and Glykos, N. M. Three force fields’ views of the 310 helix. Biophysical journal 101, 7 (2011), 1766–1771.

[145] Patmanidis, I., and Glykos, N. M. As good as it gets? folding molecular

1163

dynamics simulations of the lyta choline-binding peptide result to an exception-

1164

ally accurate model of the peptide structure. Journal of Molecular Graphics and

1165

Modelling 41 (2013), 68–71.

EP

TE

D

1162

[146] Perez, A., MacCallum, J. L., Brini, E., Simmerling, C., and Dill,

1167

K. A. Grid-based backbone correction to the ff12sb protein force field for implicit-

1168

solvent simulations. Journal of chemical theory and computation 11, 10 (2015),

1169

AC C

1166

4770–4779.

1170

[147] Phillips, J. C., Braun, R., Wang, W., Gumbart, J., Tajkhorshid,

1171

E., Villa, E., Chipot, C., Skeel, R. D., Kale, L., and Schulten, K.

1172

Scalable molecular dynamics with namd. Journal of computational chemistry 26,

1173

16 (2005), 1781–1802.

1174

[148] Piana, S., Donchev, A. G., Robustelli, P., and Shaw, D. E. Wa-

1175

ter dispersion interactions strongly influence simulated structural properties of

1176

disordered protein states. The journal of physical chemistry B 119, 16 (2015),

1177

5113–5123. 44

ACCEPTED MANUSCRIPT

[149] Piana, S., Klepeis, J. L., and Shaw, D. E. Assessing the accuracy of

1179

physical models used in protein-folding simulations: quantitative evidence from

1180

long molecular dynamics simulations. Current opinion in structural biology 24

1181

(2014), 98–105.

1182

RI PT

1178

[150] Piana, S., Lindorff-Larsen, K., and Shaw, D. E. How robust are protein folding simulations with respect to force field parameterization?

1184

journal 100, 9 (2011), L47–L49.

1185

Biophysical

SC

1183

[151] Pitera, J. W., and Swope, W. Understanding folding and design: Replicaexchange simulations of“trp-cage”miniproteins.

1187

Academy of Sciences 100, 13 (2003), 7587–7592.

Proceedings of the National

M AN U

1186

1188

[152] Ploetz, E. A., Bentenitis, N., and Smith, P. E. Developing force fields

1189

from the microscopic structure of solutions. Fluid phase equilibria 290, 1-2 (2010),

1190

43–47.

1193

Advances in protein chemistry, vol. 66. Elsevier, 2003, pp. 27–85.

D

1192

[153] Ponder, J. W., and Case, D. A. Force fields for protein simulations. In

[154] Rauscher, S., Gapsys, V., Gajda, M. J., Zweckstetter, M.,

TE

1191

de Groot, B. L., and Grubmuller, H. Structural ensembles of intrinsically

1195

disordered proteins depend strongly on force field: a comparison to experiment.

1196

Journal of chemical theory and computation 11, 11 (2015), 5513–5524.

EP

1194

[155] Reif, M. M., Hunenberger, P. H., and Oostenbrink, C. New interaction

1198

parameters for charged amino acid side chains in the gromos force field. Journal

1199

1200 1201 1202

AC C

1197

of chemical theory and computation 8, 10 (2012), 3705–3723.

[156] Robustelli, P., Piana, S., and Shaw, D. E. Developing a molecular dynamics force field for both folded and disordered protein states. Proceedings of

the National Academy of Sciences (2018), 201800690.

1203

[157] Rodriguez, A., Mokoema, P., Corcho, F., Bisetty, K., and Perez,

1204

J. J. Computational study of the free energy landscape of the miniprotein cln025

1205

in explicit and implicit solvent. The Journal of Physical Chemistry B 115, 6

1206

(2011), 1440–1449.

45

ACCEPTED MANUSCRIPT

[158] Rosenman, D. J., Wang, C., and Garc´ıa, A. E. Characterization of aβ

1208

monomers through the convergence of ensemble properties among simulations

1209

with multiple force fields. The Journal of Physical Chemistry B 120, 2 (2015),

1210

259–277.

RI PT

1207

[159] Salomon-Ferrer, R., Gotz, A. W., Poole, D., Le Grand, S., and

1212

Walker, R. C. Routine microsecond molecular dynamics simulations with

1213

amber on gpus. 2. explicit solvent particle mesh ewald. Journal of chemical

1214

theory and computation 9, 9 (2013), 3878–3888.

SC

1211

[160] Satoh, D., Shimizu, K., Nakamura, S., and Terada, T. Folding free-

1216

energy landscape of a 10-residue mini-protein, chignolin. FEBS letters 580, 14

1217

(2006), 3422–3426.

1218

M AN U

1215

[161] Schmid, N., Eichenberger, A. P., Choutko, A., Riniker, S., Winger, M., Mark, A. E., and van Gunsteren, W. F. Definition and testing of

1220

the gromos force-field versions 54a7 and 54b7. European biophysics journal 40, 7

1221

(2011), 843.

1222

D

1219

[162] Serafeim, A.-P., Salamanos, G., Patapati, K. K., and Glykos, N. M. Sensitivity of folding molecular dynamics simulations to even minor force field

1224

changes. Journal of chemical information and modeling 56, 10 (2016), 2035–2041.

1225

[163] Shaw, D. E., Dror, R. O., Salmon, J. K., Grossman, J., Mackenzie,

1226

K. M., Bank, J. A., Young, C., Deneroff, M. M., Batson, B., Bowers,

1227

K. J., et al. Millisecond-scale molecular dynamics simulations on anton. In

1228

Proceedings of the conference on high performance computing networking, storage

EP

AC C

1229

TE

1223

and analysis (2009), ACM, p. 39.

1230

[164] Shaw, D. E., Maragakis, P., Lindorff-Larsen, K., Piana, S., Dror,

1231

R. O., Eastwood, M. P., Bank, J. A., Jumper, J. M., Salmon, J. K.,

1232

Shan, Y., et al. Atomic-level characterization of the structural dynamics of

1233

proteins. Science 330, 6002 (2010), 341–346.

1234

[165] Shell, M. S., Ozkan, S. B., Voelz, V., Wu, G. A., and Dill, K. A.

1235

Blind test of physics-based prediction of protein structures. Biophysical journal

1236

96, 3 (2009), 917–924.

46

ACCEPTED MANUSCRIPT

1237

[166] Shi, Y., Xia, Z., Zhang, J., Best, R., Wu, C., Ponder, J. W., and Ren, P. Polarizable atomic multipole-based amoeba force field for proteins. Journal

1239

of chemical theory and computation 9, 9 (2013), 4046–4063.

1240 1241

RI PT

1238

[167] Shirts, M., and Pande, V. S. Screen savers of the world unite! Science 290, 5498 (2000), 1903–1904.

¨schweiler, R. Quantitative molecular ensem[168] Showalter, S. A., and Bru

1243

ble interpretation of nmr dipolar couplings without restraints. Journal of the

1244

American Chemical Society 129, 14 (2007), 4158–4159.

¨schweiler, R. [169] Showalter, S. A., Johnson, E., Rance, M., and Bru

M AN U

1245

SC

1242

1246

Toward quantitative interpretation of methyl side-chain dynamics from nmr by

1247

molecular dynamics simulations. Journal of the American Chemical Society 129,

1248

46 (2007), 14146–14147.

[170] Simmerling, C., Strockbine, B., and Roitberg, A. E. All-atom structure

1250

prediction and folding simulations of a stable protein. Journal of the American

1251

Chemical Society 124, 38 (2002), 11258–11259.

[171] Simmerling, C. L., and Elber, R. Computer determination of peptide con-

TE

1252

D

1249

formations in water: different roads to structure. Proceedings of the National

1254

Academy of Sciences 92, 8 (1995), 3190–3193.

EP

1253

[172] Simons, K. T., Kooperberg, C., Huang, E., and Baker, D. Assembly

1256

of protein tertiary structures from fragments with similar local sequences using

1257

simulated annealing and bayesian scoring functions1. Journal of molecular biology

1258

AC C

1255

268, 1 (1997), 209–225.

1259

[173] Siwy, C. M., Lockhart, C., and Klimov, D. K. Is the conformational

1260

ensemble of alzheimer???s aβ10-40 peptide force field dependent? PLoS compu-

1261

tational biology 13, 1 (2017), e1005314.

1262

[174] Smith, M. D., Rao, J. S., Segelken, E., and Cruz, L. Force-field induced

1263

bias in the structure of aβ21–30: A comparison of opls, amber, charmm, and

1264

gromos force fields. Journal of chemical information and modeling 55, 12 (2015),

1265

2587–2595.

47

ACCEPTED MANUSCRIPT

[175] Snow, C. D., Nguyen, H., Pande, V. S., and Gruebele, M. Absolute

1267

comparison of simulated and experimental protein-folding dynamics. nature 420,

1268

6911 (2002), 102.

RI PT

1266

1269

[176] Somavarapu, A. K., and Kepp, K. P. The dependence of amyloid-β dy-

1270

namics on protein force fields and water models. ChemPhysChem 16, 15 (2015),

1271

3278–3289.

1273

[177] Sorin, E. J., and Pande, V. S. Exploring the helix-coil transition via all-atom

SC

1272

equilibrium ensemble simulations. Biophysical journal 88, 4 (2005), 2472–2493. [178] Steinbach, P. J. Exploring peptide energy landscapes: a test of force fields

1275

and implicit solvent models. Proteins: Structure, Function, and Bioinformatics

1276

57, 4 (2004), 665–677.

1277

M AN U

1274

[179] Stirnemann, G., and Sterpone, F. Recovering protein thermal stability us-

1278

ing all-atom hamiltonian replica-exchange simulations in explicit solvent. Journal

1279

of chemical theory and computation 11, 12 (2015), 5573–5577. [180] Stone, J. E., Hardy, D. J., Ufimtsev, I. S., and Schulten, K. Gpu-

D

1280

accelerated molecular modeling coming of age. Journal of Molecular Graphics

1282

and Modelling 29, 2 (2010), 116–125.

1284

[181] Sugita, Y., and Okamoto, Y. Replica-exchange molecular dynamics method

EP

1283

TE

1281

for protein folding. Chemical physics letters 314, 1-2 (1999), 141–151. [182] Torrie, G. M., and Valleau, J. P. Nonphysical sampling distributions in

1286

monte carlo free-energy estimation: Umbrella sampling. Journal of Computa-

1287

AC C

1285

tional Physics 23, 2 (1977), 187–199.

1288

[183] Tran, L., and Ha-Duong, T. Exploring the alzheimer amyloid-β peptide

1289

conformational ensemble: A review of molecular dynamics approaches. Peptides

1290

69 (2015), 86–91.

1291

[184] Van Der Spoel, D., Lindahl, E., Hess, B., Groenhof, G., Mark, A. E.,

1292

and Berendsen, H. J. Gromacs: fast, flexible, and free. Journal of computa-

1293

tional chemistry 26, 16 (2005), 1701–1718.

48

ACCEPTED MANUSCRIPT

[185] van der Spoel, D., van Maaren, P. J., and Berendsen, H. J. A system-

1295

atic study of water models for molecular simulation: derivation of water models

1296

optimized for use with a reaction field. The Journal of chemical physics 108, 24

1297

(1998), 10220–10230.

1299 1300

1301

¨nenberger, [186] van Gunsteren, W. F., Billeter, S. R., Eising, A. A., Hu ¨ger, P., Mark, A. E., Scott, W. R., and Tironi, I. G. P. H., Kru Biomolecular simulation: the {GROMOS96} manual and user guide.

SC

1298

RI PT

1294

[187] Vanommeslaeghe, K., and MacKerell, A. Charmm additive and polarizable force fields for biophysics and computer-aided drug design. Biochimica et

1303

Biophysica Acta (BBA)-General Subjects 1850, 5 (2015), 861–871.

M AN U

1302

1304

[188] Vlieghe, P., Lisowski, V., Martinez, J., and Khrestchatisky, M. Syn-

1305

thetic therapeutic peptides: science and market. Drug discovery today 15, 1-2

1306

(2010), 40–56.

1309

and description. Reviews in Computational Chemistry (1999), 183–247.

D

1308

[189] Wallqvist, A., and Mountain, R. D. Molecular models of water: Derivation

[190] Wang, J., Cieplak, P., and Kollman, P. A. How well does a restrained

TE

1307

electrostatic potential (resp) model perform in calculating conformational ener-

1311

gies of organic and biological molecules? Journal of computational chemistry 21,

1312

12 (2000), 1049–1074.

EP

1310

[191] Wang, L.-P., Martinez, T. J., and Pande, V. S. Building force fields:

1314

an automatic, systematic, and reproducible approach. The journal of physical

1315

AC C

1313

chemistry letters 5, 11 (2014), 1885–1891.

1317

[192] Wang, L.-P., McKiernan, K. A., Gomes, J., Beauchamp, K. A., HeadGordon, T., Rice, J. E., Swope, W. C., Mart´ınez, T. J., and Pande,

1318

V. S. Building a more predictive protein force field: a systematic and repro-

1319

ducible route to amber-fb15. The Journal of Physical Chemistry B 121, 16 (2017),

1320

4023–4039.

1316

1321

[193] Wang, W., Ye, W., Jiang, C., Luo, R., and Chen, H.-F. New force field

1322

on modeling intrinsically disordered proteins. Chemical biology & drug design 84,

1323

3 (2014), 253–269. 49

ACCEPTED MANUSCRIPT

[194] Wei, C.-C., Ho, M.-H., Wang, W.-H., and Sun, Y.-C. Molecular dynamics

1325

simulation of folding of a short helical peptide with many charged residues. The

1326

Journal of Physical Chemistry B 109, 42 (2005), 19980–19986.

RI PT

1324

[195] Weiner, S. J., Kollman, P. A., Case, D. A., Singh, U. C., Ghio, C.,

1328

Alagona, G., Profeta, S., and Weiner, P. A new force field for molecular

1329

mechanical simulation of nucleic acids and proteins. Journal of the American

1330

Chemical Society 106, 3 (1984), 765–784.

SC

1327

[196] Weiner, S. J., Kollman, P. A., Nguyen, D. T., and Case, D. A. An

1332

all atom force field for simulations of proteins and nucleic acids. Journal of

1333

computational chemistry 7, 2 (1986), 230–252.

1334

M AN U

1331

[197] Wickstrom, L., Okur, A., and Simmerling, C. Evaluating the perfor-

1335

mance of the ff99sb force field based on nmr scalar coupling data. Biophysical

1336

journal 97, 3 (2009), 853–856.

1337

[198] Wickstrom, L., Okur, A., Song, K., Hornak, V., Raleigh, D. P., and Simmerling, C. L. The unfolded state of the villin headpiece helical subdo-

1339

main: computational studies of the role of locally stabilized structure. Journal

1340

of molecular biology 360, 5 (2006), 1094–1107.

TE

1341

D

1338

[199] Williams, S., Causgrove, T. P., Gilmanshin, R., Fang, K. S., Callender, R. H., Woodruff, W. H., and Dyer, R. B. Fast events in protein

1343

folding: helix melting and formation in a small peptide. Biochemistry 35, 3

1344

(1996), 691–697.

AC C

EP

1342

1345

[200] Wright, P. E., Dyson, H. J., and Lerner, R. A. Conformation of peptide

1346

fragments of proteins in aqueous solution: implications for initiation of protein

1347

folding. Biochemistry 27, 19 (1988), 7167–7175.

1348

[201] Yang, L., Tan, C.-h., Hsieh, M.-J., Wang, J., Duan, Y., Cieplak, P.,

1349

Caldwell, J., Kollman, P. A., and Luo, R. New-generation amber united-

1350

atom force field. The journal of physical chemistry B 110, 26 (2006), 13166–13176.

1351

[202] Yang, W. Y., and Gruebele, M. Folding at the speed limit. Nature 423,

1352

6936 (2003), 193.

50

ACCEPTED MANUSCRIPT

[203] Ye, W., Ji, D., Wang, W., Luo, R., and Chen, H.-F. Test and evaluation

1354

of ff99idps force field for intrinsically disordered proteins. Journal of chemical

1355

information and modeling 55, 5 (2015), 1021–1029.

RI PT

1353

1356

[204] Yoda, T., Sugita, Y., and Okamoto, Y. Comparisons of force fields for

1357

proteins by generalized-ensemble simulations. Chemical Physics Letters 386, 4-6

1358

(2004), 460–467.

[205] Zagrovic, B., Snow, C. D., Shirts, M. R., and Pande, V. S. Simulation

SC

1359

of folding of a small alpha-helical protein in atomistic detail using worldwide-

1361

distributed computing. Journal of molecular biology 323, 5 (2002), 927–937.

1362 1363

1364

M AN U

1360

[206] Zhang, C., and Ma, J. Enhanced sampling and applications in protein folding in explicit solvent. The Journal of chemical physics 132, 24 (2010), 244101. [207] Zhang, T., Nguyen, P. H., Nasica-Labouze, J., Mu, Y., and Der-

1365

reumaux, P. Folding atomistic proteins in explicit solvent using simulated

1366

tempering. The Journal of Physical Chemistry B 119, 23 (2015), 6941–6951.

1369

D

1368

[208] Zhao, H., and Caflisch, A. Molecular dynamics in drug design. European journal of medicinal chemistry 91 (2015), 4–14.

TE

1367

[209] Zhou, P., Wang, C., Ren, Y., Yang, C., and Tian, F. Computational peptidology: a new and promising approach to therapeutic peptide design. Current

1371

medicinal chemistry 20, 15 (2013), 1985–1996.

EP

1370

[210] Zhu, X., Lopes, P. E., and MacKerell Jr, A. D. Recent developments

1373

and applications of the charmm force fields. Wiley Interdisciplinary Reviews:

1374

AC C

1372

Computational Molecular Science 2, 1 (2012), 167–185.

51

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT