Ligand diffusion in proteins via enhanced sampling in molecular dynamics

Ligand diffusion in proteins via enhanced sampling in molecular dynamics

Accepted Manuscript Ligand diffusion in proteins via enhanced sampling in molecular dynamics J. Rydzewski, W. Nowak PII: DOI: Reference: S1571-0645...

2MB Sizes 0 Downloads 43 Views

Accepted Manuscript Ligand diffusion in proteins via enhanced sampling in molecular dynamics

J. Rydzewski, W. Nowak

PII: DOI: Reference:

S1571-0645(17)30054-4 http://dx.doi.org/10.1016/j.plrev.2017.03.003 PLREV 860

To appear in:

Physics of Life Reviews

Received date: Revised date: Accepted date:

31 March 2016 28 October 2016 28 March 2017

Please cite this article in press as: Rydzewski J, Nowak W. Ligand diffusion in proteins via enhanced sampling in molecular dynamics. Phys Life Rev (2017), http://dx.doi.org/10.1016/j.plrev.2017.03.003

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Highlights • The process of ligand recognition by proteins is fundamental in the physics of living organisms. • The complex topology of protein channels and the transient nature of the ligand passage pose difficulties in the modeling of the ligand entry/escape pathways by canonical molecular dynamics simulations. • The need for identifying the ligand egress pathways and understanding how ligands migrate through protein tunnels has spurred the development of several methodological approaches.

Ligand diffusion in proteins via enhanced sampling in molecular dynamics J. Rydzewski1, ∗ and W. Nowak1, † 1 Institute

of Physics, Faculty of Physics,

Astronomy and Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100 Toru´ n, Poland (Dated: March 31, 2017) Computational simulations in biophysics describe the dynamics and functions of biological macromolecules at the atomic level. Among motions particularly important for life are the transport processes in heterogeneous media. The process of ligand diffusion inside proteins is an example of a complex rare event that can be modeled using molecular dynamics simulations. The study of physical interactions between a ligand and its biological target is of paramount importance for the design of novel drugs and enzymes. Unfortunately, the process of ligand diffusion is difficult to study experimentally. The need for identifying the ligand egress pathways and understanding how ligands migrate through protein tunnels has spurred the development of several methodological approaches to this problem. The complex topology of protein channels and the transient nature of the ligand passage pose difficulties in the modeling of the ligand entry/escape pathways by canonical molecular dynamics simulations. In this review, we report a methodology involving a reconstruction of the ligand diffusion reaction coordinates and the free-energy profiles along these reaction coordinates using enhanced sampling of conformational space. We illustrate the above methods on several ligand-protein systems, including cytochromes and G-protein-coupled receptors. The methods are general and may be adopted to other transport processes in living matter.

2 I.

INTRODUCTION

Migration and transport processes are part of the basic characteristics of life. Active and passive transport are important at the molecular level. The complexity of these processes ranges from the straightforward passive diffusion of molecular oxygen from the pulmonary alveoli into the blood’s erythrocytes to the more complex transfer of genetic information via mRNA from DNA genes into ribosomes. Large specialized molecules, such as kinesin motors, are engaged in the transport of cargo,1 but often even small molecules, such as low-molecular-weight neurotransmitters which allow signal transduction in the brain2,3 are critical to organisms. The process of ligand recognition by proteins is undoubtedly fundamental in the physics of living organisms.4,5 In particular, it encompasses passing a biological signal via ligand binding to the active site of the protein.6,7 Nonetheless, before association between the protein and ligand occurs, the ligand usually enters the protein, and this involves diffusion on the protein surface8 and complex migration through channels, tunnels and cavities.9 This transport process is not trivial to describe since the protein atoms are in constant thermal motion and furthermore the gates of proteins are opening and closing. Numerous tunnels are transient and only certain fluctuations or rearrangements in the protein structure make the further travel of ligands possible. The ligand diffusion in enzymes is particularly interesting since the exit paths of enzymatic products are often different than the entry paths of substrates, as with protein synthesis reactions catalyzed by ribosomes. The location and topology of entrance/exit paths determine the functionality of enzymes and receptors. It is possible that numerous such paths arose over the course of evolution. In some proteins, such as HIV proteases, the binding site is relatively exposed on the surface, whereas in others, such as the cytochrome P450 family, it is buried deeply within the heterogeneous protein matrix. In the latter instance, the ingress and egress routes usually involve migration through the protein channels or cavities. Moreover, cytochrome P450 enzymes contain numerous channels speculated to exhibit multiple diffusion routes (each specific to a particular protein function and selective to the particular substrate), yet the physical principles underlying the channel preference are unknown.10 It is known that the response to an applied drug is not only associated with the topology of egress pathways, but also related to their binding kinetics and thermodynamics.11,12 Namely,

3 after the ligand reaches the docking pocket, it resides in its destination for a period of time, which is of pivotal importance in all regulatory processes in biology.13,14 The residence time is dependent on structural features of both the ligand and protein, therefore such aspects as substrate specificity, orientation and tunnel geometry cannot be neglected in quantifying the efficacy of the ligand-protein recognition.15,16 The process of ligand diffusion is challenging to study experimentally in the context of drugs. The methodologies used currently to quantify ligand diffusion, e.g., time-resolved crystallography,17 spectroscopy18,19 and xenon binding,20,21 focus primarily on gaseous species, providing indirect evidence for the migration of larger ligands and little knowledge about their binding stability, lifetime and specificity.22,23 The complexity of the ligand diffusion description increases considerably by including the dynamics of ligand migration through protein tunnels.24–28 In other words, ligand diffusion in the protein is characterized by multiple intermediate binding events connecting the docking site and the state in which the ligand and protein are dissociated. Because of that, finding the actual ligand exit routes of the protein is difficult to model computationally. We need to stress that a direct application of canonical molecular dynamics (MD) calculations to the problem of ligand diffusion is highly impractical. This computational issue is strictly related to the finite time of calculations. Particularly, equilibrium MD simulations suffer from infrequent ligand expulsions, since ligand diffusion is a rare event. In theory, the ligand diffusion paths can be reconstructed by running very long equilibrium MD simulations, in which the motion of a biological system under the influence of a specified force field is simulated by following molecular configurations in time, according to the equations of motion. The time evolution of 3N -dimensional atomic configuration x ≡ (xi |i = 1, . . . , 3N ) is given by the following first-order differential equation:

F (x) = −∇V (x)

(1)

where force F is the negative gradient of the potential energy V . Therefore, the molecular system evolves according to a dynamics leading to an equilibrium distribution P (x): P (x) = 

e−βV (x) d x e−βV (x)

(2)

4 where β = (kB T )−1 , kB is Boltzmann’s constant and T is temperature. Qualitatively, the sampled event is rare if P (x) is high in the disconnected regions of configuration space and separated by regions in which P (x) is marginal. In other words, transitions between these regions are rare (Fig. 1). Transitions through rare regions are theoretically possible if the system displays ergodicity.29 This means that over an infinite period of time, the time spent by the system in some region of configuration space is proportional to the volume of this region, thus, all accessible intermediate states are equiprobable in the infinite time limit. In practice, however, the length of the simulations may not allow one to sufficiently probe the configuration space. In order to facilitate the crossing of free-energy barriers by ligands and in turn to increase the probability of a rare event, numerous enhanced MD methods were applied, i.e., locally enhanced sampling (LES),30 steered MD (SMD),31 random acceleration MD32 (RAMD) and metadynamics.33,34 In this review, we focus on describing enhanced MD methods capable of sampling ligand diffusion pathways (i.e., reaction coordinate), and computing of thermodynamic and kinetic observables in collective variable (CV) space along a previously picked reaction coordinate (RC). These methodologies are illustrated on several ligand-protein systems, including Gprotein-coupled receptors and cytochromes.

II.

REACTION COORDINATES

Throughout this review we use a notion of RC which models the transition between the bound and unbound states. Therefore in this instance, the diffusion RC consists of the intermediate structures sampled to make a connection between these states. Molecular dynamics trajectories x(t) [defined parametrically for 0 ≤ t] are simulated on RCs. The search for the optimal ligand diffusion pathway is a very complex problem in proteins with irregular tunnels that connect ligand docking sites with solvent. There are several computational methods capable of finding such transport trajectories.

A.

Steered molecular dynamics

The simplest method to compute the ligand diffusion RCs is SMD.31 SMD in its original form is limited in modeling ligand egress paths, because the expulsion force is of the following

5

A

intermediates

B

FIG. 1. Schematic drawing of the protein-ligand dissociation from the bound state A to the dissociated states B, C or D through ligand binding intermediates (marked by •), where energetic or entropic barriers usually must be overcome. The optimum trajectory is the one which minimizes these barriers.

form:

1 f = − k∇ [vt − (r − r0 ) · n]2 2

(3)

where k is the spring constant, v is the constant pulling speed of a dummy atom connected with the ligand by the spring, r and r0 are the current and initial positions of the pulled ligand atom, and finally, n is the unit vector in the pulling direction of the dummy atom. The direction of the expulsion force acting on the ligand is kept constant for the whole simulation, so each trajectory x(t) is bound to the linear RC (Fig. 2). Such a coordinate must be assumed a priori, which severely limits the applicability of SMD in the sampling of the ligand diffusion pathways. As of now, several extensions of SMD have been proposed, expanding to some extent the applicability of acquiring linear RCs. For example, an interesting method for finding the optimal direction to pull the ligand from the binding pocket is called minimal steric hindrance (MSH).35 In MSH, a scoring function to find the optimal direction of pulling is defined as the steric hindrance which occurs during the ligand movement. It is important to notice that the optimal direction of pulling is calculated before running any SMD simulations, and once determined, is restrained to the linear RC. MSH assumes that the optimal pulling direction minimizes the steric hindrance of the receptor to the movement of the pulled ligand.35 The scoring function S for the given pulling direction is defined as the total weighted hindrance acting on each atom of the ligand:

6

FIG. 2. Steered molecular dynamics trajectory constrained to the linear RC. (a) Unbinding of the 3-quinuclidinyl benzilate ligand from the docking pocket of the M2 muscarinic receptor (b1 ) and from the subsequent part of the receptor to solvent (b2 ), and (b) its expulsion force depicting the ligand leaving the docking site at the first force maximum (b1 ) and migrating farther to solvent (b2 ).

S=

 Hδri √ ri i=1

(4)

where Hδri is the hindrance caused by the receptor due to the movement of the ith atom of the ligand in the pulling direction, and ri is the distance from the ith atom of the ligand in the direction of pulling.35 Another method that allows the inclusion of ligand-induced protein flexibility into protein tunnel prediction was proposed in an article by Kingsley and Lill.36 The authors implemented a method, called IterTunnel, which uses a geometric tunnel prediction method37 to first identify the ligand expulsion pathways leading out of the binding site of a static protein. Then, in order to access energetic information, the ligand is pulled along the predicted tunnels as RC via SMD. Once the ligand leaves the protein matrix, collected expulsion force is used to reconstruct free energy by the umbrella sampling method.38 The IterTunnel has been applied to locate the ligand expulsion pathways in cytochrome CYP2B6 and the main ligand-induced conformational changes.36 SMD has been used to probe the dissociation of the streptavidin-biotin complex,39 the

7 expulsion of retinoic acid from its receptor,31 the ingress/egress pathways of hurpezine A40 and E202041 from the binding gorge of acetylcholinesterase and the expulsion pathways of nitrile hydratase.42 In the context of ligand diffusion routes there is an interesting study by Martinez et al., in which the authors provided evidence in support of a lack of preference for any particular pathway in thyroid hormone receptors.43 This alternative ligand migration mechanism indicates that ligands, unaccompanied by the large-amplitude protein fluctuations, can simply diffuse through the protein surface to reach the binding pocket.

B.

Random acceleration molecular dynamics

In order to overcome the problem with the complex topology of channels and tunnels, and in turn to obtain nonlinear diffusion RCs, SMD was adapted to the geometry of the protein cavities by L¨ udemann et al. in RAMD.32 In this method, the ligand is pulled along a random RC; the direction of force is chosen stochastically. Despite the fact that RAMD is a random-walk technique, it has essential advantages over SMD: (1) it speeds up the timescale of dissociation kinetics by orders of magnitude and (2) it allows for finding alternative ligand diffusion routes. No initial prediction of the exit pathway is required. The RAMD algorithm follows a specific protocol: 1. The direction n (assuming n = 1) of the expulsion force acting on the center of mass of the ligand is applied randomly in such a manner that f = f n, where f is the constant magnitude of the randomly chosen force. 2. The force is maintained for a predefined number of steps, m. The ligand is expected to move with a velocity exceeding the threshold velocity given by v =

τ , mdt

where d t is

the time step and τ is the specified minimum distance before the direction is changed. Unless this condition is fulfilled (i.e., because of a steric collision), n is reassigned randomly. Selecting all these parameters for a specific ligand-protein complex is far from trivial. Vashisth and Abrams considered it a drawback of RAMD, for instance, f should not be too small, since the percentage of successfully sampled diffusion trajectories would be relatively low.44

8 Despite its popularity, RAMD has the following drawback: a trade-off between the artificial perturbation of the protein structure caused by ligand pulling and the low probability of ligand expulsions. Moreover, it is worth noting that RAMD introduces excessive artifacting in the sampling of the protein conformations,45 which leads to the rendering of non-physical tunnels.46 Because of that, numerous trajectories usually have to be computed to sample a diffusion RC.32,44,46 Nonetheless, it was recently shown that by introducing a non-Markovian variant of RAMD, sampling of diffusion RCs can be improved.45 Apart from the above computational issues, RAMD is still widely used in the probing of protein interiors.47–49 There are many studies in which RAMD has been proven useful to find a substrate’s ingress/egress routes, mainly in P450 cytochromes,21,32,50 which are a family of heme monooxygenases with enormous potential applications in the development of new drugs. For instance, it was identified that a majority of P450 cytochromes, including CYP101, CYP102 and CYP107A1, are characterized by one common ligand exit route (referred to as pw2a in the nomenclature taken from the Wade group papers).50 Sgrignani and Magistrato studied cytochrome P450 in a more realistic environment. Anchoring the human aromatase enzyme on the membrane helped to show that the membrane has a direct effect on the ligand diffusion RCs of the reactants determined by RAMD simulations.51 It was suggested that the local membrane environment stabilizes human aromatase and, hence, alters the substrate diffusion pathways when comparing to RAMD simulations without the membrane. RAMD was also applied to recover the inhibitor release pathways in histone deacetylases.49

C.

Locally enhanced sampling

Elber and Karplus proposed the LES method for simultaneous sampling of multiple conformations of carbon monoxide through myoglobin.52 The formal analysis of this method may be found in the paper by Zheng and Zheng.53 In LES, the sampling of the conformational space of a small subsystem (i.e., a ligand) is enhanced by cloning it into copies. The protein responds to the mean force from all the ligand copies. The copies of the ligand are invisible to one another, while each copy feels the force from the protein. During the simulation, the copies explore different regions of the protein’s conformational space which results in increased statistical sampling. The potential energy barriers in LES are reduced

9 according to the number of copies. Therefore, the conformational changes related to the system are more frequently observed in comparison with standard MD. Furthermore, the global potential energy minimum of the LES and MD systems occurs in the same point of the protein’s configuration space. This means that optimization of the LES system provides direct information about the original system.54 Furthermore, LES provides information about the copies’ positions in the protein matrix which can later be used to calculate the optimal diffusion pathways of the ligand. LES was applied to study the ligand diffusion paths,55–60 free-energy profiles61,62 and gaspacking effects in proteins.63 LES can also be used to initiate the sampling of ligand positions. For instance, LES was applied to probe the interior of myoglobin and the positions of copies, which approximated an egress route, were further optimized by using the self-penalty walk procedure.30 In a similar fashion, LES was used to generate the ligand distributions inside proteins which were then further optimized by memetic algorithms to optimal ligand pathways.45 One should notice that the current examples of O2 migration and escape in myoglobin64 and T-state deoxyhemoglobin65 involve using a variant of LES - temperaturecontrolled locally enhanced sampling (TLES).63

D.

Memetic algorithms

Suboptimal diffusion pathways can be reconstructed by minimizing a functional which describes the physical interactions between the protein and ligand. We recently proposed this methodology to implement a memetic algorithm (MA), which minimizes (as a functional) the interaction free energy Λ between the ligand and the protein on-the-fly during MD simulations.45 Namely, in MA, the position of the ligand is optimized with respect to Λ, starting from the initial docking position of the ligand, through subsequent ligand intermediates and ending at a state which corresponds to the dissociated ligand-protein complex (Fig. 1). The ligand is constrained to the optimal path by the SMD scheme. In MA, multiple optimal pulling directions are found to approximate a curvilinear RC. MA uses a set of ligand and protein positions x ≡ (xi |i = 0, . . . , K) consisting of the initial docking position of the ligand x0 and ligand intermediates x−0 , which altogether reconstruct the optimal diffusion pathway. The optimal direction of the pulling from xi−1 toward the next conformation xi is found by an iterative process of mimicking the evolution

10 of ith ligand decoys d ≡ (dk |k = 0, . . . , N ). The detailed algorithm for finding the optimal (in terms of the interaction free energy Λ) position of ith ligand-protein conformation xi is described below. 1. Initially, the N -size population of ligand decoys d is sampled inside a sphere of radius rs , positioned at the center of the ligand from xi−1 . 2. Λ of the protein and each decoy ligand is calculated using the Hammet linear freeenergy relation:66–68

Λ=



hij e



2 rij 2σ 2



(5)

i
where i and j indicate atom indices of the ligand and the protein, respectively. The height of the Gaussian is defined as hij = si vj + sj vi , where s and v are the partial volume and the solvation coefficient of the given atoms.69 The width of the Gaussian is given by σ and the distance between the atoms is denoted by r. Moreover, we added to Λ the van der Waals interactions and electrostatic interactions between the ligand and the protein, γ. 3. The ligand decoys d with the lowest Λ have a higher probability of being promoted to the population. Selection is performed via a roulette scheme. 4. Mating (i.e., swapping the corresponding components of the decoy position) and perturbation (i.e., adding a random number from the Cauchy distribution to a component of the decoy position) are performed on randomly chosen ligand decoys.45 5. An additional local search is applied to all ligand decoys to decrease Λ.45 6. Steps (2-5) are repeated until the ligand reaches solvent. Next, the ith optimal ligand position on the lowest interaction free-energy path is defined as xi = min dm . Then, SMD is used to pull the ligand from xi−1 in the direction of xi . The optimal diffusion RC is reconstructed after all the subsequent ligand intermediates x−0 are found. The final position of complex xK must correspond to the state in which the ligand and protein are dissociated. Thus, the ligand diffusion RC consists of all the optimal

11 ligand-protein conformations x found by MA. One should note that the functional Λ can be selected differently.

FIG. 3. Example of camphor diffusion pathways found by MA in CYP101: pw1, pw2 and pw3. Detailed analysis of occurring pathways can be found in Refs.45,70

MA was applied to reconstruct the ligand egress pathways in protein systems with increasingly complex tunnels: M2 muscarinic receptor, nitrile hydratase and CYP10145 (Fig. 3). Two 3-quinuclidinyl benzilate pathways emerged in M2 muscarinic receptor: pw1 and pw2. The shapes of trajectories clustered into pw1 resembled those studied in the paper by Kruse et al.71 wherein the authors studied 3-quinuclidinyl benzilate diffusion pathways in the M3 muscarinic G-protein-coupled receptor which has a similar GPCR structure. Long equilibrium MD simulations (nearly 25μs) were used to estimate the pathways for the 3quinuclidinyl benzilate association with the M3 receptor. MA recovered the gating mechanism through pw2 (a hydrophobic gate consisting of Leu65, Leu114, and Ile392), which was

12 consistent with the hypothesis that such a hydrophobic gate is connected with an activation of G-protein-coupled receptors.72 Also, the camphor expulsion pathways from CYP101 (Fig. 3) were in agreement with previous experimental results21,73,74 and RAMD calculations.32 Recently, another method based on genetic algorithms has been proposed to navigate the ligand pulling force.75 This variant was used to simulate dissociation of two ligand-protein families, HIV-1 protease (L63P) and tyrosine-protein phosphatase. The algorithm of Gu et al. chooses minimized expulsion force instead of interaction free energy.45 Results show that this method follows those dissociation pathways with lower expulsion forces. Further analysis indicated that expulsion forces of the complexes in the same protein family correlate with their binding free energies.

III.

COLLECTIVE VARIABLES

Reduced probability distribution is a useful concept in the analysis of the ligand diffusion process and, more generally, any rare event in statistical mechanics.76,77 Namely, instead of monitoring the full configuration space, only a set of CVs is observed. CV refers to any multidimensional function θ of 3N -dimensional atomic configuration x ≡ (xi |i = 1, . . . , 3N ). The functions θ1 (x), θ2 (x), . . . , θM (x) map configuration x onto an M -dimensional CV space z ≡ (zj |j = 1, . . . , M ), where M  3N . Unfortunately, the probability distribution and the free energy corresponding to the selected CVs give useful indications on the nature of the studied process if and only if the CVs are properly chosen. There is no general scheme which indicates how to make such a selection. In particular, we are interested in carefully selecting CVs on the previously reconstructed ligand diffusion RC. CVs are essential for the effective simulation and analysis of the ligand diffusion process. There are some CVs that are widely-used in studying the binding/unbinding of two molecules in SMD. For instance, the center-of-mass distance between those two molecules or the number of contacts is particularly useful for monitoring transitions. Cartesian coordinates of mass centers of atom groups are also used as CVs. In a recent paper, it was shown that the potential energy can also be used effectively as a CV in studying the binding of RNA molecules and charged peptides with atomistic, explicit-solvent molecular dynamics.78 According to Abrams and Bussi, accounting for the entropy of the process is necessary for the successful use of CVs.79

13 Extending our suggestions for choosing CVs in SMD, we list CVs which can be applied preferably on nonlinear diffusion RCs (from e.g., MA, LES). For instance, the interaction free energy Λ or any free-energy term that can be derived by the decomposition of Λ (Eq. 5), is suitable for the monitoring of the ligand diffusion RC. Another example of CV is work  performed during ligand diffusion, W = d x · f , where f is the expulsion force applied to the ligand on the diffusion trajectory x(t).45 Additionally, CVs may be built on a set of frames from the ligand diffusion path. So-called path CVs (PCVs) can be used to construct a continuous interpolation of the diffusion pathway. Thus, given the trajectory that connects two metastable states, PCVs represent the progression along the trajectory and the distance from it. The trajectory can be traced by introducing the following two variables:80–83 1

d tt e−λθ(x)−θ(x(t))

0

d t e−λθ(x)−θ(x(t))2

s(x) = lim  1 λ→∞

2

0

(6)

and

z(x) = − lim λ λ→∞

−1



1

d t e−λθ(x)−θ(x(t))

2

(7)

0

where  •  is a metric that defines the distance between two configurations and λ is proportional to the average mean-square distance between neighboring structures. The z variable is particularly useful when dealing with multiple trajectories along the ligand exit pathway or even different exit pathways. PCV reduces high-dimensional configuration space to one- or two-dimensional space84 and allows for enhanced sampling and free-energy reconstruction along the diffusion pathways (Fig. 4). The basic motivation behind the PCV concept is that the sampled transition is very rarely accessible in limited time scales, because it is associated with the crossing of a free-energy bottleneck. Frequently, observing a particular phenomenon of interest is found to be impossible because short-time events corrupt the quantitative observation of events occurring on longer timescales.85 Therefore, one can use several algorithms to reduce the dimensionality of CV space, thus prioritizing information indispensable in observing this process. The representation of the process is simplified by providing a map of reduced dimensionality that spans the configuration space of ligand diffusion. By preserving a metric, dimensionality reduction algorithms construct a representation of a low-dimensional configuration space from a set of

14

A

bound state

B

F

dissociated state

z

s

FIG. 4. Ligand diffusion trajectories that start in the bound state A and end in the dissociated state B are restrained by the free-energy landscape F (s, z). The trajectories (in black) are distinguished by the s value which describes the progress, and the distance z from the trajectory.

data points distributed in a high-dimensional CV configuration space. These methods rely on the assumption that physically relevant molecular configurations exist in low-dimensional space that is embedded in high-dimensional CV space.86 An example of such a method is T-distributed stochastic neighbor embedding (t-SNE).87 This machine-learning technique is particularly useful in embedding high-dimensional data into a configuration space of 2 dimensions.87 Mathematically, t-SNE reduces the dimensionality of space by representing similar objects through nearby points and dissimilar objects by distant points in a low-dimensional space. First, a probability distribution p over pairs of objects in high-dimensional CV space z is constructed, ensuring that similar and dissimilar objects have a high probability and low probability, respectively. The probability pij of the CV objects zi and zj is defined by:

pij =

pi|j + pj|i 2N

(8)

and

pi|j =

e

−zi −zj 2 2σ 2 i

 k=i

e

−zi −zk 2 2σ 2 i

(9)

where N is the number of objects in the CV space and σ is the width of Gaussian. Next, the probability distribution q is defined in the low-dimensional space. The probability of the low-dimensional space objects yi and yj is the following:

15

−1

(1 + yi − yj 2 ) qij =  2 −1 k=i (1 + yi − yk  )

(10)

Then, t-SNE minimizes the non-symmetric Kullback-Leibler divergence ∇KL (p|q) between p and q with respect to the locations of the points in CV space:

∇KL (p|q) =

 i=j

 pij log

pij qij

 (11)

The t-SNE method has been used to analyze high-dimensional biological data.88–91 In the ligand diffusion problem, t-SNE was applied to reduce the dimensionality of the dissociation of the CYP101-camphor system.70 Dimensionality reduction of the ligand diffusion pathways incorporates the configuration space of both the ligand and protein, because their flexibility often impacts diffusion routes via, for instance, gating mechanisms.70 Another recently developed method, commonly used to provide low-dimensional CV space is called the sketch-map. It embeds coordinates from the high-dimensional space of molecular dynamics simulations by preserving the middle-range root-mean-square distances between conformations.92 It was shown by data collected from MD simulations that the most interesting events occur in these intermediate distances. Because of that, CVs used to represent high-dimensional space should consist of root-mean-square distances in that range.92 The sketch-map method is a variant of multidimensional scaling, in which the distances in both high- and low-dimensional spaces are transformed by a sigmoidal function.93 The sketch-map was applied to MD simulations data, especially in locating the binding poses in ligand-protein complexes94 and probing the conformations of small molecules,85,92,93 and can easily be used in enhanced MD simulations to simplify the CV space along diffusion RCs. After CVs are carefully chosen along the diffusion RCs, it is possible to obtain thermodynamic observables (i.e., free energy) by either guiding the ligand diffusion along CVs or biasing CVs to sample the ligand transition from the docking pocket to solvent. Some methods employing these variants are described in the following section.

16 IV.

FREE-ENERGY RECONSTRUCTION

Here, we assume that the ligand diffusion RC was earlier reconstructed (cf. II. Reaction coordinates) and CVs were picked according to the given instructions (cf. III. Collective variables). Given the previous definition of CVs at equilibrium, the probability of observing the system at z in CV configuration space is: P (z) = δ[θ(x) − z]

(12)

The Dirac delta function δ[•] is used to select only those configurations of θ(x) which are infinitely close to z. Here, • denotes an average over the equilibrium probability distribution of x. P (z) leads to free energy:33,34 F (z) = −β −1 ln P (z)

(13)

where β is the inverse temperature. If the CV space has been chosen wisely then two well-separated regions of configuration space will define the two most important allosteric states of a given complex (i.e., bound and occluded states of the ligand-protein complex), one can perform free-energy calculations to estimate the change in free energy required to make the conformational transition.

A.

Jarzynski equality

The equilibrium free-energy difference ΔF can be estimated from work W performed during a non-equilibrium process using the Jarzynski equality:95 

e−βW = e−βΔF

(14)

where angular brackets denote an ensemble average over realizations of the nonequilibrium process. This result has been derived in various ways,96–99 and has been confirmed experimentally.100 For a sufficiently small variation of work, the central limit theorem states that the distribution of work is approximately Gaussian. Then, Eq. 14 takes the form:95 ΔF = W −

β 2 σ 2 W

(15)

17 2 is the variance of the work distribution. where σW

The Jarzynski equality is particularly useful in the recovering of free-energy profiles during enhanced simulations of ligand diffusion like SMD, RAMD and MA. In such instances, the pulled ligand is linked to a dummy atom by a spring with stiffness k. Then, the dummy atom is pulled with the constant velocity v along CVs. The distance that the dummy atom travels from the beginning of the simulation is given by d. Stiffness k should be large enough to ensure the stiff spring approximation.101–103 Thus, the potential of mean force Φ(λ) along the pathway is given by: 1 Φ(λ) = ΔF (λ) + 2k



dΔF dλ

2

1 d2 ΔF − +O 2βk dλ2



1 k2

 (16)

When the stiff spring approximation holds, the potential of mean force is given by the first term of the expansion Φ(λ) ≈ ΔF .101,102 Moreover, this assumption implies that the nonequilibrium work from SMD-like simulations is the following: 

t

W = −kv

d t(θ(x) − d).

(17)

0

Relations between the velocity of pulling and the accuracy of the free-energy difference estimates are discussed in a paper by Perivisic and Lu.104 The Jarzynski equality has been successfully applied in estimating ΔF when investigating the unbinding of acetylcholine from the ligand-binding domain of human alpha7 nicotinic acetylcholine receptor along predetermined pathways,105 toxins binding to potassium channels,106 ligand migration in truncated hemoglobin107 and the selectivity of estrogen receptors.108

B.

Umbrella sampling

Another sampling method applicable in ligand recognition studies is umbrella sampling, introduced by Torrie and Valleau.38 To ensure efficient equilibrium sampling along previously found diffusion pathways, an additional biasing potential is appended to the simulated system’s Hamiltonian. Typically, the biasing potential is chosen as a simple harmonic function dependent on the parameter χ and a reference point zi in CV space:

18

χ (θ(x) − zi )2 (18) 2 In umbrella sampling, a simulation is partitioned into windows of overlapping umbrella δVi (x) =

potentials that can be calculated simultaneously (Fig. 5). There have been several reviews that cover the mathematical foundations of umbrella sampling.34,109,110 Umbrella sampling was used to reconstruct the free-energy profiles on the ligand egress or ingress pathways,111 for instance, the entrance/exit pathway for a z-pro-prolinal inhibitor in prolyl oligopeptidase,112 a ligand-induced transition in adenulate kinase,113 a free-energy profile along KcsA potassium channel114 and ligand diffusion through leghemoglobin.115

FIG. 5. Schematic example of umbrella sampling simulations. The trajectory (blue) evolves in collective variable space (CV) and overcomes the free-energy barrier which results in a transition between the bound and dissociated ligand-protein system. The trajectory is later used to localize umbrella potentials δVi (x) at the reference points zi in CV space, to recover the equilibrium probability distribution.

C.

Metadynamics

Metadynamics was developed by Laio and Parrinello in 2002 to improve the sampling of rare conformational events and to reconstruct the free-energy landscape of molecular systems studied in CV space.116 This method has its roots in conformational flooding and adaptive biasing force methods.117 Metadynamics is also related to molecular dynamics simulations in the multicanonical ensemble, like the Wang-Landau sampling.118

19 The time-dependent bias potential V (θ(x), t), which is composed of a sum of Gaussians deposited in the visited CV space used in metadynamics, has an effect on the dynamics of configurational variables θ(x). It is given by the following equation:

V (θ(x), t) =



1



h(t ) e− 2w2 θ(x(t))−θ(x(t ))

2

(19)

t
where h(t ) is the height of Gaussians. Here, the previous time points t = τ, 2τ, . . . are defined by the time interval τ between successive Gaussians deposits, and w is the Gaussian width. It was proven analytically that the bias potential V (θ(x), t) provides an optimal estimation of the underlying free-energy landscape:33,34,119

lim V (θ(x), t) = −F (z) + C

t→∞

(20)

where C is an irrelevant additive constant. What has been found particularly helpful is well-tempered metadynamics,82 in which V (θ(x), t), the estimation of the free-energy profile at time t is given using F (z, t) = − T +ΔT ΔT where T is the temperature of the simulation. The difference between the temperature of the CV and the temperature of the simulation is denoted by ΔT . This variant of metadynamics ensures that the bias converges more smoothly. For a comprehensive review of estimating errors and the implementation of metadynamics, we refer readers to a work by Laio and Gervasio.120 Several variants of metadynamics have been applied to sample ligand diffusion in various systems,121 yielding free-energy estimates compatible with experimental results. For instance, it was used in the study of ligand-induced modulation in G-protein-coupled receptors,122 the prediction on the binding free energies along the dissociation path of p38 inhibitors123 and CDK2 inhibitors,124 and to discover a potential tunnel relevant for ligand escape in the L99A cavity mutant of T4 lysozyme.125 The expulsion pathways of galactose from a sodium-galactose transporter were also calculated with the use of metadynamics.126 Very recently, metadynamics has been combined with a geometric method in the probing of a protein matrix to reconstruct the ligand ingress/egress in CYP3A4.127

20 V.

PROGRAMS AND PACKAGES

Finally, we list programs and packages that could be useful in modeling ligand diffusion by means of enhanced MD simulations and further analysis. A wide variety of MD codes have already implemented the capability of reconstructing ligand diffusion RCs. Especially methods like SMD and RAMD are accessible within the NAMD,128 Gromacs129 and Amber130 distributions, either through built-in mechanisms or external plugins. On the other hand, codes for algorithms like MA and variants of SMD fitted to solve nonlinear RCs are not published yet, despite their comprehensive theoretical analysis in published articles and reviews. We expect that the tendency toward bundling such algorithms into accessible MD code distributions will become more popular in the near future, facing the expansion of the physical analysis of rare events. Once ligand diffusion RCs are computed, free energy can be retrieved within a carefully selected CV configuration space. It is worth noting that the high-dimensional configuration space of CVs should preferably be further reduced to simplify analysis. Machine-learning based nonlinear dimensionality methods designed for this purpose are the sketch-map (available at epfl-cosmo.github.io/sketchmap) and t-SNE, accessible from the scikit-learn package (scikit-learn.org). After CVs are selected, a free-energy hypersurface can be reconstructed though using well-tempered metadynamics, umbrella sampling or the Jarzynski equality. All these methods are implemented in the Plumed package,131,132 patchable almost to every MD code. For the analysis of the results, we suggest the MDAnalysis package (mdanalysis.org).

VI.

CONCLUDING REMARKS

Small molecules are as important for life processes as large biopolymers. To support the metabolism of cells, small molecules interact with proteins and become ligands. In general, the interaction between ligands and their targets triggers biologically relevant activity at the molecular level, for instance, a signaling cascade or the regulation of homeostasis.4 The role of the ligand is fulfilled when it reaches an appropriate active site on the receptor surface or within the heterogeneous protein interior. The pathways of ligand transport within proteins can be discovered computationally. The community of life scientists is eager to know where

21 each particular ligand travels and what the timescale of this transport is. These life processes, especially at the molecular level, do not occur in the space occupied by atoms, intensively studied by X-ray crystallographers or NMR specialists, but in the space unoccupied by properly organized atomic structures. In this review we presented the current methods for the effective exploration of such unoccupied spaces within the dynamical models of proteins, which are large enough to accommodate ligands. These methods lead to plausible ligand diffusion pathways within the highly heterogeneous protein matrix. The progress in this field has been extensive during the past 20 years (i.e., since MD simulations matured). Unfortunately, a simulation’s computational time is still substantial while using all-atom canonical MD, especially if one wants to collect sufficient statistics. However, sufficient and inexpensive sampling of the tremendously large conformational space of ligandprotein systems is perhaps possible only by using enhanced MD techniques, e.g., SMD,31 LES,52 TLES,63 RAMD,32 MA,45,70 metadynamics116 etc. The simulations of ligand diffusion pathways allow us to gain a better understanding of medicinal chemistry via systems like G-protein-coupled receptors,45,71 cytochromes,21,32,45,48,50,58,70,73,127 acetylcholinesterases,40,41 thyroid hormone receptors,43 CDK2 inhibitors,124 lysozymes,125 globins56,59,60 etc. One of the main obstacles in turning ligand diffusion studies into a routine engineering procedure is the prediction of reaction coordinates. Most of our current knowledge on molecular recognition implies that ligand diffusion can be significantly more complex than a two-state process, thus, raising some questions about the computing of RCs and asserting the underlying free energy. In this review, we carefully addressed these concerns by providing up-to-date protocols to model ligand diffusion. These suggestions include the calculating of ligand diffusion RCs in systems ranging from trivial instances (in which ligand pathways are linear) to more complex situations with systems without exposed binding sites and multiple nonlinear diffusion pathways. Next, we enlist preferable CVs used to quantify ligand diffusion RCs and suggest how to simplify the representation of high-dimensional CV space, if more than 2 CVs are required to recover the quantitative description of the studied process. Consequently, we explain how to bias these CVs to reconstruct the free energy of ligand diffusion by using umbrella sampling, Jarzynski equality or well-tempered metadynamics to converge efficiently to a smooth free-energy hypersurface.34,79,133 It may perhaps be observed without losing our primary focus that Markov state models (MSMs) can be successfully used to estimate kinetic quantities. MSMs allow the compu-

22 tation of kinetically important states by classifying MD data into different simulation time scales. There are many articles describing the mathematical foundations of MSMs,134,135 statistical analysis136,137 and its sampling error.138 Recent applications of MSMs include a reconstruction of complex ligand-binding kinetics, the absolute binding free-energy estimates during ligand diffusion and the mechanisms of ligand-protein association and how this process is affected by protein mutations. It is worth noting that milestoning,139 while not discussed in this review, offers an alternative option for assertion of long timescale processes. Unfortunately, there are few applications of this method in the modeling of ligand diffusion.140,141 We state that the information concerning ligand diffusion paths will help to elucidate receptor selectivity, signal transduction and enzymatic activity. The knowledge about paths presented in this review should be helpful in the rational design of inhibitors, uncovering the new functions of known proteins via biotechnology, preparing enzymes capable of the stereo-selective synthesis of organic compounds etc. We expect that progress in the physical modeling of ligand diffusion will bring benefits for biology. It is tempting to speculate that in the near future a computational ”black box” – an automatic protein ”pathfinder” might be created. The ”black box” might be used to identify all physical pathways in protein structures discovered so far. Such paths might be collected in an appropriate database and linked, for instance, with the Protein Data Bank. Obviously, a particular path would depend on a tested ligand, but a certain committee might select generic ligands to use as a testbed. Non-expert biologists and chemists would welcome an efficient web server to calculate plausible paths for any ligand-protein system. Unfortunately, this machinery is still far ahead of current capabilities. We would like to indicate that the methods described in this review are general. The problem of object motion in a heterogeneous and dynamic environment is common, especially in biology. One such instance is the trafficking of globular proteins (e.g., hemoglobin, albumin, transthyretin) within the crowded interior of a cell. The migration of a cell during tissue growth or its diffusion through a porous medium (e.g., bone) is another example. The delivery of drugs encapsulated in nano-containers or liposomes also requires similar transport through a complex medium. We hope that our search for ligand diffusion will be repaid in the accurate modeling of various aspects of life.

23 ACKNOWLEDGMENTS

JR acknowledges funding from National Science Centre, Poland (grants 2015/19/N/ST3/02171 and 2016/20/T/ST3/00488) and NCU, Poland (grants 2406-F, 2539-F). The computational results used in this review were obtained using Interdisciplinary Centre for Modern Technologies, NCU.



To whom the correspondence should be addressed: jr@fizyka.umk.pl



To whom the correspondence should be addressed: wiesiek@fizyka.umk.pl

1

Nobutaka Hirokawa, Yasuko Noda, Yosuke Tanaka, and Shinsuke Niwa, “Kinesin superfamily motor proteins and intracellular transport,” Nat. Rev. Mol. Cell Biol. 10, 682–696 (2009).

2

Ege T Kavalali, “The mechanisms and functions of spontaneous neurotransmitter release,” Nat. Rev. Neurosci. 16, 5–16 (2015).

3

Nancy E Hynes, Philip W Ingham, Wendell A Lim, Christopher J Marshall, Joan Massagu´e, and Tony Pawson, “Signalling change: signal transduction through the decades,” Nat. Rev. Mol. Cell Biol. 14, 393–398 (2013).

4

Upinder S Bhalla and Ravi Iyengar, “Emergent properties of networks of biological signaling pathways,” Science 283, 381–387 (1999).

5

Riccardo Baron and J Andrew McCammon, “Molecular recognition and ligand association,” Annu. Rev. Phys. Chem. 64, 151–175 (2013).

6

Zhiqiang Yan and Jin Wang, “Specificity quantification of biomolecular recognition and its implication for drug discovery,” Sci. Rep. 2 (2012).

7

David C Swinney, “The role of binding kinetics in therapeutically useful drug action,” Curr. Opin. Drug Discov. Devel. 12, 31–39 (2009).

8

Dmitry Nerukh, Noriaki Okimoto, Atsushi Suenaga, and Makoto Taiji, “Ligand diffusion on protein surface observed in molecular dynamics simulation,” J. Phys. Chem. Lett. 3, 3476–3479 (2012).

9

Artur Gora, Jan Brezovsky, and Jiri Damborsky, “Gates of enzymes,” Chem. Rev. 113, 5871– 5923 (2013).

10

Maximilian CCJC Ebert, Simon L D¨ urr, Armande Ang Houle, Guillaume Lamoureux, and

24 Joelle N Pelletier, “The evolution of p450 monooxygenases towards formation of transient channels and exclusion of non-productive gases,” ACS Catal. 6, 7426–7437 (2016). 11

Rumin Zhang and Frederick Monsma, “Binding kinetics and mechanism of action: toward the discovery and development of better and best in class drugs,” Expert Opin. Drug Discov. 5, 1023–1029 (2010).

12

David C Swinney, “Biochemical mechanisms of drug action: What does it take for success?” Nat. Rev. Drug Discov. 3, 801–808 (2004).

13

Fang Bai, Yechun Xu, Jing Chen, Qiufeng Liu, Junfeng Gu, Xicheng Wang, Jianpeng Ma, Honglin Li, Jos´e N Onuchic, and Hualiang Jiang, “Free energy landscape for the binding process of huperzine a to acetylcholinesterase,” Proc. Natl. Acad. Sci. USA 110, 4273–4278 (2013).

14

Robert A Copeland, David L Pompliano, and Thomas D Meek, “Drug–target residence time and its implications for lead optimization,” Nat. Rev. Drug Discov. 5, 730–739 (2006).

15

David L Mobley and Ken A Dill, “Binding of small-molecule ligands to proteins: ”what you see” is not always ”what you get”,” Structure 17, 489–498 (2009).

16

Ron Elber, “Ligand diffusion in globins: simulations versus experiment,” Curr. Opin. Struc. Biol. 20, 162–167 (2010).

17

Zhong Ren, Shin-ichi Adachi, Wilfried Schildkamp, Dominique Bourgeois, Michael Wulff, and Keith Moffat, “Photolysis of the carbon monoxide complex of myoglobin: Nanosecond timeresolved crystallography,” Science 274, 1726–1729 (1996).

18

Alisa Rupenyan, Jan Commandeur, and Marie Louise Groot, “Co photodissociation dynamics in cytochrome p450bm3 studied by subpicosecond visible and mid-infrared spectroscopy,” Biochemistry 48, 6104–6110 (2009).

19

Seongheun Kim and Manho Lim, “Protein conformation-induced modulation of ligand binding kinetics: a femtosecond mid-ir study of nitric oxide binding trajectories in myoglobin,” J. Am. Chem. Soc. 127, 8908–8909 (2005).

20

Catherine Tetreau, Liliane Mouawad, Samuel Murail, Patricia Duchambon, Yves Blouquit, and Daniel Lavalette, “Disentangling ligand migration and heme pocket relaxation in cytochrome p450cam,” Biophys. J. 88, 1250–1263 (2005).

21

Rebecca C Wade, Peter J Winn, Ilme Schlichting, et al., “A survey of active site access channels in cytochromes p450,” J. Inorg. Biochem. 98, 1175–1182 (2004).

25 22

Antonia Stank, Daria B Kokh, Jonathan C Fuller, and Rebecca C Wade, “Protein binding pocket dynamics,” Acc. Chem. Res. 49, 809–815 (2016).

23

Jinzen Ikebe, Koji Umezawa, and Junichi Higo, “Enhanced sampling simulations to construct free-energy landscape of protein–partner substrate interaction,” Biophys. Rev. 8, 45–62 (2016).

24

Xinyi Huang, Hazel M Holden, and Frank M Raushel, “Channeling of substrates and intermediates in enzyme-catalyzed reactions,” Annu. Rev. Biochem. 70, 149–180 (2001).

25

Martina Pavlova, Martin Klvana, Zbynek Prokop, Radka Chaloupkova, Pavel Banas, Michal Otyepka, Rebecca C Wade, Masataka Tsuda, Yuji Nagata, and Jiri Damborsky, “Redesigning dehalogenase access tunnels as a strategy for degrading an anthropogenic substrate,” Nat. Chem. Biol. 5, 727–733 (2009).

26

Frank M Raushel, James B Thoden, and Hazel M Holden, “Enzymes with molecular tunnels,” Acc. Chem. Res. 36, 539–548 (2003).

27

Carolina Estarellas Martin, Constant´ı Seira Castan, F Javier Luque Garriga, and Axel BidonChanal Badia, “Understanding the kinetics of ligand binding to globins with molecular dynamics simulations: the necessity of multiple state models,” Drug Discov. Today 17, 22–27 (2015).

28

Pierre-Pol Liebgott, Fanny Leroux, B´en´edicte Burlat, S´ebastien Dementin, Carole Baffert, Thomas Lautier, Vincent Fourmond, Pierre Ceccaldi, Christine Cavazza, Isabelle MeynialSalles, et al., “Relating diffusion along the substrate tunnel and oxygen sensitivity in hydrogenase,” Nat. Chem. Biol. 6, 63–70 (2010).

29

David Chandler, “Introduction to modern statistical mechanics,” Introduction to Modern Statistical Mechanics, by David Chandler, pp. 288. Foreword by David Chandler. Oxford University Press, Sep 1987. ISBN-10: 0195042778. ISBN-13: 9780195042771 1 (1987).

30

Wieslaw Nowak, Ryszard Czerminski, and Ron Elber, “Reaction path study of ligand diffusion in proteins: application of the self penalty walk (spw) method to calculate reaction coordinates for the motion of co through leghemoglobin,” J. Am. Chem. Soc. 113, 5627–5637 (1991).

31

Dorina Kosztin, Sergei Izrailev, and Klaus Schulten, “Unbinding of retinoic acid from its receptor studied by steered molecular dynamics,” Biophys. J. 76, 188–197 (1999).

32

Susanna K L¨ udemann, Val`ere Lounnas, and Rebecca C Wade, “How do substrates enter and products exit the buried active site of cytochrome p450cam? 1. random expulsion molecular dynamics investigation of ligand access channels and mechanisms,” J. Mol. Biol. 303, 797–811

26 (2000). 33

Alessandro Laio and Francesco L Gervasio, “Metadynamics: a method to simulate rare events and reconstruct the free energy in biophysics, chemistry and material science,” Rep. Prog. Phys. 71, 126601 (2008).

34

Giovanni Bussi and Davide Branduardi, “Free-energy calculations with metadynamics: theory and practice,” Rev. Comp. Ch. , 1–49 (2015).

35

Quan Van Vuong, Tin Trung Nguyen, and Mai Suan Li, “A new method for navigating optimal direction for pulling ligand from binding pocket: application to ranking binding affinity by steered molecular dynamics,” J. Chem. Inf. Model. 55, 2731–2738 (2015).

36

Laura J Kingsley and Markus A Lill, “Including ligand-induced protein flexibility into protein tunnel prediction,” J. Comput. Chem. 35, 1748–1756 (2014).

37

Eitan Yaffe, Dan Fishelovitch, Haim J Wolfson, Dan Halperin, and Ruth Nussinov, “Molaxis: efficient and accurate identification of channels in macromolecules,” Proteins 73, 72–86 (2008).

38

Glenn M Torrie and John P Valleau, “Nonphysical sampling distributions in monte carlo freeenergy estimation: Umbrella sampling,” J. Comput. Phys. 23, 187–199 (1977).

39

Helmut Grubm¨ uller, Berthold Heymann, and Paul Tavan, “Ligand binding: molecular mechanics calculation of the streptavidin-biotin rupture force,” Science 271, 997–999 (1996).

40

Yechun Xu, Jianhua Shen, Xiaomin Luo, Israel Silman, Joel L Sussman, Kaixian Chen, and Hualiang Jiang, “How does huperzine a enter and leave the binding gorge of acetylcholinesterase? steered molecular dynamics simulations,” J. Am. Chem. Soc. 125, 11340–11349 (2003).

41

Chunying Niu, Yechun Xu, Yong Xu, Xiaomin Luo, Wenhu Duan, Israel Silman, Joel L Sussman, Weiliang Zhu, Kaixian Chen, Jianhua Shen, et al., “Dynamic mechanism of e2020 binding to acetylcholinesterase: a steered molecular dynamics simulation,” J. Phys. Chem. B 109, 23730–23738 (2005).

42

L Peplowski, K Kubiak, and W Nowak, “Mechanical aspects of nitrile hydratase enzymatic activity. steered molecular dynamics simulations of pseudonocardia thermophila jcm 3095,” Chem. Phys. Lett. 467, 144–149 (2008).

43

Leandro Martinez, Igor Polikarpov, and Munir S Skaf, “Only subtle protein conformational adaptations are required for ligand binding to thyroid hormone receptors: Simulations using a novel multipoint steered molecular dynamics approach,” J. Phys. Chem. B 112, 10741–10751

27 (2008). 44

Harish Vashisth and Cameron F Abrams, “Ligand escape pathways and (un)binding free energy calculations for the hexameric insulin-phenol complex,” Biophys. J. 95, 4193–4204 (2008).

45

Jakub Rydzewski and Wieslaw Nowak, “Memetic algorithms for ligand expulsion from protein cavities,” J. Chem. Phys. 143, 124101 (2015).

46

Laura J Kingsley and Markus A Lill, “Substrate tunnels in enzymes: Structure–function relationships and computational methodology,” Proteins 83, 599–611 (2015).

47

Martin Klvana, Martina Pavlova, Tana Koudelakova, Radka Chaloupkova, Pavel Dvorak, Zbynek Prokop, Alena Stsiapanava, Michal Kuty, Ivana Kuta-Smatanova, Jan Dohnalek, et al., “Pathways and mechanisms for product release in the engineered haloalkane dehalogenases explored using classical and random acceleration molecular dynamics simulations,” J. Mol. Biol. 392, 1339–1356 (2009).

48

Weihua Li, Jie Shen, Guixia Liu, Yun Tang, and Tyuji Hoshino, “Exploring coumarin egress channels in human cytochrome p450 2a6 by random acceleration and steered molecular dynamics simulations,” Proteins 79, 271–281 (2011).

49

Subha Kalyaanamoorthy and Yi-Ping Phoebe Chen, “Exploring inhibitor release pathways in histone deacetylases using random acceleration molecular dynamics simulations,” J. Chem. Inf. Model. 52, 589–603 (2012).

50

Peter J Winn, Susanna K L¨ udemann, Ralph Gauges, Val`ere Lounnas, and Rebecca C Wade, “Comparison of the dynamics of substrate access channels in three cytochrome p450s reveals different opening mechanisms and a novel functional role for a buried arginine,” Proc. Natl. Acad. Sci. USA 99, 5361–5366 (2002).

51

Jacopo Sgrignani and Alessandra Magistrato, “Influence of the membrane lipophilic environment on the structure and on the substrate access/egress routes of the human aromatase enzyme. a computational study,” J. Chem. Inf. Model. 52, 1595–1606 (2012).

52

Ron Elber and Martin Karplus, “Enhanced sampling in molecular dynamics: use of the timedependent hartree approximation for a simulation of carbon monoxide diffusion through myoglobin,” J. Am. Chem. Soc. 112, 9161–9175 (1990).

53

Wei-Mou Zheng and Qiang Zheng, “An analytical derivation of the locally enhanced sampling approximation,” J. Chem. Phys. 106, 1191–1194 (1997).

54

Ryszard Czerminski, Adrian Roitberg, Chyung Choi, Alexander Ulitsky,

and Ron Elber,

28 “Conformational transitions,” in AIP Conf. Proc., Vol. 239 (AIP Publishing, 1991) pp. 153– 173. 55

Alex Ulitsky and Ron Elber, “Application of the locally enhanced sampling (les) and a mean field with a binary collision correction (cles) to the simulation of argon diffusion and nitric oxide recombination in myoglobin,” J. Phys. Chem. 98, 1034–1043 (1994).

56

Slawomir Orlowski and Wieslaw Nowak, “Locally enhanced sampling molecular dynamics study of the dioxygen transport in human cytoglobin,” J. Mol. Model. 13, 715–723 (2007).

57

Stephen D Golden and Kenneth W Olsen, “Identification of ligand-binding pathways in truncated hemoglobins using locally enhanced sampling molecular dynamics,” Method. Enzymol. 437, 459–475 (2008).

58

Ivo Hofacker and Klaus Schulten, “Oxygen and proton pathways in cytochrome c oxidase,” Proteins 30, 100–107 (1998).

59

Slawomir Orlowski and Wieslaw Nowak, “Oxygen diffusion in minihemoglobin from cerebratulus lacteus: a locally enhanced sampling study,” Theor. Chem. Acc. 117, 253–258 (2007).

60

Slawomir Orlowski and Wieslaw Nowak, “Topology and thermodynamics of gaseous ligands diffusion paths in human neuroglobin,” Biosystems 94, 263–266 (2008).

61

Carlos Simmerling, Thomas Fox, and Peter A Kollman, “Use of locally enhanced sampling in free energy calculations: Testing and application to the α-β anomerization of glucose,” J. Am. Chem. Soc. 120, 5771–5782 (1998).

62

Gennady Verkhivker, Ron Elber, and Wieslaw Nowak, “Locally enhanced sampling in free energy calculations: application of mean field approximation to accurate calculation of free energy differences,” J. Chem. Phys. 97, 7838–7841 (1992).

63

Jordi Cohen, Kwiseon Kim, Paul King, Michael Seibert, and Klaus Schulten, “Finding gas diffusion pathways in proteins: application to o2 and h2 transport in cpi [fefe]-hydrogenase and the role of packing defects,” Structure 13, 1321–1329 (2005).

64

Maria S Shadrina, Ann M English, and Gilles Herve Peslherbe, “Benchmarking rapid tles simulations of gas diffusion in proteins: Mapping o2 migration and escape in myoglobin as a case study,” J. Chem. Theory Comput. (2016).

65

Maria S Shadrina, Ann M English, and Gilles H Peslherbe, “Effective simulations of gas diffusion through kinetically accessible tunnels in multisubunit proteins: O2 pathways and escape routes in t-state deoxyhemoglobin,” Journal of the American Chemical Society 134,

29 11177–11184 (2012). 66

Louis P Hammett, “Linear free energy relationships in rate and equilibrium phenomena,” Trans. Faraday Soc. 34, 156–165 (1938).

67

Garrett M Morris, David S Goodsell, Robert S Halliday, Ruth Huey, William E Hart, Richard K Belew, and Arthur J Olson, “Automated docking using a lamarckian genetic algorithm and an empirical binding free energy function,” J. Comput. Chem. 19, 1639–1662 (1998).

68

David S Goodsell and Arthur J Olson, “Automated docking of substrates to proteins by simulated annealing,” Proteins 8, 195–202 (1990).

69

Laura Wesson and David Eisenberg, “Atomic solvation parameters applied to molecular dynamics of proteins in solution,” Protein Sci. 1, 227–235 (1992).

70

Jakub Rydzewski and Wieslaw Nowak, “Machine learning based dimensionality reduction facilitates ligand diffusion paths assessment: a case of cytochrome p450cam,” J. Chem. Theory Comput. (2016).

71

Andrew C Kruse, Jianxin Hu, Albert C Pan, Daniel H Arlow, Daniel M Rosenbaum, Erica Rosemond, Hillary F Green, Tong Liu, Pil Seok Chae, and Ron O Dror, “Structure and dynamics of the m3 muscarinic acetylcholine receptor,” Nature 482, 552–556 (2012).

72

Shuguang Yuan, Slawomir Filipek, Krzysztof Palczewski, and Horst Vogel, “Activation of g-protein-coupled receptors correlates with the formation of a continuous internal water pathway,” Nat. Comm. 5 (2014).

73

Tudor I Oprea, Gerhard Hummer, and Angel E Garc´ıa, “Identification of a functional water channel in cytochrome p450 enzymes,” Proc. Natl. Acad. Sci. USA 94, 2133–2138 (1997).

74

Thomas L Poulos, Barry C Finzel, and Andrew J Howard, “Crystal structure of substrate-free pseudomonas putida cytochrome p-450,” Biochemistry 25, 5314–5322 (1986).

75

Junfeng Gu, Hongxia Li, and Xicheng Wang, “A self-adaptive steered molecular dynamics method based on minimization of stretching force reveals the binding affinity of protein–ligand complexes,” Molecules 20, 19236–19251 (2015).

76

Daan Frenkel and Berend Smit, Understanding molecular simulation: from algorithms to applications, Vol. 1 (Academic Press, 2001).

77

Mark Tuckerman, Statistical mechanics: theory and molecular simulation (Oxford University Press, 2010).

78

Trang N Do, Paolo Carloni, Gabriele Varani, and Giovanni Bussi, “Rna/peptide binding driven

30 by electrostatics–insight from bidirectional pulling simulations,” J. Chem. Theory Comput. 9, 1720–1730 (2013). 79

Cameron Abrams and Giovanni Bussi, “Enhanced sampling in molecular dynamics using metadynamics, replica-exchange, and temperature-acceleration,” Entropy 16, 163–199 (2013).

80

Neva Beˇsker and Francesco L Gervasio, “Using metadynamics and path collective variables to study ligand binding and induced conformational transitions,” Computational Drug Discovery and Design , 501–513 (2012).

81

Alessandro Barducci, Massimiliano Bonomi, and Michele Parrinello, “Metadynamics,” Wiley Interdisciplinary Reviews: Computational Molecular Science 1, 826–843 (2011).

82

Davide Branduardi, Francesco Luigi Gervasio, and Michele Parrinello, “From a to b in free energy space,” J. Chem. Phys. 126, 054103 (2007).

83

Grisell D´ıaz Leines and Bernd Ensing, “Path finding on high-dimensional free energy landscapes,” Phys. Rev. Lett. 109, 020601 (2012).

84

Ludovico Sutto, Simone Marsili, and Francesco Luigi Gervasio, “New advances in metadynamics,” Wiley Interdisciplinary Reviews: Computational Molecular Science 2, 771–779 (2012).

85

Albert Ardevol, Gareth A Tribello, Michele Ceriotti, and Michele Parrinello, “Probing the unfolded configurations of a β-hairpin using sketch-map,” J. Chem. Theory Comput. 11, 1086– 1093 (2015).

86

Mary A Rohrdanz, Wenwei Zheng, and Cecilia Clementi, “Discovering mountain passes via torchlight: methods for the definition of reaction coordinates and pathways in complex macromolecular reactions,” Annu. Rev. Phys. Chem. 64, 295–316 (2013).

87

Laurens Van der Maaten and Geoffrey Hinton, “Visualizing data using t-sne,” J. Mach. Learn. Res. 9, 85 (2008).

88

Yuval Hart, Hila Sheftel, Jean Hausser, Pablo Szekely, Noa Bossel Ben-Moshe, Yael Korem, Avichai Tendler, Avraham E Mayo, and Uri Alon, “Inferring biological tasks using pareto analysis of high-dimensional data,” Nat. Methods 12, 233–235 (2015).

89

Allon M Klein, Linas Mazutis, Ilke Akartuna, Naren Tallapragada, Adrian Veres, Victor Li, Leonid Peshkin, David A Weitz, and Marc W Kirschner, “Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells,” Cell 161, 1187–1201 (2015).

90

Aleksandar D Kostic, Dirk Gevers, Heli Siljander, Tommi Vatanen, Tuulia Hy¨ otyl¨ainen, AnuMaaria H¨ am¨ al¨ainen, Aleksandr Peet, Vallo Tillmann, P¨aivi P¨ oh¨o, Ismo Mattila, et al., “The

31 dynamics of the human infant gut microbiome in development and in progression toward type 1 diabetes,” Cell Host & Microbe 17, 260–273 (2015). 91

Juan Palacios-Moreno, Lauren Foltz, Ailan Guo, Matthew P Stokes, Emily D Kuehn, Lynn George, Michael Comb, and Mark L Grimes, “Neuroblastoma tyrosine kinase signaling networks involve fyn and lyn in endosomes and lipid rafts,” PLoS Comput. Biol. 11, e1004130 (2015).

92

Michele Ceriotti, Gareth A Tribello, and Michele Parrinello, “Simplifying the representation of complex free-energy landscapes using sketch-map,” Proc. Natl. Acad. Sci. USA 108, 13023– 13028 (2011).

93

Ming Chen, Tang-Qing Yu, and Mark E Tuckerman, “Locating landmarks on high-dimensional free energy surfaces,” Proc. Natl. Acad. Sci. USA 112, 3235–3240 (2015).

94

P¨ar S¨oderhjelm, Gareth A Tribello, and Michele Parrinello, “Locating binding poses in proteinligand systems using reconnaissance metadynamics,” Proc. Natl. Acad. Sci. USA 109, 5170– 5175 (2012).

95

Christopher Jarzynski, “Equalities and inequalities: irreversibility and the second law of thermodynamics at the nanoscale,” Annu. Rev. Condens. Matter Phys. 2, 329–351 (2011).

96

Sean X Sun, “Equilibrium free energies from path sampling of nonequilibrium trajectories,” J. Chem. Phys. 118, 5769–5775 (2003).

97

Gavin E Crooks, “Entropy production fluctuation theorem and the nonequilibrium work relation for free energy differences,” Phys. Rev. E 60, 2721 (1999).

98

Christopher Jarzynski, “Equilibrium free-energy differences from nonequilibrium measurements: A master-equation approach,” Phys. Rev. E 56, 5018 (1997).

99

Christopher Jarzynski, “Nonequilibrium equality for free energy differences,” Phys. Rev. Lett. 78, 2690 (1997).

100

Jan Liphardt, Sophie Dumont, Steven B Smith, Ignacio Tinoco, and Carlos Bustamante, “Equilibrium information from nonequilibrium measurements in an experimental test of jarzynski’s equality,” Science 296, 1832–1835 (2002).

101

Sanghyun Park, Fatemeh Khalili-Araghi, Emad Tajkhorshid, and Klaus Schulten, “Free energy calculation from steered molecular dynamics simulations using jarzynski’s equality,” J. Chem. Phys. 119, 3559–3566 (2003).

102

Sanghyun Park and Klaus Schulten, “Calculating potentials of mean force from steered molec-

32 ular dynamics simulations,” J. Chem. Phys. 120, 5946–5961 (2004). 103

Berthold Heymann and Helmut Grubm¨ uller, “Dynamic force spectroscopy of molecular adhesion bonds,” Phys. Rev. Lett. 84, 6126 (2000).

104

Ognjen Periˇsi´c and Hui Lu, “On the improvement of free-energy calculation from steered molecular dynamics simulations using adaptive stochastic perturbation protocols,” PloS one 9, e101810 (2014).

105

Deqiang Zhang, Justin Gullingsrud, and J Andrew McCammon, “Potentials of mean force for acetylcholine unbinding from the alpha7 nicotinic acetylcholine receptor ligand-binding domain,” J. Am. Chem. Soc. 128, 3019–3026 (2006).

106

Turgut Ba¸stu˘g, Po-Chia Chen, Swarna M Patra, and Serdar Kuyucak, “Potential of mean force calculations of ligand binding to ion channels from jarzynski’s equality and umbrella sampling,” J. Chem. Phys. 128, 155104 (2008).

107

Leonardo Boechi, Marcelo A Mart´ı, Mario Milani, Martino Bolognesi, F Javier Luque, and Dar´ıo A Estrin, “Structural determinants of ligand migration in mycobacterium tuberculosis truncated hemoglobin o,” Proteins 73, 372–379 (2008).

108

Jie Shen, Weihua Li, Guixia Liu, Yun Tang, and Hualiang Jiang, “Computational insights into the mechanism of ligand unbinding and selectivity of estrogen receptors,” J. Phys. Chem. B 113, 10436–10444 (2009).

109

Johannes K¨astner, “Umbrella sampling,” Wiley Interdisciplinary Reviews: Computational Molecular Science 1, 932–942 (2011).

110

Christophe Chipot and Andrew Pohorille, Free energy calculations (Springer, 2007).

111

Ryoji Takahashi, V´ıctor A Gil, and Victor Guallar, “Monte carlo free ligand diffusion with markov state model analysis and absolute binding free energy calculations,” J. Chem. Theory Comput. 10, 282–288 (2013).

112

Jean-Fran¸cois St-Pierre, Mikko Karttunen, Normand Mousseau, Tomasz R´og,

and Alex

Bunker, “Use of umbrella sampling to calculate the entrance/exit pathway for z-pro-prolinal inhibitor in prolyl oligopeptidase,” J. Chem. Theory Comput. 7, 1583–1594 (2011). 113

Yasuhiro Matsunaga, Hiroshi Fujisaki, Tohru Terada, Tadaomi Furuta, Kei Moritsugu, and Akinori Kidera, “Minimum free energy path of ligand-induced transition in adenylate kinase,” PLoS Comput. Biol. 8, e1002555 (2012).

114

Serge Crouzy, Simon Bern`eche, and Benoˆıt Roux, “Extracellular blockade of k+ channels by

33 tea results from molecular dynamics simulations of the kcsa channel,” J. Gen. Physiol. 118, 207–218 (2001). 115

Gennady Verkhivker, Ron Elber, and Quentin H Gibson, “Microscopic modeling of ligand diffusion through the protein leghemoglobin: Computer simulations and experiments,” J. Am. Chem. Soc. 114, 7866–7878 (1992).

116

Alessandro Laio and Michele Parrinello, “Escaping free-energy minima,” Proc. Natl. Acad. Sci. USA 99, 12562–12566 (2002).

117

Bradley M Dickson, “Approaching a parameter-free metadynamics,” Phys. Rev. E 84, 037701 (2011).

118

Christoph Junghans, Danny Perez, and Thomas Vogel, “Molecular dynamics in the multicanonical ensemble: Equivalence of wang–landau sampling, statistical temperature molecular dynamics, and metadynamics,” J. Chem. Theory Comput. 10, 1843–1847 (2014).

119

Giovanni Bussi, Alessandro Laio, and Michele Parrinello, “Equilibrium free energies from nonequilibrium metadynamics,” Phys. Rev. Lett. 96, 090601 (2006).

120

Alessandro Laio and Francesco L Gervasio, “Metadynamics: a method to simulate rare events and reconstruct the free energy in biophysics, chemistry and material science,” Rep. Prog. Phys. 71, 126601 (2008).

121

Simone Furini and Carmen Domene, “Computational studies of transport in ion channels using metadynamics,” Biochim. Biophys. Acta 1858, 1733–1740 (2016).

122

Davide Provasi, Marta Camacho Artacho, Ana Negri, Juan Carlos Mobarec, and Marta Filizola, “Ligand-induced modulation of the free-energy landscape of g protein-coupled receptors explored by adaptive biasing techniques,” PLoS Comput. Biol. 7, e1002193 (2011).

123

G Saladino, L Gauthier, M Bianciotto,

and FL Gervasio, “Assessing the performance of

metadynamics and path variables in predicting the binding free energies of p38 inhibitors,” J. Chem. Theory Comput. 8, 1165–1170 (2012). 124

J´er´emy Fidelak, Jarek Juraszek, Davide Branduardi, Marc Bianciotto, and Francesco Luigi Gervasio, “Free-energy-based methods for binding profile determination in a congeneric series of cdk2 inhibitors,” J. Phys. Chem. B 114, 9516–9524 (2010).

125

Yong Wang, Elena Papaleo, and Kresten Lindorff-Larsen, “Mapping transiently formed and sparsely populated conformations on a complex energy landscape,” eLife 5 (2016).

126

Ina Bisha, Alex Rodriguez, Alessandro Laio, and Alessandra Magistrato, “Metadynamics sim-

34 ulations reveal a na+ independent exiting path of galactose for the inward-facing conformation of vsglt,” PLOS Comput. Biol. 10, e1004017 (2014). 127

Marketa Paloncyova, Veronika Navr´atilov´ a, Karel Berka, Alessandro Laio,

and Michal

Otyepka, “Role of enzyme flexibility in ligand access and egress to active site–bias-exchange metadynamics study of 1, 3, 7-trimethyluric acid in cytochrome p450 3a4,” J. Chem. Theory Comput. 12, 2101–2109 (2016). 128

James C Phillips, Rosemary Braun, Wei Wang, James Gumbart, Emad Tajkhorshid, Elizabeth Villa, Christophe Chipot, Robert D Skeel, Laxmikant Kale, and Klaus Schulten, “Scalable molecular dynamics with namd,” J. Comput. Chem. 26, 1781–1802 (2005).

129

Berk Hess, Carsten Kutzner, David Van Der Spoel, and Erik Lindahl, “Gromacs 4: algorithms for highly efficient, load-balanced, and scalable molecular simulation,” J. Chem. Theory Comput. 4, 435–447 (2008).

130

David A Case, Thomas E Cheatham, Tom Darden, Holger Gohlke, Ray Luo, Kenneth M Merz, Alexey Onufriev, Carlos Simmerling, Bing Wang, and Robert J Woods, “The amber biomolecular simulation programs,” J. Comput. Chem. 26, 1668–1688 (2005).

131

Massimiliano Bonomi, Davide Branduardi, Giovanni Bussi, Carlo Camilloni, Davide Provasi, Paolo Raiteri, Davide Donadio, Fabrizio Marinelli, Fabio Pietrucci, Ricardo A Broglia, et al., “Plumed: A portable plugin for free-energy calculations with molecular dynamics,” Comput. Phys. Commun. 180, 1961–1972 (2009).

132

Gareth A Tribello, Massimiliano Bonomi, Davide Branduardi, Carlo Camilloni, and Giovanni Bussi, “Plumed 2: New feathers for an old bird,” Comput. Phys. Commun. 185, 604–613 (2014).

133

Rafael C Bernardi, Marcelo CR Melo, and Klaus Schulten, “Enhanced sampling techniques in molecular dynamics simulations of biological systems,” Biochim. Biophys. Acta. 1850, 872–877 (2015).

134

Vijay S Pande, Kyle Beauchamp, and Gregory R Bowman, “Everything you wanted to know about markov state models but were afraid to ask,” Methods 52, 99–105 (2010).

135

Feliks Nuske, Bettina G Keller, Guillermo P´erez-Hern´andez, Antonia SJS Mey, and Frank Noe, “Variational approach to molecular kinetics,” J. Chem. Theory Comput. 10, 1739–1752 (2014).

136

Robert T McGibbon, Christian R Schwantes, and Vijay S Pande, “Statistical model selection

35 for markov models of biomolecular dynamics,” J. Phys. Chem. B 118, 6475–6481 (2014). 137

Elizabeth H Kellogg, Oliver F Lange, and David Baker, “Evaluation and optimization of discrete state models of protein folding,” J. Phys. Chem. B 116, 11405–11413 (2012).

138

Philipp Metzner, Frank No´e, and Christof Sch¨ utte, “Estimating the sampling error: Distribution of transition matrices and functions of transition matrices for given trajectory data,” Phys. Rev. E 80, 021106 (2009).

139

Juan M Bello-Rivas and Ron Elber, “Exact milestoning,” J. Chem. Phys. 142, 094102 (2015).

140

Tang-Qing Yu, Mauro Lapelosa, Eric Vanden-Eijnden, and Cameron F Abrams, “Full kinetics of co entry, internal diffusion, and exit in myoglobin from transition-path theory simulations,” J. Am. Chem. Soc. 137, 3041–3050 (2015).

141

Tang-Qing Yu, Anthony Bucci, Eric Vanden-Eijnden, and Cameron Abrams, “Markovian milestoning for computing entry, exit, and internal diffusion rates of ligands in proteins,” Biophys. J. 108, 216a (2015).