Computer Physics Communications (
)
–
Contents lists available at ScienceDirect
Computer Physics Communications journal homepage: www.elsevier.com/locate/cpc
GMXPBSA 2.0: A GROMACS tool to perform MM/PBSA and computational alanine scanning✩ C. Paissoni a , D. Spiliotopoulos a,1 , G. Musco a , A. Spitaleri a,b,∗ a
Biomolecular NMR Unit, S. Raffaele Scientific Institute, via Olgettina 58, Milan 20132, Italy
b
Drug Discovery and Development, Istituto Italiano di Tecnologia, Via Morego 30, Genoa 16163, Italy
article
info
Article history: Received 9 January 2014 Received in revised form 19 May 2014 Accepted 17 June 2014 Available online xxxx Keywords: Molecular dynamics simulation Binding free energy Virtual screening GROMACS Computational alanine scanning MM/PBSA
abstract GMXPBSA 2.0 is a user-friendly suite of Bash/Perl scripts for streamlining MM/PBSA calculations on structural ensembles derived from GROMACS trajectories, to automatically calculate binding free energies for protein–protein or ligand–protein complexes. GMXPBSA 2.0 is flexible and can easily be customized to specific needs. Additionally, it performs computational alanine scanning (CAS) to study the effects of ligand and/or receptor alanine mutations on the free energy of binding. Calculations require only for protein–protein or protein–ligand MD simulations. GMXPBSA 2.0 performs different comparative analysis, including a posteriori generation of alanine mutants of the wild-type complex, calculation of the binding free energy values of the mutant complexes and comparison of the results with the wild-type system. Moreover, it compares the binding free energy of different complexes trajectories, allowing the study the effects of non-alanine mutations, post-translational modifications or unnatural amino acids on the binding free energy of the system under investigation. Finally, it can calculate and rank relative affinity to the same receptor utilizing MD simulations of proteins in complex with different ligands. In order to dissect the different MM/PBSA energy contributions, including molecular mechanic (MM), electrostatic contribution to solvation (PB) and nonpolar contribution to solvation (SA), the tool combines two freely available programs: the MD simulations software GROMACS and the Poisson–Boltzmann equation solver APBS. All the calculations can be performed in single or distributed automatic fashion on a cluster facility in order to increase the calculation by dividing frames across the available processors. The program is freely available under the GPL license. Program summary Program title: GMXPBSA 2.0 Catalogue identifier: AETQ_v1_0 Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AETQ_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: GNU General Public License, version 3 No. of lines in distributed program, including test data, etc.: 185 937 No. of bytes in distributed program, including test data, etc.: 7 074 217 Distribution format: tar.gz Programming language: Bash, Perl. Computer: Any computer. Operating system: Linux, Unix OS. RAM: ∼2 GB
✩ This paper and its associated computer program are available via the Computer Physics Communication homepage on ScienceDirect (http://www.sciencedirect.com/ science/journal/00104655). ∗ Corresponding author at: Drug Discovery and Development, Istituto Italiano di Tecnologia, Via Morego 30, Genoa 16163, Italy. Tel.: +39 3485188790. E-mail address:
[email protected] (A. Spitaleri). 1 Present address: Computational Structural Biology Biochemisches Institut Universität Zürich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland.
http://dx.doi.org/10.1016/j.cpc.2014.06.019 0010-4655/© 2014 Elsevier B.V. All rights reserved.
2
C. Paissoni et al. / Computer Physics Communications (
)
–
Classification: 3. External routines: APBS (http://www.Poissonboltzmann.org/apbs/) and GROMACS installations (http:// www.gromacs.org). Optionally LaTeX. Nature of problem: Calculates the Molecular Mechanics (MM) data (Lennard-Jones and Coulomb terms) and the solvation energy terms (polar and nonpolar terms respectively) from an ensemble of structures derived from GROMACS molecular dynamics simulation trajectory. These calculations are performed for each single component of the simulated complex, including protein and ligand. In order to cancel out artefacts an identical grid setup for each component, including complex, protein and ligand, is required. Performs statistical analysis on the extracted data and comparison with wild-type complex in case of either computational alanine scanning or calculations on a set of simulations. Evaluates possible outliers in the frames extracted from the simulations during the binding free energy calculations. Solution method: The tool combines the freely available programs GROMACS and APBS to: 1. extract frames from a single or multiple complex molecular dynamics (MD) simulation, allowing comparison between multiple trajectories; 2. split the complex frames in the single components including complex, protein and ligand; 3. calculate the Lennard-Jones and Coulomb energy values (MM terms); 4. calculate the polar solvation energy values using the implicit solvation Poisson–Boltzmann model (PB); 5. calculate the nonpolar solvation energy value based on the solvent accessible surface area (SASA); 6. combine all the calculated terms into the final binding free energy value; 7. repeat the same procedure from point 1 to 6 for each simulation in case of computational alanine scanning (CAS) or simultaneous comparison of different MDs. Restrictions: Input format files compatible with GROMACS engine 4.5 and later versions. Availability of the force field or of the topology files. Running time: On a single core, Lennard-Jones, Coulomb and nonpolar solvation terms calculations require a few minutes. The time required for polar solvation terms calculations depends on the system size. © 2014 Elsevier B.V. All rights reserved.
1. Introduction MM/PBSA is a versatile method to calculate the binding free energies of a protein–ligand complex [1]. It incorporates the effects of thermal averaging with a force field/continuum solvent model to post-process a series of representative snapshots from MD trajectories. MM/PBSA has been successfully applied to compute the binding free energy of numerous protein–ligand interactions [2–5]. The method expresses the free energy of binding as the difference between the free energy of the complex and the free energy of the receptor plus the ligand (end-state method). This difference is averaged over a number of trajectory snapshots [6]. Of note, the MM/PBSA approach allows for a rapid estimation of the variation in the free energy of binding, with the caveat that generally it does not reproduce the absolute binding free energy values. Nevertheless, it usually exhibits good correlations with experiments, thus representing a fair compromise between efficiency and efficacy for the calculation and comparison of binding free energy variations. The theory underlying MM/PBSA approach has been described previously [6]. Briefly, the binding free energy of a protein molecule to a ligand molecule in solution is defined as:
∆Gbinding = Gcomplex − (Gprotein + Gligand ).
(1)
A MD simulation is performed to generate a thermodynamically weighted ensemble of structures. The free energy term is calculated as an average over the considered structures:
⟨G⟩ = ⟨EMM ⟩ + ⟨Gsolv ⟩ − T ⟨SMM ⟩.
(2)
The energetic term EMM is defined as: EMM = Eint + Ecoul + ELJ
(3)
where Eint indicates bond, angle, and torsional angle energies, and Ecoul and ELJ denote the intramolecular electrostatic and LennardJones energies, respectively. The solvation term Gsolv in Eq. (4) is split into polar Gpolar and nonpolar contributions, Gnonpolar : Gsolv = Gpolar + Gnonpolar
(4)
GMXPBSA 2.0 calculates Gpolar and Gnonpolar with Adaptive Poisson–Boltzmann Solver (APBS) program [7]. The polar contribution Gpolar refers to the energy required to transfer the solute from a continuum medium with a low dielectric constant (ε = 1) to a continuum medium with the dielectric constant of water (ε = 80). Gpolar is calculated using the non-linearized or linearized Poisson–Boltzmann equation. The nonpolar contribution Gnonpolar is considered proportional to the solvent accessible surface area (SASA): Gnonpolar = γ SASA + β
(5) −1 −2
where γ = 0.0227 kJ mol Å and β = 0 kJ mol−1 [8]. The dielectric boundary is defined using a probe of radius 1.4 Å. Binding free energy calculations based on the MM/PBSA approach can be performed either according to the three trajectories method (TTM) or to the single trajectory method (STM). The TTM requires three separate MD simulations on the three system components including the complex, the free ligand and the free receptor. This is a computationally demanding approach and prone to structural noise [3,5]. Conversely, the STM requires a single trajectory run for the complex, whereby both the protein and ligand structures are extracted directly from the complex structure [3],
C. Paissoni et al. / Computer Physics Communications (
)
–
3
Fig. 1. Workflow diagram for GMXPBSA 2.0. Diagram describing the general GMXPBSA 2.0 workflow scheme. GMXPBSA 2.0 combines the GROMACS and APBS programs in order to use the frames extracted from the molecular dynamics simulations and to calculate the binding free energy.
thus zeroing out the Eint term. In this case, the protein and the ligand are assumed to behave similarly in the bound and in the free forms. Recently, a useful python program (MMPBSA.py) developed to perform MM/PBSA calculations on AMBER MD simulations suite has been presented [9]. In this context, similar tools tailored to perform post-processing end-state method to calculate free energies using APBS on MD trajectories would be extremely welcome by the GROMACS users’ community. However, despite the popularity of both GROMACS [10] and APBS [7], until now there was not freely available tool to automatically combine the two programs in order to use directly the GROMACS output as input for APBS binding free energy calculations. To facilitate the interface between the two programs, we previously wrote a series of Bash/Perl scripts to directly perform MM/PBSA calculations on structures generated by MD simulations [11]. Herein, we present an updated and revised version of the tool, GMXPBSA 2.0 (Fig. 1). One of the major upgrade is the automation of computational alanine scanning (CAS) calculations, that can be performed a posteriori directly on the wild-type trajectory. CAS can be performed by adopting two different approaches, depending on the objectives. In the first approach, a single mutation is performed in order to qualitatively evaluate the role/contribution of a single residue to the binding, in the second, a series of alanine mutations is simultaneously performed in order to investigate the contribution to binding of specific regions, such as binding pockets, protein–protein or protein–peptide interaction interfaces. In both cases, a selected amino acid or a set of amino acids is mutated into alanine, thus allowing a per-residue decomposition of the interactions. Under the assumption that the mutation will have negligible effects on the protein conformation, CAS can qualitatively highlight the importance of the electrostatic and steric nature of the original side chain. Furthermore, GMXPBSA 2.0 can simultaneously calculate the binding free energy for a set of protein–ligand trajectories and then compare the relative binding free energies. This feature is particularly useful when comparing the binding free energy values of a set of ligands versus the same receptor, or when analysing the effects of receptor non-alanine mutants. In the latter case, the user needs to perform a priori different simulations for each mutated protein in complex with the ligand to generate the corresponding trajectories. We have introduced in GMXPBSA 2.0 the following improvements with respect to the previous version [11]: 1. control of the input and output options; 2. automatic setup and a posteriori CAS calculations;
3. CAS calculations on a single residues or on a set of residues simultaneously; 4. handling of multiple protein–ligands MD simulations to allow comparisons between different ligands; 5. handling of multiple protein–ligands MD simulations to allow comparisons (e.g. between wild-type complex and non-alanine mutants); 6. handling of APBS calculations on a multi core system (distributed calculations in cluster). 7. possibility to use custom van der Waals radii; 8. check and restart of the failed MM/PBSA calculations; 9. statistical analysis of the results. 2. Program usage 2.1. GMXPBSA 2.0 calculation workflow GMXPBSA 2.0 is a user-friendly suite of Bash/Perl scripts that efficiently streamlines the set up procedure and the calculation of binding free energies for an ensemble of complex structures generated by GROMACS MD engine. The program workflow, (Figs. 1 and 2) consists of three different sequential steps comprising: 1. gmxpbsa0.sh: In this step, the tool exploits the gmxpbsa0.sh script to set up the system and to perform preliminary calculations including:
• check of the required input files and directories; • extraction of the frames of the complex from the MD simulations, subsequently split in the protein and the ligand components by the GROMACS tools; • calculation of the Coulomb energy contributions using either GROMACS tools or the ‘‘coulomb’’ program available in the APBS suite, and Lennard-Jones term using GROMACS. If the computational alanine scanning (CAS) calculation is required, the script performs alanine mutations on the defined residues on every single extracted frames removing the side chains atoms of the target residues up to the beta C atom (CB atom) and then recalculating the Coulomb and the Lennard-Jones energy contributions of the structure containing the alanine mutant. It also generates the grid and the input to perform the APBS calculations for each frame of the simulation. The latter task is critical, since deletion of artefacts in the MM/PBSA calculation requires an exact matching of the grid setup between all the system components (complex, protein and ligand).
4
C. Paissoni et al. / Computer Physics Communications (
)
–
Fig. 2. Schematic diagram of the three GMXPBSA 2.0 calculation steps. Diagram showing the input files used by GMXPBSA 2.0 and the output files generated during each MM/PBSA step.
2. gmxpbsa1.sh: In this step, the gmxpbsa1.sh script computes the solvation polar and nonpolar energy contributions using APBS program. These calculations can be distributed on a cluster or on a multi core workstation. 3. gmxpbsa2.sh: In this last step, the gmxpbsa2.sh script combines for all the frames the single terms, ⟨EMM ⟩ and ⟨Gsolv ⟩ respectively, in order to calculate the final binding free energy value. It also checks and tries to fix errors and/or failures occurring in the preceding step 2 (APBS calculations). Statistical analysis is also performed computing average √ and standard error (SE). The SE is calculated as follows: SE = σ / N, where σ is the standard deviation and N is the number of structures (MD frames) used in the calculation. The average Coulomb and Lennard-Jones values, the polar and nonpolar solvation terms are calculated along each trajectory. If a value differs from the average more than two standard deviations it is considered as an outlier and the corresponding frame is excluded from the final calculation. However, it is always possible to check for outlier frames, since their reference-numbers are stored in the WARNING.dat file. 2.2. Installation and execution of the program Once the source code of the program GMXPBSAtool.tar.gz has been downloaded the user should perform the following steps: 1. extract the source code in a user defined location, e.g. /home/myprogram/, by typing tar zxvf GMXPBSAtool.tar.gz; set the GMXPBSAHOME environment variable in bash: export GMXPBSAHOME=/home/myprogram/GMXPBSAtool; change the /home/ myprogram to whatever directory is appropriate for your machine; verify write permissions in the directory tree, and execute permissions for the gmxpbsa0.sh, gmxpbsa1.sh and gmxpbsa2.sh scripts. $ GMXPBSAHOME should be also added to the PATH. 2. In order to perform MM/PBSA calculations, the user has to run the tool by typing $GMXPBSAHOME/