G Model
ARTICLE IN PRESS
PRBI-10319; No. of Pages 10
Process Biochemistry xxx (2015) xxx–xxx
Contents lists available at ScienceDirect
Process Biochemistry journal homepage: www.elsevier.com/locate/procbio
Cloning, expression and in silico studies of a serine protease from a marine actinomycete (Nocardiopsis sp. NCIM 5124) Sonali Rohamare a,∗ , Sushama Gaikwad a , Dafydd Jones b , Varsha Bhavnani c , Jayanta Pal c , Ranu Sharma a , Prathit Chatterjee d a
Division of Biochemical Sciences, National Chemical Laboratory, Pune, India Cardiff School of Biosciences, Main Building, Museum Avenue, Cardiff, UK c Cell & Molecular Biology Research Laboratory, Department of Biotechnology, Savitribai Phule Pune University, Pune, India d Physical Chemistry Division, National Chemical Laboratory, Pune, India b
a r t i c l e
i n f o
Article history: Received 21 August 2014 Received in revised form 1 December 2014 Accepted 22 December 2014 Available online xxx Keywords: Cloning and expression Protease Kinetic stability Actinomycetes Thermal simulation
a b s t r a c t A serine protease (N. protease), from Nocardiopsis sp., was cloned and expressed in Escherichia coli and investigated for its potential kinetic stability. Protein expression using two vectors, pET-22b (+) and pET39b (+) was compared based on proper folding and soluble expression of the protein. pET-39b (+) was found to be a better vector for soluble expression of this protease containing disulfide bonds. In silico studies were also carried out for N. protease. Homology modeling suggested N. protease to be a member of PA clan of proteases. The phylogenetic analysis showed relatedness of N. protease to kinetically stable proteases. Molecular docking studies performed exhibited interaction of a peptide substrate with catalytic pocket of the enzyme. High temperature MD simulations were performed on N. protease to study its unfolding behavior and comparisons were made with ␣LP. A novel approach to study ‘cooperativity’ of protein unfolding was undertaken, wherein ‘P’ value analysis based on ϕ and values of the protein was performed. Data showed sharper P value transition for ␣LP when compared to N. protease thus indicating relatively less kinetic stability of N. protease. Present study holds significance as the non-streptomycete actinomycetes group is least explored and ensures industrially important enzymes with exceptional stabilities. © 2015 Elsevier Ltd. All rights reserved.
1. Introduction Proteases have been widely studied often, as a model protein for enzyme catalysis and structure, as targets for drug development, or as reagents for green chemistry. Proteases are involved in virtually every physiological process in a cell. Serine proteases, with the classic Ser-His-Asp triad are the largest class of proteases. Controlling and regulating the protease activity has been of interest since long. Nature has its own ways of regulation; either there are inhibitors of proteases to control their activity or the proteases are synthesized as inactive precursors and are processed to mature proteases. Many of the bacterial proteases which are needed to work in extracellular environment are highly stable to various denaturing conditions and are called as kinetically stable proteins [1]. Most of these proteases are synthesized with a pro-region, which helps them to fold
∗ Corresponding author at: Division of Biochemical Sciences, National Chemical Laboratory, Dr. Homi Bhabha Road, Pune 411008, India. Tel.: +91 020 25902241; fax: +91 020 25902648. E-mail address:
[email protected] (S. Rohamare).
properly and is then removed autocatalytically to release the mature enzyme [2]. These pro-regions are N-terminal extensions that are large enough to fold independently [3]. The unique feature of proteases is that although they catalyze the same reaction, their sequence, structure and stability differ from each other based on their source. Proteases from different sources are being studied widely to understand structural features responsible for their exceptional stabilities. For example, alpha lytic protease (␣LP) system and NAPase (Nocardiopsis alba protease) have been previously reported to possess kinetic stability [4,5]. The kinetically stable proteases are difficult to be unfolded. If unfolded, their unfolding is irreversible. The pro-region (which is expressed along with these proteases) helps these proteases to attain their native conformation. Moreover, unfolding transitions of such proteases are reported to be “cooperative” i.e., the unfolding event is an all or none process [6]. The biochemical heterogeneity, ecological diversity and the excellent capacity of secondary metabolite production of actinomycetes, make them potential sources of enzymes with new activities and/or specificities which has largely been unexplored [7,8]. Since, some of the actinomycetes grow in extreme
http://dx.doi.org/10.1016/j.procbio.2014.12.025 1359-5113/© 2015 Elsevier Ltd. All rights reserved.
Please cite this article in press as: Rohamare S, et al. Cloning, expression and in silico studies of a serine protease from a marine actinomycete (Nocardiopsis sp. NCIM 5124). Process Biochem (2015), http://dx.doi.org/10.1016/j.procbio.2014.12.025
G Model PRBI-10319; No. of Pages 10
ARTICLE IN PRESS S. Rohamare et al. / Process Biochemistry xxx (2015) xxx–xxx
2
environmental conditions, the enzymes produced by them are exceptionally stable [9,10]. Though the enzymes from streptomycetes group of actinomycetes have been very well studied, the enzymes from non-streptomycetes actinomycetes have not been studied thoroughly. In the present study, an attempt has been made to clone and express a serine protease from Nocardipsis sp. NCIM 5124. There are many reports on production and purification of serine proteases from various actinomycetes [8,11]. However, these proteases have not been studied at the molecular level, with few exceptions such as the serine protease from N. alba (NAPase) which has been studied extensively for its high acid stability. Similarly, NprotI from Nocardipsis sp. NCIM 5124 [12] has been studied for its kinetic stability. Due to the inherent ability of proteases to cleave peptide bond, proteases represent one of the most challenging functional group of proteins for heterologous expression and purification. The proteases when expressed in Escherichia coli host often results in the formation of inclusion bodies, non-expression or cytotoxicity if they are produced unregulated as the host cell suffers from critical stress which makes the study of proteases at molecular level challenging. For proper expression and purification of recombinant proteins, the fusion tags used in the expression vector are crucial components. Numerous tags are available which assists in enhancement of expression and solubility, immobilization, detection, quantification, and purification of recombinant proteins. In this paper, we report the cloning and expression of a serine protease from Nocardiopsis sp., containing the N-terminal proregion and the mature protease domain, in E. coli. For expression studies, we have used two different vectors pET-22b (+) and pET39b (+) [13], both targeting proteins in the periplasmic space and both having a histidine tagging for easy purification. Additionally pET-39b (+) has a disulfide oxidoreductase A (DsbA) tag which helps in disulphide bond formation. A comparison between the two vectors has been done with respect to proper folding of the protease with disulphide bonds. The effect of growth temperature of E. coli on the maturation and solubility of the proteases was studied. Molecular docking studies with a peptide substrate were carried out to study catalytic pocket of the enzyme. Also, in silico studies have been carried out to study whether the protease in present the study is a kinetically stable protein similar to some other proteases from microbial origin. 2. Materials and methods 2.1. Genomic DNA extraction of Nocardiospsis sp. NCIM 5124 The organism was isolated from an oil contaminated marine site near Mumbai Harbor (India) [14]. The genomic DNA from this organism was isolated using CTAB extraction protocol to avoid any polysaccharide contamination [15]. 2.2. Cloning and sequencing Primers were designed based on the gene sequences of serine proteases from Nocardiopsis dassonivilie, as the whole genome sequence of the organism is known [16]. Following sets were designed: No.
Forward primer (5 –3 )
Reverse primer (5 –3 )
1 2 3 4 5
AGCGTTACCCGGTCACCA GAGCGTTACCCGGTCACCA CGGTCACCAGGGACAGC CGGAGCGTTACCCGGTC CCCGGTCACCAGGGACA
AGGGATCTATGCGACCCTCC ATCTATGCGACCCTCCCCC TAGGGATCTATGCGACCCTCC GATCTATGCGACCCTCCCCC GGATCTATGCGACCCTCCCC
PCR amplification of the protease gene was carried out using these primers and genomic DNA from Nocardiospsis sp. NCIM 5124
as the template. The amplified fragments were sequenced and were then assembled using CAP3 program [17] to yield a fulllength sequence of 1155 bp, coding for signal peptide, pro-region (pro) and mature protease (prot). The complete gene sequence was submitted to genbank under accession no. of KM268817. As the sequence was highly GC rich, it was codon optimized for expression in E. coli and then the assembled gene was synthesized from Life TechnologiesTM , UK, into cloning vector pMA between restriction sites BamHI and XhoI. Codons for the first 29 amino acids coding for signal peptide and the stop codon were excluded from the gene. The signal peptide from E. coli was used for effective expression. 2.3. Subcloning in expression vector The cloning vector pMA with the gene sequence was digested with BamHI and XhoI endonucleases (NEB, USA). These fragments were purified using the Qiagen PCR purification kit (Qiagen, Valencia, CA, USA) and then ligated into the BamHI and XhoI sites of the pET-22b (+) and pET-39b (+) vectors which had been digested with the same enzymes. This generated two fusion expression vectors pET22b-pro-prot and pET39b-pro-prot, where “pro” stands for proregion. The resultant plasmids were transformed into E. coli strain DH-5␣. The correct insertion was confirmed by DNA sequencing, restriction digestion and colony PCR. The transcript was predicted to be a fusion with six histidines and a pelB tag for pET22b-pro-prot and was named as pro-prot; for pET39b-pro-prot, a DsbA tag, six histidines and S tag were attached at the N-terminus of pro-prot and the transcript was named as DsbA-pro-prot. 2.4. Heterologous expression and purification of pro-prot in E. coli pET22b-pro-prot and DsbA-pro-prot in pET39b-pro-prot The screened plasmids pET22b-pro-prot and pET39b-pro-prot were transformed into the E. coli strain BL-21(DE3) for expression. Expression of the recombinant protein was performed according to the instructions in pET system manual (Novagen), induced by adding isopropyl -d-thiogalactopyranoside (IPTG). The expression studies were carried out with inducer concentration of (1–0.1 mM) and growth temperature of (37–16 ◦ C) to optimize the conditions for getting soluble expression. Cells were grown overnight after induction and were harvested by centrifugation at 4500 × g for 15 min at 20 ◦ C. The cell pellet obtained from 500 ml bacterial culture was resuspended in 15 ml of 20 mM Tris–HCl pH 7.5, 1 mM EDTA, and subjected to sonication at 4 ◦ C for 20 cycles (each cycle consisting of 2 s on and 3 s off times). The total cellular proteins were then partitioned into soluble and insoluble fractions by centrifugation at 12,000 × g for 45 min at 4 ◦ C and the protein expression was analyzed by SDS-PAGE. The processing of the expressed protein to form mature protease was monitored with SDS-PAGE. The protein was partially purified by Nickel nitrilotriacetic acid (Ni-NTA) column. This was performed at room temperature. The soluble fraction was batch absorbed for 90 min with Ni-NTA agarose (Qiagen, Germany) pre-equilibrated with Tris–HCl at room temperature. The resin was loaded on a glass column and washed with 5 column volumes of Tris–HCl with 20 mM imidazole to remove nonspecifically bound contaminants. Finally, the protein was eluted by passing the same buffer with 300 mM imidazole. Purification and elution were monitored by SDS-PAGE. 2.5. Western blot analysis Expression of recombinant pro-prot and DsbA-pro-prot in E. coli BL21 (DE3) was confirmed by Western blot analysis with rabbit antiserum raised against Histidine tag (Bangalore Genei, India) as primary antibody. For Western blot analysis, a standard protocol
Please cite this article in press as: Rohamare S, et al. Cloning, expression and in silico studies of a serine protease from a marine actinomycete (Nocardiopsis sp. NCIM 5124). Process Biochem (2015), http://dx.doi.org/10.1016/j.procbio.2014.12.025
G Model PRBI-10319; No. of Pages 10
ARTICLE IN PRESS S. Rohamare et al. / Process Biochemistry xxx (2015) xxx–xxx
was followed wherein the SDS-PAGE (12%) separated proteins were transferred to a nitrocellulose membrane (Hybond-ECL, Amersham Biosciences, Germany). The membrane was blocked with 3% BSA and incubated with primary antibody and then with secondary antibody (conjugated with Horseradish Peroxidase). Finally, the signal in the blot was detected using chemiluminescent substrate. The processing of the DsbA-pro-prot to get mature protein was also monitored by Western blot analysis. 2.6. Enzyme assay Protease activity was determined by incubating the enzyme in 300 l of 1% casein (substrate) at pH 9.0 at 50 ◦ C for 20 min as described by Dixit and Pant [14]. One unit of protease activity is defined as the amount of enzyme which releases 1 mol of tyrosine per minute in the assay conditions. The enzyme, purified by IMAC, was assayed for optimum pH (pH range 7.0–11.0) and temperature (30–60 ◦ C) using casein as substrate. Inhibition of the enzyme with PMSF (25–50 M) was checked by incubating the mixture for 10 min at room temperature and assaying the residual activity as described above. 2.7. Molecular modeling studies A similarity search using the BLAST server [18] in PDB (http://www.rcsb.org), identified the NAPase; a serine protease ˚ This had 85% idenfrom N. alba (PDB ID: 2OUA, Resolution: 1.85 A). tity with the translated gene sequence of mature N. protease. The 3D model of N. protease was built with the MODELLER 9.10 program [19] using above homologous PDB structure as a template. Separate models were built for pro-region and mature protease. The N-terminal pro-region sequence was submitted to the SWISS Model server (http://expasy.org/spdbv/) for comparative modeling ˚ as tememploying the pro-region of ␣LP (PDB ID: 2PRO, 3.00 A) plate which had 25% identity to the prodomain of N. protease. The molecular models were evaluated using Ramchandran plot obtained from PROCHECK (http://services.mbi.ucla.edu/SAVES/), Errat (version 2.0) [20] and ProSA-web Protein structure analysis [21] (supplementary Figs. 1 and 2). 2.8. Phylogenetic analysis MEGA 6.0 software (http://www.megasoftware.net) was used for the phylogenetic analysis [22] of protein sequence of N. protease. Various sequences of proteases were obtained from National Centre for Biotechnology Information (NCBI). These protein sequences were initially aligned using MUltiple Sequence Comparison by Log-Expectation (MUSCLE). This alignment was used to build the phylogenetic tree using the Neighbor Joining (NJ) algorithm and the bootstrap was calculated using 1000 replications. 2.9. Docking studies 2.9.1. Ligand and receptor preparation To study the functioning of the enzyme, docking studies with a known substrate BenzoylArginine P-NitroAnilide (BAPNA) were carried out. The molecular model of N. protease generated, as described in Section 2.7, was used as receptor for docking BAPNA in the binding pocket of the enzyme. The ligand molecule BAPNA was downloaded from PubChem compound (CID 13494) (http://pubchem.ncbi.nlm.nih.gov/). Before proceeding with the docking studies, hydrogens were added subsequently to carry out restrained minimization of the models. The root mean square deviation (RMSD) of the atomic displacement for terminating the ˚ minimization was set as 0.3 A.
3
Similarly, ligand was refined with the help of LigPrep 2.5 [23] to define its charged state and stereo isomers. The processed receptor and ligand were further used for the docking studies using Glide 5.8 [24].
2.9.2. Modeling enzyme–ligand interactions BAPNA, a substrate for serine protease, was docked in the active site cavity of the prepared receptor molecule. A grid was made by selecting the catalytic triad (H57, D102, S195) involved in the binding of the substrate. Flexible ligand docking was carried out using the standard precision option. A total of 20 poses generated were scored on the basis of their glide score and important H-bonding interactions. The hydrogen bond interactions between the protein and ligand were visualized using PyMOL. The docked complexes were subjected to molecular dynamics simulations for 5 ns using the GROningen Machine for Chemical Simulations V4.5.4 (GROMACS) [25].
2.10. Molecular dynamics simulations Thermal MD simulations were carried out for N. protease as well as for ␣LP, for comparison. The molecular model generated using MODELLER program was used for N. protease and 1SSX PDB was used for ␣LP. The structures were subjected to molecular dynamics simulations using the GROMACS. GROMOS96 43a1 force field was applied on the protein molecule placed in the center of the dodecahedron box solvated with SPCE explicit water model with distance ˚ The system was initially between the solute and the box set to 10 A. energy minimized by steepest descent minimization for 50,000 steps until a tolerance of 10 KJ/mol is reached. In order to neutralize the whole system, total charge on the system was balanced by adding 3 chloride counter-ions using genion program of GROMACS. After adding ions, the system was again energy minimized by steepest descent minimization retaining the same parameters. A cut off distance of 9 A˚ and 14 A˚ was set for Coulombic and van der Waals interactions. Simulations were carried out with periodic boundary conditions in all the three directions. A timestep of 2 fs was used and snapshots were saved every 2 ps. The solvent was equilibrated for 100 ps under NPT conditions using Berendsen coupling for both pressure (100 fs relaxation time) and temperature (2.0 ps coupling constant). For N. protease, 3 final runs of 10 ns at 298 K, 500 K and for ␣LP, 3 runs at 500 K and 1 at 298 K were performed. The trajectories were visually inspected using Visual Molecular Dynamics program (VMD) [26].
2.11. Structural persistence The structural persistence parameter was used to monitor the changes in secondary structure with respect to the reference structure [27,28]. To summarize briefly, 1 −(ϕ /ϕmax ) −( j e ·e Nres Nres
P=
j /
max )
j=1
Here, ϕj and j respectively, are the changes in the ϕ and torsional angles of residue j over the reference at a given point in time, and ϕmax and max represent the maximal changes that can occur in the torsional angles. Nres represents the total number of amino acid residues in the protein. A value of P = 1 is observed for a conformation whose secondary structure is exactly same to the reference. This parameter was calculated for both N. protease and ␣LP.
Please cite this article in press as: Rohamare S, et al. Cloning, expression and in silico studies of a serine protease from a marine actinomycete (Nocardiopsis sp. NCIM 5124). Process Biochem (2015), http://dx.doi.org/10.1016/j.procbio.2014.12.025
G Model PRBI-10319; No. of Pages 10
ARTICLE IN PRESS S. Rohamare et al. / Process Biochemistry xxx (2015) xxx–xxx
4
Fig. 1. Protein expression using pET-39b(+) at 37 ◦ C, with 1 mM IPTG induction (a)10% SDS PAGE. Two clones (39b1 and 39b2) were checked for expression. Samples were loaded as followed; 1: prestained protein ladder, 2: uninduced whole cell fraction, 3: 39b1 soluble fraction, 4: 39b1 insoluble fraction, 5: 39b2 soluble fraction, 6: 39b2 insoluble fraction. (b) Western blot analysis with anti His tag antibody showing unprocessed protein at 91 kDa in insoluble fraction.
Fig. 3. Western blot analysis showing different stages in protease processing. (1) Lane 39b sol: the soluble fraction obtained after expression of N. protease at 16 ◦ C with 0.1 mM IPTG using pET-39b vector showed different stages in protease maturation. (2) Lane 39b insol: the fraction obtained after expression of N. protease at 37 ◦ C with 1 mM IPTG using pET-39b showed protein to be in insoluble fraction and in unprocessed form. (3) Lane 22b sol: the soluble fraction obtained after expression of N. protease at 16 ◦ C with 0.1 mM IPTG using pET-22b showed unprocessed protein at 35 kDa.
Fig. 2. Protein purification using IMAC (Ni NTA) column after expression of the protein in pET-39b, 16 ◦ C, 0.1 mM IPTG. Soluble fraction was loaded on the column. Samples were loaded as followed; 1: prestained protein ladder, 2: flow through from the column, 3: column wash, 4–7: column elutions in 300 mM imidazole.
3. Results and discussion Vector containing recombinant N. protease gene obtained after successful cloning was used for further studies. The 384 amino acid protein, codified from the ORF, contained three parts: a signal peptide of 29 amino acids, an N-terminal pro-region of 167 amino acids, mature enzyme of 188 amino acids. The estimated molecular mass of the mature protein is around 20 kDa with theoretical pI of 8.86 (http://web.expasy.org/cgi-bin/compute pi/pi tool). The protein was found to have 3 disulfide bonds as studied by PDBsum program [29] (supplementary Fig. 1). 3.1. Heterologous expression of pro-prot using pET22b-pro-prot and DsbA-pro-prot using pET39b-pro-prot in E. coli BL 21(DE3) and purification As described earlier (Section 2.4), inducer (IPTG) concentration and growth temperature after induction were optimized to achieve soluble protein expression. Initially, with 1 mM concentration of IPTG and growing E. coli at 37 ◦ C (Fig. 1), the expressed protein was detected in the insoluble fractions using both the vectors. Also, maturation of the protein was not seen, as the over expressed protein was of higher molecular weight as seen in SDS-PAGE (Fig. 1). The mature protein was estimated to be of 20 kDa. The same findings were observed even at growth temperature of 28 ◦ C and 0.5 mM concentration of inducer. But, when the protein expression was checked at 16 ◦ C and 0.1 mM concentration, the protein was found to express in the soluble form using pET39b-pro-prot. The soluble mature protein was purified using Ni-NTA matrix (Fig. 2). The
Fig. 4. Protein expression using pET-22b at 16 ◦ C, with 0.1 mM IPTG induction. (a) 15% SDS PAGE showing Two clones (22b1 and 22b2) were checked for expression. Samples were loaded as followed 1: prestained protein ladder, 2: uninduced whole cell protein, 3: 22b1 soluble fraction, 4: 22b1 insoluble fraction, 5: 22b2 soluble fraction, 6: 22b2 insoluble fraction. (b) Western blot analysis with anti His tag antibody showing the expression using pET-22b as unprocessed form at 35 kDa in both soluble and insoluble fractions.
appearance of processed protease was monitored using SDS-PAGE where 20 kDa protein appeared within 4 h after induction (supplementary Fig. 2). Also, maturation of the protease was seen very clearly in western blot analysis (Fig. 3). Three stages of protease processing were seen. DsbA-pro-prot, first one with highest molecular weight around 91 kDa, then, pro-prot, was seen at 35 kDa, from which the DsbA tag was cleaved off but pro-region was still present in the expressed protein. Then, “prot” (N. protease) was seen roughly around 20 kDa from which both the prodomain and DsbA tag were cleaved off. The protein expressed using pET-22b was in soluble form at the growth temperature of 16 ◦ C and at 0.1 mM IPTG but post-translational processing of the enzyme was not seen as the overexpressed protein was seen at 35 kDa i.e., with the pro-region (Fig. 4). These observations suggested that protein was not properly folded/misfolded when expressed in pET22b and hence, was not processed by E. coli proteases. The DsbA tag in pET-39b was important in catalyzing disulphide bond formation in the protease, resulting in proper folding and thereby processing by E. coli proteases to form mature N. protease. Highest caseinolytic activity (110 U/mg) of the recombinant N. protease was observed at pH 9.0, 50 ◦ C (supplementary Fig. 3A and
Please cite this article in press as: Rohamare S, et al. Cloning, expression and in silico studies of a serine protease from a marine actinomycete (Nocardiopsis sp. NCIM 5124). Process Biochem (2015), http://dx.doi.org/10.1016/j.procbio.2014.12.025
G Model PRBI-10319; No. of Pages 10
ARTICLE IN PRESS S. Rohamare et al. / Process Biochemistry xxx (2015) xxx–xxx
5
Fig. 5. Protein sequence alignment in using program Clustal X. Protease in the present study has been named N. protease in the above diagram. The other proteases used for comparison were, from, N. dassonivillie [16], N. alba (Napase) [36] and Lysobacter enzymogenes (␣-lytic protease) [37]. The blue box indicates signal peptide, the red box indicates active protease and remaining portion is pro-domain. The conserved catalytic triad has been showed in green boxes. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
B). The enzyme activity could be significantly inhibited by specific serine protease inhibitor PMSF even at 25–50 M concentration (supplementary Fig. 3C). 3.1.1. In silico studies Initially, the sequence of N. protease was analyzed with Clustal X [30] (Fig. 5). It can be observed that the sequence of active protease region showed more homology than the pro-region to other proteases. Interestingly, it has been studied that the pro-regions of different extracellular proteases share structural similarity [31]
but their active protease domains might not be similar; for example, for alpha lytic protease, subtilisins and carboxypeptidase A, the secondary structure in the C-terminal portion of pro-region is helix-hairpin-helix-strand, with an extended region following the last strand. 3.2. Homology modeling A model for N. protease was generated taking crystal structure of NAPase (pdb id: 2OUA) as a template. NAPase is a kinetically
Please cite this article in press as: Rohamare S, et al. Cloning, expression and in silico studies of a serine protease from a marine actinomycete (Nocardiopsis sp. NCIM 5124). Process Biochem (2015), http://dx.doi.org/10.1016/j.procbio.2014.12.025
G Model PRBI-10319; No. of Pages 10 6
ARTICLE IN PRESS S. Rohamare et al. / Process Biochemistry xxx (2015) xxx–xxx
Fig. 6. (a) Model structure of N. protease generated using MODELLER program with 2OUA as a template. The molecule is colored dark blue at the N-terminus progressing to red at the C-terminus. Important structural elements are labeled (b) Model structure of pro-domain generated using 2PRO as a template. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
stable protein from N. alba studied for its high acid stability. The major structural elements of the protein have been labeled in Fig. 6a. The model evaluation parameters indicated that the model is of good quality (supplementary Figs. 4 and 5). The protein structure from the model can be visualized by the presence of two  barrels which are arranged perpendicular to each other and the residues of catalytic triad are split between the two barrels. These are the typical structural properties of PA-clan proteases [32]. The PA clan (Proteases of mixed nucleophile, superfamily A) is the largest group of proteases with common ancestry as identified by structural homology. Members of this clan have a chymotrypsinlike fold and similar proteolysis mechanisms but sequence identity of <10%. N. protease demonstrated sequence similarity with various extracellular proteases. The gene was also found to have signal sequence for extracellular secretion; therefore it was envisaged to be a member of S1A family of PA clan proteases. The peptidases
from S1A family are involved in extracellular protein turnover and are mainly present in eukaryotes with a limited distribution in plants, prokaryotes and archaea [33]. A model for prodomain was generated (Fig. 6b, supplementary Fig. 5) in which the prodomain can be seen as a C shaped molecule with two domains; C terminal and N terminal domains, attached with a hinge. 3.3. Phylogenetic analysis In the phylogenetic tree N. protease was seen clustered with NAPase from N. alba with significantly high bootstrap value (Fig. 7). ␣LP was seen in another cluster in the same clade. Both of these proteases are well studied as kinetically stable proteases. The thermodynamically stable protein, chymotrypsin was clustered in a totally different clade. Interestingly, although chymotrypsin and
Fig. 7. Phylogenetic tree for N. protease generated using MEGA 6.0 program. The kinetically stable proteases have been highlighted with squared boxes.
Please cite this article in press as: Rohamare S, et al. Cloning, expression and in silico studies of a serine protease from a marine actinomycete (Nocardiopsis sp. NCIM 5124). Process Biochem (2015), http://dx.doi.org/10.1016/j.procbio.2014.12.025
G Model PRBI-10319; No. of Pages 10
ARTICLE IN PRESS S. Rohamare et al. / Process Biochemistry xxx (2015) xxx–xxx
7
Fig. 8. Docking predictions of BAPNA bound to the catalytic site of N. protease. (A) Surface view of BAPNA with the catalytic binding pocket of the enzyme. (B) Cartoon view of the complex showing the hydrogen bonding between the receptor and ligand. The interacting residues have been labeled.
N. protease are the phylogenetically distinct enzymes, they share a common two -barrel architecture. 3.4. Interactions between enzyme and substrate A total of 20 docked complexes were generated out of which the one with the highest glide score and forming all possible interactions with the active site residues was selected as the best pose (Fig. 8). The catalytic mechanism of serine proteases require the carbonyl carbon of scissile bond (here, carbonyl carbon of para-nitro aniline group of BAPNA) to be near the active site serine residue [34]. Out of the 20 docked complexes, carbonyl carbon of BAPNA
was observed in close vicinity of nucelophile serine residue in 5 poses. Out of these five, the pose with highest glide score (−4.98) was selected as the best pose. The hydrogen bonding interactions were visualized that showed involvement of H57, S195 (residues from catalytic triad) and G216. Apart from these residues, other conserved interactions of ligand with R41 and S214 were also observed in 4 docked complexes. Molecular dynamics simulations were performed to check the stability of the docked complex. The substrate was found to be firmly bound in the binding pocket for the whole trajectory. The RMSD graph (Fig. 9A) (time in ps on the X-axis and RMSD values of the C␣ atoms in nm on the Y-axis) concluded the structure to be
Fig. 9. Molecular dynamics simulation studies on the docked complex between N. protease and BAPNA. (A) RMSD plot of the complex generated using MDS trajectory, (B) RMSF plot of the complex generated using MDS trajectory, (C) Number of hydrogen bond interaction between S195 and BAPNA during the whole trajectory.
Please cite this article in press as: Rohamare S, et al. Cloning, expression and in silico studies of a serine protease from a marine actinomycete (Nocardiopsis sp. NCIM 5124). Process Biochem (2015), http://dx.doi.org/10.1016/j.procbio.2014.12.025
G Model PRBI-10319; No. of Pages 10
ARTICLE IN PRESS S. Rohamare et al. / Process Biochemistry xxx (2015) xxx–xxx
8
stable over the complete trajectory. The fluctuations of BAPNA in binding site with respect to three key residues H57, S195 and G216 were calculated using g mindist program of GROMACS. The mindist graph (Fig. 9B) showed that BAPNA fluctuates in the binding pocket within a distance of only 0.15 nm for G216 and 0.2–0.4 nm for H57 and S195, considered in the analysis for the complete trajectory. This showed that it is bound firmly in the binding pocket. The life of hydrogen bond (Fig. 9C) between the catalytic S195 and BAPNA remained constant for the total simulation time. Negligible fluctuations were observed in the catalytic and binding site residues from the root mean square fluctuation (RMSF) plot (data not shown).
3.5. Thermal simulations High-temperature molecular dynamics (MD) unfolding simulations can be pursued for studying protein unfolding. As the unfolding rates for proteins can be very slow under physiological conditions (for instance it can be years for proteins like ␣LP), the unfolding can be accelerated into the ns range required for computational analysis using very high temperatures (450–500 K) for simulations [6]. The thermal simulations for N. protease were performed at 298 K and 500 K. The N. protease was found to be stable during 10 ns of MD simulations at 298 K. The C␣ RMSD for N. protease was plotted for each of the 3 simulations performed at 500 K (Fig. 10). The protein was found to be unfolded in each simulation based on visual inspection of trajectories (Fig. 11) and the high C␣ RMSD attained. It was speculated that the major unfolding occurs in the first 3 ns as high C␣ RMSDs were attained in this time period. The unfolding events were studied with a main focus on the N-terminal  strand, domain bridge and C-terminal  hairpin. The domain bridge, which is the only covalent linkage in the two domains, was found to be intact even after 4 ns of simulation. The area covered by the domain bridge has been found to be an important factor in deciding the cooperativity of unfolding of kinetically stable proteins [5].
Fig. 10. Unfolding of N. protease monitored with RMSD values at 500 K and 298 K. Quick unfolding of N. protease could be seen below 2 ns at 500 K.
The C-terminal  hairpin becomes more mobile from 0.7 ns but remains intact till 1.8 ns, while the N-terminal  strand seems to be more flexible and pulled away from the protein body. The Cterminal -barrel seems to unfold earlier than the N-terminal  barrel at 4 ns. At 7 ns very little residual structure remained. 3.6. P value analysis Cooperativity in thermal unfolding of ␣LP has been well studied using computational approach [6]. To examine, whether the high temperature unfolding behavior of N. protease is cooperative, we have performed P value analysis for both N. protease and ␣LP (Fig. 12). This phenomenon has been very well studied for ␣LP. In the present study, we have quantified the structural persistence during thermal simulation with respect to the reference structure i.e., the structure at 0 ns. The P value analysis was also performed at 298 K for comparison. At 500 K, there was a decrease in P value for both the proteins. But, this decrease was sharper for ␣LP compared to N. protease, indicating that N. protease followed comparatively gradual temperature unfolding. This gradual (less sharp) unfolding
Fig. 11. The unfolding pathway of N. protease at 500 K illustrated by selected structures from the high temperature MD simulations.
Please cite this article in press as: Rohamare S, et al. Cloning, expression and in silico studies of a serine protease from a marine actinomycete (Nocardiopsis sp. NCIM 5124). Process Biochem (2015), http://dx.doi.org/10.1016/j.procbio.2014.12.025
G Model PRBI-10319; No. of Pages 10
ARTICLE IN PRESS S. Rohamare et al. / Process Biochemistry xxx (2015) xxx–xxx
9
Fig. 12. Persistence analysis. Plots of P value for N. protease and alpha lytic protease, at 298 K and 500 K, plotted as a function of simulation time.
could also be seen in structural details (Fig. 11) where the domain bridge of ␣LP was seen to be quickly unfolded by 2.4 ns while for N. protease it gradually unfolded after 4 ns. This comparative gradual unfolding of N. protease suggested its lesser kinetic stability compared to ␣LP. In summary, highly GC rich gene sequence of a serine protease from Nocardiopsis sp. was codon optimized, cloned and expressed in E. coli. Comparative investigation of pet-22b (+) and pet-39b (+) vectors toward better expression and proper folding of N. protease suggested pet-39b (+) as a better vector for proteins containing disulfide bonds. N. protease was established as a member of S1A family of PA clan, whose members have limited distribution in prokaryotes. The study features phylogenetic relatedness of N. protease to kinetically stable proteins such as NAPase and ␣LP. The docking studies highlighted the important interactions between the catalytic site pocket and the substrate, BAPNA. The in silico studies performed to evaluate the kinetic stability of N. protease indicated its kinetically stable properties lesser when compared to ␣LP. Further investigations to reveal the experimental findings are currently being pursued. The findings herein are in line with the previous reports suggesting least explored non-streptomycetes actinomycetes sp. as potential source of enzymes with exceptional stabilities. Author contribution S.R. and S.M.G. conceived the idea; S.R. designed, analyzed and performed most of the experiments. V.B., D.J. and J.P. helped in cloning and expression. R.S. and P.C. conducted the computational studies. The manuscript was drafted by S.R. with contributions from S.M.G and J.P. Acknowledgements The authors are thankful to Dr. Neelanjana Sengupta, Physical Chemistry Division, Dr. R. Suresh Kumar, Div. Biochemical Sciences, NCL and Dr. Vaishali Dixit, A.G. College, Pune, for valuable suggestions and discussions. We are also thankful to Dr. Sushim Kumar Gupta, (Faculty of Medicine and Pharmacy, Aix Marseille University, Marseille, France) for his valuable inputs. Timely help from Dr. Laura Morris, Cardiff University is appreciated. S.R. acknowledges Commonwealth Commission, UK for the Commonwealth Split-site fellowship. S.R. is supported by Senior Research Fellowship from the CSIR, New Delhi, India. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.procbio.2014.12.025.
References [1] Sanchez-Ruiz JM. Protein kinetic stability. Biophys Chem 2010;148:1–15. [2] Cunningham EL, Agard DA. Interdependent folding of the N- and C-terminal domains defines the cooperative folding of ␣-lytic protease. Biochemistry 2003;42(45):13212–9. [3] Bryan NP. Prodomains and protein folding catalysis. Chem Rev 2002;102(12): 4805–16. [4] Jaswal SS, Sohl JL, Davis JH, Agard DA. Energetic landscape of ␣-lytic protease optimizes longevity through kinetic stability. Nature 2002;415:343–6. [5] Kelch BA, Eagen KP, Erciyas FP, Humphris EL, Thomason AR, Mitsuiki S, et al. Structural and mechanistic exploration of acid resistance: kinetic stability facilitates evolution of extremophilic behavior. J Mol Biol 2007;368:870–83. [6] Salimi NL, Ho B, Agard DA. Unfolding simulations reveal the mechanism of extreme unfolding cooperativity in the kinetically stable ␣-lytic protease. PLoS Comput Biol 2010;6:e1000689. [7] Peczynska-Czoch W, Mordarski M. Actinomycete enzymes. In: Actinomycetes in biotechnology. London: Academic Press; 1988. p. 220–83. [8] Mehta VJ, Thumar JT, Singh SP. Production of alkaline protease from an alkaliphilic actinomycete. Bioresour Technol 2006;97:1650–4. [9] Singh SP, Thumar JT, Gohel SD, Purohit MK. Molecular diversity and enzymatic potential of salt-tolerant alkaliphilic actinomycetes. In: Mendez A, editor. Current research, technology education topics in applied microbiology microbial biotechnology. Badajoz, Spain: Formatex Research Centre; 2010. p. 280–6. [10] Gohel SD, Singh SP. Purification strategies, characteristics and thermodynamic analysis of a highly thermostable alkaline protease from a salt-tolerant alkaliphilic actinomycete, Nocardiopsis alba OK-5. J Chromatogr B 2012;889–890: 61–8. [11] Wael HN, Wen-Jun L, Mohammed AIA, Hammouda O, Ahmed MS, Li-Hua X, et al. Nocardiopsis alkaliphila sp. nov., a novel alkaliphilic actinomycetes isolated from desert soil in Egypt. Int J Syst Evol Microbiol 2004;54:247–52. [12] Rohamare SB, Dixit V, Nareddy PK, Sivaramakrishna D, Swamy MJ, Gaikwad SM. Polyproline fold – in imparting kinetic stability to an alkaline serine endopeptidase. Biochim Biophys Acta 2013;1834:708–16. [13] Sone M, Akiyama Y, Ito K. Differential in vivo roles played by DsbA and DsbC in the formation of protein disulfide bonds. J Biol Chem 1997;272:10349–52. [14] Dixit VS, Pant A. Comparative characterization of two serine endopeptidases from Nocardiopsis sp. NCIM 5124. Biochim Biophys Acta 2000;1523:261–8. [15] Shivji MS, Rogers SO, Stanhope MJ. Rapid isolation of high molecular weight DNA from marine macroalgae. Mar Ecol Prog Ser 1992;84:197–203. [16] Sun H, Lapidus A, Nolan M, Lucas S, Del Rio TG, Tice H, et al. Complete genome sequence of Nocardiopsis dassonvillei type strain (IMRU 509). Stand Genomic Sci 2010;3(3):325–36. [17] Huang X, Madan A. CAP3: a DNA sequence assembly program. Genome Res 1999;9:868–77. [18] Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol 1990;215:403–10. [19] Eswar N, Marti-Renom MA, Webb B, Madhusudhan MS, Eramian D, Shen M, et al., Sali A. Comparative protein structure modeling with MODELLER. Curr Protoc Bioinform 2006;15:5.6.1–30. [20] Colovos C, Yeates TO. Verification of protein structures: patterns of nonbonded atomic interactions. Protein Sci 1993;2:1511–9. [21] Wiederstein M, Sippl MJ. ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins. Nucleic Acids Res 2007;35:W407–10. [22] Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. MEGA6: molecular evolutionary genetics analysis version 6.0. Mol Biol Evol 2013;30:2725–9. ¨ [23] Suite 2012: LigPrep, version 2.5. New York, NY: Schrodinger, LLC; 2012. [24] Friesner RA, Banks JL, Murphy RB, Halgren TA, Klicic JJ, Mainz DT, et al. Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J Med Chem 2004;47:1739–49. [25] van der Spoel D, Lindahl E, Hess B, Groenhof G, Mark AE, Berendsen HJ. GROMACS: fast, flexible and free. J Comput Chem 2005;26:1701–19.
Please cite this article in press as: Rohamare S, et al. Cloning, expression and in silico studies of a serine protease from a marine actinomycete (Nocardiopsis sp. NCIM 5124). Process Biochem (2015), http://dx.doi.org/10.1016/j.procbio.2014.12.025
G Model PRBI-10319; No. of Pages 10 10
ARTICLE IN PRESS S. Rohamare et al. / Process Biochemistry xxx (2015) xxx–xxx
[26] Humphrey W, Dalke A, Schulten K. VMD – visual molecular dynamics. J Mol Graph 1996;14:33–8. [27] Chatterjee P, Sengupta N. Effect of the A30P mutation on the structural dynamics of micelle-bound ␣ synuclein released in water: a molecular dynamics study. Eur Biophys J 2002;41(5):483–9. [28] Jose JC, Sengupta N. Molecular dynamics simulation studies of the structural response of an isolated Ab1–42 monomer localized in the vicinity of the hydrophilic TiO2 surface. Eur Biophys J 2013;42:487–94. [29] de Beer TA, Berka K, Thornton JM, Laskowski RA. PDBsum additions. Nucleic Acids Res 2014;42:D292–6. [30] Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, et al. Clustal W Clustal X version 2.0. Bioinformatics 2007;23:2947–8. [31] Baker D. Metastable states and folding free energy barriers. Nat Struct Biol 1998;5(12):1021–4.
[32] Rawlings ND, Barrett AJ, Bateman A. MEROPS: the peptidase database. Nucleic Acids Res 2010;38(Database issue):D227–33. [33] Page MJ, Di Cera E. Serine peptidases: classification, structure and function. Cell Mol Life Sci 2008;65:1220–36. [34] Kraut J. Serine proteases: structure and mechanism of catalysis. Annu Rev Biochem 1977;46:331–58. [36] Qiao J, Chen L, Li Y, Wang J, Zhang W, Chence S. Whole-genome sequence of Nocardiopsis alba strain ATCC BAA-2165, associated with honeybees. J Bacteriol 2012;194:6358–9. [37] Epstein DM, Wensink PC. The alpha-lytic protease gene of Lysobacter enzymogenes. The nucleotide sequence predicts a large prepro-peptide with homology to pro-peptides of other chymotrypsin-like enzymes. J Biol Chem 1988;263(32):16586–90.
Please cite this article in press as: Rohamare S, et al. Cloning, expression and in silico studies of a serine protease from a marine actinomycete (Nocardiopsis sp. NCIM 5124). Process Biochem (2015), http://dx.doi.org/10.1016/j.procbio.2014.12.025