Journal of Structural Biology xxx (xxxx) xxx–xxx
Contents lists available at ScienceDirect
Journal of Structural Biology journal homepage: www.elsevier.com/locate/yjsbi
Modeling membrane proteins: The importance of cysteine amino-acids Evgeni Grazhdankin1, Michal Stepniewski1, Henri Xhaard
⁎
Drug Research Program, Faculty of Pharmacy, Division of Pharmaceutical Chemistry and Technology, University of Helsinki, P.O. Box 56, FIN-00014 Helsinki, Finland
ARTICLE INFO
ABSTRACT
Keywords: Membrane proteins Molecular modeling Loop modeling Cysteine Disulphide bridges CASP GPCR dock GPCR
Computational modeling of membrane proteins is critical to understand biochemical systems and to support chemical biology. In this work, we use a dataset of 448 non-redundant membrane protein chains to expose a “rule” that governs membrane protein structure: free cysteine thiols are not found accessible to oxidative compartments such as the extracellular space, but are rather involved in disulphide bridges. Taking as examples the 1018 three-dimensional models produced during the GPCR Dock 2008, 2010 and 2013 competitions and 390 models for a GPCR target in CASP13, we show that this rule was not accounted for by the modeling community. We thus highlight a new direction for model development that should lead to more accurate membrane protein models, especially in the loop domains.
1. Introduction Membrane proteins are key gateways to the cell, acting in signal transduction, transport across membranes, as well as energetic processes such as photosynthesis or cellular respiration. Taken here as an example, G protein-coupled receptors (GPCRs) represent a prototypical family of integral membrane proteins of key pharmaceutical interest, with nearly 300 non-olfactory rhodopsin-like members in human (Fredriksson and Schiöth, 2005; Rinne et al., 2019). The experimental structure determination rate of membrane proteins is much slower than that of soluble proteins, even in the light of recent progresses, which is due to difficulties at all levels of expression, purification and crystallization (Grisshammer, 2017; Maeda and Schertler, 2013). Consequently, although membrane proteins constitute as much as 30% of the human genome (Wallin and von Heijne, 1998), to date, the 2839 coordinate files for three-dimensional structures solved (White, 2018) represent less than 2.0% of the 144,277 protein coordinate files in the Protein Data Bank (Berman et al., 2000); and a similar order of magnitude is observed when comparing the 943 unique integral membrane proteins solved (White, 2018) and the 33,819 clusters of protein chains in PDB30 (accession September 2019). In order to gain insights at the atomic level in membrane protein function, structural modeling has thus been widely used during the last 30 years (see e.g, (Xhaard et al., 2008, 2005); for review see (Tusnády and Simon, 2010)). Membrane protein modeling essentially follows either ab initio or template-based (homology) strategies. To improve methodologies, new protocols are often tested retrospectively from 3D
structures already available, which may bias their assessment. A prospective way to assess model construction and ligand docking has thus been implemented in the Critical Assessment of Structure Prediction (CASP) challenges since 1994 (Moult et al., 2018) and in Critical Assessment of GPCR Structure Modeling and Docking (GPCR Dock) 2008, 2010, and 2013 (Kufareva et al., 2014, 2011; Michino et al., 2009). While common quality assessment methods are able to assert the general physical correctness in terms of hydrogen bonding, secondary structure, solvent exposure and pairwise residue interactions (Kryshtafovych and Fidelis, 2009), scoring models that are close to native is still beyond our reach (Ray et al., 2010). Yet membrane proteins have assets over their soluble counterparts that should help their structural modeling. Integral membrane proteins are embedded in an approximately 40 Å thick cell membrane, providing many constraints for topology prediction and 3D structure modeling: Firstly membrane proteins usually fold into bundles of α-helices in order to maximize the number of satisfied intramolecular hydrogen bonds (Cowan and Rosenbusch, 1994). Porins, exceptions that are folded as a transmembrane beta-barrel, are unique in such that they are located in the outer membrane of gram negative bacteria. Secondly, transmembrane segments (TMs) are most often composed of hydrophobic amino acids that mirror the hydrophobic environment of the membrane (Kyte and Doolittle, 1982). Thirdly, the intracellular region of membrane proteins has an excess of positively charged amino acids compared to the extracellular region (the “positive inside rule”) (von Heijne and Gavel, 1988). In addition, some amino acids have a strong preference for certain locations within the
Corresponding author at: Biocenter 2, room 3084, Viikinkaari 5E, 00014 Helsinki, Finland. E-mail address:
[email protected] (H. Xhaard). 1 Equal contribution. ⁎
https://doi.org/10.1016/j.jsb.2019.10.002 Received 9 April 2019; Received in revised form 11 September 2019; Accepted 3 October 2019 1047-8477/ © 2019 Published by Elsevier Inc.
Please cite this article as: Evgeni Grazhdankin, Michal Stepniewski and Henri Xhaard, Journal of Structural Biology, https://doi.org/10.1016/j.jsb.2019.10.002
Journal of Structural Biology xxx (xxxx) xxx–xxx
E. Grazhdankin, et al.
membrane; for example prolines are often located near the middle of the membrane plane and at the extremities of TMs, reflecting their involvement in helical kinks and in capping alpha-helices (Huang and Chen, 2012). Membrane proteins are furthermore often packed into domains organized pseudo-symmetrically (Choi et al., 2008; Goodsell and Olson, 2000). Stemming from early work on co-evolution of spatially interacting amino acids, recent years have seen progress in the introduction of in particular evolutionary information into modeling protocols (Altschuh et al., 1987; DeBartolo et al., 2009; Ovchinnikov et al., 2017, 2015; Wu et al., 2011). These constraints have long been thought to allow the prediction of their 3D structures from sequence alone. This seems yet, however, only a distant goal. Accurate prediction of 3D structure from sequence is hindered by, for example, hydrophobic segments that enter the membrane without crossing it, folding into reentrant loops (Yan and Luo, 2010), or polar TMs that are found inside α-helical bundles (Hedin et al., 2010). Thus, there is a need for discovering new strategies to produce more native-like integral membrane protein structures. An hypothesis that we decided to test in this study is whether extracellular cysteines having a free (solvent-exposed) thiol group are not found in membrane protein structures. If the hypothesis held true, a new “rule” can be highlighted that could help identify errors on membrane protein models. Furthermore, new methods could be developed that force disulphide bonds to form, providing additional constraints during the modeling of the extracellular domains of membrane proteins.
the membrane. The Orientation of Proteins in Membranes (OPM) database (Lomize et al., 2012) was in parallel queried for α -helical polytopic and β-barrel transmembrane structures. We combined the structures from the two sources by the PDB codes and the number of cysteines, thus avoiding dissimilar oligomer reconstruction in the databases. The structures were then split by chains and reconstituted across the databases for membrane correspondence (e.g. a tight-junction protein can span two membranes). Every chain had to have at least one residue with CA’s z-coordinate in the OPM-defined membrane region. Furthermore, we applied a PDB30 filter thus enforcing a maximum of 30% sequence identity between any pair of chains, and a resolution cutoff of 3.5 Å. Some numerical information on the dataset is provided as Supporting Information Figs. 1–3. The cleaned dataset will be provided upon reasonable request. In addition, we collected structural models and their associated data submitted to GPCR Dock editions in 2008, 2010, and 2013 (GPCR Dock 2008 models, GPCR Dock 2010 models, GPCR Dock 2013 models), and CASP13 (Moult et al., 2018). In GPCR Dock 2008 and 2010, prior to the release of the X-ray structures of the human adenosine 2 (hAA2AR) (PDB code 3EML), dopamine 3 (hD3DR; 3PBL) and chemokine 4 receptor (hCXCR4; 3OE8 and 3ODU), 29 and 32 groups submitted in total 206 and 284 structural models. In GPCR Dock 2013, 40, 39, 20 and 20 groups submitted, respectively, 181, 171, 88 and 88 models of serotonin 1B (5HT1B; 4IAR), 2B (5HT2B; 4IB4) and two SMO receptor ligand-receptor complexes (4JKV, 4N4W). The maximum number of submissions was 10 models per target per group in GPCR Dock 2008 and five models per target per group in GPCR Dock 2010 and 2013. After closing of the competitions the submissions were scored and ranked according to measures such as the amount of reproduced protein–ligand contacts and RMSD to the respective experimental structures. The CASP competitions (Moult et al., 2018), rounds 9–13, were also searched for relevant integral membrane protein targets using a sequence-based TM prediction tool (Krogh et al., 2001). As a result we identified two GPCR targets, T1011 and T1013 in CASP13. While no structure was available for T1013 at the time of writing this manuscript, T1011 corresponds to the prostaglandin EP3 receptor bound to misoprostol-FA (PDB code 6M9T) and was taken for further analysis.
2. Material and methods 2.1. Data collection and analysis Data collection and analysis was performed using in-house Python 3.7 scripts (Python Software Foundation) mainly utilizing pandas library (v0.25, McKinney 2010). Visualizations were done with matplotlib (v3.1, Hunter 2007) or seaborn (v0.9, Waskom), and PyMOL (v2.0, PyMOL). Protein structures were handled using Biopython (v1.74, Cock et al., 2009). 2.2. Membrane protein structural data
2.3. Assigning classes to cysteines and orientation of 3D structures along zcoordinate
The workflow on data collection and processing is presented in Fig. 1. We collected a dataset of 448 non-redundant membrane protein chains with at least one cysteine amino acid originating from 1142 PDB files. We explicitly accounted for the membrane for solvent exposure calculations. In order to do so, we used as a starting point the MemProtMD (Newport et al., 2019) database which provides snapshots of coarse-grained molecular dynamics simulations. Solvent accessibility (see below) was calculated on the whole structures with and without
We assigned interfaces and domains to compartments using the OPM database, itself taken from literature or from Uniprot (Lomize et al., 2012). In OPM, the protein z-coordinate is normal to the membrane plane and the coordinate z = 0 coincides with the center of the membrane. This provides a straightforward indication about the depth of burial of amino acids and their location with respect to the lipids (tail, headgroup).
Fig. 1. Workflow of data acquisition and curation. 2
Journal of Structural Biology xxx (xxxx) xxx–xxx
E. Grazhdankin, et al.
Fig. 2. Comparison of the solvent exposition of cysteine (A) and serine (B) amino acids. For ease of comparison between (A) and (B), both density graphs are modeled so that the area under the curve is set to an arbitrary value (1.0). Two sets of structures have been used, membrane-embedded (orange) and membrane-free (blue).
Fig. 3. Number cysteines as a function of the z-coordinate (the normal to the membrane plane; 0 indicates the center of the membrane). A: Dataset divided by membrane classes reported in OPM. B: Cysteines assigned to classes based on their contacts, or lack thereof, to other molecules.
Cysteine residues within 4.0 Å from any ligand atom (HET atoms in a PDB file) were classified as ligand-facing. Cysteine pairs were defined as disulfide-bonded where the thiol sulphur atomic distance was less than 2.50 Å. Cysteine was considered lipid-facing if the difference in the relative SASA with and without membrane was in excess of 0.10. The remaining cysteines were considered to be free and either exposed or buried.
The solvent accessible surface of cysteine side-chains was estimated using default parameters in the software NACCESS, which utilizes a rolling-ball algorithm with 1.4 Å probe (Hubbard and Thronton, 1993) or by FreeSASA (Mitternacht, 2016) with NACCESS parameters. Values reported are side-chain surface area relative to the Ala-X-Ala tripeptide. Cysteines were defined to be solvent exposed if their relative SASA value was at least 0.30 when calculated with the membrane in place. 3
Journal of Structural Biology xxx (xxxx) xxx–xxx
E. Grazhdankin, et al.
Table 1 Number of cysteine amino acids in different states across locations within protein for the α-helical polytopic and β-barrel transmembrane proteins studied. Domain
Total number of cysteines in domain
Disulfide bonded
Buried
Exposed
Ligand-facing
Lipid-facing
Cytoplasmic (z < −20 Å) Within Membrane (−20 Å ≤ z ≤ 20 Å) Oxidative Milieu (z > 20 Å)
415 1185 457
15 70 309
281 633 49
61 36 22
48 82 76
10 364 1
Fig. 4. Histogram showing the number of solvent accessible cysteines in models submitted to GPCR Dock (2008, 2010) competitions as a function of the z-coordinate. Left panels, exposed cysteines; right panels, buried cysteines. (A) hAA2AR with zm241385; (B) hD3DR with eticlopride; (C) hCXCR4 with It1t; (D) hCXCR4 with CVX15. Color coding: cysteines with free thiol group not facing lipid membrane (orange); less than 5 Å from ligand (green); or facing lipid membrane (red). Cysteines with bonded thiol group involved in disulphide bridge (dark blue). The red boxes indicate regions where extracellular free cysteines should not be found.
3. Results and discussion
as the extracellular space or the rough endoplasmic reticulum, cysteines are not accessible, being buried inside the protein (Daniels et al., 2010; Marino and Gladyshev, 2010). As an alternative to burial, surface cysteine amino acids can be involved in post-translational modifications such as disulphide bridges to avoid exposing their thiol groups (Raina and Missiakas, 1997). Disulphide bridges furthermore increase the stability of secretory proteins (Medraño-Fernandez et al., 2014). More complex mechanisms can exist, for example reversible disulphide bonds can serve as redox switches sensing oxidative stress (Vázquez-Torres, 2012). Burial or covalent binding of cysteine thiols in the exoplasmic domain of membrane proteins can be easily verified (Figs. 2, 3, Table 1). To illustrate this concept, we acquired a dataset of 448 non-redundant membrane protein chains with at least one cysteines found in 1142 PDB files and classified according to their cellular compartments. We first compared in proteins from our dataset the relative solvent exposition (SASA) of serine and cysteine amino acids (Fig. 2). Serine was chosen for comparison because it has the same number of non-hydrogen atoms
3.1. Alpha helical integral membrane proteins do not contain free cysteine thiols accessible to the extracellular milieu One “rule” governing the structure of membrane proteins that is often not taken into account by protein modelers is that, only very exceptionally, free cysteine thiols are accessible to the extracellular milieu. As a strong nucleophile, the cysteine thiol sidechain functional group can exist in a reduced and an oxidized state. This allows for performing diverse biochemical functions such as catalysis or ligand and ion binding (Miseta and Csutora, 2000). Oxidized cysteines are however reactive and their inadvertent reaction can handicap protein function (Guttmann and Powell, 2012; Marino and Gladyshev, 2010). In the cytoplasm, pathways such as glutathione GSH/GSSG (in Eukaryota) or thioredoxin system (Prokaryota) reduce these oxidized thiol species (Carmel-Harel and Storz, 2000), protecting the cell. In contrast, in oxidative compartments such 4
Journal of Structural Biology xxx (xxxx) xxx–xxx
E. Grazhdankin, et al.
Fig. 5. GPCR Dock 2008 and 2010, targets and best models. (A, D, G) Protein targets’ X-ray structures; (B, E, H) best models in terms of ECL2 RMSD; and (C, F, I) best models in terms of ECL2 RMSD with most correct bridge predictions. (A, B, C) Human adenosine 2A receptor; (D, E, F) human dopamine D3 receptor; (G, H, I) human chemokine CXCR4 receptor. Viewed from the extracellular side. ECL1 is shown in magenta, ECL2 in green, ECL3 in purple and the N-terminus in blue. Extracellular cysteines are colored orange and the remaining are gray.
as cysteine. Generally, serines are more prevalent than cysteines (which is well known). Both amino acids side chains are usually more buried (SASA less that 1.0) than in a canonical Ala-Ser-Ala or Ala-Cys-Ala
tripeptide. Comparing the distributions of SASA for these two amino acids in our integral membrane protein dataset, for cysteines the distribution appears skewed towards lower SASA, thus qualitatively 5
Journal of Structural Biology xxx (xxxx) xxx–xxx
E. Grazhdankin, et al.
Table 2 Correct and incorrect pairings of cysteines in molecular models submitted to GPCR Dock 2008 and 2010. hAA2AR (n = 206 models) Correct Pairings
Incorrect Pairings Total
0 1 ≥2
hCXCR4 (n = 166) Correct Pairings
hD3DR (n = 118) Correct Pairings
0
1
2
3
0
1
2
0
1
2
89 17 0 106
74 12 5 91
5 0 0 5
4 0 0 4
57 0 0 57
64 0 0 64
45 0 0 45
26 1 0 27
69 0 0 69
22 0 0 22
Table 3 Correct and incorrect pairings of cysteines in molecular models submitted to GPCR Dock 2013. 5HT1B (n = 181) Correct Pairings
Incorrect Pairings Total
0 1 ≥2
5HT2B (n = 171) Correct Pairings
SMO/LY-2940680 (n = 88) Correct Pairings
SMO/SANT-1 (n = 88) Correct Pairings
0
1
2
0
1
2
0
1
2
0
1
2
56 12 0 68
51 62 0 113
0 0 0 0
63 0 0 63
47 0 0 47
61 0 0 61
51 5 0 56
9 5 10 24
8 0 0 8
53 1 0 54
11 6 10 27
7 0 0 7
cysteines are more buried than serines (Fig. 2). To study the dataset in more details, we took advantage of the orientation and centering of the proteins along the normal to the membrane plane (along z-coordinate, z = 0 being the center of the membrane) (Fig. 3). Out of the 2057 cysteines present in the dataset, 415 were located in the cytosol (z-coordinate less than −20 Å), 1185 within the membrane (−20 Å ≤ z ≤ 20 Å), and 457 in the exoplasmic domain or an equivalent oxidizing cellular compartment (e.g. extracellular milieu, periplasmic space, mitochondrial intermembrane space as in Fig. 3; z > 20 Å). Out of these later 457 cysteines, 408 are accessible to protein surface and of these only 22 are free (not disulphide bonded, not facing lipid or ligand; listed in Supporting Information Table 1). As previously described, the vast majority of cysteine amino acids were found in the protein core, i.e. 1938 out of 2057 cysteine amino acids, not accessible to solvent. This illustrates well both the hydrophobic character of cysteine amino-acids and the need for avoiding oxidizing compartments. 3.2. Molecular models in GPCR Dock and CASP13 contain many of extracellularly-exposed free cysteine thiols In contrast, structural models submitted to GPCR Dock competitions in 2008 and 2010 displayed generally a considerable fraction of extracellularly-exposed free cysteine thiols (Fig. 4). Taking together sets of models built by the modelling community, for the hAA2AR 555 times a cysteine was proposed to be extracellular and out of these 389 (70%) were solvent accessible and not-involved in a disulphide bridge. The equivalent numbers for the hD3DR and for the hCXCR4/It1t are 139/ 167 (83%) and 106/114 (93%) respectively, while for the hCXCR4/ CVX15 models the peptidic ligand protects the cysteines from being solvent accessible. Incomplete modeling is not an explanation for these observations since only few models (9/206 hAA2AR models, 4/118 dopamine D3 receptor models) lack a significant fragment of ECL2 (between 9 and 29 amino acids) that could expose an extracellular free cysteine thiols to the solvent. The native GPCR X-ray structures on the other hand contain no free solvent accessible cysteines: All extracellular cysteines are involved in disulphide bridges within or across protein loops (Fig. 5, panels A, D and G). The targets share a bridge between the second extracellular loop 2 and TM3 (C77-C166 in hAA2AR, C103-C181 in hD3DR, C109C186 in hCXCR4), which was widely suggested to be conserved across GPCRs at time of competition (Kufareva et al., 2011). This bridge was
Fig. 6. Calculated all-atom RMSD values for the whole structures and TM-regions of all models (n = 390) of T1011 in CASP13. Models with RMSD in excess of 100 were dismissed. Inset zooms in the lower RMSD values. In blue the whole structure, and orange only the transmembrane helices. Table 4 Correct and incorrect pairings of cysteines in CASP13 for the prostaglandin E3 receptor (target T1011). Prostaglandin EP2 (n = 390 models) Correct Pairings
Incorrect Pairings
0 1
0
1
269 1
113 7
6
Journal of Structural Biology xxx (xxxx) xxx–xxx
E. Grazhdankin, et al.
Fig. 7. CASP13, target T1011, EP3 receptor. (A) X-ray structure (PDB code 6M9T), (B) best model T1011TS043_1. Viewed from the extracellular side. ECL1 is shown in magenta, ECL2 in green, ECL3 in purple and N-terminus in blue. Extracellular cysteines are colored orange and the remaining are gray. The amino acid sequence is also shown, stressing that a portion of the N-terminus including two cysteines is missing (bottom panel).
placed in only 84/206 models hAA2AR (~40% prediction rate), in 90/ 118 models of hD3DR (76%) and 107/166 models of hCXCR4 (64%). Both hAA2AR and hD3DR have a bridge within ECL3 (hAA2AR in C259-C262, hD3DR in C355-C358) bridging two cysteines close in sequence. There was little evidence for this ECL3 bridge in hAA2AR at the time of the competition, and it was successfully placed in 17/206 models; two years later, where the hAA2AR could be used to infer the bridge, 23 models of hD3DR (19%) successfully predicted an equivalent bridge. The most difficult bridges have been the ones occurring over long distances. The hCXCR4 contains a bridge between the N terminus segment and loop ECL3 (C28-C274) and that bridge was successfully placed in 47/166 models (28%). In the hAA2AR, there are two bridges between ECL1 and ECL2, in addition to the conserved bridge between the top of TM3 and ECL2 (C71-C159 and C74-C146). These proved most challenging and were successfully placed in 7 and 5 models respectively (3% and 2% prediction rate). The ECL1-ECL2 bridges of hADORA2A are not widely conserved across sequences, which made their prediction even more difficult: the C71-C159 bridge is conserved within vertebrates and C74-C146 only within the mammalian ADORA2A subtype, where the group share 90% TM percent identity. Notably, four models
contained three correctly predicted bridges for the hA2AAR (participant Pogozheva and Lomize: ECL2-TM3 and the two ECL1-ECL2 bridges). A matrix quantifying the success of pairing the actual and predicted bridges can be seen in Tables 2 and 3. For example, concerning the hAA2AR models from GPCR Dock 2008 in Table 2 (n = 206 models), 89 models have been submitted without any disulphide bridges (0 correct and 0 incorrect pairings), 74 models with one bridge that was correct (most likely the TM3-ECL2 bridge), and five models have been submitted with one correct pairing and two or more disulphide bridges not found in the X-ray structure. In order to get a recent point of view on the modeling practices, we also turned to the CASP13 experiment (Moult et al., 2018). While there were about ten integral membrane protein structures targets in the recent CASPS (9 to 13), a single GPCR having a released structure was found, the prostaglandin EP3 receptor bound to misoprostol-FA (Target T1011, PDB code 6M9T). We decided to investigate this structure as a hallmark of the CASP membrane protein models. The data provided at the CASP13 website on the local model-target RMSD along the sequence (http://predictioncenter.org/casp13/local_acc_plot.cgi?target= T1011 (accessed 28.08.19)), together with global data on RMSD (Fig. 6) clearly indicates that a template (homology) modeling procedure was 7
Journal of Structural Biology xxx (xxxx) xxx–xxx
E. Grazhdankin, et al.
Fig. 8. Histogram of the distribution of TM backbone and ECL2 RMSD for GPCR Dock 2008/2010 models of hAA2AR (A), hD3DR (B), and hCXCR4 (C). Models with 3 (blue), 2 (cyan),1 (green), and no (orange) bridges correctly constructed.
used for only about two-thirds of the models. The transmembrane helices were then – not surprisingly – the best modelled regions (Fig. 6). The prostaglandin EP3 receptor is a Class-A GPCR that harbors the hallmark disulphide bridge between ECL2 and TM3 (C130-C208). The CASP community seemed less aware on the very likely conservation of this bridge than the GPCR Dock community, and it was placed in only 30% (120/390) models (Table 4), about twice less than for the latest GPCR Dock competition. This highlights the limited progress in accounting disulphide bridges into modeling protocols made by the community as a whole, as well as the importance of family-specific knowledge for modeling. The conserved bridge was actually missing from the best model (measured using the GDT_TS metric (Zemla, 2003). Fig. 6) so accounting for it could have pushed a little further the success of the modeling exercice. In addition, the target T1011 is also interesting because it contains two solvent exposed cysteines, C47 and C325. A careful consideration of its amino acid sequence (Fig. 7; for the sequence in the PDB file, see Uniprot code P43115 for the native receptor) shows that there are two other cysteines that should reside in the extracellular domain and thus could form duslphide bridges: C14 (N-terminus) and C315 (EC3). This study set the ground to speculate that these cysteines could be paired.
indicators (TMs and ECL2 RMSD) given in GPCR Dock on a broader scale, there is no apparent trend towards models with correct bridges to be comparatively better than models where bridges have not been correctly paired (Fig. 8). Constructing disulphide bridges correctly is therefore not sufficient to drive molecular models so that they outperform those molecular models that have not done so; it is neither a guarantee about model correctness in terms of backbone RMSD in TMs nor in ECL2. However, and by definition, properly pairing cysteine amino acids do improve the accuracy of molecular models. Furthermore, pairing disulphide bridges could be developed as a valuable strategy to guide the folding of extracellular domains of membrane proteins. 4. Conclusions In this manuscript, we introduce a rule that has in the past not generally been accounted for by the modeling community: the thiol groups of cysteine amino acids should not be left free and exposed to the extracellular milieu. Instead, cysteine amino acids should be buried in the protein interior or paired into disulphide bridges, or covalently attached to other interacting molecules. The rule we present in this paper should serve be a guideline for building molecular models; it is relatively easy to implement and should help to drive efforts for better prediction of the extracellular domain of membrane proteins.
3.3. How well properly pairing extracellular cysteines coincide with model accuracy? We last compared, for the GPCR Dock 2008 and 2010 competitions, the best models in terms of ranking (RMSD in the second extracellular loop ECL2) with those that predicted all or most bridges (Fig. 8). For the human adenosine 2A receptor, model mod9ijk, without correct bridge predictions, attains 5.7 Å RMSD in ECL2 with respect to X-ray while mod2upu with three correct predictions has an ECL2 RMSD to X-ray of 8.9 Å. For the human dopamine D3 receptor, D3_4374_0003 without correct predictions has about the same RMSD performance as D3_3646_0004 with two correct predictions (3.3 Å and 3.7 Å ECL2 RMSD). For the human chemokine CXCR4 receptor CXCR4_1_0400_0002, with one correct prediction has a ECL2 RMSD to X-ray of 7.5 Å while CXCR4_1_7533_0004 with no correct predictions achieve better with 5.3 Å ECL2 RMSD to X-ray. As a result, there seems to be no link between RMSD as an indicator of model correctness and the correct prediction of loop bridges. Considering the quality
Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgments We thank the DDCB consortium and CSC - IT Center for Science Ltd supported by HiLIFE and Faculty of Pharmacy, University of Helsinki. Professor Mark S. Johnson (Åbo Akademi University, Finland) is thanked for helpful discussions. 8
Journal of Structural Biology xxx (xxxx) xxx–xxx
E. Grazhdankin, et al.
Funding
Evaluated by the GPCR Dock 2013 Assessment: Meeting New Challenges. Structure 22, 1120–1139. https://doi.org/10.1016/J.STR.2014.06.012. Kufareva, I., Rueda, M., Katritch, V., Stevens, R.C., Abagyan, R., GPCR Dock 2010 participants, 2011. Status of GPCR Modeling and Docking as Reflected by Communitywide GPCR Dock 2010 Assessment. Structure 19, 1108–1126. https://doi.org/10. 1016/j.str.2011.05.012. Kyte, J., Doolittle, R.F., 1982. A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157, 105–132. Lomize, M.A., Pogozheva, I.D., Joo, H., Mosberg, H.I., Lomize, A.L., 2012. OPM database and PPM web server: resources for positioning of proteins in membranes. Nucleic Acids Res. 40, D370–D376. https://doi.org/10.1093/nar/gkr703. McKinney, W., 2010. Data Structures for Statistical Computing in Python. In: Proceedings of the 9th Python in Science Conference, pp. 51–56. Maeda, S., Schertler, G.F., 2013. Production of GPCR and GPCR complexes for structure determination. Curr. Opin. Struct. Biol. 23, 381–392. https://doi.org/10.1016/J.SBI. 2013.04.006. Marino, S.M., Gladyshev, V.N., 2010. Cysteine Function Governs Its Conservation and Degeneration and Restricts Its Utilization on Protein Surfaces. J. Mol. Biol. 404, 902–916. https://doi.org/10.1016/j.jmb.2010.09.027. Medraño-Fernandez, I., Fagioli, C., Mezghrani, A., Otsu, M., Sitia, R., 2014. Different redox sensitivity of endoplasmic reticulum associated degradation clients suggests a novel role for disulphide bonds in secretory proteins. Biochem. Cell Biol. 92, 113–118. https://doi.org/10.1139/bcb-2013-0090. Michino, M., Abola, E., Brooks, C.L., Dixon, J.S., Moult, J., Stevens, R.C., Stevens, R.C., 2009. Community-wide assessment of GPCR structure modelling and ligand docking: GPCR Dock 2008. Nat. Rev. Drug Discov. 8, 455–463. https://doi.org/10.1038/ nrd2877. Miseta, A., Csutora, P., 2000. Relationship Between the Occurrence of Cysteine in Proteins and the Complexity of Organisms. Mol. Biol. Evol. 17, 1232–1239. https:// doi.org/10.1093/oxfordjournals.molbev.a026406. Mitternacht, S., 2016. FreeSASA: an open source C library for solvent accessible surface area calculations. F1000Research 5, 189. doi:10.12688/f1000research.7931.1. Moult, J., Fidelis, K., Kryshtafovych, A., Schwede, T., Tramontano, A., 2018. Critical assessment of methods of protein structure prediction (CASP)-Round XII. Proteins Struct. Funct. Bioinforma. 86, 7–15. https://doi.org/10.1002/prot.25415. Newport, T.D., Sansom, M.S.P., Stansfeld, P.J., 2019. The MemProtMD database: a resource for membrane-embedded protein structures and their lipid interactions. Nucleic Acids Res. 47, D390–D397. https://doi.org/10.1093/nar/gky1047. Ovchinnikov, S., Kinch, L., Park, H., Liao, Y., Pei, J., Kim, D.E., Kamisetty, H., Grishin, N.V., Baker, D., 2015. Large-scale determination of previously unsolved protein structures using evolutionary information. Elife 4. https://doi.org/10.7554/eLife. 09248. Ovchinnikov, S., Park, H., Varghese, N., Huang, P.-S., Pavlopoulos, G.A., Kim, D.E., Kamisetty, H., Kyrpides, N.C., Baker, D., 2017. Protein structure determination using metagenome sequence data. Science 355, 294–298. https://doi.org/10.1126/science. aah4043. PyMOL. The PyMOL Molecular Graphics System, Version 2.0 Schrödinger, LLC. Python Software Foundation. Python Language Reference, version 3.7. Available at http://www.python.org (accessed 11.09.19). Raina, S., Missiakas, D., 1997. Making and breaking disulfide bonds. Annu. Rev. Microbiol. 51, 179–202. https://doi.org/10.1146/annurev.micro.51.1.179. Ray, A., Lindahl, E., Wallner, B., 2010. Model quality assessment for membrane proteins. Bioinformatics 26, 3067–3074. https://doi.org/10.1093/bioinformatics/btq581. Rinne, M., Tanoli, Z.-U.-R., Khan, A., Xhaard, H., 2019. Cartography of rhodopsin-like G protein-coupled receptors across vertebrate genomes. Sci. Rep. 9, 7058. https://doi. org/10.1038/s41598-018-33120-8. Tusnády, G.E., Simon, I., 2010. Topology prediction of helical transmembrane proteins: how far have we reached? Curr. Protein Pept. Sci. 11, 550–561. Vázquez-Torres, A., 2012. Redox Active Thiol Sensors of Oxidative and Nitrosative Stress. Antioxid. Redox Signal. 17, 1201–1214. https://doi.org/10.1089/ars.2012.4522. von Heijne, G., Gavel, Y., 1988. Topogenic signals in integral membrane proteins. Eur. J. Biochem. 174, 671–678. Wallin, E., von Heijne, G., 1998. Genome-wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms. Protein Sci. 7, 1029–1038. https://doi.org/10.1002/pro.5560070420. Waskom, M., 2019. Seaborn. Available at http://seaborn.pydata.org (accessed 11.09.19). White, S., 2018. mpstruc. http://blanco.biomol.uci.edu/mpstruc (accessed 7.9.18). Wu, S., Szilagyi, A., Zhang, Y., 2011. Improving Protein Structure Prediction Using Multiple Sequence-Based Contact Predictions. Structure 19, 1182–1191. https://doi. org/10.1016/j.str.2011.05.004. Xhaard, H., Backström, V., Denessiouk, K., Johnson, M.S., 2008. Coordination of Na + by Monoamine Ligands in Dopamine, Norepinephrine, and Serotonin Transporters. J. Chem. Inf. Model. 48, 1423–1437. https://doi.org/10.1021/ci700255d. Xhaard, H., Nyrönen, T., Rantanen, V.-V., Ruuskanen, J.O., Laurila, J., Salminen, T., Scheinin, M., Johnson, M.S., 2005. Model structures of α-2 adrenoceptors in complex with automatically docked antagonist ligands raise the possibility of interactions dissimilar from agonist ligands. J. Struct. Biol. 150, 126–143. https://doi.org/10. 1016/j.jsb.2004.12.008. Yan, C., Luo, J., 2010. An Analysis of Reentrant Loops. Protein J. 29, 350–354. https:// doi.org/10.1007/s10930-010-9259-z. Zemla, A., 2003. LGA: a method for finding 3D similarities in protein structures. Nucleic Acids Res. 31, 3370–3374. https://doi.org/10.1093/nar/gkg571.
The National Doctoral Programme in Informational and Structural Biology is thanked for financial support to M.S. Author contributions The study was conducted through contributions of all authors. All authors read and approved the final version of the manuscript. Appendix A. Supplementary data Supplementary data to this article can be found online at https:// doi.org/10.1016/j.jsb.2019.10.002. References Altschuh, D., Lesk, A.M., Bloomer, A.C., Klug, A., 1987. Correlation of co-ordinated amino acid substitutions with function in viruses related to tobacco mosaic virus. J. Mol. Biol. 193, 693–707. https://doi.org/10.1016/0022-2836(87)90352-4. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E., 2000. The Protein Data Bank. Nucleic Acids Res. 28, 235–242. https://doi.org/10.1093/nar/28.1.235. Carmel-Harel, O., Storz, G., 2000. Roles of the Glutathione- and Thioredoxin-Dependent Reduction Systems in the Escherichia Coli and Saccharomyces Cerevisiae Responses to Oxidative Stress. Annu. Rev. Microbiol. 54, 439–461. https://doi.org/10.1146/ annurev.micro.54.1.439. Choi, S., Jeon, J., Yang, J.-S., Kim, S., 2008. Common occurrence of internal repeat symmetry in membrane proteins. Proteins Struct. Funct. Bioinforma. 71, 68–80. https://doi.org/10.1002/prot.21656. Cock, P.J.A., Antao, T., Chang, J.T., Chapman, B.A., Cox, C.J., Dalke, A., Friedberg, I., Hamelryck, T., Kauff, F., Wilczynski, B., de Hoon, M.J.L., 2009. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423. Cowan, S.W., Rosenbusch, J.P., 1994. Folding pattern diversity of integral membrane proteins. Science 264, 914–916. Daniels, R., Mellroth, P., Bernsel, A., Neiers, F., Normark, S., von Heijne, G., HenriquesNormark, B., 2010. Disulfide Bond Formation and Cysteine Exclusion in Gram-positive Bacteria. J. Biol. Chem. 285, 3300–3309. https://doi.org/10.1074/jbc.M109. 081398. DeBartolo, J., Colubri, A., Jha, A.K., Fitzgerald, J.E., Freed, K.F., Sosnick, T.R., 2009. Mimicking the folding pathway to improve homology-free protein structure prediction. Proc. Natl. Acad. Sci. USA 106, 3734–3739. https://doi.org/10.1073/pnas. 0811363106. Fredriksson, R., Schiöth, H.B., 2005. The Repertoire of G-Protein–Coupled Receptors in Fully Sequenced Genomes. Mol. Pharmacol. 67, 1414–1425. https://doi.org/10. 1124/mol.104.009001. Goodsell, D.S., Olson, A.J., 2000. Structural Symmetry and Protein Function. Annu. Rev. Biophys. Biomol. Struct. 29, 105–153. https://doi.org/10.1146/annurev.biophys.29. 1.105. GPCR Dock 2008 models. http://jcimpt.scripss.edu/gpcr_dock.html (accessed 01/2019). GPCR Dock 2010 models. http://ablab.ucsd.edu/GPCRDock2010/ (accessed 01/2019). GPCR Dock 2013 models. http://ablab.ucsd.edu/GPCRDock2013/ (accessed 01/2019). Grisshammer, R., 2017. New approaches towards the understanding of integral membrane proteins: a structural perspective on G protein-coupled receptors. Protein Sci. 26, 1493–1504. https://doi.org/10.1002/pro.3200. Guttmann, R.P., Powell, T.J., 2012. Redox Regulation of Cysteine-Dependent Enzymes in Neurodegeneration. Int. J. Cell Biol. 2012, 1–8. https://doi.org/10.1155/2012/ 703164. Hedin, L.E., Öjemalm, K., Bernsel, A., Hennerdal, A., Illergård, K., Enquist, K., Kauko, A., Cristobal, S., von Heijne, G., Lerch-Bader, M., Nilsson, I., Elofsson, A., 2010. Membrane Insertion of Marginally Hydrophobic Transmembrane Helices Depends on Sequence Context. J. Mol. Biol. 396, 221–229. https://doi.org/10.1016/j.jmb.2009. 11.036. Huang, Y.-H., Chen, C.-M., 2012. Statistical analyses and computational prediction of helical kinks in membrane proteins. J. Comput. Aided. Mol. Des. 26, 1171–1185. https://doi.org/10.1007/s10822-012-9607-5. Hubbard, S.J., Thronton, J.M., 1993. NACCESS. Computer program. Department of Biochemistry and Molecular Biology, University College, London. Hunter, J.D., 2007. Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering 9, 90–95. Krogh, A., Larsson, B., von Heijne, G., Sonnhammer, E.L., 2001. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305, 567–580. https://doi.org/10.1006/JMBI.2000.4315. Kryshtafovych, A., Fidelis, K., 2009. Protein structure prediction and model quality assessment. Drug Discov. Today 14, 386–393. https://doi.org/10.1016/j.drudis.2008. 11.010. Kufareva, I., Katritch, V., Stevens, R.C., Abagyan, R., 2014. Advances in GPCR Modeling
9