Available online at www.sciencedirect.com
ScienceDirect Carbohydrate 3D structure validation Robbie P Joosten1 and Thomas Lu¨tteke2 Glycoproteins and protein–carbohydrate complexes in the worldwide Protein Data Bank (wwPDB) can be an excellent source of information for glycoscientists. Unfortunately, a rather large number of errors and inconsistencies is found in the glycan moieties of these 3D structures. This review illustrates frequent problems of carbohydrate moieties in wwPDB entries, such as nomenclature issues, incorrect N-glycan core structures, missing or erroneous linkages, or poor glycan geometry, and describes the carbohydrate-specific validation tools that are designed to identify such problems. Recommendations how to avoid these issues or how to rectify incorrect structures are also given. Addresses 1 Department of Biochemistry, Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands 2 Institute of Veterinary Physiology and Biochemistry, Justus-LiebigUniversity Giessen, Frankfurter Str. 100, 35392 Giessen, Germany Corresponding author: Lu¨tteke, Thomas (
[email protected])
individual cell different copies of one glycoprotein can carry different glycan chains. This effect is called microheterogeneity [3,4]. Nevertheless, some parts of the glycans such as the N-glycan core structure are highly conserved. The complexity of the glycosylation machinery indicates the importance of glycosylation. For example, it is involved in protein folding via the calnexin/ calreticulin system [5], the glycans stabilize the protein conformation and shield the protein from proteases (e.g. the trypsin resistance of the 1918 pandemic flu neuraminidase [6]), and they can serve as labels in intracellular trafficking [7]. Glycoproteins and glycolipids on the cell surface form the glycocalix, a dense layer of glycans that covers the cells [1]. These glycans play an important role in a series of cell–cell and cell–matrix interaction events, ranging from fertilization and cell differentiation to host– pathogen interactions, inflammation, and immune defense [1,2,8–11]. Carbohydrates in the wwPDB
Current Opinion in Structural Biology 2017, 44:9–17 This review comes from a themed issue on Carbohydrates Edited by Ute Krengel and Thilo Stehle
http://dx.doi.org/10.1016/j.sbi.2016.10.010 0959-440/# 2016 Elsevier Ltd. All rights reserved.
Introduction Carbohydrates in biological contexts
Carbohydrates, often referred to as glycans, are one of the four major classes of biomolecules, next to nucleic acids, proteins, and lipids. In addition to their well-known role in energy metabolism, glycans fulfill a multitude of other important functions: Glycosylation is the most frequent and at the same time the most complex modification of proteins. More than half of all proteins are glycoproteins [1]. Potential glycosylation sites are encoded as sequence motifs, so-called sequons, in the amino acid sequence. However, unlike in proteins, the composition of glycans are indirectly encoded in the genome via the expression of the glycosyltransferases and glycoside hydrolases that are used to build the glycan chains [2]. As a result, the glycans that are attached to a glycoprotein not only depend on the species or the individual, but also on the tissue or the status of a cell [1]. Even within one www.sciencedirect.com
Details of glycosylation and protein–carbohydrate interactions on a molecular level can be learned from 3D structures of glycoproteins and protein–carbohydrate complexes. Such structures can be found in the worldwide Protein Data Bank (wwPDB) [12]. However, the PDB and mmCIF file formats, which have been designed for protein structures, cause some problems when storing glycans. The residue names are limited to three characters in PDB files, which is sufficient for amino acids, but carbohydrate nomenclature requires longer names to encode hundreds of different residues in a systematic way. Therefore, the abbreviations used for carbohydrate residues (as well as for many small molecule ligands) are often not related to the common residue names. The mmCIF format in principle allows for longer residue names, but this possibility is currently not used as it would break backward compatibility with the PDB format. Furthermore, some residue names that are used to encode carbohydrates in wwPDB entries do not define monosaccharides but oligosaccharides or glycoconjugates. In these residues, the involved monosaccharides and the linkages are implicitly encoded, whereas for oligosaccharides that are built of monosaccharide residues the linkages are explicitly stated in LINK records. This causes ambiguity because the same compound can be described in more than one way in the PDB. For instance, maltose can be described as a single entity named MAL or by two a-D-glucose (GLC) residues (see Table 1 for definition of 3-letter-codes). Because of these limitations of the current file formats it is not only difficult to encode carbohydrates in wwPDB entries, but also to search for specific glycans in the wwPDB. Current Opinion in Structural Biology 2017, 44:9–17
10 Carbohydrates
Table 1 Definition of wwPDB carbohydrate residue names used in this manuscript wwPDB 3-letter code 0WK 5AX BGC BMA FUC FUL GLC MAL MAN NAG NDG NGS SIA XYP
Definition 1,5-anhydro-6-O-phosphono-D-Glucitol 2-(acetylamino)-1,5-anhydro-2-deoxy-D-glucitol b-D-Glcp b-D-Manp a-L-Fucp b-L-Fucp a-D-Glcp a-D-Glcp-(1-4)-a-D-Glcp a-D-Manp b-D-GlcpNAc a-D-GlcpNAc 6-sulfono-b-D-GlcpNAc a-D-Neup5Ac b-D-Xylp
Errors in wwPDB carbohydrate structures In a correct description of a carbohydrate residue in a wwPDB entry three things must match: the true chemical nature of the compound (e.g. a-D-glucose), the residue name (GLC), and the modeled atomic structure. Unfortunately, a rather large portion of the carbohydrates in the wwPDB features some kind of error [13,14,15,16,17]. Mismatches between the residues defined by the PDB residue names and the residues actually present in the structure are a frequent problem, which can be caused by selection of a wrong residue name, but also by issues in the structure model, such as wrong stereochemistry of individual atoms. Ideally the experimental data in the form of electron density maps indicates the nature of the mismatch, but in practice many electron density maps are not well-defined enough to make this distinction. In case of carbohydrate ligands in protein–carbohydrate complexes it is hard to decide, without further information, if the residue name or the structure is erroneous. In case of glycans that are covalently linked to Asn side-chains (Nglycans), which share a highly conserved core structure (Figure 1a), it is easier to tell which residues should be present. However, both wrongly selected residue names and erroneous stereochemistry of monosaccharides within the N-glycan core can be found [16]. O-Glycans are more difficult to validate, because several different types of O-glycosylation exist, such as O-mannosylation, mucin-type O-glycans (O-GalNAc), proteoglycans, or O-GlcNAc. Furthermore, different core structures can be present within one type of O-glycosylation [18]. Ligands are even more difficult, because virtually any glycan ligand could have been added to the protein in the experiment. Residue naming and N-glycan core structure issues
A common problem with carbohydrates in wwPDB entries is errors in residue naming. Most frequently, Current Opinion in Structural Biology 2017, 44:9–17
the anomeric center (see Side panel A), that is the carbon that links via a glycosidic linkage to the previous monosaccharide, is affected, such as b-D-Manp (BMA) named MAN (a-D-Manp) or vice versa. Often, however, the mismatch is not just a notation problem, but caused by incorrect geometry of the structure model, for example, a-D-GlcpNAc (NDG) residues instead of b-D-GlcpNAc (NAG), or b-L-Fucp (FUL) instead of a-L-Fucp (FUC) in (core-fucosylated) N-glycan structures (Figure 1b). NDG or FUL should not be present at all in non-engineered Nglycans, because there is no biosynthesis pathway known yet that generates N-glycans with these residues. The background of this type of error illustrates the complexity of dealing with carbohydrates that both wwPDB depositors (mostly X-ray crystallographers) and annotators face. As hydrogens are typically not observed in a crystallographic experiment, the anomer is solely defined there by the hydroxy group that is lost when a glycosidic linkage is formed. Therefore the correct anomer cannot be identified based on the atomic coordinates of the monosaccharide alone and the bonded monosaccharide (or amino acid side-chain) must be taken into account. With poor electron density it may prove quite difficult to get the relative positions of the two monomers correct during model building. During model refinement geometric restraints will regularize the geometry of the glycosidic linkage into the alpha or beta form provided the linkage is detected (otherwise restraints against too close atomic contacts will push the monomers apart), but incorrect input geometry may lead to incorrect restraint selections. What should be an alpha linkage may end up looking like beta linkage, or vice versa, particularly if the restraints outweigh the experimental data. When a structure model with such an incorrect linkage is deposited at the PDB, the monosaccharide might be renamed during model annotation to fit the, incorrect, model geometry, for example, NAG renamed to NDG. Depositors are advised of such changes, but they may choose to ignore them especially when the glycan is not their main interest in the protein. Renaming of monosaccharides has also occurred in remediation of PDB entries [19]. This has resulted in too many cases where correctly named monosaccharides with incorrect geometry were reannotated as incorrectly (with respect to their chemical nature) named monosaccharides with seemingly correct geometry. This is a particular problem in case of carbohydrate ligands, where in contrast to N-glycans no biological pathway-based information on the correct residues is implicitly available. Nevertheless, the same errors as in N-glycans can be present in ligands as well. For example, blood group antigens such as (sialyl) LewisX feature fucose residues, which are naturally present in alpha-anomeric form in these antigens. However, b-fucose is also found in some wwPDB entries (Figure 1e). ‘Flat’ carbon atoms, which look like sp2carbons, are also sometimes found. This problem is not limited to anomeric carbons; even completely flat glycan rings are present in some wwPDB entries. www.sciencedirect.com
Carbohydrate 3D structure validation Joosten and Lu¨tteke 11
Figure 1
(a)
(b)
FUC
MAN
MAN
FUL
(1-6) NAG (1-4)
(1-6)
(1-4)
(1-6)
(1-6)
MAN
BMA
(1-3)
ASN
NAG (1-4)
(1-4)
NAG
(1-3)
ASN
BMA NDG
MAN
(c)
(d) NAG
NAG (1-4) BMA
MAN (1-3)
(1-4)
(1-4) ASN
ASN
NAG
NAG
(e) GAL SIA
NGS
GAL
NGS
GAL
NAG
SIA SIA
FUC
FUC
FUL Current Opinion in Structural Biology
Examples of N-glycan core structure and Lewis X ligand. Incorrect residues or linkages are highlighted using a red font. Figure made with CCP4 mg [42]. (a) N-glycan core with correct stereochemistry and linkages. The glycan shown here (from wwPDB: 4B7I [43]) features a corefucosylation at position 6 of the first NAG. (b) Incorrect anomers are frequent errors in N-glycans in the wwPDB. The four highlighted residues all have the incorrect a/b anomer (from wwPDB: 3RY6 [44]). (c) Errors in linkage positions are less frequently found. Here the 1-3 linkage (from wwPDB: 4ZNE [45]) is clearly not supported by the electron density (gray mesh, drawn at 1s, taken from EDS [46]). (d) As C, but with a manually corrected glycan with a 1-4 linkage. (e) (6-sulfono) Sialyl LewisX ligand. Left: Distorted ligand in which the C1 atom of the fucose (FUC) residue is flat (wwPDB: 4UNZ [47]). Middle: Corrected version of the same ligand in the PDB_REDO databank. Right: Sialyl LewisX with the fucose incorrectly modeled in the b anomer (wwPDB: 4UO6 [47]).
Linkage errors
Errors involving the linkage positions of the residues, such as a (1-3)-linkage between the two core NAG residues, which should be (1-4)-linked, are rare but nevertheless present in some wwPDB entries (Figure 1c). Such errors in linkage positions indicate underlying, sometimes severe, errors in the structure model such as completely flipped residues, or wrong interpretation of electron density (Figure 1d). More frequently found issues of linkages are missing or surplus linkages. Lacking linkages between monosaccharide residues of a glycan chain usually result in monosaccharides that are pushed apart during model refinement (Figure 2a,b). In the final model there may seem to be unbonded monosaccharides that lack an anomeric www.sciencedirect.com
oxygen or equivalent atom. During wwPDB remediation some of these were renamed to appear as deoxy-derivatives of the true monosaccharides, for example, 2-(acetylamino)-1,5-anhydro-2-deoxy-D-glucitol (5AX) instead of b-D-GlcpNAc (NAG). In some cases 5AX is used although the C1 atom of a GlcpNAc residue is linked to another residue and thus should be NAG. Linkages of NAG residues to the OD1 atom of ASN instead of ND2 also exist. Surplus linkages can be found within the LINK records, which are used by the wwPDB to describe covalent bonds (or metal coordination) between compounds, of PDB files if non-bonded atoms are in close contact due to poor local geometry (Figure 2e), whereas erroneous long-distance linkages are usually caused by errors in the CONECT records (which describe the Current Opinion in Structural Biology 2017, 44:9–17
12 Carbohydrates
Side panel A Anomeric state and ring type of monosaccharide residues. Most monosaccharides exist as cyclic hemiacetals or hemiketals. During ring formation, an additional stereocenter is formed at the carbon atom that carries the aldehyde or ketone group in open chain form. This carbon atom is called the anomeric center, which is classified as a or b depending on the relative stereochemistry of the anomeric center and the so-called anomeric reference carbon. In Fischer projection the exocyclic oxygen atom at the anomeric center is formally cis to the oxygen atom at the anomeric reference carbon in the a anomer, whereas these two oxygen atoms are formally trans in the b anomer. In most cases, the anomeric reference carbon is identical with the configurational atom that is used to assign a monosaccharide to the D or L series (see rule 2-Carb-6 of [41]). Therefore, the local stereochemistry of the anomeric center looks different for D-monosaccharides and L-monosaccharides of the same anomer (Figure I). Reducing monosaccharides, that is those whose anomeric center is not glycosidically linked to any other molecule, can switch between open chain and different ring forms, resulting in a mixture of a and b anomers. This process is called mutarotation (Figure I). This is only possible if the anomeric center carries a free hydroxy group. As soon as it is linked to some other monosaccharide or aglycone (such as a protein, a lipid, or even just a methyl group) the anomer is fixed and does not change anymore. Glycosyltransferases, as well as glycoside hydrolases, are usually stereospecific. Therefore, in many cases only specific anomers are expected for certain saccharides. Figure I
(a)
H-C-OH H-C-OH HO-C-H H-C-OH
α-D-Glcf
H-C-O
α-D-Glcp
H-C-OH H
HO-C-H H-C-OH
D-Glc
HO-C-H H-C-OH H-C-O
β-D-Glcf
β-D-Glcp
H-C-OH H
(b)
HO-C-H
H-C-OH
HO-C-H
H-C-OH
H-C-OH
H-C-OH
O-C-H
α-L-Fucp
HO-C-H
H-C-OH
H-C-H H
O-C-H
β-L-Fucp
H-C-H H Current Opinion in Structural Biology
(a) Reducing monosaccharides switch in solution between different ring types (furanose or pyranose) and anomers via the open chain form. The anomeric state is defined by the relative orientation of hydroxyl groups at the anomeric carbon (black arrows) and the anomeric reference carbon (gray arrows). This is best seen in the Fischer projection (right). (b) As a result of this definition, anomers in L-Fuc look opposite to those in D-Glc in terms of local stereochemistry.
connectivity of any atom in non-protein/non-nucleic acid compounds) of PDB files. It should be noted that many surplus linkages were added in wwPDB annotation or remediation. Current Opinion in Structural Biology 2017, 44:9–17
Missing or surplus atoms
Monosaccharides with missing atoms can for instance be found at the reducing end of a non-covalently bound glycan, where mutarotation causes a mixture of alpha and www.sciencedirect.com
Carbohydrate 3D structure validation Joosten and Lu¨tteke 13
Figure 2
(a)
(b)
FUC
NAG2 BMA
(c)
MAN
ASN
NAG2
NAG1
NAG1
MAN
(d)
FUL
ASN
BGC
0WK
BMA ASN
MAN
O1 NDG
NAG
(e)
(f)
(g) NAG2 1 BMA
44
1 4
1
ASN
NAG1
ASN
NAG
XYP7
XYP5 XYP1 1 2 1 XYP3 1 45 145 21 XYP6 XYP 5 4 4 1 5 5 XYP2 Current Opinion in Structural Biology
Missing glycosidic linkages cause some residues to appear as if their anomeric hydroxy group is not present. (a) Linkages (indicated by dotted lines) between C1 of NAG1 and ND2 of ASN 297 as well as between C1 of FUC and O6 NAG1 need to be added in this glycan from wwPDB: 5D4Q [48] to form a correct N-glycan chain (only partly shown for clarity). (b) In this example from wwPDB: 5AFJ [49] adding linkages without rebuilding is insufficient. The linkage between NAG1 and ASN can be added directly, but the linkage from C1 of NAG2 to O4 of NAG1 would, incorrectly, have the a1-4 type. Together with the fact that the O5 of NAG2 is much closer (2.29 A˚) to the O4 of NAG1 than the C1 atom (3.03 A˚), this suggests that NAG2 should be flipped around the axis shown as a blue line. The positions of BMA and MAN also need to be adjusted to be able to add the required linkages. Furthermore, these residues, as well as NAG2, are not in the preferred 4C1 ring conformation. (c) This glycan chain from wwPDB: 3AGV [50] is an example of a missing linkage that is hidden by the fact that the anomeric hydroxy group is retained. Its orientation yields an NDG residue. If it is removed and the C1 of the NDG is linked to ND2 of ASN, then this residue becomes an N-glycosidically linked NAG. FUL is however already linked in the wrong anomeric form. (d) The cellobiose-6-phosphate ligand in wwPDB: 4ZFM is composed of two separate residues, BGC (b-D-Glcp) and 0WK (1,5-anhydro-6-O-phosphono-D-Glucitol). In chain C the linkage from C1 of 0WK to O4 of BGC is missing but, surprisingly, the residues were not pushed apart in refinement and the atoms are in fact closer (1.07 A˚) than in a standard linkage. This might have resulted from the fact that the ligand residues have only 80% occupancy in this model which may have switched off the restraints against excessive overlap of non-bonded atoms. (e) In wwPDB: 5FO9 [51] a surplus linkage is formed from C2 of NAG to ND2 of ASN, probably as a result of a poor local geometry with a close distance of between actually unbonded atoms. (f) A rather unusual linkage is formed in this glycan from wwPDB: 5A7E. Instead of the expected (1-4)-linkage, a (4-4)-linkage is found between BMA and NAG2. No glycosyltransferase is known to yield such a linkage, so probably the BMA residue needs to be rotated by 1808 to form a correct linkage. (g) In this glycan from wwPDB: 4D5Q [52] it seems at first glance that formation of linkages between XYP2 and XYP1 as well as between XYP4 and XYP3 is also prevented by rotation of residues, because the C5 is at the position of the C1 and vice versa. In this case, however, the atom coordinates are correct, and the atom names need to be adjusted; that is the atoms in XYP2 and XYP4 need to be re-numbered so that correct (1-4)-linkages between the b-D-Xylp (XYP) residues can be formed.
beta anomers (see Side panel A). This type of static disorder can cause the observed electron density to be very weak at the positions of the anomeric hydroxy group. In such cases some crystallographers may choose to not model the atoms altogether, or (if sufficient electron density is available) to refine both anomers as alternates, potentially with coupled occupancies. Also at other sites in glycans, lack of electron density can cause unmodeled/ missing atoms. Unmodeled atoms due to a lack of electron density should not be considered as errors per se, but they can lead to annotation errors down the line because the monosaccharide identity interpreted, and annotated, www.sciencedirect.com
based on the atomic coordinates may be incorrect. Surplus atoms can always be regarded as errors. The most common source of surplus atoms is failure to remove the leaving hydroxy group when a glycosidic linkage is modeled (see Figure 4 of [20]). It should be noted that this error can to some extent be hidden when a glycosidic linkage is not modeled at all and the monosaccharides involved are pushed apart in refinement (Figure 2c) [16]. Poor glycan geometry
Lacking or weak electron density is not only an issue in case of anomeric mixtures at a reducing end, but is a Current Opinion in Structural Biology 2017, 44:9–17
14 Carbohydrates
frequent problem when dealing with glycans. This is caused by the microheterogeneity of glycosylation sites mentioned above and by the intrinsic flexibility of glycans. This flexibility, however, is mainly caused by the rotatable bonds of glycosidic linkages; the individual monosaccharide rings are rather rigid [21,22]. Most residues are known to prefer a so-called chair conformation of the ring structure. In the wwPDB, however, many residues are found in non-preferred high-energy conformations [17]. These are typically caused by errors in model building such as fitting the wrong stereochemistry of individual ring carbons, most frequently of the anomeric carbon, or orienting residues with their rings flipped (Figure 2b). It should be noted, however, that not all unusual conformations or ring distortions are errors, as for example, carbohydrates can undergo conformational changes upon binding to glycoside hydrolases and during the reactions catalyzed by these enzymes [23]. Reducing the number of new errors
Many of the problems that are found in glycan structures in the wwPDB can be explained by a combination of weak electron density, complexity of glycans, which requires a high level of knowledge or experience of the researchers dealing with them, and a lack of glycanspecific knowledge of scientists who resolve the structures of glycoproteins or protein–carbohydrate complexes and who are mainly interested in the protein part of the structure. Furthermore, for a long time no validation tools for carbohydrate 3D structures have been available, and in contrast to protein validation the existing tools for carbohydrates are not yet part of the wwPDB validation pipeline and thus not used systematically. Nevertheless, several tools for validation of various aspects of carbohydrate 3D structures are available to assist experimental structural biologists with finding potential issues in their carbohydrate 3D models and with fixing these problems before submitting a structure model to the wwPDB. Inclusion of carbohydrate validation in the wwPDB validation pipeline has also been recommended by the wwPDB X-ray validation task force [24]. These tools will also be used for future remediations of existing wwPDB entries.
has also been improved recently. If no anomeric oxygen atom is found in a potential carbohydrate ring, then pdb-care tries to find atoms, to which the anomeric carbon might be linked, and suggests adding a LINK record if such an atom is found. Multiple LINK records between a pair of residues, which in most cases are caused by superfluous linkages (Figure 2e), are also flagged as well as LINKs of an N-glycan to the OD1 atom of an ASN side-chain instead of the ND2 atom. Pdb-care also offers a web-service with XML output for use in external programs. Carbohydrate Ramachandran plot (CARP)
The Ramachandran plot of f and c torsions of peptide links is a frequently used method in protein 3D structure validation [28]. These peptide torsions determine the conformation, or fold, of a protein backbone. In a similar way, the conformation of a carbohydrate is governed by the torsions of glycosidic linkages. These depend on the residues that are involved and the linkage position [22]. Therefore, a separate plot is required for each disaccharide fragment. Such plots are generated by CARP [29], which is also part of the glycosciences.de portal. To evaluate the detected torsions, data on the preferred and allowed conformations of each glycosidic torsion found in a PDB file are required. CARP offers two different sources of such data: torsion values mined from wwPDB entries, or conformational maps that are computed from molecular dynamics simulations and provided by GlycoMapsDB [30]. Privateer
Privateer [31] is the most recently published tool for carbohydrate structure validation. Its main focus is on checking the fit of carbohydrates with electron density and on validation of ring conformation. Stereochemistry problems can be detected as well. The results can be accessed and analyzed independently, but also be imported into the crystallographic modeling program COOT [32] for corrections of the reported problems. Privateer also provides Crystallographic Information File (CIF) library files of monosaccharides, which include torsion restraint targets to enforce lowenergy ring conformations already during structure refinement.
Carbohydrate validation tools PDB carbohydrate residue check (pdb-care)
Hetero-atom/ligand validation tools
Pdb-care, the first carbohydrate-specific 3D structure validation tool [25], is freely available via a web interface at the glycosciences.de server [26]. A stepwise description how to use this tool has been recently published [27]. Initially, pdb-care was focused on finding mismatches between residue names and the residues that are actually detected in the 3D structure. The current version also validates N-glycan core structures. Suggestions how to fix the detected errors are given in many cases. Validation of LINK records by pdb-care
The so-called hetero-atoms in PDB files comprise the small molecules that are found as ligands or protein modifications in wwPDB entries, including carbohydrates. Tools that are designed for validation of these residues [33] cover the carbohydrate residues as well, of course, but usually do not provide any glycan-specific validation criteria. Among these tools, MotiveValidator [34] features a ‘sugar validation’ mode; however, this only means that validation is limited to those hetero-atom residues that are part of carbohydrates. Nevertheless,
Current Opinion in Structural Biology 2017, 44:9–17
www.sciencedirect.com
Carbohydrate 3D structure validation Joosten and Lu¨tteke 15
these tools can be helpful to find issues that are caused by chirality errors or missing atoms. PDB_REDO
PDB_REDO is an automated procedure to optimize macromolecular structure models derived from X-ray or electron crystallography [35]. This procedure is applied to all crystallographic entries with deposited experimental data in the wwPDB with resulting structure models available through the PBD_REDO databank [36]. The procedure is also available as a webserver for work-inprogress structure models that have not (yet) been deposited in the PDB [37]. Although PDB_REDO is not a validation tool like the ones described above, it does attempt to resolve some of the problems that carbohydrate validation tools detect by refining and rebuilding the input structure model. The carbohydrate specific features revolve mostly around ensuring that the correct restraints are used in model refinement. To this end the output from pdb-care is used to correct residue names, add or remove linkages, and delete surplus atoms. In addition, the anomeric type of each glycosidic linkage is set (these are not explicitly stored in the wwPDB entries), atoms are renamed and renumbered to ensure canonical descriptions of glycosidic linkages (i.e. the anomeric O1 atom is the leaving atom), and ASN side-chains are flipped if needed to ensure glycans are attached to the ND2 atom.
Summary and conclusion The rather high frequency of problems within the carbohydrate part of wwPDB entries emphasizes the need for carbohydrate structure validation. Broad usage of validation tools to evaluate the glycan moieties of 3D structures will further improve the quality of the wwPDB and also its usability for all scientists who work with glycoproteins or protein–carbohydrate complexes. Validation is also required for statistical analyses of carbohydrate data from the wwPDB to filter out erroneous data. The individual tools such as privateer, pdb-care, or CARP evaluate specific criteria each, but none of them covers all kind of errors. Therefore, a combination of different tools is required for an extensive validation. Some of the problems, such as residues that do not match the assigned PDB residue name, for example, because of stereochemistry problems or missing atoms, can be detected by general ligand validation tools. However, these methods are of no help in case of those carbohydrate residues, for which a matching residue name has been selected in spite of an error in the 3D structure. For example, if an a-D-GlcpNAc residue within the N-glycan core is named NDG in a PDB file, then a simple comparison of residue name and residue coordinates will not detect a problem. In this case, carbohydrate-specific knowledge of the kind of residues that are known to be found in N-glycans needs to be implemented in the validation. www.sciencedirect.com
This kind of knowledge is also helpful if a notation/ structure mismatch is detected and suggestions how to fix it should be given. For example, if two b-D-Manp residues are found in an N-glycan core (one linked to bD-GlcpNAc, and the second one linked to the first b-DManp), and both of them are named MAN, then the first b-D-Manp simply needs to be renamed to BMA, whereas for the second one, where the residue name is correct but the stereochemistry of the anomeric carbon is wrong, the coordinates need to be fixed to match the residue name. In case of non-covalently bound carbohydrate ligands, this decision can only be made by the scientist who resolved the structure and who knows, what kind of ligand was used when preparing the samples for structure analysis. Ideally this is done before a model is deposited to the wwPDB, but in other cases post-deposition re-evaluation of the model, as for instance done by PDB_REDO is needed. It should be noted that such approaches can resolve many issues in a structure model, but there is always a risk of introducing new issues. Finally, it should be noted that not all problems that are identified by the validation tools are necessarily errors. For example, wwPDB entries 2nc3–2nc6 contain peptides with different monosaccharide residues chemically linked to an Asn side-chain [38]. Validation with pdb-care flags these residues, because they are different from the b-D-GlcpNAc that is normally linked to Asn. Nevertheless, these structures are correct, because here chemically engineered peptides rather than glycoproteins that were synthesized following biosynthetic pathways have been analyzed. Similarly, ‘unusual’ torsions do not necessarily mean that a glycan structure is erroneous, as conformational maps that are used for comparison are usually calculated for free glycans, whereas a glycan bound to a protein might be present in a different conformation [39]. Some glycan-binding proteins such as the RSL lectin from the bacterium Ralstonia solanacearum are also known to bind glycans in conformations that differ from those preferred by the free glycan [40]. Nevertheless, validation tools can help identifying potential problems, fixing them in case they are real problems, and finding ‘interesting’ structures if glycans are flagged during validation and turn out to be correct. On the other hand, structures that pass all validation tools might still be erroneous, if they contain a kind of error that is not yet handled by any of the tools or that requires further knowledge such as the chemical nature of a carbohydrate ligand. Therefore, the final decision if a structure is correct or not has to be made by humans, but the validation tools can assist them with this decision.
Acknowledgements This work was financially supported by the Netherlands Organization for Scientific Research (NWO) with Vidi Grant 723.013.003 to RPJ. Current Opinion in Structural Biology 2017, 44:9–17
16 Carbohydrates
References and recommended reading Papers of particular interest, published within the period of review, have been highlighted as: of special interest
19. Henrick K, Feng Z, Bluhm WF, Dimitropoulos D, Doreleijers JF, Dutta S, Flippen-Anderson JL, Ionides J, Kamada C, Krissinel E et al.: Remediation of the protein data bank archive. Nucleic Acids Res 2008, 36:D426-D433. 20. Joosten RP, Womack T, Vriend G, Bricogne G: Re-refinement from deposited X-ray data can deliver improved models for most PDB entries. Acta Crystallogr D: Biol Crystallogr 2009, 65:176-185.
1.
Lauc G, Pezer M, Rudan I, Campbell H: Mechanisms of disease: the human N-glycome. Biochim Biophys Acta 2016, 1860:1574-1582.
2.
Schnaar RL: Glycobiology simplified: diverse roles of glycan recognition in inflammation. J Leukoc Biol 2016, 99:825-838.
3.
Everest-Dass AV, Abrahams JL, Kolarich D, Packer NH, Campbell MP: Structural feature ions for distinguishing N- and O-linked glycan isomers by LC-ESI-IT MS/MS. J Am Soc Mass Spectrom 2013, 24:895-906.
22. Wehle M, Vilotijevic I, Lipowsky R, Seeberger PH, Silva DV, Santer M: Mechanical compressibility of the glycosylphosphatidylinositol (GPI) anchor backbone governed by independent glycosidic linkages. J Am Chem Soc 2012, 134:18964-18972.
4.
An HJ, Froehlich JW, Lebrilla CB: Determination of glycosylation sites and site-specific heterogeneity in glycoproteins. Curr Opin Chem Biol 2009, 13:421-426.
23. Speciale G, Thompson AJ, Davies GJ, Williams SJ: Dissecting conformational contributions to glycosidase catalysis and inhibition. Curr Opin Struct Biol 2014, 28:1-13.
5.
Lamriben L, Graham JB, Adams BM, Hebert DN: N-glycan-based ER molecular chaperone and protein quality control system: the calnexin binding cycle. Traffic 2016, 17:308-326.
6.
Wu ZL, Ethen C, Hickey GE, Jiang W: Active 1918 pandemic flu viral neuraminidase has distinct N-glycan profile and is resistant to trypsin digestion. Biochem Biophys Res Commun 2009, 379:749-753.
24. Read RJ, Adams PD, Arendall WB 3rd, Brunger AT, Emsley P, Joosten RP, Kleywegt GJ, Krissinel EB, Lu¨tteke T, Otwinowski Z et al.: A new generation of crystallographic validation tools for the protein data bank. Structure 2011, 19:1395-1412. An extensive overview of 3D-structure validation methods is given in this manuscript. Some of them are protein-specific, but general methods such as bond length and angle analyses are applicable to carbohydrates as well.
7.
Vishwanatha KS, Back N, Lam TT, Mains RE, Eipper BA: Oglycosylation of a secretory granule membrane enzyme is essential for its endocytic trafficking. J Biol Chem 2016, 291:9835-9850.
8.
Lepenies B, Seeberger PH: The promise of glycomics, glycan arrays and carbohydrate-based vaccines. Immunopharmacol Immunotoxicol 2010, 32:196-207.
9.
del Carmen Fernandez-Alonso M, Diaz D, Berbis MA, Marcelo F, Canada J, Jimenez-Barbero J: Protein–carbohydrate interactions studied by NMR: from molecular recognition to drug design. Curr Protein Pept Sci 2012, 13:816-830.
10. De Schutter K, Van Damme EJ: Protein–carbohydrate interactions as part of plant defense and animal immunity. Molecules 2015, 20:9029-9053. 11. Arnaud J, Audfray A, Imberty A: Binding sugars: from natural lectins to synthetic receptors and engineered neolectins. Chem Soc Rev 2013, 42:4798-4813. 12. Berman H, Henrick K, Nakamura H: Announcing the worldwide Protein Data Bank. Nat Struct Biol 2003, 10:980.
21. Frank M, Collins PM, Peak IR, Grice ID, Wilson JC: An unusual carbohydrate conformation is evident in moraxella catarrhalis oligosaccharides. Molecules 2015, 20:14234-14253.
25. Lu¨tteke T, von der Lieth CW: pdb-care (PDB CArbohydrate REsidue check): a programm to support annotation of complex carbohydrate structures in PDB files. BMC Bioinform 2004, 5:69. The first description of a carbohydrate-specific 3D-structure validation tool. 26. Lu¨tteke T, Bohne-Lang A, Loss A, Goetz T, Frank M, von der Lieth CW: GLYCOSCIENCES.de: an Internet portal to support glycomics and glycobiology research. Glycobiology 2006, 16:71R-81R. 27. Emsley P, Brunger AT, Lu¨tteke T: Tools to assist determination and validation of carbohydrate 3D structure data. Methods Mol Biol 2015, 1273:229-240. 28. Ramachandran GN, Ramakrishnan C, Sasisekharan V: Stereochemistry of polypeptide chain configurations. J Mol Biol 1963, 7:95-99. 29. Lu¨tteke T, Frank M, von der Lieth CW: Carbohydrate Structure Suite (CSS): analysis of carbohydrate 3D structures derived from the Protein Data Bank. Nucleic Acids Res 2005, 33:D242-D246.
13. Lu¨tteke T, Frank M, von der Lieth CW: Data mining the protein data bank: automatic detection and assignment of carbohydrate structures. Carbohydr Res 2004, 339:1015-1020.
30. Frank M, Lu¨tteke T, von der Lieth CW: GlycoMapsDB: a database of the accessible conformational space of glycosidic linkages. Nucleic Acids Res 2007, 35:287-290.
14. Crispin M, Stuart DI, Jones EY: Building meaningful models of glycoproteins. Nat Struct Mol Biol 2007, 14:354. This article is the first description of non-conformance of N-glycans in PDB entries with the conserved N-glycan core structure.
31. Agirre J, Iglesias-Fernandez J, Rovira C, Davies GJ, Wilson KS, Cowtan KD: Privateer: software for the conformational validation of carbohydrate structures. Nat Struct Mol Biol 2015, 22:833-834. The most recent publication of a new carbohydrate-specific validation tool.
15. Nakahara T, Hashimoto R, Nakagawa H, Monde K, Miura N, Nishimura S: Glycoconjugate Data Bank: structures—an annotated glycan structure database and N-glycan primary structure verification service. Nucleic Acids Res 2008, 36:D368-D371. 16. Lu¨tteke T: Analysis and validation of carbohydrate threedimensional structures. Acta Crystallogr D: Biol Crystallogr 2009, 65:156-168.
32. Debreczeni JE, Emsley P: Handling ligands with Coot. Acta Crystallogr D: Biol Crystallogr 2012, 68:425-430. 33. Adams PD, Aertgeerts K, Bauer C, Bell JA, Berman HM, Bhat TN, Blaney JM, Bolton E, Bricogne G, Brown D et al.: Outcome of the first wwPDB/CCDC/D3R ligand validation workshop. Structure 2016, 24:502-508.
17. Agirre J, Davies G, Wilson K, Cowtan K: Carbohydrate anomalies in the PDB. Nat Chem Biol 2015, 11:303. Carbohydrate ring conformation is systematically analyzed in this manuscript; many high-energy conformations are reported.
34. Varekova RS, Jaiswal D, Sehnal D, Ionescu CM, Geidl S, Pravda L, Horsky V, Wimmerova M, Koca J: MotiveValidator: interactive web-based validation of ligand and residue structure in biomolecular complexes. Nucleic Acids Res 2014, 42:W227-W233.
18. Zauner G, Kozak RP, Gardner RA, Fernandes DL, Deelder AM, Wuhrer M: Protein O-glycosylation analysis. Biol Chem 2012, 393:687-708.
35. Joosten RP, Joosten K, Murshudov GN, Perrakis A: PDB_REDO: constructive validation, more than just looking for errors. Acta Crystallogr D: Biol Crystallogr 2012, 68:484-496.
Current Opinion in Structural Biology 2017, 44:9–17
www.sciencedirect.com
Carbohydrate 3D structure validation Joosten and Lu¨tteke 17
36. Joosten RP, Vriend G: PDB improvement starts with data deposition. Science 2007, 317:195-196. 37. Joosten RP, Long F, Murshudov GN, Perrakis A: The PDB_REDO server for macromolecular structure model optimization. IUCrJ 2014, 1:213-220. The PDB_REDO server does not only flag problems but also provides methods to fix some of the detected issues within carbohydrate structures. 38. Hsu CH, Park S, Mortenson DE, Foley BL, Wang X, Woods RJ, Case DA, Powers ET, Wong CH, Dyson HJ et al.: The dependence of carbohydrate–aromatic interaction strengths on the structure of the carbohydrate. J Am Chem Soc 2016, 138:7636-7648. 39. Jo S, Qi Y, Im W: Preferred conformations of N-glycan core pentasaccharide in solution and in glycoproteins. Glycobiology 2016, 26:19-29. 40. Topin J, Lelimousin M, Arnaud J, Audfray A, Perez S, Varrot A, Imberty A: The hidden conformation of Lewis x, a human histoblood group antigen, is a determinant for recognition by pathogen lectins. ACS Chem Biol 2016, 11:2011-2020. 41. McNaught AD: Nomenclature of carbohydrates (recommendations 1996). Adv Carbohydr Chem Biochem 1997, 52:43-177. 42. McNicholas S, Potterton E, Wilson KS, Noble ME: Presenting your structures: the CCP4 mg molecular-graphics software. Acta Crystallogr D: Biol Crystallogr 2011, 67:386-394. 43. Bowden TA, Baruah K, Coles CH, Harvey DJ, Yu X, Song BD, Stuart DI, Aricescu AR, Scanlan CN, Jones EY et al.: Chemical and structural analysis of an antibody folding intermediate trapped during glycan biosynthesis. J Am Chem Soc 2012, 134:17554-17563. 44. Fibriansah G, Veetil VP, Poelarends GJ, Thunnissen AM: Structural basis for the catalytic mechanism of aspartate ammonia lyase. Biochemistry 2011, 50:6053-6062.
www.sciencedirect.com
45. Oganesyan V, Mazor Y, Yang C, Cook KE, Woods RM, Ferguson A, Bowen MA, Martin T, Zhu J, Wu H et al.: Structural insights into the interaction of human IgG1 with FcgammaRI: no direct role of glycans in binding. Acta Crystallogr D: Biol Crystallogr 2015, 71:2354-2361. 46. Kleywegt GJ, Harris MR, Zou JY, Taylor TC, Wahlby A, Jones TA: The Uppsala electron-density server. Acta Crystallogr D: Biol Crystallogr 2004, 60:2240-2249. 47. Collins PJ, Vachieri SG, Haire LF, Ogrodowicz RW, Martin SR, Walker PA, Xiong X, Gamblin SJ, Skehel JJ: Recent evolution of equine influenza and the origin of canine influenza. Proc Natl Acad Sci U S A 2014, 111:11175-11180. 48. Ahmed AA, Keremane SR, Vielmetter J, Bjorkman PJ: Structural characterization of GASDALIE Fc bound to the activating Fc receptor FcgammaRIIIa. J Struct Biol 2016, 194:78-89. 49. Spurny R, Debaveye S, Farinha A, Veys K, Vos AM, Gossas T, Atack J, Bertrand S, Bertrand D, Danielson UH et al.: Molecular blueprint of allosteric binding sites in a homologue of the agonist-binding domain of the alpha7 nicotinic acetylcholine receptor. Proc Natl Acad Sci U S A 2015, 112:E2543-E2552. 50. Nomura Y, Sugiyama S, Sakamoto T, Miyakawa S, Adachi H, Takano K, Murakami S, Inoue T, Mori Y, Nakamura Y et al.: Conformational plasticity of RNA for target recognition as revealed by the 2.15 A crystal structure of a human IgGaptamer complex. Nucleic Acids Res 2010, 38:7822-7829. 51. Forneris F, Wu J, Xue X, Ricklin D, Lin Z, Sfyroera G, Tzekou A, Volokhina E, Granneman JC, Hauhart R et al.: Regulators of complement activity mediate inhibitory mechanisms through a common C3b-binding mode. EMBO J 2016, 35:1133-1149. 52. Momeni MH, Ubhayasekera W, Sandgren M, Stahlberg J, Hansson H: Structural insights into the inhibition of cellobiohydrolase Cel7A by xylo-oligosaccharides. FEBS J 2015, 282:2167-2177.
Current Opinion in Structural Biology 2017, 44:9–17