Journal Pre-proof Identification of glycan branching patterns using multistage mass spectrometry with spectra tree analysis
Hui Wang, Jingwei Zhang, Junchuan Dong, Meijie Hou, Weiyi Pan, Dongbo Bu, Jinyu Zhou, Qi Zhang, Yaojun Wang, Keli Zhao, Yan Li, Chuncui Huang, Shiwei Sun PII:
S1874-3919(20)30017-8
DOI:
https://doi.org/10.1016/j.jprot.2020.103649
Reference:
JPROT 103649
To appear in:
Journal of Proteomics
Received date:
25 September 2019
Revised date:
2 January 2020
Accepted date:
16 January 2020
Please cite this article as: H. Wang, J. Zhang, J. Dong, et al., Identification of glycan branching patterns using multistage mass spectrometry with spectra tree analysis, Journal of Proteomics (2020), https://doi.org/10.1016/j.jprot.2020.103649
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
© 2020 Published by Elsevier.
Journal Pre-proof
Identification of Glycan Branching Patterns Using Multistage Mass Spectrometry with Spectra Tree Analysis Hui Wanga,d,1 , Jingwei Zhanga,c,1 , Junchuan Donga,d, Meijie Houa,d, Weiyi Pana,d, Dongbo Bua,d, Jinyu Zhoub,d, Qi Zhanga,d, Yaojun Wanga,e, Keli Zhaob,d, Yan Lib,d, Chuncui Huangb,d,* , Shiwei Suna,d,* a
Key Laboratory of Intelligent Information Processing, Institute of Computing Technology,
Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.
Department of Computer Science, Stony Brook University, Stony Brook, NY 11794, USA University of Chinese Academy of Sciences, Beijing 100049, China
-p
d
College of Information and Electrical Engineering, China Agricultural University 100083,China
re
e
ro
c
of
Chinese Academy of Sciences, Beijing 100190, China. b
lP
Abstract
Glycans are crucial to a wide range of biological processes, and their biological activities are closely related to the branching patterns of structures. Different from the simple linear chains of
na
proteins, branching patterns of glycans are more complicated, making their identification extremely challenging. Tandem mass spectrometry (MS 2 ) cannot provide sufficient structural
ur
information to deduce glycan branching patterns even with the assistance of various bioinformatic
Jo
tools and algorithms.The promising technology to identify glycan branching patterns is multi-stage mass spectrometry (MSn ). The production-relationship among MSn spectra of a glycan is essentially a tree, making deducing glycan structures from MS n spectra a great challenge. In the present study, we report an approach called glyBranch (glycan Branching pattern identification based on spectra tree) to fully exploit the information contained in the MS n spectra tree for glycan identification. Using 14 glycan standards, including 2 pairs with isomeric sequence, and 16 complex N-glycans isolated from RNase B and IgG, we demonstrated the successful application of glyBranch to branching pattern analysis. *
Corresponding author Email address:
[email protected] (Chuncui Huang),
[email protected] (Shiwei Sun) 1
H.W. and J.Z. contributed equally
Journal Pre-proof The source code of glyBranch is available at https://github.com/bigict/glyBranch/. We have also developed a web-server, which is freely accessible at http://glycan.ict.ac.cn/glyBranch/. Keywords: Multi-stage mass spectrometry, glycan branching pattern, glycan identification, spectra tree, isomeric glycan
Significance
of
Glycans are crucial in various biological processes and their functions are closely related to the details of their structures; thus, the identification of glycan branching patterns is of great
ro
significance to biological studies. Multistage mass spectrometry (MS n ) can provide detailed
-p
structural information by generating multiple-level fragments through consecutive fragmentation; however, the interpretation of numerous MS n spectra is extremely challenging. In this study, we
re
present an approach called glyBranch (glycan Bra nching pattern identification based on spectra tree) to exploit the information contained in MS n spectra tree for glycan identification. This
lP
approach will greatly facilitate the automated identification of glycan structures and related
Introduction
ur
1
na
biological studies.
Glycosylation is one of the most important post-translational modifications (PTM) of proteins, and
Jo
more than half of the proteins are glycosylated during the process of post translation. Glycans play important roles in a wide variety of biological processes such as protein conformation, cell proliferation and differentiation, cell-cell communication, immune response and microbial adhesion [1, 2, 3, 4]. Moreover, glycans are critically involved in the pathogenesis and progression of various diseases, and have been used as biomarkers for clinical diagnosis [5, 6, 7]. For instance, an increase in N-glycans with bisecting N-acetylglucosamine (GlcNAc) occurs frequently in cancer cells, and large tetra-antennary branching structures are commonly formed in cancers [7, 8]. In addition, a dramatic increase of the N-glycans lacking outer-arm galactose residues has been detected in the patients with rheumatoid arthritis (RA) [9, 10]. The roles and functions of glycans are closely related to the details of their structures. For example, human milk oligosaccharides (HMOs) are a family of structurally diverse unconjugated
Journal Pre-proof oligosaccharides that are highly abundant in human milk. Most HMOs are elongation products of lactose composed of monosaccharides with various branching patterns, followed by adding N-glucosamine, galactose, fucose and sialic acid. Structures of HMOs determine the bacterial composition in the infant’s intestine tract, which can protect infants from pathogen and infection. A number of methodologies have been proposed for glycan identification, among which mass spectrometry (MS) is one of the most sensitive techniques without the need of using glycan standards [11, 12, 13]. Most of the existing MS-based methods explore MS1 or MS2 information for glycan identification [14, 15, 16, 17, 18]. MS 1 and MS2 have been proved to be successful in
of
peptide identification; however, they cannot provide sufficient information to elucidate the complex branching patterns of glycans [19, 20, 21, 22]. In addition, the number of isomeric
ro
glycans is much greater than that of peptides. Superior to MS 2 , multi-stage mass spectrometry
-p
(MSn , n > 2 ) can provide more detailed structural information by generating multiple- level fragments through consecutive fragmentation of a glycan. Based on these multiple- level
re
fragments, it is possible to determine the exact branching patterns of glycans. The
lP
production-relationship among MSn spectra of a glycan is essentially a tree, making deducing glycan structures from MSn spectra a great challenge. Given the fact that the MS n provides far
26, 27, 28].
na
more detailed information on glycan structures, computational tools are timely needed [23, 24, 25,
In this study, we present an approach glyBranch for assignment of glycan branching
ur
patterns through interpretation of MS n spectra based on glycan structure database. glyBranch
Jo
automatically integrated information from all acquired MS n spectra, and assigned specific branching patterns from the database GDB (consisting of a total of 7258 glycan structures excerpted from CarbBank [29]). In this approach, we explored the pattern of isotopic peaks to measure the quality of MSn spectra, and filtered out the low-quality spectra. We further proposed a new scoring function to compare the experimental MS n spectra with the known glycan structures listed in GDB. The structure with the highest probability was reported by glyBranch as identification result. We applied glyBranch to assignment of 14 standards, including N-glycans and human milk oligosaccharides (HMOs) as representatives, and to identification of 2 pair of glycan isomers before it was used in the identification of N-glycans isolated from two glycoproteins.
Journal Pre-proof
2.
Method
2.1. MALDI-MS Permethylated oligosaccharide standards and N-glycans released from glycoproteins were analyzed on an Axima MALDI Resonance mass spectrometer with a QIT-TOF configuration (Shimadzu). A nitrogen laser was used to irradiate samples at 337 nm, with an average of 200 shots accumulated. Permethylated glycan standards and N- glycans from glycoproteins dissolved in methanol were applied to a focus MALDI plate target (900 m , 384 circles, HST). A matrix
of
solution (0.5 L ) of 2,5-dihydroxybenzoic acid (20 mg / mL ) in a mixture of methanol/water (1:1) containing 0.1% trifluoroacetic acid and 1 mM NaCl was added to the plate and mixed with
ro
samples. The mixture was air dried at room temperature before analysis. The product- ion spectra
-p
acquired were converted into platform- independent data formats, mzXML file by the Shimadzu Biotech Launchpad for later identification. For MS 2 , MNa+ was used as the precursor and the
re
collision energy was optimized between 100-200 V , and for MS3 the energy was set at 200-300
V . For MS4 , it was at 300-400 V , and 400-600 V for MS5 to produce the MSn spectrum. For
lP
precursor selection, among the four different resolution settings of the instrument (FWHM 70, 250, 500 and 1,000) with ‘two-step selection’ design, the window at FWHM 500 with a width of
na
3-5 mass units was considered appropriate and used for the present study. During MS n scanning, five most intense peaks in each acquired spectrum were selected as precursors to undergo
Jo
ur
next-stage dissociation until MS5 spectra were generated.
2.2. glyBranch approach for glycan identification glyBranch aims to identify glycan branching structures based on a collection of MS n (n > 1) spectra of the glycan of interest. The spectra were generated as follows: at each stage of MS n experiment, the 5 most- intensive peaks were selected as precursor- ions for next-stage MS scanning. This procedure was repeated up to MS 5 stage, thus generating a total of 156 (= 53 52 51 50 ) MSn spectra for a glycan of interest. We organize these MS n spectra into a spectra
tree according to the production relationship among them; that is, if an MS i+1 spectrum is a production-ion spectrum of an MSi spectrum, we connect these two spectra using an edge. The basic operations of glyBranch approach are shown in Figure 1 and described as follows:
Journal Pre-proof Figure 1: The workflow of glyBranch for glycan identification. glyBranch contains three steps: (i ) Removing noisy peaks and filtering out low-quality spectra; (ii ) Generating theoretical
spectra tree of candidate glycan structure; (iii ) Scoring candidate glycan structure through comparing theoretical and experimental spectra trees.
(1) Removing noisy peaks and filtering out low-quality spectra MSn spectra usually contain a large amount of noisy peaks. In this study, a peak is treated as noise if its relative intensity was below a threshold Th , which was set through estimating noise
of
level: In a spectrum, the peaks with m/z less than 146 (m/z of free fucose residue) or greater than
ro
MNa+ (m/z of the precursor ion) are definitely noise. We collect these peaks and calculate the mean and standard deviation of their intensities. According to the “three-sigma―
-p
rule, we set the noise level as 3 . In the case that these noise peaks do not exist, we simply
re
use 0.1 as noise level. In summary, we set threshold Th = max{0.1, 3 } and remove all peaks with relative intensity less than Th .
lP
After denoising peaks, we further filter out low-quality spectra. Here, a spectrum is treated as low-quality if it does not contain isotopic peak or significant peak. Specifically, for a spectrum,
na
we calculate its quality as I 3 / I med , where I 3 represents the average intensity of the top 3 peaks,
ur
and I med represents the median intensity of all peaks in the spectrum. A spectrum with
I3 / I med < 2 is treated as low-quality. It should be pointed out once a spectrum was filtered out, we
Jo
also filter out all of its product- ion spectra, i.e., the spectra that are generated using a fragment-ion in this spectrum as precursor ion. (2) Generating theoretical spectra tree of candidate glycan structure From the primary mass spectrum of the glycan of interest, we first determine its molecular mass and then extract the glycans with identical mass from GDB as candidate glycan structures, denoted as G1 , G2 ,
, Gn . Next, for each candidate glycan structure Gi , we generate a theoretical
spectra tree that has identical production relationship with the experimental spectra tree as follows: (i ) Initially, we generate theoretical MS 2 spectrum S 2 of Gi through simulating
fragmentation process of Gi . Here we adopt the simulating fragmentation process used in GIPS [19].
Journal Pre-proof (ii ) For each experimental MS3 spectrum S3 , we first identify the fragments Fi 2 in Gi
that has identical mass to its parent- ion. Next, we generate the theoretical MS 3 spectrum S3 of
Gi through simulating fragmentation process of the fragments in Fi 2 . (iii ) Now consider each experimental MS 4 spectrum (denoted as S 4 ) which is
product- ion spectrum of S3 . We first identify the fragments Fi 3 in Gi that has identical mass to the parent-ion of S 4 . To guarantee that the theoretical spectrum S 4 is also a product- ion
of
spectrum of S3 , we require that each fragment in Fi 3 to be a substructure of a certain fragment in
ro
Fi 2 . Next, we generate the theoretical MS 4 spectrum S 4 of Gi through simulating fragmentation
process of the fragments Fi 3 .
-p
(iv) The theoretical MS5 spectra is generated in similar manner.
re
This way, for the experimental MS n spectra, we construct theoretical spectra as their counterparts, which share identical production relationship to the experimental ones.
lP
(3) Scoring candidate glycan structure through comparing theoretical and experimental spectra trees
na
For each experimental spectrum S and its theoretical counterpart S , we measure their similarity Sim( S , S ) based on the common peaks shared by them. The existing scoring functions
ur
either simply use the shared peak count (SPC) [30, 31], which is sensitive to noisy peaks, or use
Jo
logarithm of peak intensities, i.e.,
Sim(S , S ) = pC ( S , S )lnI p
where C ( S , S ) represents the shared peaks of S and S , and I p represents the intensity of peak p . The logarithmic scoring function has shown its success in practice [32]. However, this function is dominated by a small amount of the most intensive peaks. To overcome this shortcoming, we here propose to use weighted tanh scoring function:
Sim(S , S ) = pC ( S ,S )cos( Dp , Dp )tanhI p where D p and D p represent the intensities of isotopic derivatives of p in S and S ,
Journal Pre-proof respectively. The advantage of this scoring scheme are two- fold: i ) Using the tanh function, the intensive shared peaks would lead to high score; more importantly, no peak will dominate the scoring function. ii ) Using the weight cos( Dp , Dp ) , the peaks with isotopic derivatives would be emphasized since these peaks are more reliable. For a candidate glycan Gi , we generate theoretical spectra tree TG according to the i
experimental spectra tree T and then calculate the similarity between these two trees as follows:
of
Sim(T , TG ) = ST Sim(S , SG ) i
i
ro
The similarity Sim(T , TG ) measures the possibility that the experimental spectra are i
-p
generated from Gi . We finally normalize these possibilities into the range [0,1] using SoftMax.
Results and Discussion
lP
3.
re
The candidate glycan with the highest possibility is finally reported as the actual glycan.
3.1. Comparison of various scoring functions
na
To illustrate the advantages of tanh( I ) , we compared it with two popular scoring functions. As shown in Table 1, when no peak-denoising is applied, glyBranch could correctly identify 29 out of
ur
30 glycans using tanh-score, which is significantly higher than SPC-score (22) and ln-score (26).
Jo
tanh-score outperforms the other two scoring function even when denoising level increases. A reasonable explanation of the advantage of tanh-score is that the SPC-score is very sensitive to noise while ln-score is too sensitive to mismatched peaks. We examined this issue in more details as follows.
Table 1: The number of correctly-identified glycans using different scoring functions. Here, tanh-score, SPC-score and ln-score represent the score functions tanh( I ) , shared peak count, and ln( I ) , respectively. Peak-denoising threshold
tanh-score SPC-score
ln-score
0%
29
22
26
1%
29
26
26
Journal Pre-proof 2%
26
29
26
Figure 2 (a) shows the actual structure Man-5 and one of its isomeric structures (denoted as wrong structure). The experimental MS 3 spectrum produced from m/z 1084 precursor, together with fragment annotations using these two glycan structures, are shown in Figure 2 (c). The wrong structure could explain 6 MS3 peaks while the actual structure could explain 5 peaks. Thus, the wrong structure was assigned with higher SPC-score. However, the 6 peaks explained by the wrong structure have lower intensity than the 5 peaks explained by the actual glycan. The
of
tanh-score considers both number of shared peaks and peak intensity and thus assign high score to the actual glycan.
ro
We further investigated the advantage of tanh-score over ln-score using Man-6 as an
-p
example. Specifically, Figure 2 (b) shows the actual glycan Man-6 and one of its isomeric glycan structures (denoted as wrong structure) and Figure 2 (d) shows the experimental MS 4 spectrum
re
produced from m/z 927 precursor with fragment annotation using these two glycans. The peak at
lP
m/z 677 (the isotopic peak of m/z 676) is the most intensive one. This peak can be explained by the wrong structure only; thus, the wrong structure was assigned with a higher ln-score. In contrast,
ur
in a reasonable manner.
na
tanh-score gives each peak a score in the range [0,1] and thus controls the effect of peak intensity
Figure 2: Investigation of the advantage of tanh-score over SPC-score and ln-score using Man-5
Jo
and Man-6 as examples. (a) The actual glycan Man-5 and one of its isomeric glycan structures. (b) The actual glycan Man-6 and one of its isomeric glycan structures. (c) The experimenta l MS 3 spectrum produced from m/z 1084 precursor for Man-5 with fragment annotation using both Man-5 and the isomeric glycan. (d) The experimental MS 4 spectrum produced from m/z 927 precursor for Man-6 with fragment annotation using both Man-6 and the isomeric glycan.
3.2. Identification of glycan standards including isomeric glycan pairs We further evaluated glyBranch on 14 N-glycans and HMOs. As shown in Table 2, glyBranch could correctly identify all of these 14 glycans and assigned the actual glycan with probabilities close or equal to 1. This result clearly suggests the performance of glyBranch. We also investigated the ability of glyBranch to distinguish isomeric glycan pairs. The 14
Journal Pre-proof glycans used here contain two pairs of isomers, i.e., LNDFH-I and LNDFH-II, MFLNH-I and MFLNH-III. The isomeric glycan structures differ only in which monosaccharide fucose links to: in MFLNH-I and LNDFH-I, fucose links to Gal, whereas in MFLNH-III and LNDFH-II, fucose links to GlcNAc and Glc, respectively. Here we used MFLNH-I and MFLNH-III as an example to describe the identification process of glyBranch. Based on the primary mass spectrum with MNa + at m/z 1549, 16 isomeric candidate structures, including MFLNH-I and MFLNH-III, were extracted from GDB. For the MFLNH-I sample, the experimental MS2 spectrum provided sufficient structurally- informative peaks to
of
distinguish it from MFLNH-III. As shown in Figure 3 (b), 4 more MS 2 peaks were annotated
ro
using MFLNH-I than MFLNH-III. Furthermore, the MS3 , MS4 , and MS5 spectra demonstrated a total of 14 distinctive peaks of MFLNH-I. Based on this information, glyBranch assigned a higher
-p
probability to MFLNH-I (1.00) than MFLNH-III (0.00) and thus correctly reported MFLNH-I as the actual glycan.
re
For the MFLNH-III sample, the MS2 spectrum provided no information for distinguishing
lP
these two isomers since they can annotate exactly the same peaks (Figure 3 (c)). However, the subsequent MS3 , MS4 and MS5 spectra revealed 6 distinctive peaks of MFLNH-III but only 2
na
distinctive peak of MFLNH-I. Thus, glyBranch correctly identified MFLNH-III as the actual
ur
glycan.
Figure 3: Distinguishing isomeric glycan pairs using glyBranch. (a) Isotopic pair MFLNH-I and
Jo
MFLNH-III. (b) The MS2 spectrum produced using precursor ion m/z 1549 for the sample MFLNH-I. (c), (d), and (e) show the MS 2 , MS3 , and MS4 spectrum acquired from the sample MFLNH-III, respectively.
Table 2: The identification results of glyBranch on 14 glycans standards. Here “Noc” is the abbreviation for “Number of candidates”. Glycan
Structure
MNa+
Noc
Probability
Rank
A2
[image]
2792
3
1.00
1
LNDFH-I
[image]
1274
8
1.00
1
LNDFH-II
[image]
1274
8
0.99
1
MFLNH-I
[image]
1549
16
1.00
1
Journal Pre-proof [image]
1549
16
0.99
1
Man-5D1
[image]
1579
16
0.95
1
Man-6
[image]
1783
15
0.66
1
Man-7D1
[image]
1987
11
0.99
1
NA2
[image]
2070
10
1.00
1
NA3
[image]
2519
8
1.00
1
NA4
[image]
2968
14
1.00
1
NGA2
[image]
1661
16
1.00
1
NGA3
[image]
1906
19
1.00
1
NGA4
[image]
2152
14
of
MFLNH-III
1
ro
1.00
Table 3: The identification results of glyBranch for the samples isolated from RNase B and IgG. MNa+
[image]
1579
[image]
1783
Rank
15
1
14
0.87
1
[image]
1987
11
0.05
2
[image]
2192
11
0.99
1
[image]
2396
15
0.84
1
[image]
1835
10
0.88
1
[image]
2040
9
0.98
1
[image]
2070
10
0.55
1
[image]
2081
7
0.98
1
[image]
2244
14
0.99
1
[image]
2285
10
0.84
1
[image]
2401
6
0.88
1
[image]
2605
5
0.95
1
[image]
2850
3
1.00
1
[image]
2966
13
0.99
1
[image]
3211
8
0.88
1
ur Jo IgG
Probability 0.73
na
RNase B
Noc
re
Structure
lP
Glycoprotein
-p
Here “Noc” is the abbreviation for “Number of candidates”.
3.3. Identification of glycans released from RNase B and human serum IgG
Journal Pre-proof Next we applied glyBranch for profiling the N glycans isolated from RNase B and human serum IgG. In the primary MS spectrum of RNase B, five major peaks were produced at m/z 1579, 1783, 1987, 2192 and 2396. Then MSn scanning was performed for these five peaks. Using the present approach, four out of the five peaks (except for m/z 1987) were successfully identified as Man-5D1, Man-6, Man-8D1D3 and Man-9, with the probability of 0.73, 0.87, 0.99, and 0.84, respectively (Table 3). We further investigated why glyBranch failed at m/z 1987 (Man-7). As revealed by CE analysis, Man-5, Man-6 and Man-9 each contains a single component, but Man-7 contains multiple components (Man-7D1, Man-7D2, and Man-7D3 with approximate
of
concentrations of 26%, 26%, and 48%, respectively). Man-8 also contains multiple components,
ro
among which Man-8D1D3 is a dominant one with purity of 85%. This result suggests that glyBranch has the potential to identify mixed samples with a dominant component.
-p
Compared with the glycans isolated from RNase B sample, the glycans isolated from IgG sample are more complex. In this study, IgG was extracted and purified from 1 L human serum,
re
and SDS-PAGE was then performed to evaluate the purity. High purity I gG was prepared for
lP
subsequent research. In the primary spectrum of this complex sample, we selected the most abundant 11 N-glycans with m/z 1835, 2040, 2070, 2081, 2244, 2285, 2401, 2605, 2851, 2966 and
na
3211 for MSn fragmentation. As shown in Table 3, all of the 11 complex-type N-glycans were successfully identified with high probabilities of 0.88, 0.98, 0.54, 0.98, 0.99, 0.84, 0.88 0.95, 1.00,
ur
0.99 and 0.88, respectively. Among these identified glycans, seven were corroborated by available
4.
Jo
N-glycan standards while the other four by literature data [33, 34, 35, 36].
Conclusions
In the study, we present an approach that can effectively identify glycan structures using MS n spectra. We demonstrated the successful application of our approach in identification of 14 glycan standards including two isomeric pairs, and complex samples isolated from RNase B and human IgG. In the current study, we used MALDI-MSn to produce mass spectrum. For different ionization modes and instrumentations, glyBranch can be easily extended to deal with multiple charged ions in ESI-MSn spectra with feasible modifications and automatically identify both multiply charged molecular and fragment ions, which appear at a fractional m/z in high- resolution
Journal Pre-proof mass spectra. Adequate resolution is important for precursor selection in order to have unambiguous assignment of N-glycans using ion trap instrument. The isolation window of ion trap at FWHM 500 with a width of 3-5 mass units is wide enough to capture the entire isotope envelope, and therefore improving the sensitivity. In the present proof-of-concept work, only glycosidic cleavages are considered and used for assignment of branching patterns as the fragment ions thus produced contain little linkage information. Cross-ring cleavages have been used for partial linkage dete rmination in previous
of
reports [37, 38]. Incorporation of cross-ring fragmentations in future glyBranch development will
ro
certainly help assignment of particular monosaccharides on specific N-glycan’s antennae, and also
-p
improve the specificity of the method.
re
Declaration of Interests
lP
The authors declare no competing financial interests.
na
Author contributions
SS and CH conceived the study. SS, JZ and HW designed the approach concept and computational
ur
model, and CH, JZ and KZ designed the mass spectrometry methodology. CH established the MALDI-MSn and glycan structural analysis procedure and performed and analyzed the mass
Jo
spectral data. JZ and HW implemented the approach. MH developed the web-server of this approach. JD, WP, YW and DB provided constructive comments and advice on the approach concept. CH, JZ and KZ carried out glycan preparation and participated in MALDI-MSn data acquisition. HW, JZ, CH and SS wrote the manuscript. All authors discussed the results and commented on the manuscript.
Acknowledgements The
authors
thank
the
National
Key
Research
and
Development
Program
of
China(2018YFC0910405), the National Natural Science Foundation of China (31671369, 31600650, 31770775) and International Partnership Program of Chinese Academy of
Journal Pre-proof Sciences(No.153311KYSB20150012) for supporting their work.
Data availability All data used in this work can be found at https://github.com/bigict/glyBranch/
References [1]
R. Raman, S. Raguram, G. Venkataraman, J. C. Paulson, R. Sasisekharan, Glycomics: an
of
integrated systems approach to structure-function relationships of glycans, Nat Methods 2
[2]
ro
(11) (2005) 817–24.
J. C. Paulson, O. Blixt, B. E. Collins, Sweet spots in functional glycomics, Nat Chem Biol
J. D. Marth, P. K. Grewal, Mammalian glycosylation in immunity, Nat Rev Immunol 8
re
[3]
-p
2 (5) (2006) 238–48.
(11) (2008) 874–87.
F. Ju, J. Zhang, D. Bu, Y. Li, J. Zhou, H. Wang, Y. Wang, C. Huang, S. Sun, De novo
lP
[4]
glycan structural identification from mass spectra using tree merging strategy,
[5]
na
Computational biology and chemistry 80 (2019) 217–224. D. H. Dube, C. R. Bertozzi, Glycans in cancer and inflammation–potential for therapeutics
[6]
ur
and diagnostics, Nat Rev Drug Discov 4 (6) (2005) 477–88. L. R. Ruhaak, S. Miyamoto, C. B. Lebrilla, Developments in the identification of glycan
[7]
Jo
biomarkers for the detection of cancer, Mol Cell Proteomics 12 (4) (2013) 846–55. S. S. Pinho, C. A. Reis, Glycosylation in cancer: mechanisms and clinical implications, Nat Rev Cancer 15 (9) (2015) 540–55. [8]
G. W. Hart, R. J. Copeland, Glycomics hits the big time, Cell 143 (5) (2010) 672–6.
[9]
D. F. Smith, R. D. Cummings, Application of microarrays for deciphering the structure and function of the human glycome, Mol Cell Proteomics 12 (4) (2013) 902–12.
[10]
S. A. Springer, P. Gagneux, Glycomics: revealing the dynamic ecology and evolution of sugar molecules, J Proteomics 135 (2016) 90–100.
[11]
A. Varki, Biological roles of glycans, Glycobiology 27 (1) (2017) 3–49.
[12]
A. V. Everest-Dass, M. T. Briggs, G. Kaur, M. K. Oehler, P. Hoffmann, N. H. Packer, N-glycan MALDI imaging mass spectrometry on Formalin-Fixed Paraffin-Embedded
Journal Pre-proof tissue enables the delineation of ovarian cancer tissues, Mol Cell Proteomics 15 (9) (2016) 3003–16. [13]
Y. Kizuka, N. Taniguchi, Enzymes for N-glycan branching and their genetic and nongenetic regulation in cancer, Biomolecules 6 (2).
[14]
M. Anugraham, F. Jacob, S. Nixdorf, A. V. Everest-Dass, V. Heinzelmann-Schwarz, N. H. Packer, Specific glycosylation of membrane proteins in epithelial ovarian cancer cell lines: glycan structures reflect gene expression and DNA methylation status, Mol Cell Proteomics 13 (9) (2014) 2213–32. P. Pompach, Z. Brnakova, M. Sanda, J. Wu, N. Edwards, R. Goldman, Site-specific
of
[15]
ro
glycoforms of haptoglobin in liver cirrhosis and hepatocellular carcinoma, Mol Cell Proteomics 12 (5) (2013) 1281–93.
J. M. Pekelharing, E. Hepp, J. P. Kamerling, G. J. Gerwig, B. Leijnse, Alterations in
-p
[16]
carbohydrate composition of serum igg from patients with rheumatoid arthritis and from
C. Huhn, M. H. Selman, L. R. Ruhaak, A. M. Deelder, M. Wuhrer, IgG glycosylation
lP
[17]
re
pregnant women, Ann Rheum Dis 47 (2) (1988) 91–5.
analysis, Proteomics 9 (4) (2009) 882–913. [18]
J. N. Arnold, M. R. Wormald, R. B. Sim, P. M. Rudd, R. A. Dwek, The impact of
na
glycosylation on the biological function and structure of human immunoglobulins, Annu Rev Immunol 25 (2007) 21–50.
S. Sun, C. Huang, Y. Wang, Y. Liu, J. Zhang, J. Zhou, F. Gao, F. Yang, R. Chen, B.
ur
[19]
Jo
Mulloy, W. Chai, Y. Li, D. Bu, Toward automated identification of glycan branching patterns using multistage mass spectrometry with intelligent precursor selection, Analytical chemistry 90 (24) (2018) 14412–14422. [20]
Y. Wang, D. Bu, C. Huang, H. Wang, J. Zhou, J. Dong, W. Pan, J. Zhang, Q. Zhang, Y. Li, S. Sun, Best- first search guided multistage mass spectrometry-based glycan identification, Bioinformatics.
[21]
L. Zhang, S. Luo, B. Zhang, Glycan analysis of therapeutic glycoproteins, MAbs 8 (2) (2016) 205–15.
[22]
C. H. Smit, A. van Diepen, D. L. Nguyen, M. Wuhrer, K. F. Hoffmann, A. M. Deelder, C. H. Hokke, Glycomic analysis of life stages of the human parasite schistosoma mansoni reveals developmental expression profiles of functional and antigenic glycan motifs, Mol
Journal Pre-proof Cell Proteomics 14 (7) (2015) 1750–69. [23]
V. Reinhold, H. Zhang, A. Hanneman, D. Ashline, Toward a platform for comprehensive glycan sequencing, Mol Cell Proteomics 12 (4) (2013) 866–73.
[24]
P. M. Rudd, G. R. Guile, B. Kuster, D. J. Harvey, G. Opdenakker, R. A. Dwek, Oligosaccharide sequencing technology, Nature 388 (6638) (1997) 205–7.
[25]
T. W. Rademacher, P. Williams, R. A. Dwek, Agalactosyl glycoforms of IgG autoantibodies are pathogenic, Proc Natl Acad Sci U S A 91 (13) (1994) 6123–7.
[26]
R. Malhotra, M. R. Wormald, P. M. Rudd, P. B. Fischer, R. A. Dwek, R. B. Sim,
of
Glycosylation changes of IgG associated with rheumatoid arthritis can activate
[27]
ro
complement via the mannose-binding protein, Nat Med 1 (3) (1995) 237–43. A. Youings, S. C. Chang, R. A. Dwek, I. G. Scragg, Site-specific glycosylation of human
-p
immunoglobulin G is altered in four rheumatoid arthritis patients, Biochem J 314 ( Pt 2) (1996) 621–30.
R. A. Dwek, A. C. Lellouch, M. R. Wormald, Glycobiology: ‘the function of sugar in the
re
[28]
[29]
lP
IgG molecule’, J Anat 187 ( Pt 2) (1995) 279–92. S. Doubet, P. Albersheim, Letter to the glyco- forum: Carbbank, Glycobiology 2 (6) (1992) 505–505.
S. P. Gaucher, J. Morrow, J. A. Leary, STAT: a saccharide topology analysis tool used in
na
[30]
combination with tandem mass spectrometry, Anal Chem 72 (11) (2000) 2331–6. H. Tang, Y. Mechref, M. V. Novotny, Automated interpretation of MS/MS spectra of
ur
[31]
[32]
Jo
oligosaccharides, Bioinformatics 21 Suppl 1 (2005) i431–9. L. He, L. Xin, B. Shan, G. A. Lajoie, B. Ma, GlycoMaster DB: software to assist the automated identification of N-linked glycopeptides by tandem mass spectrometry, Journal of proteome research 13 (9) (2014) 3881–3895. [33]
A. Bondt, Y. Rombouts, M. H. Selman, P. J. Hensbergen, K. R. Reiding, J. M. Hazes, R. J. Dolhain, M. Wuhrer, Immunoglobulin G (IgG) Fab glycosylation analysis using a new mass spectrometric high-throughput profiling method reveals pregnancy-associated changes, Molecular & Cellular Proteomics 13 (11) (2014) 3029–3039.
[34]
A. E. Mahan, J. Tedesco, K. Dionne, K. Baruah, H. D. Cheng, P. L. De Jager, D. H. Barouch, T. Suscovich, M. Ackerman, M. Crispin, G. Alter, A method for high-throughput, sensitive analysis of IgG Fc and Fab glycosylation by capillary
Journal Pre-proof electrophoresis, Journal of immunological methods 417 (2015) 34–44. [35]
M. Šimurina, N. de Haan, F. Vučković, N. A. Kennedy, J. Štambuk, D. Falck, I. Trbojević-Akmačić, F. Clerc, G. Razdorov, A. Khon, et al., Glycosylation of immunoglobulin G associates with clinical features of inflammatory bowel diseases, Gastroenterology 154 (5) (2018) 1320–1333.
[36]
F. Vučković, E. Theodoratou, K. Thaçi, M. Timofeeva, A. Vojta, J. Štambuk, M. Pučić-Baković, P. M. Rudd, L. Derek, D. Servis, et al., IgG glycome in colorectal cancer, Clinical Cancer Research 22 (12) (2016) 3078–3086. A. S. Palma, Y. Liu, H. Zhang, Y. Zhang, B. V. McCleary, G. Yu, Q. Huang, L. S.
of
[37]
ro
Guidolin, A. E. Ciocchini, A. Torosantucci, et al., Unravelling glucan recognition systems by glycome microarrays using the designer approach and mass spectrometry, Molecular &
D. J. Harvey, L. Royle, C. M. Radcliffe, P. M. Rudd, R. A. Dwek, Structural and
re
quantitative analysis of N-linked glycans by matrix-assisted laser desorption ionization and
ur
na
lP
negative ion nanospray mass spectrometry, Analytical biochemistry 376 (1) (2008) 44–60.
Jo
[38]
-p
Cellular Proteomics 14 (4) (2015) 974–988.
Journal Pre-proof
Highlights A spectra tree of MS n was assembled for identification of glycan branching patterns. A strategy named glyBranch was proposed to interpret MS n spectra in the spectra tree.
of
The tanh function was introduced and proved to be more suitable for peak scoring.
-p
ro
Isomeric structures with different branching patterns were distinguished using glyBranch.
re
Complicated branching patterns of glycans were identified using glyBranch.
Jo
ur
na
lP
Graphical abstract
Figure 1
Figure 2
Figure 3