Identification of glycan branching patterns using multistage mass spectrometry with spectra tree analysis

Identification of glycan branching patterns using multistage mass spectrometry with spectra tree analysis

Journal Pre-proof Identification of glycan branching patterns using multistage mass spectrometry with spectra tree analysis Hui Wang, Jingwei Zhang, ...

4MB Sizes 0 Downloads 7 Views

Journal Pre-proof Identification of glycan branching patterns using multistage mass spectrometry with spectra tree analysis

Hui Wang, Jingwei Zhang, Junchuan Dong, Meijie Hou, Weiyi Pan, Dongbo Bu, Jinyu Zhou, Qi Zhang, Yaojun Wang, Keli Zhao, Yan Li, Chuncui Huang, Shiwei Sun PII:

S1874-3919(20)30017-8

DOI:

https://doi.org/10.1016/j.jprot.2020.103649

Reference:

JPROT 103649

To appear in:

Journal of Proteomics

Received date:

25 September 2019

Revised date:

2 January 2020

Accepted date:

16 January 2020

Please cite this article as: H. Wang, J. Zhang, J. Dong, et al., Identification of glycan branching patterns using multistage mass spectrometry with spectra tree analysis, Journal of Proteomics (2020), https://doi.org/10.1016/j.jprot.2020.103649

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2020 Published by Elsevier.

Journal Pre-proof

Identification of Glycan Branching Patterns Using Multistage Mass Spectrometry with Spectra Tree Analysis Hui Wanga,d,1 , Jingwei Zhanga,c,1 , Junchuan Donga,d, Meijie Houa,d, Weiyi Pana,d, Dongbo Bua,d, Jinyu Zhoub,d, Qi Zhanga,d, Yaojun Wanga,e, Keli Zhaob,d, Yan Lib,d, Chuncui Huangb,d,* , Shiwei Suna,d,* a

Key Laboratory of Intelligent Information Processing, Institute of Computing Technology,

Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.

Department of Computer Science, Stony Brook University, Stony Brook, NY 11794, USA University of Chinese Academy of Sciences, Beijing 100049, China

-p

d

College of Information and Electrical Engineering, China Agricultural University 100083,China

re

e

ro

c

of

Chinese Academy of Sciences, Beijing 100190, China. b

lP

Abstract

Glycans are crucial to a wide range of biological processes, and their biological activities are closely related to the branching patterns of structures. Different from the simple linear chains of

na

proteins, branching patterns of glycans are more complicated, making their identification extremely challenging. Tandem mass spectrometry (MS 2 ) cannot provide sufficient structural

ur

information to deduce glycan branching patterns even with the assistance of various bioinformatic

Jo

tools and algorithms.The promising technology to identify glycan branching patterns is multi-stage mass spectrometry (MSn ). The production-relationship among MSn spectra of a glycan is essentially a tree, making deducing glycan structures from MS n spectra a great challenge. In the present study, we report an approach called glyBranch (glycan Branching pattern identification based on spectra tree) to fully exploit the information contained in the MS n spectra tree for glycan identification. Using 14 glycan standards, including 2 pairs with isomeric sequence, and 16 complex N-glycans isolated from RNase B and IgG, we demonstrated the successful application of glyBranch to branching pattern analysis. *

Corresponding author Email address: [email protected] (Chuncui Huang), [email protected] (Shiwei Sun) 1

H.W. and J.Z. contributed equally

Journal Pre-proof The source code of glyBranch is available at https://github.com/bigict/glyBranch/. We have also developed a web-server, which is freely accessible at http://glycan.ict.ac.cn/glyBranch/. Keywords: Multi-stage mass spectrometry, glycan branching pattern, glycan identification, spectra tree, isomeric glycan

Significance

of

Glycans are crucial in various biological processes and their functions are closely related to the details of their structures; thus, the identification of glycan branching patterns is of great

ro

significance to biological studies. Multistage mass spectrometry (MS n ) can provide detailed

-p

structural information by generating multiple-level fragments through consecutive fragmentation; however, the interpretation of numerous MS n spectra is extremely challenging. In this study, we

re

present an approach called glyBranch (glycan Bra nching pattern identification based on spectra tree) to exploit the information contained in MS n spectra tree for glycan identification. This

lP

approach will greatly facilitate the automated identification of glycan structures and related

Introduction

ur

1

na

biological studies.

Glycosylation is one of the most important post-translational modifications (PTM) of proteins, and

Jo

more than half of the proteins are glycosylated during the process of post translation. Glycans play important roles in a wide variety of biological processes such as protein conformation, cell proliferation and differentiation, cell-cell communication, immune response and microbial adhesion [1, 2, 3, 4]. Moreover, glycans are critically involved in the pathogenesis and progression of various diseases, and have been used as biomarkers for clinical diagnosis [5, 6, 7]. For instance, an increase in N-glycans with bisecting N-acetylglucosamine (GlcNAc) occurs frequently in cancer cells, and large tetra-antennary branching structures are commonly formed in cancers [7, 8]. In addition, a dramatic increase of the N-glycans lacking outer-arm galactose residues has been detected in the patients with rheumatoid arthritis (RA) [9, 10]. The roles and functions of glycans are closely related to the details of their structures. For example, human milk oligosaccharides (HMOs) are a family of structurally diverse unconjugated

Journal Pre-proof oligosaccharides that are highly abundant in human milk. Most HMOs are elongation products of lactose composed of monosaccharides with various branching patterns, followed by adding N-glucosamine, galactose, fucose and sialic acid. Structures of HMOs determine the bacterial composition in the infant’s intestine tract, which can protect infants from pathogen and infection. A number of methodologies have been proposed for glycan identification, among which mass spectrometry (MS) is one of the most sensitive techniques without the need of using glycan standards [11, 12, 13]. Most of the existing MS-based methods explore MS1 or MS2 information for glycan identification [14, 15, 16, 17, 18]. MS 1 and MS2 have been proved to be successful in

of

peptide identification; however, they cannot provide sufficient information to elucidate the complex branching patterns of glycans [19, 20, 21, 22]. In addition, the number of isomeric

ro

glycans is much greater than that of peptides. Superior to MS 2 , multi-stage mass spectrometry

-p

(MSn , n > 2 ) can provide more detailed structural information by generating multiple- level fragments through consecutive fragmentation of a glycan. Based on these multiple- level

re

fragments, it is possible to determine the exact branching patterns of glycans. The

lP

production-relationship among MSn spectra of a glycan is essentially a tree, making deducing glycan structures from MSn spectra a great challenge. Given the fact that the MS n provides far

26, 27, 28].

na

more detailed information on glycan structures, computational tools are timely needed [23, 24, 25,

In this study, we present an approach glyBranch for assignment of glycan branching

ur

patterns through interpretation of MS n spectra based on glycan structure database. glyBranch

Jo

automatically integrated information from all acquired MS n spectra, and assigned specific branching patterns from the database GDB (consisting of a total of 7258 glycan structures excerpted from CarbBank [29]). In this approach, we explored the pattern of isotopic peaks to measure the quality of MSn spectra, and filtered out the low-quality spectra. We further proposed a new scoring function to compare the experimental MS n spectra with the known glycan structures listed in GDB. The structure with the highest probability was reported by glyBranch as identification result. We applied glyBranch to assignment of 14 standards, including N-glycans and human milk oligosaccharides (HMOs) as representatives, and to identification of 2 pair of glycan isomers before it was used in the identification of N-glycans isolated from two glycoproteins.

Journal Pre-proof

2.

Method

2.1. MALDI-MS Permethylated oligosaccharide standards and N-glycans released from glycoproteins were analyzed on an Axima MALDI Resonance mass spectrometer with a QIT-TOF configuration (Shimadzu). A nitrogen laser was used to irradiate samples at 337 nm, with an average of 200 shots accumulated. Permethylated glycan standards and N- glycans from glycoproteins dissolved in methanol were applied to a  focus MALDI plate target (900  m , 384 circles, HST). A matrix

of

solution (0.5  L ) of 2,5-dihydroxybenzoic acid (20 mg / mL ) in a mixture of methanol/water (1:1) containing 0.1% trifluoroacetic acid and 1 mM NaCl was added to the plate and mixed with

ro

samples. The mixture was air dried at room temperature before analysis. The product- ion spectra

-p

acquired were converted into platform- independent data formats, mzXML file by the Shimadzu Biotech Launchpad for later identification. For MS 2 , MNa+ was used as the precursor and the

re

collision energy was optimized between 100-200 V , and for MS3 the energy was set at 200-300

V . For MS4 , it was at 300-400 V , and 400-600 V for MS5 to produce the MSn spectrum. For

lP

precursor selection, among the four different resolution settings of the instrument (FWHM 70, 250, 500 and 1,000) with ‘two-step selection’ design, the window at FWHM 500 with a width of

na

3-5 mass units was considered appropriate and used for the present study. During MS n scanning, five most intense peaks in each acquired spectrum were selected as precursors to undergo

Jo

ur

next-stage dissociation until MS5 spectra were generated.

2.2. glyBranch approach for glycan identification glyBranch aims to identify glycan branching structures based on a collection of MS n (n > 1) spectra of the glycan of interest. The spectra were generated as follows: at each stage of MS n experiment, the 5 most- intensive peaks were selected as precursor- ions for next-stage MS scanning. This procedure was repeated up to MS 5 stage, thus generating a total of 156 (= 53  52  51  50 ) MSn spectra for a glycan of interest. We organize these MS n spectra into a spectra

tree according to the production relationship among them; that is, if an MS i+1 spectrum is a production-ion spectrum of an MSi spectrum, we connect these two spectra using an edge. The basic operations of glyBranch approach are shown in Figure 1 and described as follows:

Journal Pre-proof Figure 1: The workflow of glyBranch for glycan identification. glyBranch contains three steps: (i ) Removing noisy peaks and filtering out low-quality spectra; (ii ) Generating theoretical

spectra tree of candidate glycan structure; (iii ) Scoring candidate glycan structure through comparing theoretical and experimental spectra trees.

(1) Removing noisy peaks and filtering out low-quality spectra MSn spectra usually contain a large amount of noisy peaks. In this study, a peak is treated as noise if its relative intensity was below a threshold Th , which was set through estimating noise

of

level: In a spectrum, the peaks with m/z less than 146 (m/z of free fucose residue) or greater than

ro

MNa+ (m/z of the precursor ion) are definitely noise. We collect these peaks and calculate the mean  and standard deviation  of their intensities. According to the “three-sigma―

-p

rule, we set the noise level as   3 . In the case that these noise peaks do not exist, we simply

re

use 0.1 as noise level. In summary, we set threshold Th = max{0.1,   3 } and remove all peaks with relative intensity less than Th .

lP

After denoising peaks, we further filter out low-quality spectra. Here, a spectrum is treated as low-quality if it does not contain isotopic peak or significant peak. Specifically, for a spectrum,

na

we calculate its quality as I 3 / I med , where I 3 represents the average intensity of the top 3 peaks,

ur

and I med represents the median intensity of all peaks in the spectrum. A spectrum with

I3 / I med < 2 is treated as low-quality. It should be pointed out once a spectrum was filtered out, we

Jo

also filter out all of its product- ion spectra, i.e., the spectra that are generated using a fragment-ion in this spectrum as precursor ion. (2) Generating theoretical spectra tree of candidate glycan structure From the primary mass spectrum of the glycan of interest, we first determine its molecular mass and then extract the glycans with identical mass from GDB as candidate glycan structures, denoted as G1 , G2 ,

, Gn . Next, for each candidate glycan structure Gi , we generate a theoretical

spectra tree that has identical production relationship with the experimental spectra tree as follows: (i ) Initially, we generate theoretical MS 2 spectrum S 2 of Gi through simulating

fragmentation process of Gi . Here we adopt the simulating fragmentation process used in GIPS [19].

Journal Pre-proof (ii ) For each experimental MS3 spectrum S3 , we first identify the fragments Fi 2 in Gi

that has identical mass to its parent- ion. Next, we generate the theoretical MS 3 spectrum S3 of

Gi through simulating fragmentation process of the fragments in Fi 2 . (iii ) Now consider each experimental MS 4 spectrum (denoted as S 4 ) which is

product- ion spectrum of S3 . We first identify the fragments Fi 3 in Gi that has identical mass to the parent-ion of S 4 . To guarantee that the theoretical spectrum S 4 is also a product- ion

of

spectrum of S3 , we require that each fragment in Fi 3 to be a substructure of a certain fragment in

ro

Fi 2 . Next, we generate the theoretical MS 4 spectrum S 4 of Gi through simulating fragmentation

process of the fragments Fi 3 .

-p

(iv) The theoretical MS5 spectra is generated in similar manner.

re

This way, for the experimental MS n spectra, we construct theoretical spectra as their counterparts, which share identical production relationship to the experimental ones.

lP

(3) Scoring candidate glycan structure through comparing theoretical and experimental spectra trees

na

For each experimental spectrum S and its theoretical counterpart S , we measure their similarity Sim( S , S ) based on the common peaks shared by them. The existing scoring functions

ur

either simply use the shared peak count (SPC) [30, 31], which is sensitive to noisy peaks, or use

Jo

logarithm of peak intensities, i.e.,

Sim(S , S ) =  pC ( S , S )lnI p

where C ( S , S ) represents the shared peaks of S and S , and I p represents the intensity of peak p . The logarithmic scoring function has shown its success in practice [32]. However, this function is dominated by a small amount of the most intensive peaks. To overcome this shortcoming, we here propose to use weighted tanh scoring function:

Sim(S , S ) =  pC ( S ,S )cos( Dp , Dp )tanhI p where D p and D p represent the intensities of isotopic derivatives of p in S and S ,

Journal Pre-proof respectively. The advantage of this scoring scheme are two- fold: i ) Using the tanh function, the intensive shared peaks would lead to high score; more importantly, no peak will dominate the scoring function. ii ) Using the weight cos( Dp , Dp ) , the peaks with isotopic derivatives would be emphasized since these peaks are more reliable. For a candidate glycan Gi , we generate theoretical spectra tree TG according to the i

experimental spectra tree T and then calculate the similarity between these two trees as follows:

of

Sim(T , TG ) =  ST Sim(S , SG ) i

i

ro

The similarity Sim(T , TG ) measures the possibility that the experimental spectra are i

-p

generated from Gi . We finally normalize these possibilities into the range [0,1] using SoftMax.

Results and Discussion

lP

3.

re

The candidate glycan with the highest possibility is finally reported as the actual glycan.

3.1. Comparison of various scoring functions

na

To illustrate the advantages of tanh( I ) , we compared it with two popular scoring functions. As shown in Table 1, when no peak-denoising is applied, glyBranch could correctly identify 29 out of

ur

30 glycans using tanh-score, which is significantly higher than SPC-score (22) and ln-score (26).

Jo

tanh-score outperforms the other two scoring function even when denoising level increases. A reasonable explanation of the advantage of tanh-score is that the SPC-score is very sensitive to noise while ln-score is too sensitive to mismatched peaks. We examined this issue in more details as follows.

Table 1: The number of correctly-identified glycans using different scoring functions. Here, tanh-score, SPC-score and ln-score represent the score functions tanh( I ) , shared peak count, and ln( I ) , respectively. Peak-denoising threshold

tanh-score SPC-score

ln-score

0%

29

22

26

1%

29

26

26

Journal Pre-proof 2%

26

29

26

Figure 2 (a) shows the actual structure Man-5 and one of its isomeric structures (denoted as wrong structure). The experimental MS 3 spectrum produced from m/z 1084 precursor, together with fragment annotations using these two glycan structures, are shown in Figure 2 (c). The wrong structure could explain 6 MS3 peaks while the actual structure could explain 5 peaks. Thus, the wrong structure was assigned with higher SPC-score. However, the 6 peaks explained by the wrong structure have lower intensity than the 5 peaks explained by the actual glycan. The

of

tanh-score considers both number of shared peaks and peak intensity and thus assign high score to the actual glycan.

ro

We further investigated the advantage of tanh-score over ln-score using Man-6 as an

-p

example. Specifically, Figure 2 (b) shows the actual glycan Man-6 and one of its isomeric glycan structures (denoted as wrong structure) and Figure 2 (d) shows the experimental MS 4 spectrum

re

produced from m/z 927 precursor with fragment annotation using these two glycans. The peak at

lP

m/z 677 (the isotopic peak of m/z 676) is the most intensive one. This peak can be explained by the wrong structure only; thus, the wrong structure was assigned with a higher ln-score. In contrast,

ur

in a reasonable manner.

na

tanh-score gives each peak a score in the range [0,1] and thus controls the effect of peak intensity

Figure 2: Investigation of the advantage of tanh-score over SPC-score and ln-score using Man-5

Jo

and Man-6 as examples. (a) The actual glycan Man-5 and one of its isomeric glycan structures. (b) The actual glycan Man-6 and one of its isomeric glycan structures. (c) The experimenta l MS 3 spectrum produced from m/z 1084 precursor for Man-5 with fragment annotation using both Man-5 and the isomeric glycan. (d) The experimental MS 4 spectrum produced from m/z 927 precursor for Man-6 with fragment annotation using both Man-6 and the isomeric glycan.

3.2. Identification of glycan standards including isomeric glycan pairs We further evaluated glyBranch on 14 N-glycans and HMOs. As shown in Table 2, glyBranch could correctly identify all of these 14 glycans and assigned the actual glycan with probabilities close or equal to 1. This result clearly suggests the performance of glyBranch. We also investigated the ability of glyBranch to distinguish isomeric glycan pairs. The 14

Journal Pre-proof glycans used here contain two pairs of isomers, i.e., LNDFH-I and LNDFH-II, MFLNH-I and MFLNH-III. The isomeric glycan structures differ only in which monosaccharide fucose links to: in MFLNH-I and LNDFH-I, fucose links to Gal, whereas in MFLNH-III and LNDFH-II, fucose links to GlcNAc and Glc, respectively. Here we used MFLNH-I and MFLNH-III as an example to describe the identification process of glyBranch. Based on the primary mass spectrum with MNa + at m/z 1549, 16 isomeric candidate structures, including MFLNH-I and MFLNH-III, were extracted from GDB. For the MFLNH-I sample, the experimental MS2 spectrum provided sufficient structurally- informative peaks to

of

distinguish it from MFLNH-III. As shown in Figure 3 (b), 4 more MS 2 peaks were annotated

ro

using MFLNH-I than MFLNH-III. Furthermore, the MS3 , MS4 , and MS5 spectra demonstrated a total of 14 distinctive peaks of MFLNH-I. Based on this information, glyBranch assigned a higher

-p

probability to MFLNH-I (1.00) than MFLNH-III (0.00) and thus correctly reported MFLNH-I as the actual glycan.

re

For the MFLNH-III sample, the MS2 spectrum provided no information for distinguishing

lP

these two isomers since they can annotate exactly the same peaks (Figure 3 (c)). However, the subsequent MS3 , MS4 and MS5 spectra revealed 6 distinctive peaks of MFLNH-III but only 2

na

distinctive peak of MFLNH-I. Thus, glyBranch correctly identified MFLNH-III as the actual

ur

glycan.

Figure 3: Distinguishing isomeric glycan pairs using glyBranch. (a) Isotopic pair MFLNH-I and

Jo

MFLNH-III. (b) The MS2 spectrum produced using precursor ion m/z 1549 for the sample MFLNH-I. (c), (d), and (e) show the MS 2 , MS3 , and MS4 spectrum acquired from the sample MFLNH-III, respectively.

Table 2: The identification results of glyBranch on 14 glycans standards. Here “Noc” is the abbreviation for “Number of candidates”. Glycan

Structure

MNa+

Noc

Probability

Rank

A2

[image]

2792

3

1.00

1

LNDFH-I

[image]

1274

8

1.00

1

LNDFH-II

[image]

1274

8

0.99

1

MFLNH-I

[image]

1549

16

1.00

1

Journal Pre-proof [image]

1549

16

0.99

1

Man-5D1

[image]

1579

16

0.95

1

Man-6

[image]

1783

15

0.66

1

Man-7D1

[image]

1987

11

0.99

1

NA2

[image]

2070

10

1.00

1

NA3

[image]

2519

8

1.00

1

NA4

[image]

2968

14

1.00

1

NGA2

[image]

1661

16

1.00

1

NGA3

[image]

1906

19

1.00

1

NGA4

[image]

2152

14

of

MFLNH-III

1

ro

1.00

Table 3: The identification results of glyBranch for the samples isolated from RNase B and IgG. MNa+

[image]

1579

[image]

1783

Rank

15

1

14

0.87

1

[image]

1987

11

0.05

2

[image]

2192

11

0.99

1

[image]

2396

15

0.84

1

[image]

1835

10

0.88

1

[image]

2040

9

0.98

1

[image]

2070

10

0.55

1

[image]

2081

7

0.98

1

[image]

2244

14

0.99

1

[image]

2285

10

0.84

1

[image]

2401

6

0.88

1

[image]

2605

5

0.95

1

[image]

2850

3

1.00

1

[image]

2966

13

0.99

1

[image]

3211

8

0.88

1

ur Jo IgG

Probability 0.73

na

RNase B

Noc

re

Structure

lP

Glycoprotein

-p

Here “Noc” is the abbreviation for “Number of candidates”.

3.3. Identification of glycans released from RNase B and human serum IgG

Journal Pre-proof Next we applied glyBranch for profiling the N  glycans isolated from RNase B and human serum IgG. In the primary MS spectrum of RNase B, five major peaks were produced at m/z 1579, 1783, 1987, 2192 and 2396. Then MSn scanning was performed for these five peaks. Using the present approach, four out of the five peaks (except for m/z 1987) were successfully identified as Man-5D1, Man-6, Man-8D1D3 and Man-9, with the probability of 0.73, 0.87, 0.99, and 0.84, respectively (Table 3). We further investigated why glyBranch failed at m/z 1987 (Man-7). As revealed by CE analysis, Man-5, Man-6 and Man-9 each contains a single component, but Man-7 contains multiple components (Man-7D1, Man-7D2, and Man-7D3 with approximate

of

concentrations of 26%, 26%, and 48%, respectively). Man-8 also contains multiple components,

ro

among which Man-8D1D3 is a dominant one with purity of 85%. This result suggests that glyBranch has the potential to identify mixed samples with a dominant component.

-p

Compared with the glycans isolated from RNase B sample, the glycans isolated from IgG sample are more complex. In this study, IgG was extracted and purified from 1 L human serum,

re

and SDS-PAGE was then performed to evaluate the purity. High purity I gG was prepared for

lP

subsequent research. In the primary spectrum of this complex sample, we selected the most abundant 11 N-glycans with m/z 1835, 2040, 2070, 2081, 2244, 2285, 2401, 2605, 2851, 2966 and

na

3211 for MSn fragmentation. As shown in Table 3, all of the 11 complex-type N-glycans were successfully identified with high probabilities of 0.88, 0.98, 0.54, 0.98, 0.99, 0.84, 0.88 0.95, 1.00,

ur

0.99 and 0.88, respectively. Among these identified glycans, seven were corroborated by available

4.

Jo

N-glycan standards while the other four by literature data [33, 34, 35, 36].

Conclusions

In the study, we present an approach that can effectively identify glycan structures using MS n spectra. We demonstrated the successful application of our approach in identification of 14 glycan standards including two isomeric pairs, and complex samples isolated from RNase B and human IgG. In the current study, we used MALDI-MSn to produce mass spectrum. For different ionization modes and instrumentations, glyBranch can be easily extended to deal with multiple charged ions in ESI-MSn spectra with feasible modifications and automatically identify both multiply charged molecular and fragment ions, which appear at a fractional m/z in high- resolution

Journal Pre-proof mass spectra. Adequate resolution is important for precursor selection in order to have unambiguous assignment of N-glycans using ion trap instrument. The isolation window of ion trap at FWHM 500 with a width of 3-5 mass units is wide enough to capture the entire isotope envelope, and therefore improving the sensitivity. In the present proof-of-concept work, only glycosidic cleavages are considered and used for assignment of branching patterns as the fragment ions thus produced contain little linkage information. Cross-ring cleavages have been used for partial linkage dete rmination in previous

of

reports [37, 38]. Incorporation of cross-ring fragmentations in future glyBranch development will

ro

certainly help assignment of particular monosaccharides on specific N-glycan’s antennae, and also

-p

improve the specificity of the method.

re

Declaration of Interests

lP

The authors declare no competing financial interests.

na

Author contributions

SS and CH conceived the study. SS, JZ and HW designed the approach concept and computational

ur

model, and CH, JZ and KZ designed the mass spectrometry methodology. CH established the MALDI-MSn and glycan structural analysis procedure and performed and analyzed the mass

Jo

spectral data. JZ and HW implemented the approach. MH developed the web-server of this approach. JD, WP, YW and DB provided constructive comments and advice on the approach concept. CH, JZ and KZ carried out glycan preparation and participated in MALDI-MSn data acquisition. HW, JZ, CH and SS wrote the manuscript. All authors discussed the results and commented on the manuscript.

Acknowledgements The

authors

thank

the

National

Key

Research

and

Development

Program

of

China(2018YFC0910405), the National Natural Science Foundation of China (31671369, 31600650, 31770775) and International Partnership Program of Chinese Academy of

Journal Pre-proof Sciences(No.153311KYSB20150012) for supporting their work.

Data availability All data used in this work can be found at https://github.com/bigict/glyBranch/

References [1]

R. Raman, S. Raguram, G. Venkataraman, J. C. Paulson, R. Sasisekharan, Glycomics: an

of

integrated systems approach to structure-function relationships of glycans, Nat Methods 2

[2]

ro

(11) (2005) 817–24.

J. C. Paulson, O. Blixt, B. E. Collins, Sweet spots in functional glycomics, Nat Chem Biol

J. D. Marth, P. K. Grewal, Mammalian glycosylation in immunity, Nat Rev Immunol 8

re

[3]

-p

2 (5) (2006) 238–48.

(11) (2008) 874–87.

F. Ju, J. Zhang, D. Bu, Y. Li, J. Zhou, H. Wang, Y. Wang, C. Huang, S. Sun, De novo

lP

[4]

glycan structural identification from mass spectra using tree merging strategy,

[5]

na

Computational biology and chemistry 80 (2019) 217–224. D. H. Dube, C. R. Bertozzi, Glycans in cancer and inflammation–potential for therapeutics

[6]

ur

and diagnostics, Nat Rev Drug Discov 4 (6) (2005) 477–88. L. R. Ruhaak, S. Miyamoto, C. B. Lebrilla, Developments in the identification of glycan

[7]

Jo

biomarkers for the detection of cancer, Mol Cell Proteomics 12 (4) (2013) 846–55. S. S. Pinho, C. A. Reis, Glycosylation in cancer: mechanisms and clinical implications, Nat Rev Cancer 15 (9) (2015) 540–55. [8]

G. W. Hart, R. J. Copeland, Glycomics hits the big time, Cell 143 (5) (2010) 672–6.

[9]

D. F. Smith, R. D. Cummings, Application of microarrays for deciphering the structure and function of the human glycome, Mol Cell Proteomics 12 (4) (2013) 902–12.

[10]

S. A. Springer, P. Gagneux, Glycomics: revealing the dynamic ecology and evolution of sugar molecules, J Proteomics 135 (2016) 90–100.

[11]

A. Varki, Biological roles of glycans, Glycobiology 27 (1) (2017) 3–49.

[12]

A. V. Everest-Dass, M. T. Briggs, G. Kaur, M. K. Oehler, P. Hoffmann, N. H. Packer, N-glycan MALDI imaging mass spectrometry on Formalin-Fixed Paraffin-Embedded

Journal Pre-proof tissue enables the delineation of ovarian cancer tissues, Mol Cell Proteomics 15 (9) (2016) 3003–16. [13]

Y. Kizuka, N. Taniguchi, Enzymes for N-glycan branching and their genetic and nongenetic regulation in cancer, Biomolecules 6 (2).

[14]

M. Anugraham, F. Jacob, S. Nixdorf, A. V. Everest-Dass, V. Heinzelmann-Schwarz, N. H. Packer, Specific glycosylation of membrane proteins in epithelial ovarian cancer cell lines: glycan structures reflect gene expression and DNA methylation status, Mol Cell Proteomics 13 (9) (2014) 2213–32. P. Pompach, Z. Brnakova, M. Sanda, J. Wu, N. Edwards, R. Goldman, Site-specific

of

[15]

ro

glycoforms of haptoglobin in liver cirrhosis and hepatocellular carcinoma, Mol Cell Proteomics 12 (5) (2013) 1281–93.

J. M. Pekelharing, E. Hepp, J. P. Kamerling, G. J. Gerwig, B. Leijnse, Alterations in

-p

[16]

carbohydrate composition of serum igg from patients with rheumatoid arthritis and from

C. Huhn, M. H. Selman, L. R. Ruhaak, A. M. Deelder, M. Wuhrer, IgG glycosylation

lP

[17]

re

pregnant women, Ann Rheum Dis 47 (2) (1988) 91–5.

analysis, Proteomics 9 (4) (2009) 882–913. [18]

J. N. Arnold, M. R. Wormald, R. B. Sim, P. M. Rudd, R. A. Dwek, The impact of

na

glycosylation on the biological function and structure of human immunoglobulins, Annu Rev Immunol 25 (2007) 21–50.

S. Sun, C. Huang, Y. Wang, Y. Liu, J. Zhang, J. Zhou, F. Gao, F. Yang, R. Chen, B.

ur

[19]

Jo

Mulloy, W. Chai, Y. Li, D. Bu, Toward automated identification of glycan branching patterns using multistage mass spectrometry with intelligent precursor selection, Analytical chemistry 90 (24) (2018) 14412–14422. [20]

Y. Wang, D. Bu, C. Huang, H. Wang, J. Zhou, J. Dong, W. Pan, J. Zhang, Q. Zhang, Y. Li, S. Sun, Best- first search guided multistage mass spectrometry-based glycan identification, Bioinformatics.

[21]

L. Zhang, S. Luo, B. Zhang, Glycan analysis of therapeutic glycoproteins, MAbs 8 (2) (2016) 205–15.

[22]

C. H. Smit, A. van Diepen, D. L. Nguyen, M. Wuhrer, K. F. Hoffmann, A. M. Deelder, C. H. Hokke, Glycomic analysis of life stages of the human parasite schistosoma mansoni reveals developmental expression profiles of functional and antigenic glycan motifs, Mol

Journal Pre-proof Cell Proteomics 14 (7) (2015) 1750–69. [23]

V. Reinhold, H. Zhang, A. Hanneman, D. Ashline, Toward a platform for comprehensive glycan sequencing, Mol Cell Proteomics 12 (4) (2013) 866–73.

[24]

P. M. Rudd, G. R. Guile, B. Kuster, D. J. Harvey, G. Opdenakker, R. A. Dwek, Oligosaccharide sequencing technology, Nature 388 (6638) (1997) 205–7.

[25]

T. W. Rademacher, P. Williams, R. A. Dwek, Agalactosyl glycoforms of IgG autoantibodies are pathogenic, Proc Natl Acad Sci U S A 91 (13) (1994) 6123–7.

[26]

R. Malhotra, M. R. Wormald, P. M. Rudd, P. B. Fischer, R. A. Dwek, R. B. Sim,

of

Glycosylation changes of IgG associated with rheumatoid arthritis can activate

[27]

ro

complement via the mannose-binding protein, Nat Med 1 (3) (1995) 237–43. A. Youings, S. C. Chang, R. A. Dwek, I. G. Scragg, Site-specific glycosylation of human

-p

immunoglobulin G is altered in four rheumatoid arthritis patients, Biochem J 314 ( Pt 2) (1996) 621–30.

R. A. Dwek, A. C. Lellouch, M. R. Wormald, Glycobiology: ‘the function of sugar in the

re

[28]

[29]

lP

IgG molecule’, J Anat 187 ( Pt 2) (1995) 279–92. S. Doubet, P. Albersheim, Letter to the glyco- forum: Carbbank, Glycobiology 2 (6) (1992) 505–505.

S. P. Gaucher, J. Morrow, J. A. Leary, STAT: a saccharide topology analysis tool used in

na

[30]

combination with tandem mass spectrometry, Anal Chem 72 (11) (2000) 2331–6. H. Tang, Y. Mechref, M. V. Novotny, Automated interpretation of MS/MS spectra of

ur

[31]

[32]

Jo

oligosaccharides, Bioinformatics 21 Suppl 1 (2005) i431–9. L. He, L. Xin, B. Shan, G. A. Lajoie, B. Ma, GlycoMaster DB: software to assist the automated identification of N-linked glycopeptides by tandem mass spectrometry, Journal of proteome research 13 (9) (2014) 3881–3895. [33]

A. Bondt, Y. Rombouts, M. H. Selman, P. J. Hensbergen, K. R. Reiding, J. M. Hazes, R. J. Dolhain, M. Wuhrer, Immunoglobulin G (IgG) Fab glycosylation analysis using a new mass spectrometric high-throughput profiling method reveals pregnancy-associated changes, Molecular & Cellular Proteomics 13 (11) (2014) 3029–3039.

[34]

A. E. Mahan, J. Tedesco, K. Dionne, K. Baruah, H. D. Cheng, P. L. De Jager, D. H. Barouch, T. Suscovich, M. Ackerman, M. Crispin, G. Alter, A method for high-throughput, sensitive analysis of IgG Fc and Fab glycosylation by capillary

Journal Pre-proof electrophoresis, Journal of immunological methods 417 (2015) 34–44. [35]

M. Šimurina, N. de Haan, F. Vučković, N. A. Kennedy, J. Štambuk, D. Falck, I. Trbojević-Akmačić, F. Clerc, G. Razdorov, A. Khon, et al., Glycosylation of immunoglobulin G associates with clinical features of inflammatory bowel diseases, Gastroenterology 154 (5) (2018) 1320–1333.

[36]

F. Vučković, E. Theodoratou, K. Thaçi, M. Timofeeva, A. Vojta, J. Štambuk, M. Pučić-Baković, P. M. Rudd, L. Derek, D. Servis, et al., IgG glycome in colorectal cancer, Clinical Cancer Research 22 (12) (2016) 3078–3086. A. S. Palma, Y. Liu, H. Zhang, Y. Zhang, B. V. McCleary, G. Yu, Q. Huang, L. S.

of

[37]

ro

Guidolin, A. E. Ciocchini, A. Torosantucci, et al., Unravelling glucan recognition systems by glycome microarrays using the designer approach and mass spectrometry, Molecular &

D. J. Harvey, L. Royle, C. M. Radcliffe, P. M. Rudd, R. A. Dwek, Structural and

re

quantitative analysis of N-linked glycans by matrix-assisted laser desorption ionization and

ur

na

lP

negative ion nanospray mass spectrometry, Analytical biochemistry 376 (1) (2008) 44–60.

Jo

[38]

-p

Cellular Proteomics 14 (4) (2015) 974–988.

Journal Pre-proof

Highlights  A spectra tree of MS n was assembled for identification of glycan branching patterns.  A strategy named glyBranch was proposed to interpret MS n spectra in the spectra tree.

of

 The tanh function was introduced and proved to be more suitable for peak scoring.

-p

ro

 Isomeric structures with different branching patterns were distinguished using glyBranch.

re

 Complicated branching patterns of glycans were identified using glyBranch.

Jo

ur

na

lP

Graphical abstract

Figure 1

Figure 2

Figure 3