Data in Brief 6 (2016) 68–76
Contents lists available at ScienceDirect
Data in Brief journal homepage: www.elsevier.com/locate/dib
Data article
Glycosylation characterization of therapeutic mAbs by top- and middle-down mass spectrometry Bao Quoc Tran a, Christopher Barton b, Jinhua Feng b, Aimee Sandjong b, Sung Hwan Yoon a, Shivangi Awasthi a, Tao Liang a, Mohd M. Khan a, David P.A. Kilgour a, David R. Goodlett a,n, Young Ah Goo a,n a b
Department of Pharmaceutical Sciences, School of Pharmacy, University of Maryland, Baltimore, MD, USA MedImmune, LLC, Gaithersburg, MD, USA
a r t i c l e i n f o
abstract
Article history: Received 20 October 2015 Received in revised form 4 November 2015 Accepted 10 November 2015 Available online 24 November 2015
A reference monoclonal antibody IgG1 and a fusion IgG protein were analyzed by top- and middle-down mass spectrometry with multiple fragmentation techniques including electron transfer dissociation (ETD) and matrix-assisted laser desorption ionization in-source decay (MALDI-ISD) to investigate heterogeneity of glycosylated protein species. Specifically, glycan structure, sites, relative abundance levels, and termini structural conformation were investigated by use of Fourier transform ion cyclotron resonance (FT-ICR) or high performance liquid chromatography electrospray ionization (HPLC-ESI) linked to an Orbitrap. Incorporating a limited enzymatic digestion by immunoglobulin G-degrading enzyme Streptococcus pyogenes (IdeS) with MALDI-ISD analysis extended sequence coverage of the internal region of the proteins without pre-fractionation. The data in this article is associated with the research article published in Journal of Proteomics (Tran et al., 2015) [1]. & 2015 Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
n
DOI of original article: http://dx.doi.org/10.1016/j.jprot.2015.10.021 Corresponding authors. E-mail addresses:
[email protected] (D.R. Goodlett),
[email protected] (Y.A. Goo).
http://dx.doi.org/10.1016/j.dib.2015.11.031 2352-3409/& 2015 Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
B.Q. Tran et al. / Data in Brief 6 (2016) 68–76
69
Specifications Table Subject area
Biology, Chemistry
More specific subject area Type of data How data was acquired Data format Experimental factors
Top- and middle-down mass spectrometric analysis of protein species and proteoforms Table, Figure Mass spectrometry data were collected on a 12TFT-ICR XR (Bruker) and a LTQ-Orbitrap Elite (Thermo Scientific) MS spectra Top- and middle-down analyses with multiple fragmentation techniques including electron transfer dissociation (ETD) and matrix-assisted laser desorption ionization in-source decay (MALDI-ISD) for characterization of a reference monoclonal antibody IgG1 and a fusion IgG protein Glycan profiling including structure, sites, relative abundance levels, and termini structural conformation were investigated Baltimore and Gaithersburg, Maryland, USA
Experimental features Data source location Data accessibility
Data is provided within this article
Value of the data
Application of top-down and middle-down mass spectrometry for characterization of comprehensive glycosylated protein species from recombinant IgG and IgG-fusion proteins.
Detection of glycan structure, sites, and relative abundance levels. Investigation of termini structural conformation; N-terminal glutamine (Q) to pyroglutamate (pyroGlu, aka. pE) conversion and C-terminal lysine truncation.
Improvement on protein sequencing by MALDI ISD, ETD in combination with middle-down analysis by use of an enzyme, IdeS.
1. Data Glycosylation profiling of recombinant IgG and IgG-fusion proteins and sequence coverage information.
2. Experimental design, materials and methods The recombinant human mAb (IgG1) and IgG1-fusion protein were manufactured using standard cell culture, purification and formulation processes [2]. The IgG1 has two identical light chains and two identical heavy chains with molecular weight of 150 kDa and N-glycosylation site at N297. The IgG fusion protein, approximately 90 kDa, was comprised of two chains each with a human protein domain fused onto the N-terminus of a human IgG1 constant domain (Fc). The protein has three Nglycosylation sites at N76, N108 and N207.
3. Analysis of the IgG protein The IgG protein was analyzed 1) in reduced form by MALDI-ISD in a 12 T Fourier Transform Ion Cyclotron Resonance (FT-ICR XR) mass spectrometer (Bruker, Bremen, Germany), and 2) after IdeS digestion and dithiothreitol (DTT) reduction to generate three smaller fragments; light chain, Fab, and Fc/2 after which analyzed by LC–MS/MS on an Orbitrap Elite (Thermo Scientific, Saint Jose, CA, USA).
70
B.Q. Tran et al. / Data in Brief 6 (2016) 68–76
Fig. 1. MALDI-ISD mass spectrum of the DTT-reduced IgG1 mAb. Amino acid residues underlined were identified after phase correction by c or z þ 2 ions with a mass error threshold of 50 ppm, covering (A) 37.6% of the light chain and (B) 14.3% of the heavy chain.
LC LC
LC
LC
Fd
Fd
Fd
Fd
Fig. 2. LC-MS/MS mass spectrum of IdeS/DDT treated IgG mAb. The charge state distribution of light chain (LC) and Fd domain are shown. Inset shows the full mass spectrum collected after accumulation of 20 scans over an m/z range 600–3500 at a resolving power of 120,000.
B.Q. Tran et al. / Data in Brief 6 (2016) 68–76
71
11062014_011-qb.raw
20000
100 80
Intensity
15000
60 10000 40 5000
20
0 1442.0570
1942.0570
2442.0570
2942.0570
0 3442.0570
m/z
1
D I Q M T Q S P S T L S A S V G D R V T I T C S A S S R V G
184
31
Y M H W Y Q Q K P G K A P K L L I Y D T S K L A S G V P S R
154
61
F S G S G S G T E F T L T I S S L Q P D D F A T Y Y C F Q G
124
91
S G Y P F T F G G G T K V E I K R T V A A P S V F I F P P S
94
121 D
E Q L K S G T A S V V C L L N N F Y P R E A K V Q W K V D
64
151 N
A L Q S G N S Q E S V T E Q D S K D S T Y S L S S T L T L
34
181
S K A D Y E K H K V Y A C E V T H Q G L S S P V T K S F N R
211 G
4 1
E C [1-213] Mono : 23113.3039346 Average : 23127.83128 Total# c :19 z :15
11062014_015-qb.raw
100
35000 30000
80
Intensity
25000 20000
60
15000
40
10000 20
5000 0 1006.6417
0 1506.6417
2006.6417
2506.6417
3006.6417
m/z
1
Q V T L R E S G P A L V K P T Q T L T L T C T F S G F S L S
210
31
T A G M S V G W I R Q P P G K A L E W L A D I WWD D K K H
180
61
Y N P S L K D R L T I S K D T S K N Q V V L K V T N M D P A
150
91
D T A T Y Y C A R D M I F N F Y F D V WG Q G T T V T V S S
120
121
A S T K G P S V F P L A P S S K S T S G G T A A L G C L V K
90
151 D
Y F P E P V T V S WN S G A L T S G V H T F P A V L Q S S
60
181 G
L Y S L S S V V T V P S S S L G T Q T Y I C N V N H K P S
30
211 N
T K V D K R V E P K S C D K T H T C P P C P A P E L L G
1
[1-239] Mono : 25672.8062256 Average : 25689.230331 Total# c :33 z :17 Q1: -17.026549da
Fig. 3. (A) ETD mass spectrum of precursor m/z 1446 (charge 16þ ) was searched against IgG1 mAb light chain sequence with mass tolerance of 100 ppm. ETD mass spectrum was produced from accumulation of 49 scans. Identified c- and z- fragment ion series are indicated in diamonds in the spectrum and relevant sequence identification is shown in the sequence. (B) ETD mass spectrum of precursor m/z 1512 (charge 17 þ) was searched against Fd sequence of IgG1 mAb with mass tolerance of 100 ppm. ETD mass spectrum was produced from accumulation of 50 scans. Identified c- and z- fragment ion series are indicated in diamonds in the spectrum and relevant sequence identification is shown in the sequence. N-terminal glutamine (Q) to pyroglutamate (pyroGlu) conversion was confirmed with precursor mass error 3.9 ppm (0.10 Da).
72
B.Q. Tran et al. / Data in Brief 6 (2016) 68–76
Intact protein mass measurement for detection of protein species of the IgG was performed using ESI FT-ICR MS as demonstrated in the related study [1]. 3.1. MALDI-ISD analysis of reduced protein The IgG protein in 1 mg/mL in 50 mM ammonium bicarbonate buffer was reduced to light and heavy chain by 20 mM dithiothreitol (DTT, product 43815, Sigma-Aldrich, St. Louis, MO, USA) at 80 °C for 15 min. The reduced protein sample was buffer exchanged to 5% acetonitrile (ACN) /0.1% formic acid (FA) using a 10 kDa molecular weight cut off (MWCO) centrifugal filter (product UFC201024, Merck Millipore, Tullagreen, Carrigtwohill, Ireland) and then diluted to 17 mg/mL in 50% ACN/0.1% FA. 1 mL sample and 1 mL of 1,5-diaminonaphthalene (1,5-DAN, product 56451, Fluka, St. Louis, MO, USA) saturated in ACN were spotted on a stainless steel MALDI plate. MALDI-ISD experiment was performed with a transient domain of 1 mega-word data points (estimated resolving power of 270,000 at m/z 400) covering m/z range 400–10,000. MALDI-ISD mass spectra after phase correction [3] were processed to identify c- and zþ 2 ions from light chain and heavy chain with mass tolerance 50 ppm using DataAnalysis 4.2 and Biotools software 3.2 (Bruker) (Fig. 1). 3.2. LC-MS/MS analysis of IdeS-digested protein The IgG protein was digested with IdeS (product A0-FR1-020, Genovis AB, Lund, Sweden) at a ratio of 1 unit of enzyme per 1 mg protein for 2 h at 37 °C, followed by DTT treatment. Approximately 14 mg of the IdeS/DTT treated sample was separated on a 100 mm 20 mm column packed with Jupiter 5 mm C5 particles (product 04A-4052, Phenomenex, Torrance, CA, USA) by a fast mobile gradient of 5–50% ACN/0.1% FA in 15 min at a flow rate of 0.3 mL/min. Light chain and Fd domain were detected in a range of 350–3500 m/z at resolving power of 120,000 in positive ion mode on the Orbitrap Elite (Fig. 2). Precursor masses were selected for ETD fragmentation. ETD spectra were acquired with 5 microscans at resolving power of 120,000 and processed by MASH Suite version 1.0 software [4]. The fragment ions and corresponding sequence coverage are shown in Fig. 3A for the light chain and Fig. 3B for the Fd domain. Glutamine (Q) to pyroglutamate (pyroGlu) conversion was confirmed on the N-terminal of heavy chain (Fig. 3B).
Fig. 4. Fourier transform ion cyclotron resonance (FT-ICR) mass spectrum of the intact fusion protein peaks (charge 38 þ ). The mass spectrum was produced from accumulation of 1650 scans collected in a time domain of 128 K words data points. Major peaks demonstrate a heterogeneous combination of glycosylation, predominantly by sialic acids (NANA). Additional information regarding the glycan nomenclature and structure are provided in Fig. 5.
B.Q. Tran et al. / Data in Brief 6 (2016) 68–76
73
4. Analysis of the IgG fusion protein The fusion protein was analyzed 1) by accurate mass measured in FT-ICR for protein species detection 2) by MALDI-ISD following DTT reduction in FT-ICR, and 3) by accurate mass measurement following IdeS cleavage in FT-ICR or ETD MS/MS in the Orbitrap Elite. 4.1. Accurate mass measurement of intact protein The fusion protein sample was buffer exchanged to 5%ACN and 0.1%TFA using Amicon 50 kDa MWCO centrifugal filters (product UFC505024, Merck Millipore). Since the fusion protein sample was stored in buffer with detergent, detergent removal was performed by Pierce detergent removal kit (product 87778, Thermo Scientific, Rockford, CA, USA). The detergent-free protein sample was diluted to 10 mg/mL final concentration in 50% ACN/0.1% FA and introduced to electrospray ionization (ESI)-FT-ICR for intact protein mass measurement for detection of protein species. The experiment was carried out in positive ion mode on the 12 T FT-ICR at flow rate 2 mL/min. Mass spectra were collected in magnitude mode with a transient domain of 128,000 word data points to cover m/z range 400–4000, with an accumulation of 1650 scans. Fig. 4 shows the protein species detected in m/z region 2365–2450 with high heterogeneity of glycosylation. Corresponding glycan nomenclature and structures are shown in Fig. 5. 4.2. MALDI-ISD analysis of reduced protein The protein sample was diluted to 1 mg/mL in in 50 mM ammonium bicarbonate buffer and reduced with 20 mM DTT at 80 °C for 15 min. The reduced sample was then detergent removed. The
Fig. 5. The glycan nomenclature and structure.
74
B.Q. Tran et al. / Data in Brief 6 (2016) 68–76
Fig. 6. MALDI-ISD mass spectrum of the DTT reduced fusion protein. Underlined sequence indicates the amino acid residues identified by phase-corrected c and/or zþ 2 ions, which showed an overall sequence coverage of 25.3% of the full sequence.
Table 1 Glycosylated protein species detected by Fourier transform ion cyclotron resonance (FT-ICR) accurate mass measurement of DTT/IdeS treated fusion protein from the 17 þ charge state. Highlighted in bold are the most abundance glycoforms detected on the C-terminal fragment of the protein. Additional information regarding the glycan nomenclature and structure are provided in the Fig. 6. Potential glycosylated Observed m/z Charge (z) Observed monoisotopic Theoretical monoisotopic Mass error protein species mass [Mþ H] þ mass [Mþ H] þ between observed and theoretical
G0 G0f G1f G2f G1f þ NANA G2f þ NANA G2f þ NANA2
1479.64124 1488.28997 1497.76750 1507.29477 1514.88175 1524.41738 1541.54188
17þ 17þ 17þ 17þ 17þ 17þ 17þ
25137.77353 25284.83767 25445.88614 25607.90766 25736.94787 25899.02107 26190.11349
25139.397044 25282.454953 25444.507776 25606.560599 25735.063193 25897.656016 26188.751433
(Da)
(ppm)
1.62 2.38 1.38 1.35 1.88 1.37 1.36
64.58 94.24 54.17 52.61 73.23 52.71 52.01
detergent-free protein was diluted in 50% ACN/0.1% FA at a final concentration of 7 mg/mL. 1 mL of sample and 1 mL of DAN matrix were spotted on stainless steel MALDI plate. MALDI-ISD mass spectra were recorded for m/z range 400–10,000 with a transient domain of 1 mega-word data points and estimated resolving power of 270,000 (at m/z 400). MALDI-ISD spectra were phased corrected and processed using DataAnalysis and Biotools software. Fragment ions were identified with mass tolerance 50 ppm and sequence coverage of the fusion protein is shown in Fig. 6.
B.Q. Tran et al. / Data in Brief 6 (2016) 68–76
75
Fig. 7. LC-MS mass spectrum of IdeS/DDT treated fusion protein. Charge state distribution of glycosylated protein species from of C-terminal fragment is shown. The mass spectrum was produced from accumulation of 20 scans over an m/z range 600–3500 at a resolving power of 120,000.
4.3. ETD MS/MS analysis of IdeS-digested protein The fusion protein sample was diluted to 1 mg/mL in 50 mM ammonium bicarbonate solution and digested with enzyme IdeS at ratio of 1 enzyme unit to 1 mg protein for 2 h at 37 °C. 20 mM DDT was added to cleave disulfide bonds at 80 °C for 15 min. The sample was further concentrated using 10 kDa MWCO filter and subject to detergent removal. The sample buffer was exchanged to 5% ACN/ 0.1% FA and diluted to final concentration of approximately 7 mg/mL in 50%ACN/0.1% FA. IdeS digestion and DTT reduction cut the fusion protein to half the size producing N-terminal and C-terminal fragments. Accurate mass measurement of the IdeS/DTT treated sample was performed by ESI FT-ICR. Mass spectra were collected in magnitude mode with a transient domain of one mega-word data points to cover m/z range 400–4000. Seven glycoforms were identified from the C-terminus at multiple charge states. Table 1 represents glycosylated protein species detected from 17þ charge peaks and their glycan nomenclature and structures are shown in Fig. 5. The sample was also directly infused into the Orbitrap Elite. Data were collected in positive ion mode for 350–3500 m/z range at resolving power of 120,000 (Fig. 7). Precursor masses of the glycoforms were isolated for ETD fragmentation. ETD mass spectra were acquired with 5 microscans at resolving power of 120,000 and processed by MASH Suite for sequence coverage analysis. Manual interpretation of the ETD spectra confirmed that glycosylation by G0f or G1f occurred on a site N207 and the protein has a C-terminal lysine truncation (Fig. 8).
76
B.Q. Tran et al. / Data in Brief 6 (2016) 68–76
G0f
G0f
G1f
Fig. 8. LC-MS/MS ETD mass spectra of (A) precursor m/z 1266, and (B and C) m/z 1055 were searched against C-terminal fragment of DTT/IdeS treated fusion protein. Identified c- and z- fragment ion series are indicated in diamonds in the mass spectra and relevant sequence identifications are shown in the associated sequences to the right of each mass spectrum. Detection of glycosylated protein species, G0f and G1f was confirmed on N207 (in green) with C-terminal lysine truncation.
Acknowledgments The data collection was supported in part by collaborative grant from MedImmune, LLC (20142312 RSA), and University of Maryland, Baltimore, School of Pharmacy Mass Spectrometry Center (SOP1841-IQB2014).
Appendix A. Supplementary material Supplementary data associated with this article can be found in the online version at http://dx.doi. org/10.1016/j.dib.2015.11.031.
References [1] B.Q. Tran, C. Barton, J. Feng, A. Sandjong, S.H. Yoon, S. Awasthi, T. Liang, M.M. Khan, D.P.A. Kilgour, D.R. Goodlett, Y.A. Goo, Comprehensive glycosylation profiling of IgG and IgG-fusion proteins by top-down MS with multiple fragmentation techniques, J. Proteomics. (2015), pii: S1874-3919(15)30164-0 http://dx.doi.org/10.1016/j.jprot.2015.10.021 [Epub ahead of print]. [2] P.A. Marichal-Gallardo, M.M. Álvarez, State-of-the-art in downstream processing of monoclonal antibodies: process trends in design and validation, Biotechnol. Prog. 28 (2012) 899–916. http://dx.doi.org/10.1002/btpr.1567. [3] D.P.A. Kilgour, R. Wills, Y. Qi, P.B. O’Connor, Autophaser: an algorithm for automated generation of absorption mode spectra for FT-ICR MS, Anal. Chem. 85 (2013) 3903–3911. http://dx.doi.org/10.1021/ac303289c. [4] H. Guner, P.L. Close, W. Cai, H. Zhang, Y. Peng, Z.R. Gregorich, Y. Ge, MASH Suite: a user-friendly and versatile software interface for high-resolution mass spectrometry data interpretation and visualization, J. Am. Soc. Mass Spectrom. 25 (2014) 464–470. http://dx.doi.org/10.1007/s13361-013-0789-4.