Mass spectrometry of natural and recombinant proteins and glycoproteins

Mass spectrometry of natural and recombinant proteins and glycoproteins

TIBTECH-JULY 8 Dainty, A. L., Goulding, K..H., Robinson, P.K., Simpkins, I. and Trevan, M. D. (1985) Trends Biotechnol. 3, 219 9 Brodelius, P. and Mo...

815KB Sizes 0 Downloads 46 Views

TIBTECH-JULY

8 Dainty, A. L., Goulding, K..H., Robinson, P.K., Simpkins, I. and Trevan, M. D. (1985) Trends Biotechnol. 3, 219 9 Brodelius, P. and Mosbach, K. (1982) Adv. Appl. Microbiol. 28, 1-26 10 Tanaka, H. (1981) Biotechnol. Bioeng. 23, 1203-1218 11 Absolom, D. R., Neumann, A.W., Zingg, W. and van Oss, C.J. (1979) Trans. Am. Soc. Artif. Intern. Organs 25,152-158 12 Gristina, A. G. (1987) Science 237, 1588-1595 13 Rosen, B., Applebaum, B. and Holt, S.C. (1981) in Microbiological Adhesion to Surfaces (Lynch, J.M.,

14 lo 16 17 18 19

Melling, J., Rutter, P. R. and Vincent, B., eds), pp. 536-537, Ellis Horwood Characklis, W. G. (1973) WaterRes. 7, 1113-1127 Mozes, N., Marchal, F., Hermesse, M. P. et al. (1987) Biotechnol. Bioeng. 30, 439-450 Facchini, P. J., Radvanyi, L. G., Gigeure, Y. and DiCosmo, F. Biotechnol. Bioeng. (in press) Facchini, P. J., Neumann, A. W. and DiCosmo, F. Appl. Microbiol. Biotech. (in press) Neumann, A. W., Good, R. J., Hope, C.J. and Sejpal, M. (1974) J. Colloid Interface Sci. 49, 291-304 Kolster, A. and Beiderbeck, R. (1987)

Mass spectrometry of natural and recombinant proteins and glycoproteins Howard R. Morris and Fiona M. Greer Most modern protein sequence analysis is carried out using classical, wet-chemical Edman degradation technology. However, an increasing number of studies on both natural and recombinant genetically engineered proteins demands the use of new technologies capable of assigning structural features such as glycosylation, which cannot be assigned by Edman sequence analysis. The most important alternative and complementary procedure at present is the use of high-mass mass spectrometry. This brief article introduces some of the principles and applications of the technique. Protein research laboratories, both academic and industrial will make increasing use of these techniques to complement classical gas phase sequencing, and to identify post-translational modifications including glycosylation, phosphorylation, S-S bridge assignment and processing events, including the formation of 'ragged ends'. Mass spectrometry,:.has played an important role in biopolymer analysis for more than a decade. This is particularly so in protein sequence

Howard R. Morris is at the Department of Biochemistry, Imperial College, London SW7 2AZ, UK. Fiona M. Greer is at M-Scan, Silwood Park, Ascot, Berks SL5 7PZ, UK and 137, Brandywine Parkway, West Chester, Pennsylvania, PA 1938O, USA.

analysis where it is complementary to classical gas phase Edman sequencing, and in the analysis of posttranslational modifications where classical methods cannot give definitive data. Mass spectrometric methods were crucial, for example, in the discovery of the y-carboxylation of glutamic acid in the blood coagulation zymogens 1'2, in the discovery of 1--->3 linked galactosyl N-acetyl galactosamine in plasma glycopep-

~) 1988, Elsevier Publications, Cambridge 0167- 9430/88/$02.00

1988 [Vol. 6]

Ann. Bot. 59, 471-473 20 Absolom, D. R., Thomson, C., Kruzyk, G., Zingg, W. and Neumann, A.W. (1986) Colloids Surf. 21,447-456 F R A N K D I C O S M O * t:~ P E T E R J. F A C C H I N I *~ A. WILHELM

NEUMANN

*$§¶

• Centre for Plant Biotechnology, ~Department of Botany, *Institute of Biomedical Engineering, ~Departmerit of Mechanical Engineering, University of Toronto, Toronto, Ontario MSS 1A4, and ¶Research Institute, Hospital for Sick Children, Toronto, Ontario MSG 1X8, Canada. tides 3, and in the characterization of conjugate structures of great biological significance, such as the peptidolipid leukotrienes 4 associated with a variety of allergen-induced hypersensitivity reactions. Apart from the ability to study non-protein modifications, e.g. sulphation, phosphorylation 5 or glycosylation, the second strength of a mass spectrometric approach is in the analysis of mixtures - a proposition quite alien to classical methods. Since purification is often the rate determining step in structural analysis (and causes decreasing yields), a technique capable of analysing mixtures has obvious applications. Mixture analysis by mass spectrometry was developed in the mid 1960s for the study of protein digests. There is only a minimal need for purification: several peptides can be analysed for mass, and sometimes sequenced, in the same mixture without purifying individual components. Indeed, mixture analysis strategies have been developed and used for over two decades to solve structures ranging from silk fibroin 6 to ribitol dehydrogenase (RDH) 7, chloramphenical transacetylase (CAT) 8 and the enkephalins 9. These studies were, however, technically very difficult. More recently, two advances have combined to allow more widespread application of mass spectrometric methods to biopolymer problems. These are the development of High Field 1°'11 and other high-mass instruments 12, and the introduction of Fast Atom Bombardment 13-15 as a new ionization method. Using these tech-

TIBTECH - JULY 1988 [Vol. 6]

niques, it is n o w possible to exploit the old mixture analysis strategies in protein, glycoprotein and other post-translational analysis at the picomole-nanomole level 16-18. Principles of mass spectrometry In mass spectrometry, the sample to be analysed is first ionized (usually to give positively charged ions) by a beam of particles, traditionally electrons (electron impact) or more recently ions or atoms (fast atom bombardment, secondary ion mass spectrometry). The positively charged sample ions, known as quasimolecular ions, gain internal energy upon ionization and some break down to give smaller ions which can be used in determining the detailed structure of the molecule. For example, peptides fragment across the peptide bond between amino acid residues to give a series of fragment ions from which the sequence can be deduced. Thus a peptide ABCD may produce A +, AB + ABC +, ABCD + as fragment ions and the mass differ-

ence between fragment ions determines the next amino acid in the sequence. Following the production of these sample ions, they are then separated according to their mass to produce a mass spectrum (e.g. Fig. 1) usually by using the ability of a magnetic field to deflect a charged particle according to its mass to charge ratio. Although the mass spectrometer is an expensive instrument (some two to four times the cost of a gas-phase protein sequencer) it can solve problems that classical methods cannot, and it can be applied to a vast range of organic and bio-organic analyses (both quantitative and qualitative) quite apart from work on proteins (which Edman sequencers are restricted to). With modern instruments, samples are loaded usually from a few microlitres of solvent (TFA, acetic acid) and data can be obtained for simple studies, e.g. molecular weight analysis, in 1030 min, although interpretation may take longer. Importantly, the use of high mass

- Box I

High mass fast atom bombardment mass spectrometry In HM-FABMS, sample ions are created by bombardment with a high energy beam of atoms or ions, and mass analysis is carried out using a high field magnet or other high-mass method to enable the measurement of sample ions of mass up to (sometimes beyond) 10 kDa. A two-sector double focusing mass spectrometer is an ideal and cost-effective instrument for carrying out the FAB mapping and related post-translational analysis work described here. Such instrumentation is easily capable of accurate molecular weight analyses, such as required in recombinant protein or glycoprotein screening studies, and can provide sequence data where necessary either by fragment ion analysis or by combination with protein chemical methods.

m Fig. 1 N-terminal

C-terminal

MQTQKPTSSSK--LKK--NSDSECPLSH I

1 --11

I

t

DGYCLH DGVCMYIEALDK--YACNCVVGYIG 15--42

J

1-12-14J

G

L56--59 J

[--.43--55

375 388

'

I I

633

,/ [/

ER--CQYR--DLK--WWEL

,- t

627

L63--66J

t60-62J

, Xl0

F

1222 I

1562 I

t

i

Xl00

F

3288

Fast atom bombardment mapping strategy. The protein of interest is cleaved using specific enzymic or chemical methods. The peptide fragments are analysed by direct FAB-MS. The peaks in the mass/charge (m/z) spectrum can be assigned to particular peptides. In this example, a carboxymethylated urogastrone fusion product was digested with trypsin which cleaves after lysine and arginine residues. Subsequent digestion of the tryptic fragments with carboxypeptidase B can often be used to identify the C-terminal peptide: unless the C-terminal amino acid is lysine or arginine, the C-terminal peptide will be unaltered in mass by carboxypeptidase B digestion. All the other fragments (unless possessing penultimate proline) will have their C-terminal lysine or arginine residues (highlighted) removed by this treatment.

-

-

Fig. 2 (a) C-terminal analysis of a recombinant interferon- 7. IFN- 7 was digested by CNBr and the digest was then passed through a gel column to remove high molecular weight fragments. The "low mass" (<3 kDa) fractions were then screened by HM-FABMS and fraction 20 (together with adjacent fractions) showed evidence for the "ragged end" phenomenon. The signals of interest are m/z 648 and m/z 1090. The signal at m/z 1090 is the intact C-terminal signal anticipated for the sequence IFN-7 138-146. The signal at m/z 648 shows that a proportion of the IFN-7 molecules in the original sample studied were truncated at residue Arg 142 since m/z 648 corresponds to the sequence IFN-7 138-142, Leu-Phe-Arg-Gly-Arg.

Glycerol 645

m / z 1090

/-

m/z 648

x 10

!

600 (b) Information obtained from tryptic, chymotryptic and CNBr FAB maps allowed rapid confirmation of over 90% (underlined) of the structure of recombinant IFN-7 (mol. wt approx. 18 000). In contrast to classical Edman sequencing technology, where data is usually limited to the first 20-30 N-terminal residues, in FAB mapping there is an equal probability of observing data from the C terminus, thus complementing classical technology.

fast atom bombardment mass spectrometry (HM-FABMS) [Box 1] in conjunction with classical protein and carbohydrate chemistry has proved to be a powerful and costeffective method for structural analysis of genetically engineered proteins and glycoproteins of interest to the biotechnology industry 19'2°. In relation to this, we have been active in developing specific':, methods for determining the identity of blocked N-terminal residues and for detecting the presence or absence of C-terminal fragments or 'ragged-ends' - two areas that pose extreme problems for conventional classical techniques. In addition, characterization of recombinant proteins for both research and regulatory body approval will often require determination of posttranslational modification such as

m/z IFN-Gamma

CYS

TYR

CYS

GLN

ASP

PRO

TYR

VAL

LYS

GLU

ALA

GLU

ASN

LEU

LYS

LYS

TYR

PHE

ASN

ALA

GLY

HIS

SER

ASP

VAL

ALA

ASP

ASN

GLY

THR

LEU

PIIE L E U

GLY

ILE

LEU

LYS

ASN

TRP

LYS

GLU

GLU

SER

ASP

ARG

I,YS [LE

MET

GLN

SER

GLN

ILE

VAL

SER

PHE

TYR

PFIE L Y S

LEU

PIlE

LYS

ASN

PIPE L Y S

ASP

ASP

GLN

SER

ILE

GLN

LYS

SER

VAL

GLU

TIIR

ILE

LYS

GLU

ASP

MET

ASN

VAL

LYS

PIIE PIIE A S N

SER

ASN

LYS

LYS

LYS

ARG

ASP

ASP

PIIE G L U

LYS

LEU

THR

ASN

TYR

SER

VAL

T[IR A S P

LEU

ASN

VAL

GLN

ARG

LYS

ALA

ILE

HIS

GLU

LEU

ILE

GLN

VAL

MET

ALA

GLU

LEU

SER

PRO

ALA

ALA

LYS

THR

GLY

LYS

ARG

LYS

ARG

SER

GLN

MET

LEU

PIlE A R G

GLY

ARG

ARG

ALA

SER

GLN

glycosylation or S-S bridge assignment: HM-FABMS mixture analysis procedures, christened FAB mapping 14'2°'21, have been developed to solve these problems. The novel strategies, described briefly in this article, constitute a powerful new tool for protein engineers and protein/glycoprotein chemists. FAB mapping Figure 1 shows the first biotechnology application of FAB mapping to the analysis of a fusion product of urogastrone (epidermal cell growth factor) in 198222 . The spectrum shows the mass signals (M+H) + of all the tryptic peptides derived from the sample, and these can be mapped onto the anticipated structure, demonstrating that the primary sequence is correct. Errors of

translation, deletion, insertion, point mutation (excepting inversions in sequence), post-translational modification or processing, can be detected and assigned by this powerful new technology. For example, suppose that the N-terminal methionine had been formylated, then a signal 28 mass units higher at m/z 1250 rather than m/z 1222 w o u l d have been observed. Similarly, if processing had taken place this could be recognized by a shift in the mass observed either to lower masses than calculated, for loss of signal peptides, or to higher mass due to a posttranslational modification such as addition of carbohydrate, phosphate, etc. Simple biochemical, enzymatic or chemical procedures such as tryptic or cyanogen bromide digestion can

TIBTECH-JULY1988[Vol.6]

be used to verify assignments in the spectrum. Mass shifts after Edman degradation (observing the peptides rather than the phenylthiohydantoin (PTH) derivatives of the amino acids) can lead to sequence assignment 23,24. Similarly, carboxypeptidase B digestion of the tryptic digest (carboxypeptidase B removes C-terminal arginine or lysine residues) produces, in the resulting FAB map, shifts of 128 or 156 mass units - lysine or arginine, respectively 2°,21. This procedure of FAB mapping the tryptic digest of a protein and comparing that with the map produced after a further step of carboxypeptidase B is very useful in identifying the true carboxy terminus of a protein normally a very difficult task: the peptide corresponding to the C terminus will not shift in mass after carboxypeptidase B digestion 2° unless the terminus is Lys or Arg. For instance, the signal at m/z 633 in Fig. 1 is the only signal to remain at the same mass after subdigestion, and corresponds to the carboxy terminal peptide (residues 63-66). This is a powerful procedure and has been applied to daunting protein chemical problems, such as the search for partial amino acid sequences in h u m a n low density lipoprotein APO B (approximately 500 kDa) used to find the gene coding for the full protein sequence 25'26. In a 1983 study to determine sequences in APO B, a peptide isolated by HPLC from a tryptic digest of the molecule retained the same mass after FAB mapping following the carboxypeptidase B subdigest procedure. This made the peptide a prime candidate for the true carboxy terminus of the whole molecule. Mass spectrometric sequence ion analysis of the peptide produced a sequence LeuAla-Pro-(Glu,Gly,Leu)-Thr-Leu-LeuLeu. The bracketed region came from a combination of amino acid analysis and mass data, the quasi-molecular ion [M+H] + at 1039 calculating for Glu rather than Gin in the sequence. The amino acid analysis also showed a Leu:Ile ratio of 3:2. (These amino acids have the same mass and are not distinguished by MS.) When the nucleotide sequences of cDNA clones were determined some two years later, the apparent C-terminal

Fig. 3

(M+H) ÷ 1231L (a)

i (M--t-H) + 1058

Analysis of blocked N-termini of peptides. (a) The signal at m/z 1231 of a fast atom bombardment mass spectrum of o~1-antitrypsin could not be mapped onto the anticipated structure of the protein. The anticipated N-terminal peptide signal was (m/z 1058) absent. (b) Following CNBr treatment, the anticipated signal appeared. The difference in the two masses corresponds to the mass of an N-acetyl methionine residue, thus assigned as the N-terminal amino acid.

residues numbered 4527 to 4536 were translated as L e u - A l a - P r o Gly-Glu-Leu-Thr-Ile-Ile-Leu, and the MS data that predicted this sequence thus proved invaluable in verifying the C-terminal sequence and the important fact that the protein had not been further processed 25,26. Returning to the general FAB mapping strategy (Fig. 1), a most important application is in the process or quality control analysis of recombinant genetically engineered proteins. The majority of the structure can be screened by combining data from two or more simple FAB maps after, for example, tryptic, cyanogen bromide (CNBr), chymotryptic, or V8 protease digestion to generate peptide mixtures 2°. Over 90% of a protein sequence can be rapidly confirmed, for example in the case of recombinant interferon-y (Fig. 2). The procedure thus complements simple HPLC-tryptic profiles by giving an added dimension of definitive mass assignment in quality control procedures. The same procedure is also a research tool at a stage prior to production control, and further examples of the power of the FAB mapping method are described below. C-terminal 'ragged ends' When additional and unexpected signals appear in the FAB map, they

can often be interpreted simply and quickly by using a software search of the protein sequence using the peptide mass observed in the spectrum. Sometimes such signals will be due to minor enzymatic cleavages (e.g. chymotryptic activity in the trypsin used) or to impurities, but occasionally these may arise by pre-processing of the sample (naturally or otherwise) to produce truncated species. This can lead to 'ragged ends', mixtures of whole and truncated molecules. Ascertaining the existence or absence of truncated species is important in biopolymer characterization, since, for example, they may not have the correct biological activity or may be antigenic. The analogy in carbohydrate analysis is the presence or absence or sialylation or fucosylation of nonreducing end sugars in glycoproteins. In protein work, if the truncation is N-terminal, and if the N terminus of the protein is not blocked (with, for instance, an acetyl group), the ragged ends may be assignable by careful examination of gas phase Edman data. This is not usually possible, however, for C-terminal assignment, since it is extremely difficult to know whether a decrease in PTH-amino acid yield at a particular step is not just a consequence of the Edman chemistry or the gas phase instrumentation. We have developed a new procedure based on FAB mapping

TIBTECH-JULY 1988 [Vol. 6]

Fig. 4

Proteolytic digestion and FAB mapping

(a)

Proteolytic digestion and FAB mapping

(c)

Proteolytic digestion and FAB mapping

(e)

(b) NaOH/NaBH4 on intact protein (or isolated glycopeptides) ALA

ALA

ABA

~

/kS~

(d) N-Glycanase on intact protein (or isolated glycopeptides)

ABA

..

Blocked N termini

Strategy for the location of O- and N-linked glycosylation sites in a large glycoprotein. Step (a) gives a FAB map baseline. Step (b) removes O-linked sugars, converting Ser to Ala (or AAla if base alone is used) and Thr to o~-aminobutyric acid, ABA (or AABA if base alone is used). The FAB map at step (c) reveals those peptides that were originally O-glycosylated. Removal of the N-linked sugars [step (d)] converts the glycosylated Asn to Asp and the FAB map at step (e) reveals the N-glycosylated peptide.

that allows solution of this difficult problem 27. Firstly, because of the suppression phenomenon observed in complex mixtures 14'2°, in which not every component in the mixture is necessarily observed, gel filtration or HPLC purification (or partial purification) of a suitable digest is sometimes carried out prior to FABMS analysis

C-terminal sequence and/or for truncated versions of it. Once actual C-terminal species have been identified, it is relatively simple to interpret a suitable carboxypeptidase/ amino acid analysis experiment to quantitate the molecular species present accurately. This work is illustrated for the Cterminal analysis of a recombinant interferon-y (Fig. 2) 27.

to create simpler mixtures. The digest, either chemical (e.g. CNBr) or enzymatic, is specifically chosen to cut the protein some distance away from the anticipated carboxy terminus. FAB mass spectra of individual fractions of a semipurified digest will then reveal the integrity of the C terminus, since the signals observed will fit either for the intact

In contrast to the above method, an application of FABMS is in the analysis of proteins the N-termini of which are blocked, thereby preventing sequencing using Edman chemistry. Here the definitive nature of mass spectrometry comes into its own: it can not only detect the presence of blocking groups, but actually identify them 28. A good example of this type of study is shown for the aminoterminal analysis of recombinant oclantitrypsin in Fig. 3 (Ref. 29). The expected N terminus of this 394 residue recombinant protein, G l u - A s p Pro-Gln-Glu-Asp-Ala-Ala-Gln-Lys (m/z 1058), was not mapped in the original FABMS spectra, although over 90% of the molecular structure was confirmed. Because of the suppression phenomenon discovered in the first FAB mapping experiments, a search for the amino-terminal pep-

- - Fig. 5

1434

N-Acetyl GAI~CTOSAMINE

" 778 800

''

GALAr, T r ~ E .1143 " x i ' ° " ' = "1 "

981

N-AoetyI-NEURAMINIC ACID



I

1200

1400

Glycopeptide analysis by FAB-MS. Gas phase Edman sequencing was used to determine the peptide sequence of lL-2 but could not identify the amino acid at position 3 in the sequence. The N-terminal tryptic peptide was isolated and around 1 i~g used in direct FABMS analysis to produce this spectrum. The quasimolecular ion at m/z 1434 showed that the peptide was post-translationally modified. The fragment ions at m/z 1143, 981 and 778, corresponding to glycosidic bond cleavages between sugar residues, showed that the modification was a glycosylation with N-acetylhexosamido-hexosylN-acetylneuraminic acid.

TIBTECH-JULY 1988 [Vol. 6]

Fig. 6

(a)

(b)

NANA--Hex--HexNAc--Hex

\ HexNAc~-Hex--HexNAc--HexNAc--PEPTIDE /

NANA--Hex--HexNAc- Hex

NANA--Hex--HexNAc--Hex HexNAc--~Hex--HexNAc--HexNAc--PEPTIDE

/

NANA--Hex--HexNAc--Hex

I

NANA--Hex--HexNAc Structure of a Factor Vile carbohydrate moiety. The results in the table indicate that the carbohydrate moiety studied has the structure shown in (a). They also suggest that the carbohydrate moiety is heterogenous: they indicate the presence of the structure shown in (b). (Data published with the kind permission of Dr L. Thim, Novo Research Institute, who supplied the sample.)

tide by FABMS analysis of an HPLCpurified, thermally denatured tryptic digest was initiated. This resulted in the discovery of a signal at m/z 1231 in the spectrum of HPLC fraction 21 which could not be mapped onto the anticipated structure using the M-Scan computer program (Fig. 3a). Furthermore, the anticipated N-terminal tryptic peptide (m/z 1058) again could not be found, raising the possibility that the m/z 1231 signal was a modified version of the N-terminal peptide: the mass difference (m/z 1231 - m/z 1058) corresponds to the mass of an additional N-acetyl methionine residue. In an experiment designed to test this hypothesis a small sample (100 pmol) of the peptide was treated with CNBr to give the spectrum shown in Fig. 3b. The observed signal at m/z 1058 corresponds to the originally anticipated N-terminal sequence, and confirms the loss of Nacetyl methionine after CNBr digestion. Thus the N-terminal sequence is unequivocably assigned in this and further confirmatory experiments as N-acetyl-Met-Glu-AspPro~ln-Gly-Asp-Ala-Ala-Gln-Lys. Post-translational modification

Glycosylation

Procedures for the mass spectrometric analysis of glycoproteins were developed some years ago on plasma glycoproteins, from antarctic fish 3. These involved analysis of permethylated derivatives of glycopeptides by electron impact and chemical ionization mass spectrometry both prior to and following release of the

carbohydrate from the peptide backbone. Similar powerful strategies (Fig. 4) can now be used incorporating HMFABMS analysis of both free (underivatized) samples, acetyl or permethyl derivatives in studies to determine sites of glycosylation of both O- and N-linked structures, the identity of terminal non-reducing ends (potentially the most antigenic structures), and the type and identity of oligosaccharides. In the first example of the analysis of an u n k n o w n glycopeptide structure by FABMS, the glycosylation on natural interleukin 2 (IL-2) was discovered (Fig. 5) 30'31. FABMS indicated that the carbohydrate moiety was N-acetylgalactosaminylgalactosyl-N-acetyl neuraminic acid and this was later confirmed by classical carbohydrate analysis of the hydrolysed glycopeptide. With larger saccharide structures, it is often useful to remove the carbohydrate from the glycopeptide by, for example, N-glycanase (for Asn-linked s u g a r s ) 32 o r base-catalysed [3 elimination (for sugars O-linked to Ser or Thr) a'aa. A particularly sensitive way of analysing the resulting polysaccharide by FABMS is to prepare a permethyl derivative 3'33. The polysaccharide fragments mainly across the glycosidic bonds (producing 'A-type fragments') with the charge held on the non-reducing end fragment. Table I shows data from an N-glycanase digested, permethylated sample of one of the glycopeptides isolated from human coagulation Factor VIIa.

Mild acid hydrolysis of the glycanase glycopeptide digest prior to permethylation can bring larger branched triantennary and tetrantennary structures into view on the mass spectrum by removing bulky Nacetyl neuraminic residues in addition to fucose. For Factor VIIa, an overall structure shown in Fig. 6 could be deduced. FAB mapping in the glycoprotein area is thus a powerful method for locating sites of glycosylation and complementing ~Table I Signals observed in the positive ion FAB-MS spectrum o f the glycanase digested, permethylated Factor Vile.

Signal m/z 260 (weak) 376 464 825

999 2737 2791

2965

Possible assignment (HexNAc) + (NANA) + (Hexose, HexNACl)+

(NANAiHexosel HexNAcl) + (NANA1Hexosel HexNAcl Deoxyhexosel)+ (NANA2Hexose5 HexNAc4) Pseudomolecular ion: NANA2 HexosesHexNAc. Pseudomolecular ion: NANA2 Hexose5 Deoxyhexosel HexNAcl

TIBTECH- JULY 1988[Vol. 6]

Fig. 7 Disulphidebridged protein E ~

s-Sf

Mixtureof peptides Enzymic/chemic;, digestion

E'~sH~E

I'+~ ~ ~. S'~, S,_ + f + ,, +npeptides 4 , - I" SH \ ,/ I [

11

.-

I I

I

classical carbohydrate chemistry methods for polysaccharide analysis, in particular in confirming branching and terminal non-reducing end substitution 32,33.

Identificationby FAB-MS followedby reductionand furtherFAB-MSetc.

I/P?

ii I

b

S - S bridges

/" / / /

Based on the FAB mapping procedure, an entirely new strategy for the important task of S-S bridge assignment in proteins has been devised (Fig. 7) 34. This final stage of protein primary structure determination has traditionally been very difficult and long, exhaustive procedures raise the possibility of disulphide 'scrambling' or reshuffling by reduction and re-oxidation of S-S bridges. The new strategy for S-S bridge assignment was first demonstrated with work on insulin 34, a particularly difficult S-S bridged protein because of the presence of two adjacent

iI

/

/

,

/

,l[a

,

S-S Bridge Assignment using FAB mapping strategy. The polypeptides are cleaved at points initially between the potentially bridged cysteine residues using a suitable method - usually pepsin digestion. The FAB mapping approach is then used to search for any disulphide-bridged peptides. These are characterized by their unique masses, and the interpretation is confirmed by, for example, reduction in thioglycerol or dithiothreitol followed by rerunning the HM-FABMS spectrum. Inter S-S bridged peptides collapse to give two single peaks at lower masses corresponding to the individual peptide components, and an intra S-S bridge (where there is no peptide bond cleavage between cysteines) can simply be diagnosed by a shift of 2 mass units as the disulphide is converted to a dithioL

Fig. 8

1993

1922

A region of the FABMS spectrum of a peptic digest of lysozyme covering the Cys6-Cys127 bridge, before and after manual Edman degradation. The signals m/z 1922, 1993 and 2064 in spectrum (a) correspond to the disulphidebridged residues 1-9 + 123-129, 1-10 + 123-129and 1-11 + 123129, respectively. Follo~ving one step of manual Edman degradation the peptide mixture was reexamined directly by, FABMS. The signals in spectrum (b) (m/z 1608, 1679 and 1750) correspond to the cleavage of Trp123 and Lysl (186 and 128 mass units, respectively; 314 in total) from the two N termini of each disulphide-bridged peptide. The S-S bridge itself was not affected by the alkaline pH of Edman degradation. (H. R. Morris

and P. Pucci, unpublished.)

2064

(a)

(b)

1679 1608

TIBTECH-JULY 1988 [Vol. 6]

cysteine residues in the primary sequence w h i c h are difficult to cleave. This problem was solved by repetitive steps of Edman degradation on the peptide digest mixture, monitoring the removal of the Cys residues by examining the FAB spectra of the remaining peptides 34. In disulphide analysis, the relevant signals are normally at the high-mass end of the spectrum (emphasizing the need for HM-MS). For example, two of the signals from an unfractionated peptic digest of insulin occur at m/z 2560 and m/z 2788. Their masses and the observation that the signals at these m/z values collapse following a single S-S bond reduction step allow the signals to be assigned to fragments B1-11 + A1-13 and B1-13 ÷ A1-13 respectively. Since the method does not normally require isolation steps, it is rapid and the possibility of scrambling is minimized. In any event, with proper treatment, the S-S bridges are fairly robust, those in the insulin study withstanding tryptic, chymotryptic or V8 digestion, or the alkaline pH of Edman degradation for as many as six cycles. The use of this method in analysis of lysozyme is illustrated in Fig. 8.

Conclusion High mass fast atom bombardment mass spectrometric techniques have been developed to provide powerful and versatile methods for solving modern protein and glycoprotein structural problems. The procedure is often complementary to classical Edman chemistry, but in addition gives the protein chemist the ability to observe and assign, for example, blocking groups at the N terminus or heterogeneity at the C terminus of the molecule. In the analysis of both recombinant and native proteins and glycoproteins, FAB mapping procedures can be used to provide sequence data, detect errors of translation, mutation, insertion, deletion and, importantly, to identify sites of glycosylation or other posttranslational modification, to determine the basic carbohydrate structure and to definitively assign S-S bridges. The MS technique based on two-sector double focusing mass spectrometry is already a sensitive one, applicable to sample sizes of

the order of 10 pmol to 1 nmol for molecular weight determination. We are currently extending these limits i n a new generation of instruments with post-acceleration array detection of ions w h i c h will give >10 kDa mass range, and will allow detection and measurement of the order of 10 femtomoles of biopolymer. These methods promise to revolutionize protein and carbohydrate chemistry and biochemistry over the next decade.

References 1 Magnusson, S., Sottrup-Jensen, L., Petersen, T. E., Morris, H. R. and Dell, A. (1974) FEBS Lett. 44, 189-193 2 Morris, H. R., Dell, A., Petersen, T. E., Sottrup-Jensen, L. and Magnusson, S. (1976) Biochem. J. 153, 663-679 3 Morris, H. R., Thompson, M.R., Osuga, D. T. etal. (1978)J. Biol. Chem. 2253, 5155-5161 4 Morris, H. R., Taylor, G. W., Piper, P.J. and Tippins, J. R. (1980) Nature 285, 104-106 5 Petrilli, P., Pucci, P., Morris, H. R. and Addeo, F. (1986) Biochem. Biophys. Res. Commun. 140, 28-37 6 Geddes, A. J., Graham, G. N., Morris, H.R. et al. (1969) Biochem. J. 114, 685=-702, 7 Morris, H. R., Williams, D. H., Midwinter, G.C. and Hartley, B.S. (1974) Biochem. J. 141, 701-713 8 Shaw, W. V., Packman, L. C., Burghleigh, B. D. et al. (1979) Nature 282, 870-872 9 Hughes, J., Smith, T. W., Kosterlitz, H. W. et al. (1975) Nature 258, 577579 10 Morris, H. R., Dell, A., Banner, A. E. et al. (1977) Proceedings of the

11 12 13 14 15 16

25th Conference on Mass Spectrometry and Allied Topics, pp. 73-75, ASMS Morris, H. R., Dell, A. and McDowell, R. A. (1981) Biomed. Mass Spectrom. 8, 463-473 Macfarlane, R. D. and Torgerson, D. F. (1976) Science 191,920-925 Barber, M., Bordoli, R. S., Sedgewick, R. D. and Tyler, A. N. (1981) J. Chem. Soc., Chem. Commun., 325-327 Morris, H. R., Panico, M., Barber, M. et al. (1981) Biochem. Biophys. Res. Commun. 101,623-631 Williams, D. H., Bradley, C. V., Santikarn, S. and Bojesen, G. (1982) Biochem. J. 201,105-117 Morris, H. R., Dell, A., McDowell, R. A., Panico, M. and Taylor, G.W.

17 18

19

20 21

22

23

24 25 26 27

28 29 30 31

32 33 34

(1985) in Mass Spectrometry of Large Molecules (Facchetti, S., ed.) 111125, 127-149, Elsevier Morris, H. R., Dell, A., Judkins, M. et al. (1981) Pure App]. Chem. 54, 267279 Biemann, K., Gibson, B. W., Mathews, W.F. and Pang, H. (1986) in Mass Spectrometry in the Health and Life Sciences (Burlingame, A. L. and Castagnoli, N., Jr, eds), pp. 239-265, Elsevier Morris, H. R., Dell, A., Panico, M. and McDowell, R.A. (1986) in Mass Spectrometry in the Health and Life Sciences (Burlingame, A. L. and Castagnoli, N., Jr, eds), pp. 363-377, Elsevier Morris, H. R., Panico, M. and Taylor, G. W. (1983) Biochem. Biophys. Res. Commun. 117, 299-305 Lemaire, S., Chouinard, L., Denis, A., Panico, M. and Morris, H.R. (1982) Biochem. Biophys. Res. Commun. 105, 51-58 Morris, H. R., Panico, M. and Etienne, A.T. (1983) Proceedings of the 31st Conference on Mass Spectrometry and Allied Topics, pp. 683-685, ASMS Morris, H. R., Taylor, G. W., Panico, M. et al. (1982) in Methods in Protein Sequence Analysis (Elzinga, M., ed.), pp. 243-261, Humana Press Gibson, B. W., Poulter, L., Williams, D. H. and Maggio, J. E. (1986) J. Biol. Chem. 261, 5341-5349 Knott, T. J., Rail, S. C., Jr, Innerarity, T.L. et al. (1985) Science 230, 37-43 Knott, T. J., Pease, R. J., Powell, L. M. et al. (1986) Nature 323, 734-738 Greer, F. M., Morris, H. R., Fallon, T. and Brewer, S.J. (1987) Proceedings of the 35th ASMS Conference on Mass Spectrometry and Allied Topics, pp. 942-943, ASMS Morris, H. R. and Dell, A. (1975) Biochem. J. 149, 754-755 Greer, F. M., Morris, H. R., Forstrom, J. and Lyons, D. Biomed. Environ. Mass Spectrom. (in press) Robb, R. J., Kutny, R. M., Panico, M. et al. (1983) Biochem. Biophys. Res. Commun. 116, 1049-1055 Robb, R. J., Kutny, R. M., Panico, M., Morris, H. R. and Chowdry, V. (1984) Proc. Natl Acad. Sci. USA 81, 64866490 Dell, A. and Morris, H.R. (1983) Carbohydrate Res. 115, 41-52 Sasaki, H., Bothner, B., Dell, A. and Fukuda, M. (1987) J. Biol. Chem. 262, 12059-12076 Morris, H. R. and Pucci, P. (1985) Biochem. Biophys. Res. Commun. 126, 1122-1127