459
Analytica Chimica Act, 231 (1990) 459-472 Elsevier Science Publishers B.V., Amsterdam
MASSPEC: a graphics-based data system for correlating a mass spectrum with a proposed structure Marshall
M. Siegel * and Gregory
Gill
American Cyanamid Company, Medical Research Division, Lederle Laboratories, (Received
21 February
Pearl River, NY 10965 (U.S.A.)
1990)
Abstract
A graphics-based user-friendly data system called MASSPEC was developed to aid in the analysis of a mass spectrum when a proposed structure is provided. The proposed chemical structure is drawn and combinatorial algorithms correlate the masses of the substructures with the masses of the fragment ions observed in the mass spectrum. These substructures are subsequently drawn on the terminal screen. The commands and algorithms for operating MASSPEC are described. The MASSPEC data system can be used to interpret either nominal or exact mass data generated from relatively large molecules in any ionization mode. Illustrations utilizing MASSPEC include the interpretation of mass spectra generated on tandem mass spectrometers in the thermospray and fast atom bombardment (FAB) modes and the analysis of a FAB mass spectrum of a digested polypeptide to reveal post-translational modifications. Keywords;
Structure
determination;
Computer
program;
Proteins;
The computer analysis of mass spectral data for structural elucidation of materials of unknown structure has centered on two methodologies: mass spectral library searching and computerized interpretative techniques involving pattern recognition and artificial intelligence. These subjects have been reviewed by Martinsen and Song [l]. More recent work has utilized one or both of the above techniques to refine and expand upon this subject of automated mass spectral interpretation [2-121. Often, however, a mass spectrometrist does not predict the structure of a material of unknown structure but rather has a proposed or partial structure and wishes to correlate the structural details with the fragment ions appearing in the mass spectrum. On successful completion of that task, related unknown structures could be interpreted. With this in mind, the development of a computerized bookkeeping method for correlating and interpreting fragment ion masses with the 0003-2670/90/$03.50
0 1990 - Elsevier Science
Publishers
Peptides;
Antibiotics
masses of molecular substructures was undertaken. In previous work, algorithms were described for correlating the masses of the ions observed in a mass spectrum with the masses of molecular substructures generated from a proposed parent structure [13,14]. The algorithms were used to interpret mass spectral data produced in any ionization mode and were applicable to either nominal or exact mass data. A computer can most efficiently process a proposed molecular structure for correlation with a mass spectrum when the molecule is constructed from bonded “superatoms”. A superatom is a group of atoms which is not expected to fragment. Likewise, to account for possible adducts, neutral/radical losses and rearrangements recorded in the mass spectrum, floating superatoms were constructed which in effect can be “bonded to” or “lost from” any superatom in the molecular structure. In defining B.V.
1
commands
Erase a bond or superatomAdd a floating
ERASE
Move a superatom’s coordinates
Exit MASSPEC
MOVE
QUIT
command
Help file
?
a Bold letters are the executable
Save an XXX.MOL
SAVE
abbreviations.
file
program
Label a superatom
LABEL
REDRAW
Generate bonds automatically
CONNECT
OUTPUT
Highlight a superatom on the screen of the terminal
HIGHLIGHT
an
Retrieve and display XXX.MOL file
GET
Bond Order
Ion Mass List
Sum
Mass Error Window)
Maximum
(Exact
Nominal or Exact Mass Processing
EXECUTE
Execution commands and questions
superatom
FREE
Create a bond of desired bond order
BOND
superatom
Add a bonded
a
ADD
Structure generation commands
MASSPEC
TABLE
MASS-
Display solutions of the mass spectral analysis
Execute the PEC program
ion structure
From EXIT
program
Return to structure generation commands QUIT
MASSPEC
New XXX.SDF laser printer
file for
Predicted fragment ion connectivity table in XXX.SDF file for laser printer
Display
Scroll to ion mass #J
Scroll down # lines
Scroll up # tines
CREATE
(CR) SAVE
##8
D#
U#
Mass spectral analysis (output) commands
MASSPEC:
A GRAPHICS-BASED
DATA
461
SYSTEM
bonded and floating superatoms, the mass spectrometrist has embedded into the superatom structure some of the mass spectral fragmentation rules believed to be necessary for interpreting the mass spectrum. The most efficient algorithm for performing these correlation studies was referred to as the “bond removal method” [14]. In this method, bonds between superatoms are systematically removed, thereby fragmenting the molecule into connected superatom fragments. The masses of these molecular fragments were added to the masses of each of the combinations of the floating superatoms and each sum was then correlated with the masses observed in the mass spectrum. This correlation algorithm has now been revised and optimized for high speed and it has been incorporated into a graphics-based user-friendly data system called MASSPEC. The MASSPEC data system was designed to be used as a graphics-based tool for analyzing and interpreting mass spectra from proposed chemical structures. The proposed parent structure can now be easily drawn and the computed substructure correlations easily viewed on the screen of a terminal. The ultimate aim in developing MASSPEC is to automate mass spectral correlation and interpretation for routine use by non-specialists. The purpose of this paper is to describe in detail the design and operation of MASSPEC and the improved algorithm for the “bond removal method” with illustrations selected from its use during the past 2 years.
EXPERIMENTAL
The mass spectra recorded in the thermospray (TSP) and fast atom bombardment (FAB) tandem mass spectrometry (MS-MS) modes were acquired with a Finnigan Model TSQ 46 triple quadrupole mass spectrometer. The FAB mass spectra recorded in the mass-analyzed ion kinetic energy scan (MIKES) mode were acquired with a VG ZAB-SE high-performance mass spectrometer. The collision-induced decomposition experiments, in the MS-MS and MIKES modes, utilized argon and helium, respectively, as the collision gases at pressures necessary to reduce the initial parent ion intensities by 50-75s.
Computations The MASSPEC routines were written in VAX Fortran and ReGIS graphics. A Digital Equipment Corporation (DEC) VAX computer, Model 8650, and a color graphics terminal,’ either a DEC Model 340, equipped with a mouse, or a DEC Model 240, were used. Any personal computer or workstation equipped with an emulator capable of handling ReGIS color graphics will satisfactorily draw the MASSPEC structures.
INPUT
COMMAND
STRUCTURE
FOR MASSPEC
The three modes for operating MASSPEC are the structure generation, the program execution and the output analysis modes. For each mode of operation, a series of MASSPEC commands are available for executing the program. The commands are listed in Table 1. The screen layouts for each of the modes of operation are similar. The screen is divided into four sections. The largest section is for drawing the proposed molecular structure and displaying the computed molecular substructures. The section to the left of this space is used for file, structure and floating superatom descriptors. The bottom line lists the available commands in the respective operating mode. The six-line region above the bottom line is reserved for executing the commands and for scrolling the output files. Structure generation mode The best way to enter the chemical structure into MASSPEC for mass spectral analysis is to draw the structure on the terminal screen as a set of bonded superatoms. The locations of the superatoms on the screen are determined by the position of the crosshairs activated by use of a mouse or the cursor keys. Bonded superatoms are entered by use of the ADD command, which also requests a label and mass for the superatom. The BOND command draws bonds between the bonded superatoms of any desired bond order. To account for possible adducts, neutral/radical losses and rearrangements, floating superatoms are created. Floating superatoms are entered by use of the FREE command, which also requests a label, mass and minimum and maximum numbers allowed for
462
M.M. SIEGEL
AND
G. GILL
the floating superatoms. Additional commands, LABEL, REDRAW, ERASE, HIGHLIGHT, MOVE and CONNECT, manipulate the drawn superatoms. The HIGHLIGHT command activates a selected superatom as the reference point for ADDing new superatoms and BONDS and for MOVEing the location of a superatom. The CONNECT command automatically bonds superatoms to the currently HIGHLIGHTed superatom as they are ADDed. During the drawing of the chemical structure, a connectivity table for that structure is automatically produced. The drawn superatom structures can be SAVEd and retrieved (GET) from the computer memory. The Molecular Access System (MACCS) file structure format [15] is used for representing the connectivity table (“molfile”, XXX.MOL) for the chemical structures which were generated from bonded and floating superatoms. In general, molfiles generated by MACCS, where every individual atom and its respective mass are represented, can be analyzed by MASSPEC. However, since the structure is represented by atoms and not superatoms, large numbers of duplicate solutions are obtained. MACCS does not have the capability of easily representing bonded and floating superatoms. Program execution mode When MASSPEC is EXECUTEd, the following data or information must be supplied: the bonded superatom parent structure and floating superatoms; the masses of the ions observed in the mass spectrum to be analyzed (NOLMASs); an instruction to perform the analysis either in the nominal or exact mass mode (if exact mass mode is chosen, a mass error window for the analysis is selected); and the maximum bond order sum (K,,) for the analysis is chosen. The bond order sum is the sum of the bond orders for all the bonds broken in the proposed parent structure in order to generate a desired substructure. The K,,, value is generally less than 5 because most fragment ions observed in mass spectra are formed by fewer than five bond cleavages. MASSPEC algorithm The flow diagram describing algorithm for correlating the
the MASSPEC observed and
I---_.~
Flnsl POstanslysis SOI”llO”.
I
Fig. 1. Flow diagram for MASSPEC combinatorial algorithm for correlating bonded and floating superatom masses with the masses of ions observed in a mass spectrum.
predicted ions is illustrated in Fig. 1. The MASSPEC algorithm computes the masses of all possible substructures of bonded superatoms (MMl),
MASSPEC:
A GRAPHICS-BASED
DATA’
SYSTEM
463
subject to the maximum bond order sum constraint (K,,), by using the combinatorial algorithm NEXKSP [16] and the graph theory algorithm SPAN [16]. The method used to produce all the substructures of bonded superatoms was referred to as the “bond removal method” [14]. All possible combinations of floating superatoms (MM2) are also calculated and tabulated. The mass differences between the observed ion masses (NOLMAS) and masses MM1 are searched in the table of masses MM2 by using the binary search subroutine RANGE10 (this use of subroutine RANGE10 for floating superatoms considerably enhances the speed of this version of the MASSPEC algorithm and is one of the major revisions in this release of MASSPEC). All successful searches are possible solutions because they have bond order sums for the cleaved bonds (NBROKE) equal to or less than the desired upper limit (K,,,). When fragmentation and/or selection rules are incorporated into the program and are satisfied, the final substructure solutions are obtained. Post-analysis of these final substructure solutions can also be performed to reduce further the number of possible solutions by apply-
Mass:
183.0000
ing additional rules.
fragmentation
and/or
selection
Output analysis mode The final substructure solutions are accessed using the OUTPUT command. Figure 2 illustrates the typical graphics of the terminal screen for the final substructure solutions. The bottom part of Fig. 2 illustrates a scrolled portion of the complete table of solutions for data processed in the nominal mass mode. The table includes the computed masses of the proposed fragment ion structures, the corresponding bond order sum of the cleaved bonds and the list of bonded and floating superatoms making up the proposed substructure. If the exact mass mode of analysis was chosen, the differences between the masses of the proposed substructures and the measured masses would also be tabulated. This table can be scrolled UP, DOWN and by MASS. A desired entry in the table, indicated with a star, can be displayed in the upper right part of the screen. Superatoms present in the proposed fragment ion structure are enclosed in rectangles and those absent are not enclosed. The upper left part of the screen contains the fragment
1’ 2.0
Free Rtom
Mass Mmb,
H
1.0000
1
El i
Comwnt:
FRAGMENT
ION NH2 C
l
ut up:
S
CH C
mare
Cuts
I6
I2
32 13
182.0000
4.00
l
C
5
N
C
12 14 I2
CH C
l
C
NH CO NH CH CO H 15 26 15 13 2G
1
NH CD NH CH CO
2 1
183.0000
2.00
NH C
S
CH C
N
C
NH CO NH CH
183.0000
3.00
l
l
S
CH C
N
C
NH CO NH CH CO
1
183.0000
4 00
l
c
s
l
C
N
C
NH CO NH CH CO
2
nx hum:
x nearest
nasE,
-display:
S saw:
l
[: create;
0 ouit;
x exit
Fig. 2. Terminal screen graphics for the MASSPEC analysis of the m/z 211 fragment ion of cefixime. Top right: bonded and floating superatoms. Top left: floating superatom details for the analyzed structure. Lowest line: analysis mode commands. Scrolled region (six lines above lowest line): MASSPEC substructure analysis.
M.M. SIEGEL
464
ion mass and the sum of the bond orders of the cleaved bonds (#) and information about the floating superatoms for the proposed structure. Also included are options for CREATEing graphics files (XXX.SDF) for laser printout of the illustrated output structures using the SAVE command.
APPLICATIONS
The CPU time needed to correlate 15 fragment ions with a chemical structure described by 30 superatoms, a relatively large molecule (molecular weight ca. 700 daltons), is 0.5-2.0 min. For experimental data acquired under very low resolution conditions with uncertainties in the nominal masses, the data are analyzed in the exact mass mode with a large nominal mass error window. The criteria generally used for predicting the best substructure among a number of possibilities are the substructure with lowest bond order sum and the substructure most consistent with mass spectral fragmentation rules. From a practical point of view, after entering a parent structure into MASSPEC, one should check that the parent structure has the predicted molecular weight by executing the program in either the nominal or exact mass mode.
Strategies for using MASSPEC As MASSPEC works by combinatorial algorithms to generate substructures, the minimum number of bonded and floating superatoms should be created which fully describe the parent structure and all the expected adduct, neutral/radical loss and rearrangement ions. The lower the number of superatoms used to describe the chemical structure, the higher is the speed of the algorithm. However, when in doubt as to whether the number of superatoms chosen is sufficient to describe fully the structure, extra superatoms should be created because the speed of the algorithm is very 63 +2H H -I.I
169
L-b +H
MASSPEC analysis of MS-MS daughter ion spectra The structure illustrated in Fig. 2 is the proposed fragment ion structure for the ion of m/z 211 appearing in the TSP mass spectrum of cefi-
-H
125
F-56
169
166
t
t
NH,
151 125 +H 1
loo
Ii0
Mass (daltons) Fig. 3. TSP tandem mass spectrum of the m/r
G. GILL
high.
OF MASSPEC
-H
AND
211 fragment ion of cefixime and the proposed
fragment ion substructures.
MASSPEC:
A GRAPHICS-BASED
DATA
465
SYSTEM
ing the above structural and mass spectral data into MASSPEC and executing the program in the nominal mass mode, the output graphics, illustrated in Fig. 2, were generated on the screen. The user scrolls the output data and correlates the observed fragment ions with the predicted substructures and selects the most likely candidate based on the known fragmentation rules. In this instance, all the predicted substructures computed by MASSPEC were subjected to a post-analysis selection scheme to check whether the substructures were even- or odd-electron. Even or odd numbers were simply computed from the following equation
xime, a third-generation cephalosporin antibiotic [17]. Each node in the structure corresponds to a bonded superatom. For convenience, the labels for these bonded ‘superatoms correspond to the elemental compositions of the nodes, e.g., CO, CH, NH. A floating superatom was also included in the calculation and it was labeled H (hydrogen) with a nominal mass of 1 dalton. The number of H floating superatoms used in the calculation ranged from a minimum of - 1 to a maximum of + 1. Hence, there were three possibilities for H floating superatoms in the calculation, namely, a neutral H loss (-1) no H present (0) or an H (proton) adduct (+ 1). The spectrum illustrated in Fig. 3 is the TSP tandem mass spectrum of the m/z 211 fragment ion of cefixime acquired on a triple quadrupole mass spectrometer. After enter-
B + M + 1 = even or odd number where
B = number
of bonds
broken
r
Minocycline - NH,
10.0,
lb0 +l.i
3
7’
in the neutral 3040.
3048. L 2.7
2 83
395
298
I ,
1,452,
462'
,47,4,
450
400
,4?0
, 500
Mass (daltons) Fig. 4. Proposed
structure
for minocycline
- NH,
and the FAB tandem
mass spectrum
of the [M - NH,
+ H]+ ion of minocycline.
466
Fig. 5. Illustration
M.M.
of the MASSPEC
screen graphics
for the minocycline
superatom molecule to create a fragment ion and A4 = number of monovalent floating superatoms added to (lost by) the superatom fragment ion. Even numbers correspond to even-electron fragment ions and odd numbers correspond to odd-electron fragment ions. In TSP, CI and FAB MS and MS-MS experiments, even-electron parent ions generally tend to produce even-electron fragment ions. By applying this post-analysis scheme to the cefixime m/z 211 even-electron parent ion, 50% of the computed fragment ions are the more likely even electron fragment ions. Most of the proposed fragment ion structures, illustrated with the bonded superatom structure in Fig. 3, are even-electron ions. These results confirm the proposed ketene structure for the m/z 211 fragment ion. Figure 4 illustrates the proposed structure for minocycline - NH, and the FAB tandem mass spectrum of the [M - NH, + H]+ ion of the minocycline antibiotic acquired on a triple quadrupole mass spectrometer. The spectrum was analyzed by MASSPEC in the nominal mass mode by using the bonded and floating superatoms illustrated in Fig. 5, copied from the structure input screen. Again, note that many of the superatom names are their elemental compositions and that the aromatic ring D is expressed as a superatom with the same label. The proposed fragment ion structures were analyzed using the methods described above for cefixime. The most consistent fragment ion structures are illustrated in Fig. 6.
- NH,
superatom
SIEGEL
G. GILL
AND
structure.
b Ii
I_- 2H,O +H
239
-CH, -215
-co
187
+H I 441
I-
I
-co
Ii,0
423
1
-CH,
408 I-co 3’50
I-HN
,CH, ‘CH,
413
396
391
368
-co 395 p. 387
352 -co I 324
Fig. 6. Correlation of the minocycline - NH, structure the fragment ion masses observed in the FAB tandem
with mass
MASSPEC:
A GRAPHICS-BASED
DATA
SYSTEM
467
Most of the predicted fragment ions are formed by losses of the neutral molecules H,O, CO and HN(CH,), and the CH, radical. Most of the ring cleavages occur at ring A.
a post-analysis program to select the ion structures most consistent with the peptide chemistry. Pucci and Sepe [19] have written a dedicated program to analyze post-translational modifications of known polypeptides which were studied by FAB MS. One example from their data will now be analyzed by MASSPEC. An ion at m/z 2095 was believed to be a post-translationally modified phosphorylated proton-adduct molecular ion from the polypeptide TPl, which was obtained from a tryptic digest of buffalo P-casein. The TPl polypeptide structure consisted of 27 amino acid residues and was entered into MASSPEC as illustrated in the top right part of Fig. 7. The bonded superatoms, each representing an amino acid residue, have three-letter labels; the first two characters represent the position number of the amino acid residue in the peptide chain and the third is the single letter code for the amino acid residue. The floating superatoms are one H,O molecule which is equivalent to H and OH groups for terminating the ammo acid sequence at the Nand C-termini, respectively, up to five PO, groups for generating the possible phosphate groups with amino acid residues containing hydroxyl groups and one H ion for producing the proton adduct molecular ion. Figure 7 illustrates the output
MASSPEC analysis of post-translational modifcations to known proteins The MASSPEC program is ideally suited for correlating the structures of peptide ions appearing in FAB mass spectra which originate from a digested polypeptide believed to have undergone post-translational modifications. The procedure for solving this problem by using MASSPEC is as follows. The primary polypeptide known structure is entered into MASSPEC as a set of bonded superatoms and the chemical units suspected of generating the post-translational modifications are entered as floating superatoms. An extensive list of post-translational modifications from neuropeptides has been tabulated by Andrews and Dixon [18] and each could serve as a floating superatom in the MASSPEC analysis. The MASSPEC algorithm computes all combinations of bonded and floating superatoms with masses equal to the masses of the ions believed to originate from post-translationally modified peptides. The resulting answers can then be analyzed manually or by lass:2095.0000 I):2.00 FreeRtons Mass Rmbsr II20
aoooo
1
PO3
80.0000
4
1.0000
1
II
cuts129113129129113114 99 97 57129113 99129 87113 87 97 97129129 8711310113711311412917 90
1
2008.0000
2.00
l
l
03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 +
2095.0000
2.00
l
l
l
l
l
2095.0000
2.00
l
l
+
l
05 06 07 OR 09 10 11 12 13 14 15 16 17 16 19 20 21
l
l
l
*
l
rnd)PP
.
E%
2192. 0000 1. 00
l
l
06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22
l
l
*
l
l
l
l
l
l
l
l
1
4
1
l
l
l
l
l
1
4
1
l
l
l
l
l
1 4
1
11 12 13 14 15 16 17 19 19 20 21 22 23 24 25 26 27
UXUD; De Down: II ne4rest “499: -dis914y: S SW:
Fig. 7. MASSPEC
l
C Cndte:
1
4 I
0 Quit: X exit
output screen for the analysis of post-translational
modifications
of the peptide TPl.
M.M. SIEGEL
468
screen of MASSPEC for this problem. The bottom part is a scrolled region of the MASSPEC solutions. Two isobaric solutions for the m/z 2095 proton adduct ,molecular ion are indicated. The structure for the starred solution is indicated in the top right part of Fig. 7 by the bonded and floating superatoms enclosed in rectangles. The top left part lists the numbers of floating superatoms present in the illustrated structure. Note that all the bonded superatoms are interconnected and that there are one H,O, one H and four PO, floating superatoms. Therefore, the [M + HI+ at m/z 2095 is consistent with being phosphorylated four times, one phosphate group for each of the four serine residues (S) in the peptide substructure. Two isobaric substructures were computed because the only difference between the two substructures is that one has leucine (L) on the Nterminus and the other has isoleucine (I) on the C-terminus, but both L and I are isobaric amino acid residues.
AND
G. GILL
MASSPEC analysis of MIKES, linked scan and plasma desorption spectra All mass spectral data acquired in the MIKES and linked scan modes with double focusing magnetic sector instruments and plasma desorption data acquired with time-of-flight instruments have relatively large uncertainties in the measured mass values owing to the low resolution of the spectrometers in these modes. In many instances, the experimental mass uncertainty for ions of up to 3000 daltons is about f 1 dalton. A convenient way to use MASSPEC for studying these data is to analyze the nominal mass data in the exact mass mode with a mass error window of f 1000 millidaltons. In this way, for every measured ion mass, three masses would be correlated with a proposed structure, namely the measured nominal mass and masses 1 dalton above and below the measured mass. By applying the appropriate fragmentation rules, the most likely fragment ion structures could be determined. This method of analysis will now be illustrated for the interpreta-
(A)
Floating Superatoms
MIN
n
MAX t
Na
1 -1 -1 -2
1 0 0 +2
H,C CH,OH H
OCH,
Fig. 8. (A) Structure for LL-E10920a.
of antibiotic
LL-E19020a
and its reduction
into a set of bonded
superatoms.
(B) Bonded
superatom
structure
MASSPEC:
A GRAPHICS-BASED
DATA
469
SYSTEM
tion of the FAB MIKES spectrum generated from the [M + Na]+ parent ion of the novel antibiotic growth promoting agent LL-E19020a [20]. This relatively large’ molecule is illustrated in Fig. 8A. It has a molecular weight of 1225 daltons and the elemental composition C,, H,,NO,, . To correlate the fragment ion masses with all possible substructures for such a large molecule by manual calculations would be nearly impossible because one would not be certain if all possible substructural combinations were calculated. This is achieved with MASSPEC, which calculates all the possible combinations of the bonded and floating superatoms.
Figure 8A illustrates the structural units of LL-E19020a that were reduced into bonded superatoms. The bonded superatoms were created by keeping the sugar ring structures intact (Su) and cleaving nearly all the other single bonds in the structure excluding the single bonds between conjugated double bond moieties. The complete bonded superatom structure is illustrated in Fig. 8B where each bonded superatom is numerically labeled. A total of 28 bonded superatoms were created. The floating superatoms are indicated in the top right corner, viz., one Na atom, the loss of up to one H,O and/or CH,OH molecules and the range of from two H losses to two H gains. Figure
100
90
60
30
20
10
0
2iIo
600
Mass (daltons) Fig. 9. FAB MIKES spectrum with the fragment ions.
for [M + Na]+
parent
ion of antibiotic
LL-El902011
and the MASSPEC
correlation
of the structure
470
9 illustrates the FAB MIKES spectrum of the [M + Na]+ parent ion of LL-E19020cr and the proposed interpretation of the spectrum based on the proposed structures generated by MASSPEC. The most likely structures were selected from the generated structures by applying the following fragmentation rules, culled from FAB MS-MS studies of ionophores [21], a similar class of antibiotics, and the known chemistry of the molecules. The charge for each fragment ion is carried by a metal ion. This was based on Na+ and K+ adulteration studies in which all the observed fragment ions shifted by the mass difference between the two metal ions. All fragment ions are evenelectron ions and no radical ions are formed. Ions containing the elements C, H and/or 0 and Na or K without (with) N are of odd (even) mass. As there is only one N atom in LL-E19020a, the ion mass indicates the presence or absence of nitrogen in the fragment ion. The masses of the predicted ion structures for the FAB MIKES fragment ions were correlated and checked for consistency with the masses of similar ions observed in the FAB mass spectra of known nominal mass. Most of the fragment ions identified (see Fig. 9) appear as losses of the simple molecules H,O, CH,OH, phenyl-CH = C = 0 or SuO, and, their combinations from the parent ion [M + Na]+. Some fragment ions also originate from cleavages across the A ring, B ring and the amide bond. These results were consistent with fragment ions observed in a series of derivatives and analogues of LL-E-19020a studied by FAB MIKES. Combinatorial calculations using MASSPEC The MASSPEC data system can be used for a variety of combinatorial calculations often needed for the analysis of mass spectral data. The most popular combinatorial calculation in mass spectrometry is the calculation of the elemental composition from exact mass measurements. This is accomplished by running MASSPEC in the exact mass mode with a given small error window where all the possible elements (or groups of elements) and their minimum and maximum numbers are entered into the data system as floating superatoms. The observed exact masses are entered into
M.M.
SIEGEL
AND
G. GILL
MASSPEC and the elemental compositions computed. In a similar fashion, exact mass calculations can be performed when a chemical structure is proposed using bonded superatoms as illustrated previously [13] for avermectin A,,. An additional combinatorial application is when a partial structure for a compound is available from an ancillary spectroscopic technique, e.g., NMR, IR, x-ray, and exact mass measurements are available for a number of fragment ions. In this instance, the partial structures can be described as bonded superatoms and the possible elements described as floating superatoms. MASSPEC can then be used to compute possible combinations of the elements together with the partial structures to aid in predicting the chemical structure. Similar calculations can be performed when all chemical structures over a large mass range are desired. In the peptide sequencing algorithm for FAB MS and FAB MS-MS data [22], look-up tables are needed which contain all combinations of amino acid residues up to a desired upper mass. This is achieved by entering into MASSPEC all the desired amino acid residues either as nominal or exact mass values. The MASSPEC data system is then run in the exact mass mode with a mass error window equal to the desired upper mass value for the table. Also, only one observed ion mass is entered into the data system and is set equal to the lowest desired mass in the table, assumed here to be zero daltons. (In general, though, to create tables displaced from zero daltons, the observed experimental ion mass entered into the data system should be the central mass of the table and the mass error window should be half the mass range of the table.) In a similar fashion a table of all the possible fragment ions expected to be observed in the FAB mass spectrum of a cyclic peptide can be computed. The structure is drawn with bonded superatoms representing the amino acid residues and one floating H superatom needed for generating acylium ions. The MASSPEC data system is operated in the exact mass mode as described above for generating the amino acid residue table with the additional entry to the query for the maximum number of bond cleavages (K,,,) set to 2.
MASSPEC:
A GRAPHICS-BASED
DATA
471
SYSTEM
Conclusions The MASSPEC data system has been found to be a powerful tool for correlating known or proposed structures of moderately large size with a mass spectrum or tandem mass spectrum. The data system should also be useful when only partial structural information is available from mass spectral data or any other spectroscopic data, e.g., NMR, IR, x-ray. The MASSPEC data system can be used with either nominal or exact mass data obtained in any ionization mode. The MASSPEC data system is user-friendly because it operates in the graphics mode in which proposed structures are entered in standard chemical notation and structural correlations are rapidly computed and illustrated. For a known or proposed structure, MASSPEC can compute all the possible substructures. A variety of possible substructures for a given ion mass are generally found. If selection and/or fragmentation rules can be compiled, the program could then select the most likely structure candidate. This was demonstrated above with cefixime for even- and odd-electron fragment ion selection rules and was recently demonstrated for simple fragmentation rules (retro-aldol and lactone ester reactions) [14]. Generalization of rule-based analysis of MASSPEC substructures awaits further development. Another possible way to aid in selecting the most likely substructure candidate is to use as an ancillary tool semi-empirical molecular orbital (MO) calculations. Molecular orbital calculations can more precisely define the bond orders between adjacent superatoms and thereby aid in selecting the more likely substructure since, in many instances, the most likely substructure has the lowest bond order sum for bonds cleaved from the parent structure (NBROKE). Likewise, when trying to select the best parent structure from a number of possible candidates by interpreting the mass spectral fragment ions, a statistical criterion could be developed for the best fit. If the bond orders for the possible structures were computed by molecular orbital calculations, the structure having the minimum sum for the products of ion intensity and the MO-derived bond order sum NBROKE, for all the fragment ion substructures, should be consistent with the most likely parent
structure. In a similar fashion, if the electric charge is calculated by semi-empirical molecular orbital methods for each side of the bond cleavages for all possible substructures of a proposed molecule, a mass spectrum may be predicted. The MASSPEC manual and listings are available from the corresponding author to interested readers who send a formatted 3.5inch doublesided disc (Macintosh compatible) and a written statement that the software is for their personal use and will not be sold commercially.
REFERENCES
1 D.P. Martinsen and B.-H. Song, Mass Spectrom. Rev., 1 (1985) 461. 2 E. Sorkau, B. Adler, G. Fit and Z. Hippe, Chem. Anal. (Warsaw), 31 (1986) 377. 3 C.G. Enke, A.P. Wade, P.T. Palmer and K.J. Hart, Anal. Chem., 59 (1987) 1363 A. 4 A.P. Wade, P.T. Palmer, K.J. Hart and C.G. Enke, Anal. Chim. Acta, 215 (1988) 169. 5 H.L. Lohninger and K. Varmuza, Anal. Chem., 59 (1987) 236. 6 K. Varmuza and W. Werther, paper presented at the 9th International Conference on Computers in Chemical Research and Education (ICCCRE), Riva de1 Garda, Italy, May 28-June 2, 1989. 7 W. Werther and K. Varmuza, in J. Gasteiger (Ed.), Software-Development in Chemistry 4, Springer, Berlin, 1990, in press. 8 D. Zhu, J. She, Q. Hong, R. Liu, P. Lu and L. Wang, Analyst, 113 (1988) 1261. 9 W. Hanebeck, H. Saller, J. Gasteiger, in J. Gasteiger (Ed.), Software-Entwicklung in der Chemie 2, Springer, Berlin, 1988, pp. 197-209. 10 H.J. Luinge, Trends Anal. Chem., 9 (1990) 66. 11 D. Weininger, J. Chem. Inf. Comput. Sci., 28 (1988) 31. 12 M.E. Munk and B.D. Christie, Anal. Chim. Acta, 216 (1989) 57. 13 M.M. Siegel, Anal. Chim. Acta, 174 (1985) 61. 14 M.M. Siegel, N. Bauman and G.T. Carter, Anal. Chim. Acta, 186 (1986) 163. 15 S. Anderson, N. Zimmerman and G. Busell, MACCS (Molecular Access System) Reference Manual, Molecular Design, Hayward, CA, 1984, Appendix B. 16 A. Nijenhuis and H.S. Wilf, Combinatorial Algorithms, Academic, New York, 1975, Chap. 3 and 14. 17 M.M. Siegel, R.D. Isensee and D.J. Beck, Anal. Chem., 59 (1987) 989. 18 P.C. Andrews and J.E. Dixon, Methods Enzymol., 168 (1989) 72.
472 19 P. Pucci and C. Sepe, Biomed. Environ. Mass Spectrom., 17 (1988) 287. 20 G.T. Carter, D.W. Phillipson, J.J. Goodman, T.S. Dunne and D.B. Borders, J. Antibiot., 41 (1988) 1511.
M.M.
SIEGEL
AND
G. GILL
21 M.M. Siegel, W.J. McGahren, K.B. Tomer and T.T. Chang, Biomed. Environ. Mass Spectrom., 14 (1987) 29. 22 M.M. Siegel and N. Bauman, Biomed. Environ. Mass Spectrom., 15 (1988) 333.