Computer-assisted interpretation of mass spectrometry—mass spectrometry data of potentially hazardous environmental compounds

Computer-assisted interpretation of mass spectrometry—mass spectrometry data of potentially hazardous environmental compounds

w 175 OriginalResearch Paper Chemometricsand IntelligentLaboratory Systems, 19 (1993) 175-179 Elsevier Science Publishers B.V., Amsterdam Computer...

407KB Sizes 2 Downloads 105 Views

w

175

OriginalResearch Paper

Chemometricsand IntelligentLaboratory Systems, 19 (1993) 175-179 Elsevier Science Publishers B.V., Amsterdam

Computer-assisted interpretation of mass spectrometry-mass spectrometry data of potentially hazardous environmental compounds W.J. Dunn III and Dorothy Swain Department of Medicinal Chemistry & Phannacognosy, University of Illinois at Chicago, 833 S. Wood, Chicago, IL 60680 (USA) (Received

3 September

1992; accepted

4 January 1993)

Abstract

Dunn III, W.J. and Swain, D., 1993. Computer-assisted interpretation of mass spectrometty-mass spectrometty data of potentially hazardous environmental compounds. Chemometricsand IntelligentLaboratory Systems, 19: 175-179. The recent development of tandem mass spectrometry (MS), with much higher information content in the mass spectrum, makes it possible to obtain higher identification accuracy compared to that obtained with one-dimensional mass spectra. We have begun a project to explore the use of computational pattern recognition methods to identify compounds from their tandem mass, or MS-MS, spectra. We have obtained MS-MS spectra of several potentially hazardous polycyclic aromatic hydrocarbons and other compounds of environmental interest and applied pattern recognition to these spectra. Some of our recent findings are discussed below.

INTRODUCTION

The monitoring of ambient air quality is a complex process. Samples can contain as many as 200-300 compounds appearing in very low concentrations. The analytical method of choice is gas chromatography-mass spectrometry (GCMS) and the US Environmental Protection Agency has adopted gas chromatographyCorrespondence to: W.J. Dunn III, Department of Medicinal Chemistry & Pharmacognosy, University of Illinois at Chicago, 833 S. Wood, Chicago, IL 60680 (USA).

0169-7439/93/$06.00

quadrupole mass spectrometry as its primary monitoring tool. Data interpretation is done by a version of the software available on the instrument which is a library searching method. An Environmental Protection Agency study has shown that the software is only 70% accurate when used to interpret mass spectral data similar to that from the monitored target compounds. Tandem mass spectrometry is one of the several currently explored hyphenated analytical methods which hold promise for improved accuracy in environmental monitoring. Schwartz et al. [l], have delineated the various scan modes which

0 1993 - Elsevier Science Publishers B.V. All rights reserved

176

WJ. Dunn III and D. Swain /Chemom.

can result from multid~ensional mass spectrometry; we limit our discussion to the treatment of what they term MS-MS spectra. There are four possible MS-MS, or, using the terminology of Schwartz et al. [l], ms’, experiments. If m, and m2 represent parent and product ions, respectively, the MS-MS data we are interested in are those in which both parent and product ions are variable in the mass domain. Such spectra can have an order of magnitude more information than the traditional electron impact mass spectra. A typical spectrum is shown in Fig. 1 for 2,4-dichlorophenoxyacetic acid. There have been few reports of methods for MS-MS interpretation and these have been limited to library searching methods [2] in which grammatical rules are used. These methods are similar to those used to interpret traditional mass spectral data which have been shown to be of limited accuracy when used in compound identification schemes [1,3]. Fig. 1 can be considered a matrix in which the first row is the traditional electron impact mass spectrum with its elements being the intensities of the parent ions. The remaining rows represent the intensities of product ions from the columnspecific parent ion. Here we apply computational pattern recognition to a set of MS-MS data of ~tentially hazardous com~unds in an effort to explore the utility of the approach in compound identification and monitoring.

Fig. 1. Tandem mass spectrum of 2,4-dichlorophenoxyacetic acid.

Intell. Lab. Syst. 19 (1993) 175-l 79/0riginal

DATA COLLE~ON ODS

Research Paper

AND CONNATIONS

H

METH-

The compounds whose spectra are used in this study are given in Fig. 2. All are potentially hazardous and are of concern as en~ro~ental #ntaminants. There are twelve pure compounds in the study. The details of the data collection have been reported [4]. In the case of the polycyclic aromatic hydrocarbons and xylenes, the spectra of one parent ion was obtained whereas in the case of the pheno~acetic acid analogs and the be&dine, three parent spectra were obtained. The xylenes were separated by gas chromatography into two fractions, IV and V. One appeared to be a single component and the other appeared to be a mixture of the other two isomers. Since no standards were available for these compounds, their identities cannot be established. Each spectrum was represented as a matrix. This treats each spectrum as a ‘class’ in the traditional pattern recognition sense. The first row of each matrix is the parent ion intensities which for this study span the common m/z range of 40-384. Product ion spectra were scaled relative to that of the parent which represents 100%. Because the matrices are sparse, the spectra were truncated. Row two was ~nst~cted by su~ing the intensities of the ions which represent loss of masses l-10; row three is the sum of intensities

n

W.J. Dunn III and D. Swain /Chemom.

Intell. Lab. Syst. 19 (1993) 175-l 79/0riginal

of the ions which correspond to loss of 11-20, etc. The result is a final matrix of order 36 x 345. In this truncation process some information is lost. All computations were carried out on an IBM or compatible desk top computer using the UNIPALS software containing principal components and partial least squares analysis methods as developed in this laboratory [5]. Each spectrum was tested as a class and a principal components model was derived for each spectrum based on Eqn. 1. A xii =xj

+

C

tiabaj+ eij

(1)

a=1

The index i is row-specific and the index j is column-specific; xi is the column mean and eij is the residual. The principal component scores are ti, and loadings are b,j. A is the number of components, or product terms, in the model.

RESULTS

AND DISCUSSION

In all cases, one-component models, A = 1, were computed. The average variance explained per spectrum was 94% which represents a significant reduction in variance and has some implica-

TABLE

Research Paper

177

tions regarding the use of principal component models to store MS-MS spectra. Each spectrum was then fitted to the model for each spectrum and the residual standard deviation of fit (RSD) was computed by 345 36 RSD

=

c

c

e:/(36

x

345 - 2)

(2)

j=li=l

The prediction matrix is given in Table 1. The diagonal elements (bold) of the matrix are the residual standard deviations of fit of each compound to its model while the off-diagonal elements are the corresponding fits of each compound to the other spectral models. In almost all cases the spectra are better approximated by their own model. Compounds IV and V are the xylene fractions which cannot be distinguished. Compounds X and XI have the same molecular weights. X fits its own model better than it fits that of XI but the model for XI cannot distinguish XI from X. The results of this preliminary study of applying pattern recognition methods to MS-MS data are encouraging. Still remaining are a number of questions. In most of the cases discussed here, the spectra contained only one daughter ion which was modeled by a one-component model. It must be determined how many daughter ions are opti-

1

Prediction matrix for application of pattern recognition to the MS-MS data Model

Spectrum I

I II III IV V VI VII VIII IX X XI XII XIII XIV

0.24 1.20 1.21 1.19 1.19 1.20 1.19 1.19 1.20 1.19 1.19 1.19 1.19 1.19

II

III

IV

v

VI

VII

VIII

IX

x

1.62 0.32 1.62 1.62 1.62 1.62 1.62 1.62 1.62 1.62 1.62 1.62 1.62 1.62

1.68 1.68 0.53 1.66 1.66 1.68 1.68 1.68 1.68 1.68 1.68 1.68 1.68 1.68

1.28 1.29 1.29 0.24 0.25 1.29 1.28 1.28 1.28 1.28 1.28 1.28 1.28 1.28

1.28 1.28 1.28 0.27 0.25 1.28 1.27 1.28 1.28 1.28 1.28 1.28 1.28 1.28

1.27 1.27 1.29 1.27 1.27 0.32 1.27 1.27 1.27 1.27 1.27 1.27 1.27 1.27

1.32 1.32 1.33 1.31 1.31 1.32 0.24 1.31 1.32 1.31 1.32 1.31 1.32 1.32

1.20 1.21 1.22 1.21 1.21 1.21 1.21 0.11 1.21 1.20 1.21 1.21 1.21 1.21

1.56 1.56 1.57 1.56 1.56 1.56 1.56 1.56 0.43 1.56 1.56 1,56 1.56 1.56

1.24 1.25 1.26 1.25 1.25 1.25 1.25 1.24 1.25 0.10 0.38 1.24 1.21 1.25

XI

XII

XIII

XIV

1.41 1.42 1.43 1.41 1.41 1.42 1.41 1.41 1.41 0.58 0.44 1.41 1.38 1.41

1.16 1.16 1.18 1.16 1.16 1.16 1.16 1.16 1.16 1.16 1.16 0.12 1.16 1.16

1.58 1.58 1.59 1.58 1.58 1.58 1.58 1.58 1.58 1.51 1.51 1.58 0.55 1.58

1.07 1.08 1.09 1.07 1.07 1.08 1.07 1.07 1.07 1.07 1.07 1.07 1.07 0.32

W.J. Dunn III and D. Swain / Chemom. Intell. Lab. Syst. 19 (1993) 175-l 79 /Original

178

(2-methyl+chloro

3,3’-dlmethoxy benzldine MW = 242

Research Paper

n

phenoxy)-acetic acid

MW = 200 (for %I)

Cl

a;;; q””

H3CyH3

CH3w

CH3

CH3

two xylenes

(2,4-dlchloro phenoxy)-acetic acid

fractions MW = 106

MW = 220 (for %I)

8) \I



p, p’ - bitolyl MW = 182

I

CH3

9,10 - dimethyl anthracene MW = 206

perylene MW = 252 Fig. 2. Compounds

triphenylene MW=228

1,2,3,4dlbenzanthracene MW q 278

CH3

retene MW=234

9,10 - diphenyl anthracene MW = 330

3

4 3

. I

‘I

1,2 - benzpyrene MW q 252

m - quinquephenyl MW = 382

used in this study.

ma1 for general compound identification. In general, one-dimensional spectra do not contain sufficient information to distinguish between isomers and it is of interest to see if isomer discrimination is possible using MS-MS data. Another problem which must be addressed is that the computational time required to fit spectra to stored models increases exponentially with the

number of spectra or models in the data base. These problems are currently being delt with. ACKNOWLEDGEMENTS

The authors would like to acknowledge the assistance of Tammy Jones of the US Environmental Protection Agency, Las Vegas, and R.

n

W.J. Dunn Ill and D. Swain /Chemom. Intell. Lab. Syst. 19 (1993) 175-l 79/Original Research Paper

Taalat of ABC Labs, Columbia, MO, for providing some of the spectra. One of us CDS) would like to thank the University of Illinois at Chicago for a Graduate College Fellowship.

REFERENCES J.C. Schwartz, A.P. Wade, C.G. Enke and R.G. Cooks, Systematic delineation of scan modes in multidimensional mass spectrometry, Analytical Chemistry, 62 (1990) 18091818. K.J. Hart, P.T. Palmer, D.L. Diedrich and C.G. Enke,

119

Generation of substructure identification rules using feature-combinations from tandem mass spectra, Journal of the American Society of Mass Spectrometry, 3 (1992) 1.59168. 3 W.J. Dunn III, S.L. Emery, W.G. Glen and D.R. Scott, Preprocessing, variable selection, and classification rules in the application of SIMCA pattern recognition to mass spectral data, Enuironmental Science and Technology, 23 (1989) 1499-1505. 4 D. Swain, W.J. Dunn III and R.E. Talaat, Pattern recognition studies of MS/MS spectra, Analytica Chimica Acta, 277 (1993) 305-311. 5 W.G. Glen, W.J. Dunn III and D.R. Scott, UNIPALS, Software for principal components and partial least squares analysis, Tetrahedron Computer Technology, 2 (1989) 377390.