Spectral data bases for chemical compound identification

Spectral data bases for chemical compound identification

Computer Physics Communications 33 (1984) 85—92 North-Holland, Amsterdam 85 SPECTRAL DATA BASES FOR CHEMICAL COMPOUND IDENTIFICATION Jaroslav FIALA ...

717KB Sizes 1 Downloads 93 Views

Computer Physics Communications 33 (1984) 85—92 North-Holland, Amsterdam

85

SPECTRAL DATA BASES FOR CHEMICAL COMPOUND IDENTIFICATION Jaroslav FIALA Central Research Institute .~KODA,31600 Plzeh, ~SSR

The status of the large data bases for identifying unknown compounds by the use of X-ray diffraction, infrared spectrometry, Raman spectrometry and mass spectrometry is discussed. The paper reviews various spectra-encoding methods and library search techniques. The efficiency of identification of components in mixtures is evaluated from the point of view of information theory and miscellaneous methods to separate the spectra of pure compounds are mentioned. Finally, the possibilities of interpretation of spectra not included in the spectra data base are summarized.

I. Introduction

spectrum of the mth substance:

1m = (Xmi,

2

,x,,~), and consists in matching the reference spectra with the spectrum x = (x1, x2,. ,x~)of an unknown compound. To match the spectra we have to find the values c1, c2 cm that minimize the residual ,.

There are approximately five million substances currently known [1] and this number grows by about 200 000 new substances every year. To identify the composition of an unknown material (to discern the substances it consists of and to determine their concentrations) is therefore a difficult problem. The substances are represented by spectra of characteristic physical properties (molecular spectra, diffraction spectra, mass spectra) that behave as 1A linear combinations of the1B pure = (xAl, XA2 xAfl), = components: let (XBI, x~2,. .,XBS) and x~= (xc1, X~2,...,Xc~) be spectra of substances A, B and C, respectively [2—4J;then the spectrum of their mixture containing cA% of component A, cB% of component B and cc% of component C will be

x,~

. .

. .

~



~

1=1



i.e. the quantities Xl



m

m

c,x, 1



x2

m

I

c,x,~,...,



x~



I

c.x,~

.

I

=

cAxA + cBxB + c~x~

(see ref. [5J).The arduousness of the spectral identification stems from the large volume of spectral data bases and from the fact that the spectra of analyzed substances are given not only by their composition but also even when to a smaller extent by the details of the experimental arrangement used and the sample preparation pro—



=

(cAxAl + cRxBl + +c~x~2,. . ~

C~X~1,cAxA2

+

CBXBn

+

CBXB2

cedure applied.

+ ccxc~).

Spectral identification is based on a collection of spectra: spectrum of the 1st substance: x1 = (x11, x12 spectrum of the 2nd substance:

12 =

,x1,~), (x21, x22 .~

2~



2. Methods of chemical compound identification Crystalline materials can be identified using diffraction: A narrow monochromatic beam of X-rays, electrons or neutrons passing through such a material is scattered (diffracted) into specific directions which for the given radiation wave-

0010-4655/84/$03.00 © Elsevier Science Publishers B.V. (North-Holland Physics Publishing Division)

.1. Fiala

86

/

Chemical compound identification

length (energy) are characteristic of the analysed substance, the diffraction pattern (directional distribution of the diffracted intensity) thus being a compound fingerprint so to speak [6]. Over the past forty years the ASTM Joint Committee on Powder Diffraction Standards and the International Centre for Diffraction Data have compiled a file of over 40000 diffraction spectra; the data base is updated and expanded continuously with both the addition of new spectra and the replacement of older with higher quality new spectra, presently at a rate of 2000 spectra per year [7]. A number of computer search and match programs operating on this data base as well as on smaller files have been worked out [6,8—15]. If we let a collimated beam of infrared radiation pass through a sample of the substance under investigation it will interact with the molecules of the substance causing transitions between their different vibrational—rotational states. These transitions give rise to the absorption bands appearing on the spectrum of the transmitted beam. The infrared absorption spectra are used to identify the compound by comparison of its spectrum with those on a reference file. There is a number of collections of infrared spectra [16] as well as cornputer-based search systems [17—23].The largest data bank contains about 150000 infrared spectra. Another, similar technique for chemical compound identification is based on Raman spectroscopy: e.g., the identification system described in ref. [24] uses about 3500 Raman spectra. The composition of an unknown substance also may be determined by mass spectral analysis of molecular fragments which are produced from the molecules of the substance under investigation by electron impact, by a high-intensity laser pulse or by ion-beam excitation, field ionization, etc. The largest mass spectral library contains about 40000 spectra [1,25] and there are several computerized mass-spectral search systems available [1,25—32]. Other techniques for identification of UflkflOWfl compounds by comparison of their spectra with a reference library are: nuclear magnetic resonance spectroscopy (H-data bases containing 8000 spectra [33] and 15000 spectra [34], C-database contaming 4000 spectra [35]), absorption spectrophotometry ([34,36] data bank of 5000 spectra), etc. —

3. Representation of spectra In general terms, each spectrum can be regarded as a group of peaks plotted in some kind of “energy” or “time” scale t vs. absolute or relative intensity x = x( 1). By dividing the range (Ia, tb) of the measured quantity 1 into n intervals (windows) ~ = t0, ti), (ti, 12), (‘2, 13) (t~ t,, = — ~,

(1) we can expressvector the spectrum in the form of an n-dimensional [3,4,10,37,38] .~

x2

=

x~)= (x(11), x(12)

x(t~)), (2)

where each resolution element or sampling point represents a different dimension. Various spectraencoding methods differ in the number of resolution units n (e.g. n = 130 [37], 200 [39], 640 [40]) and the precision to which the peak heights x, are recorded. Retaining complete spectra in detail presents an inordinate amount of data for storage and searching. To reduce both the time and search requirements the spectra are abbreviated (compressed) by setting n low and classifying all peaks into a small number of groups according to their intensity values x,. In the extreme case only two intensities, “0” and “1” are distinguished so that the spectrum will be a string of zeros and ones:

{~

x,

(

1

. . .

the spectrum x

=

x t) has a peak in

0

. .

the spectrum x

=

x ( t) has no peaks in

(t~— ~, t).

Such a spectrum may also be coded as a sequence

(

,

12

of the values t corresponding to the peak positions (the average number of peaks per spectrum is m = 35 for diffraction and infrared spectra, m = 65 for mass spectra). Sometimes only the k( = s, 10, 15,...) strongest peaks ~,

(ti, 12

~k)

J. Fiala

/

Chemical compound identification

are retained, occasionally with the corresponding intensity values x, i.e. r

L~

Each point in the spectrum x

~1’ t2’”~’tk

87

=

x(r)

=f

+

s(f)e2~”df

X

2,..

.

xJ~

Peak intensity data are coded relatively to the highest peak or in an absolute scale. In another abbreviation scheme the spectrum is divided into broad windows (1) containing 5—10 peaks each and only one or two most intense peaks per window are encoded [30]. The locations of peaks of the spectra in a given collection are distributed unevenly. The lower the probability of finding a peak of a spectrum x = x (I) in a given window (‘a, ed), the more significant is the window in spectral identification. Therefore, in some systems only a part of spectra x = x(t); t E

(‘a, ‘d) U(te,

t~)U

u(t5, th)

...

is a linear combination of all the points in the Fourier transform spectrum s = s(f) spread over the entire range oo, + oo). Therefore the loss of information about the original spectrum x = x t) due to discarding some of the values s(f~) is equivalent to losing some information about all of the values x(t1), x(t2),. ,x(t~)rather than all information about some part of the spectrum x = x(t). An even greater reduction of computer storage space requirements (ten times or more) is achieved by clipping (hard-limiting or one-bit quantization of) the Fourier domain representation of spectra, i.e. by encoding (—

(

. .

c(f1)

=

/

j=1,2,. ..,m;

(4)

c (‘a’ ‘b),

consisting of windows with the least probability of occurrence of a peak are encoded [41,42]. For a given library one can calculate the distribution of the probability p (x, I) that a spectrum has a peak located in t with the height x [43]. The lower is this probability, the more important the role played by such a peak in the identification of spectra. And so, sometimes only several peaks with the least values of p (x, I) are used [44,45]. Several studies have shown the usefulness of expressing spectrometric data = x(t), i.e. x = (x(t1), x(t2),.. in a Fourier transform manner as

x

instead of (3) [47]. Coding the spectra in a Fourier transform manner is also applied in some techniques by which the spectra are directly collected in the Fourier domain [48]: Fourier transform infrared spectroscopy (interferometry) [49], Fourier transform mass spectrometry (ion cyclotron resonance spectrometry) [50], Fourier transform nuclear magnetic resonance spectroscopy [51]; the primary advantage gained in Fourier spectroscopies in general is the speed of measurement which allows an on-line coupling of separation by chromatography and identification by spectroscopy [52].

.

4. Arrangement of spectra in a file $ =

(s(f1), s(f2),.

where ski /

~





. .

,S(fm)),

(3)

,e 2”~~d ~

and x( t) = x( t) so that s( is real. The number m of components of the Fourier transform spectra s may be set substantially lower than the dimension n of the original spectra x (e.g. m = 0.5n) without degrading the identification efficacy [46]. —

f)

The majority of spectral databases are unordered, i.e. the individual spectra are arranged historically ded to the according file. Such to anwhen organization they haveofbeen the file adallows only sequential access, the search time being proportional to the number of spectra in the library. One way to make the library searching procedure more efficient is to place the spectra into some form of unique order, e.g. lexicographical

88

J. Fiala

/

Chemical compound identification

order [53,54], hashed structure [53,55,56] or hierarchical tree [57—60].For instance, in a hash-stored file the location at which a spectrum is to be stored or retrieved is determined by its numeric representation using a hashing algorithm, whence the length of time to perform a search is very short, independent of the size of the library searched. There are two major problems with ordered files. First, new spectra cannot be added to an ordered file as conveniently as with unordered files. Second, the high efficiency of retrieval with ordered files applies only to identifying spectra of pure compounds; with mixtures a search over the whole file is necessary, so that the advantage of the order in the file disappears. Appreciable improvement of the identification efficiency can be achieved by dividing the whole data base into several subfiles based on chemistry (spectra of minerals, metals and alloys, etc.) [13,15] or separate characteristics which may apply to a particular substance [61], selecting one of them according to the condition of the input data, and searching only the subfile in the practical search. An alternative approach is to create the subfiles according to the distribution of peaks in spectra: for instance, one subfile consists of spectra ~ = ~ ~ showing no peak in the range (1,,,, en)’ another subfile contains spectra showing at least one peak (

in the range (‘a, tn), etc. [22]. One of the most successful of the library search techniques is the method employing an inverted file structure. In contrast to the direct file which is arranged according to compounds (spectrum of the first compound, spectrum of the second compound,...), the data in the inverted file are arranged according to the location of the peaks [6,9,58,62,63]. Sometimes only several of the most intense peaks of each spectrum are retained in the inverted file [44]. When using direct files, the similarity between a pair of spectra is measured by comparing the corresponding peak lists so as to identify the number of peaks in common. However, this implies that the list of peaks of the unknown spectrum must be scanned during the comparison with each and every spectrum of the file; this repeated scanning is obviated by the use of an inverted file which is its main advantage,

5. Spectral library search methods To identify the composition of an unknown substance, we try to express its spectrum in terms of spectra of known compounds. Mathematically stated, we want to find the values c~that minimize

I

2

m



I

~ c~y) subject to c1 ~ 0, =

j=1,2

I

m

(5)

where x is the spectrum of the analyzed substance and Yi’ .P2 i~m are reference spectra stored in some bank [3,5]. The values of c1, proportional to the probabilities of the occurrence of corresponding compounds in the analyzed mixture, can be determined by quadratic programming methods [64], in the simplest case by solving the system of linear equations (x, y,)

~ c,( y~,y,)

=

i = 1, 2

m.

(6)

=

This cannot be done directly for a large file (e.g. m = 139800 in ref. [22]) and therefore the identification process is divided into two stages: First, a limited number (e.g. 100) spectra which most closely resemble the unknown spectrum x are retrieved from the file and then (5) is solved only for these best candidates. The initial stage of identification (filtering) is performed using a criterion function (match coefficient, similarity index, score) R(x, y) which measures the extent of coincidence between an analysed spectrum x and a reference spectrum y. A large number of scoring methods have been developed [9,13,15,21—23,26,27,30,31,35,39,45,47,65—69], e.g. R(x,

~)

=

(x, y)/I(x0)(

.v’ y)

(7)

where 10 is the unknown spectrum from which all the peaks that have no counterpart in the reference spectrum y have been discarded. There are no problems with searching spectra of pure cornpounds. But in case of a mixture the similarity between the spectrum of its particular component and the spectrum of the unknown is masked by contributions from other components. Therefore, not more than 5—10 major components can be

J. Fiala

/

Chemical compound identification

identified or quantified in a mixture by spectral library search techniques [68]. The relatively elaborate computation of the similarity index (such as (7)) for all spectra in a large data base is time-consuming. In order to improve the search time, simple pre-search filters are sometimes used [32]. Such filters attempt to discard from the complete similarity calculation those spectra which do not have a potential for obtaining a high similarity index: For instance, one thousand spectra with the maximum number of peak matches (or maximum percent of peak matches) are retrieved from the file; this can be performed very quickly even with large data bases, especially when an inverted file is used. These 1000 spectra are then taken as a minifile from which a microfile of, say, one hundred spectra with the highest similarity index is selected. Finally, for the compounds corresponding to the spectra of the microfile the probabilities of their occurrence in the analyzed sample are calculated from (5). Another pre-screening method is based on the fact that the low frequency part of the Fourier domain representation of a spectrum contains sufficient information on its rough outline, important for identification. Thus, only the first several bits of the clipped Fourier transform (4) are retained for each spectrum so that only a small fraction of memory is required and the pre-search performed with spectra abbreviated in this manner is very fast [47]. The second, simultaneous stage of identification (5) is sometimes replaced by a less correct [5,9] procedure in which the spectra {y~,Y2’ y3,.. . } retrieved from a file in the first, sequential stage are subtracted one by one from the unknown spectrum x in order of decreasing values R ( x, Yi), R(x, Y2)’ R(x, y3),... until noting nothing remains (“spectrum-stripping” technique) [70,71], or, conversely, the spectra { Yi’ Y2’ y3,. .. } are added together to form which is built up until all the peaks a of composite the unknown spectrum x are matched (additive identification technique) [15]. The time taken for retrieval can be reduced by treating several unknown spectra simultaneously [26].

89

6. Information theory applied to chemical compound identification by retrieval with spectral files The information content of an identification has been defined as the decrease of uncertainty about the identity of the unknown substance as follows , = ~ (U0/U) (bit), where U0 and U are the uncertainties before and after the identification, respectively [41,72]. The uncertainty about the identity of an analysed mixture is given by the number of possible alternatives for its composition; for k eventual components on p concentration levels there are p alternatives with respect to the first component, p alternatives with respect to the second component,. .. ‘p alternatives with respect to the k th component, together ~k alternatives. For an unequivocal identification of a mixture of any of the known five million cornpounds on p = 10 concentration levels there would be necessary an information capacity of ‘m =

5000000 log2 p

=

16500000 bits.

This is much more than the information ‘a that can be obtained from the comparison of the spectrum of the analyzed sample with the spectra from a database: if the spectra are encoded as n-dimensional vectors, the maximum number of mixtures resolvable by use of this database is n, so that for p = 10 ‘a =

n log2 p

=

3.3n.

The value of n is limited by the resolution of the corresponding experimental technique: e.g. with diffraction spectra, this is the angular distance 0.1° between two adjacent reasonably resolvable peaks of a spectrum recorded in the range (00, 1800), so that n = 1800 and I be 6000 bits. 1acan increased One way in which the value of ~ is to combine several spectroscopic methods such as in the complex retrieval system described in refs. [73,74] where three kinds of spectra are used for identification, viz. infrared spectra, mass spectra and nuclear magnetic resonance spectra. The identification of pure compounds is much easier than the resolution of mixtures. With pure

J. Fiala

90

/

Chemical compound identification

compounds, the uncertainty U0 about the identity before analysis is given by the number k = 5000 000 of all known substances, so that the information capacity .1~,necessary for an unequivocal identification of any of the five million presently reported pure compounds is .[~, = log2 5000000 22 bits ~ ‘a’ To take advantage of this fact, the identification of mixtures is coupled with some separation technique (gas chromatography, liquid chromatography, mass separation) by which the individual components are isolated [52,75—77]. In case of incomplete separation (enrichment of particular fractions by corresponding components) it is possible to reconstruct the spectra of pure components by correlation analysis of spectra of partially fractionated samples by a procedure described in refs. [31,32]. Another possibility provides local analysis of fine particles that are made of single compound: selected-area microdiffraction in transmission electron microscopy [78], Kossel diffraction in electron probe microanalysis [79], backscattered electron diffraction patterns [80] and channeling electron patterns [81] in scanning electron microscopy, laser microprobe Raman spectroscopy [82], ion microprobe mass spectrometry [83], laser microprobe mass spectrometry [84], etc.

7.

known compound is claimed to be identical with the structure that yields the best match [1,25,34,86,87]. Crystalline compounds can be characterized by cell parameters as determined from diffraction spectra [88—90] and then using a data base of crystal structures (e.g., National Bureau of Standards Crystal Identification File 60000 structures or ref. [91] 47000 structures) the cornpounds whose cell parameters closely resemble those of the unknown compound are retrieved. —



8. Conclusions The great importance of the computerized spectral data bases is due to the key role they play in techniques for chemical compound identification. Efficient methods of spectra encoding and library search for identification of pure compounds are available to date. The situation with mixtures is much worse: no more than dozen or so components can be resolved. For the analysis of complex mixtures spectral identification must be coupled with various separation techniques. The spectral data bases contain only a small percentage of the approximately five million currently registered compounds.

Structure elucidation techniques

Sometimes it may happen that spectra of some components of the analysed substance are not present in the reference file so that the conventional library search methods of identification fail. In such a case it is necessary that the unknown compound be isolated and a structure assigned to its spectrum. So, on the basis of infrared, Raman or mass spectra it is possible to identify various structural fragments (functional groups) present in the molecule of the unknown compound [85,57]. To this end special data bases which link up spectral data with structural elements can be used [34]. Then, according to the chemical connectivity rules, all possible structures are built up from these fragments. For each hypothetical structure a theoretical spectrum is calculated and compared with the analysed spectrum and the structure of the un-

References [1] D.M. Martinsen, Appi. Spectr. 35 (1981) 255.

121

P.C. Jurs and T.L. Isenhour, Chemical Applications of Pattern Recognition (John Wiley, New York, 1975). [31J. Fiala, J. Phys. D 5 (1972) 1874. [4] J. Fiala, Anal. Chem. 52 (1980) 1300. [5] J. Fiala, J. AppI. Cryst. 9 (1976) 429. [6] J. Fiala, t~s.~as. Fyz. A 24 (1974) 237. [7] Powder Diffraction File 1982—1983 (JCPDS-International Center for Diffraction Data, Swarthmore, 1982). [8] L.K. Frevel, Anal. Chem. 37 (1965) 471. [9] 0G. Johnson and V. Vand, md. Eng. Chem. 59 (1967) 19. [10] T.L. Isenhour, Anal. Chem. 45 (1973) 2153. [11] B.H. O’Connor and F. Bagliani, J. App!. Cryst. 9 (1976) 419. [12] E.M. Burova, N.P. Zidkov, A.G. Zilberman, V.V. Zubenko, L.S. Nabutovskij, M.M. Umanskij and B.M. Scednn, Kristallografija 22 (1977) 1182. [13] A.F. G!ezner and D.B. McIntyre, A. Mineralogist 64 (1979) 902.

.1. Fiala

/

Chemical compound identification

[14] R.G. Marquart, I. Katsnelson, G.W.A. Milne, S.R. Heller, G.G. Johnson and R. Jenkins, J. AppI. Cryst. 12 (1979) 629. [15] W.N. Schreiner, C. Surdukowski and R. Jenkins, J. AppI. Cryst. 15 (1982) 513. [16] Ch.L. Fisk, G.W.A. Mime and S. Heller, J. Chrom. Sci. 17 (1979) 441. [17] D.H. Anderson and G.L. Covert, Anal. Chem. 39 (1967) 1288. [18] D.S. Erley, Anal. Chem. 40 (1968) 894. [19] R.W. Sebesta and G.G. Johnson, Anal. Chem. 44 (1972) 260. [20] R.C. Fox, Anal. Chem. 48 (1976) 717. [21] S.R. Lowry and D.A. Huppler, Anal. Chem. 53 (1981) 889. [22] K. Tanabe, T. Tamura, J. Hiraishi and S. Saeki, Bunseki Kagaku 31 (1982) E27. [23] K. Tanabe, T. Tamura, J. Hiraishi, S. Saeki, I. Suzuki and M. Tasumi, Bunseki Kagaku 31 (1982) E177. [24] l.A. Degen, L. Birmingham and G.A. Newman, Analyst 101 (1976) 212. [25] J.R. Chapman, J. Phys. E 13 (1980) 365. [26] L.R. Crawford and J.D. Morrison, Anal. Chem. 40 (1968) 1464. [27] B.A. Knock, I.C. Smith, D.E. Weight and R.G. Ridley, Anal. Chem. 42 (1970) 1516. [28] H.S. Hertz, R.A. Hites and K. Biemann, Anal. Chem. 43 (1971) 681. [29] L.E. Wangen, W.S. Woodward and T.L. Isenhour, Anal. Chem. 43 (1971) 1605. [30] S.L. Grotch, Anal. Chem. 45 (1973) 2. [31] H. Damen, D. Henneberg and B. Weimann, Anal. Chim. Acta 103 (1978) 289. [32] J. Shindo, A. Yasuhara, H. Ito and T. Mizoguchi, Chem. Lett. (1982) 521. [33] Y. Katagiri, K. Kanohta, K. Nagasawa, T. Okusa, T. Sakai, 0. Tsumura and Y. Yotsui, Anal. Chim. Acta 133 (1981) 535. [34] J. Zupan, Anal. Chim. Acta 103 (1978) 273. [35] M. Zippel, J. Morvitz, I. Kohler and H.J. Opferkuch, Anal. Chim. Acta 140 (1982) 123. [36] A.G. SchOning, Anal. Chim. Ada 71 (1974) 17. [37] R.W. Liddell and P.C. Jurs, Anal. Chem. 46 (1974) 2126. [38] M. Razinger, M. Penca and J. Zupan, Anal. Chem. 53 (1981) 1107. [39] G.T. Rasmussen and T.L. Isenhour, Appi. Spectr. 33 (1979) 371. [40] J.T. Clerc, R. Knutti, H. Koenitzer and J. Zupan, Z. Anal. Chem. 283 (1977) 177. [41] P.F. Dupuis and A. Dijkstra, Z. Anal. Chem. 290 (1978) 357. [42] 0. van Marlen and A. Dijkstra, Anal. Chem. 48 (1976) 595. [43] G.M. Pesyna, F.W. McLafferty, R. Venkataraghavan and H.E. Dayringer, Anal. Chem. 47 (1975) 1161. [44] I.K. Mun, D.R. Bartholomew, D.B. Stauffer and F.W. McLafferty, Anal. Chem. 53 (1981) 1938. [45] G.M. Pesyna, R. Venkataraghavan, H.E. Dayringer and F.W. McLafferty, Anal. Chem. 48 (1976) 1362.

91

[46] P.C. Jurs, Anal. Chem. 43 (1971) 1812. [47] R.B. Lam, S.J. Foulk and T.L. Isenhour, Anal. Chem. 53 (1981) 1679. [48] A.G. Marshal! and M.B. Comisarow, Anal. Chem. 47 (1975) 491A. [49] R.J. Bell, Introductory Fourier Transform Spectroscopy (Academic Press, New York 1972). [50] Ch.L. Wilkins and M.L. Gross, Anal. Chem. 53 (1981) 1661A. [51] S.A. Borman, Anal. Chem. 54 (1982) 1129A. [52] S.A. Borman, Anal. Chem. 54 (1982) 901A. [53] R.G. Dromey, Anal. Chem. 51(1979) 229. [54] L.K. Frevel, Anal. Chem. 54 (1982) 691. [55] R.G. Dromey, Anal. Chem. 49 (1977) 1982. [56] P.C. Jurs, Anal. Chem. 43 (1971) 364. [57] M. Penca, J. Zupan and D. Hadzi, Anal. Chim. Acta 95 (1977) 3. [58] J. Zupan, Anal. Chim. Acta 139 (1982) 143. [59] L. Domokos, E. Pretsch, H. Mandli, H. Konitzer and J.T. Clerc, Z. Anal. Chem. 304 (1980) 241. [60] M.F. Delaney, Anal. Chem. 53 (1981) 2354. [61] R.G. Dromey, Anal. Chem. 48 (1976) 1464. [62] F.E. Lytle, Anal. Chem. 42 (1970) 355. [63] P. Willett, Anal. Chim. Acta 138 (1982) 339. [64] J. Fiala, Kristall u. Technik 12 (1977) 505. [65] K. Tanabe and S. Saeki, Anal. Chem. 47 (1975) 118. [66] H.T. Clifford and W. Stephenson, An Introduction to Numerical Classification (Academic Press, New York, 1975). [67] S. Saeki, K. Tanabe, T. Tamura, M. Tasumi and I. Suzuki, App!. Spectr. 36 (1982) 148. [68] S.C. Gates, M.J. Smisko, C.L. Ashendel, N.D. Young, J.F. Holland and C.C. Sweeley, Anal. Chem. 50 (1978) 433. [69] W.N. Schreiner and C. Surdukowski, J. Appi. Cryst. 15 (1982) 524. [70] B.L. Atwater (Fell), R. Venkataraghavan and F.W. McLafferty, Anal. Chem. 51 (1979) 1945. [71] M. Bachiri and G. Mouvier, Org. Mass Spectrom. 11 (1976) 634. [72] L. Brillouin, Science and Information Theory (Academic Press, New York, 1956). [73] A.P. Uthman, J.P. Koontz, J. Hinderliter-Smith, W.S. Woodward and Ch.N. Reilley, Anal. Chem. 54 (1982) 1772. [74] J. Zupan, M. Penca, D. Hadzi and J. Marsel, Anal. Chem. 49 (1977) 2141. [75] D. Rosenthal, Anal. Chem. 54 (1982) 63. [76] E. Bayer, K. Albert, M. Nieder, E. Grom, 0. Wolff and M. Rindeisbacher, Anal. Chem. 54 (1982) 1747. [77] G.L. Glish, V.M. Shaddock, K. Harmon and P.G. Cooks, Anal. Chem. 52 (1980) 165. [78] K.W. Andrews, D.J. Dyson and S.R. Keown, Interpretation of Electron Diffraction Patterns (Adam Huger, London, 1971). [79] R. Tixier and C. Waché, J. App!. Cryst. 3 (1970) 466. [80] C.J. Harland, P. Akhter and J.A. Venables, J. Phys. E 14 (1981) 175. [81] E.M. Schulson, J. Mater. Sci. 12 (1977) 1071.

92

J. Fiala / Chemical compound identification

[82] P. Dhamelincourt, F. Wallart, M. Leclercq, A.T.N’Guyen and D.0. Landon, Anal. Chem. 51 (1979) 414A. [83] H. Liebl, J. Phys. E 8 (1975) 797. [84] E. Denoyer, R. Van Grieken, F. Adams and D.F.S. Natusch, Anal. Chem. 54 (1982) 26A. [85] C.G.A. van Eijk and J.H. van der Maas, Z. Anal. Chem. 291 (1978) 308. [86] J.T. Clerc, Pure App!. Chem. 50 (1978) 103. [87] T. Visser and J.H. van Maas, Anal. Chim. Acta 122 (1980) 357.

[88] [89] [90] [91]

J.W. Vusser, J. AppI. Cryst. 2 (1969) 89. F. Kohibeck and E.M. HOrl, J. Appl. Cryst. 11(1978) 60. A.D. Mighell, J. App!. Cryst. 9 (1976) 491. J.D.H. Donnay, ed., Crystal Data Determinative Tables, vol. 1-4 (National Bureau of Standards of the US Department of Commerce &Jount Committee on Powder Diffraction Standards — International Centre for Diffraction Data, Washington, 1979).