Estuarine water classification using EEM spectroscopy and PARAFAC–SIMCA

Estuarine water classification using EEM spectroscopy and PARAFAC–SIMCA

Analytica Chimica Acta 581 (2007) 118–124 Estuarine water classification using EEM spectroscopy and PARAFAC–SIMCA Gregory J. Hall 1 , Jonathan E. Ken...

1011KB Sizes 11 Downloads 69 Views

Analytica Chimica Acta 581 (2007) 118–124

Estuarine water classification using EEM spectroscopy and PARAFAC–SIMCA Gregory J. Hall 1 , Jonathan E. Kenny ∗ Department of Chemistry, Tufts University, 62 Talbot Avenue, Medford, MA 02155, United States Received 12 June 2006; received in revised form 28 July 2006; accepted 9 August 2006 Available online 26 August 2006

Abstract The primary method for the prevention of the introduction of nonindigenous aquatic nuisance species in the U.S. is ballast water exchange (BWE). Our recent work focused on the use of the excitation emission matrix (EEM) spectroscopy of the colored dissolved organic matter (CDOM) to “fingerprint” water as a function of its port of origin, and therefore provide a forensic tool for the enforcement of BWE regulations. In that work, we utilized N-way partial least squares with discriminant analysis (NPLS-DA), which models the data with an emphasis on differences among classes (ports of origin). In this work, EEMs of samples from three different U.S. ports were analyzed by parallel factor analysis (PARAFAC) coupled with soft independent modeling of class analogy (SIMCA) to provide an effective classification method with a low false positive rate. This coupling, which is shown for the first time in this work, can be a useful alternative to NPLS-DA in that PARAFAC–SIMCA decomposes the EEM signal into chemical components and utilizes the scores for these components in the classification scheme. This gives the user the option of removing the contributions of interfering or unidentifiable fluorescent components prior to classification. © 2006 Elsevier B.V. All rights reserved. Keywords: Colored dissolved organic matter (CDOM); Parallel factor analysis (PARAFAC); Soft independent model of class analogy (SIMCA); Ballast water; Ballast water exchange (BWE); Estuarine; Fluorescence

1. Introduction Fluorescence of colored dissolved organic matter (CDOM) has been used in many ways to study estuarine water. These uses range from calculations of mixing ratios of tributaries, to identification of geographical source of unknown samples to estimations of estuary health [1–6]. Excitation emission matrix (EEM) spectroscopy, often called “total luminescence,” is a powerful, multidimensional technique for characterization of a water sample’s CDOM. Recently, advanced chemometric techniques such as parallel factor analysis (PARAFAC) and N-way partial least squares regression (N-PLS) have been shown to be effective tools for interpreting EEMs of CDOM with respect to both characterization and classification [1,7]. PARAFAC has been used primarily as a characterization tool, while regression algorithms ∗

Corresponding author. Tel.: +1 617 627 3397; fax: +1 617 627 3443. E-mail addresses: [email protected] (G.J. Hall), [email protected] (J.E. Kenny). 1 Present address: Department of Science, U.S. Coast Guard Academy (ds-1), 27 Mohegan Avenue, New London, CT 06320, United States. 0003-2670/$ – see front matter © 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.aca.2006.08.034

such as NPLS-DA have been used for classification. This paper will show the advantages of using a PARAFAC based method for classification of estuarine samples based on the total luminescence of CDOM. Classification of samples according to their geographical origin can be of practical use in many ways, for example, for possible fingerprinting of ships’ ballast water. To mitigate the introduction of non-indigenous aquatic nuisance species (ANS), the U.S. has implemented ballast water exchange (BWE) regulations requiring ships to exchange the water in their ballast tanks mid-ocean [8,9]. The U.S. Coast Guard is charged with the enforcement of those regulations. A forensic tool that could quantifiably confirm the location of origin of water found in the ballast tanks of a ship suspected of non-compliance with BWE requirements would be useful for enforcement activities. Our group as well as Murphy et al. have been developing chemometric based fluorescence techniques to “fingerprint” ballast water according to its CDOM fluorescence [1,7]. Their approach has been to try to reduce the necessary data set to a minimum number of fluorescence channels necessary for discrimination [7]. Ours has been to develop methods that treat the EEM data holistically,

G.J. Hall, J.E. Kenny / Analytica Chimica Acta 581 (2007) 118–124

utilizing relationships between fluorescent factors as an important consideration [1]. Our earlier work used N-way partial least squares with discriminant analysis (NPLS-DA) to describe the observed fluorescence using factors chosen expressly for their value in providing quantitative discrimination among classes (locations of origin). These factors do not necessarily contain direct information relevant to chemical identities of the fluorescent species present in the samples. In contrast, this paper demonstrates an alternative approach to classification which describes the observed fluorescence using factors with chemical significance. This PARAFAC-based treatment of the data is resilient in the face of interferents, and can still provide classification with a quantifiable degree of certainty. PARAFAC is a deconvolution algorithm that solves for a number of factors that are contributing to the data, in this case, the EEMs of each sample. Because of the multi-way nature of the data, and the particular constraints of the PARAFAC model, the solution is unique [10–12]. What this means in a practical application is that, ideally, the loadings of each factor in each mode represent a pure component contribution to the fluorescence of the mixture. (The fluorescent components recovered by PARAFAC may actually represent discrete species, covarying species, interacting pairs or sets of species, or instrumental artifacts. The number of components found is therefore only approximately equal to the actual number of fluorescing chemical species.) This technique is especially useful in environmental applications where pure component standards are not available or practicable (Fig. 1). EEMs are data that are described as “multi-way”, because there is more than one independent variable. In the case of an EEM, the fluorescence is collected as a function of both excitation and emission wavelengths. This creates a data set with three dimensions: excitation wavelength, emission wavelength, and intensity. An EEM is considered to be “two-way” since there are two independent variables, excitation and emission wavelengths, and one dependent variable, intensity [13]. A group of EEMs, each from a different sample, can be “stacked” into a single data set which would then be “three-way” data. Having a data set that is three-way or more is especially advantageous, because advanced chemometric techniques such as PARAFAC can be used. There are several good reviews and tutorials in the literature that outline the proper way to apply PARAFAC to EEM fluorescence [10,14]. PARAFAC of CDOM EEMs has been studied extensively by our group and others [1,7,15]. PARAFAC models generated in these studies find anywhere between three and nine factors present in the fluorescence of estuarine CDOM. Our group has also shown that the multi-way EEM data contains information that can be used to “fingerprint” it according

Fig. 1. A graphical representation of a three factor PARAFAC model. Each factor has a loading vector ar , br , and cr , in each dimension of the data matrix X. E contains the residuals.

119

to location of origin. This previous work has, however, been based on NPLS algorithms, which do not have loadings that are interpretable as chemical species or classes of species as PARAFAC factor loadings are. Instead, NPLS-DA “chooses” vectors, or “latent variables” in spectroscopic space that discriminate between classes, but may be comprised of intensity of fluorescence from a combination of sources. While the method is extremely useful in performing discriminations between classes that are very similar, even different locations within a single estuary, the models are difficult to interpret. It is especially difficult to answer the question of what chemical difference there is between classes. Since PARAFAC factors are interpretable as the contribution of a compound or class of compounds to the fluorescent signal, a classification method based on PARAFAC has advantages. For example, a PARAFAC model would be easier to explain to a layperson, legal expert, judge or jury. Since PARAFAC analysis will produce more than one factor, one must still generate a way to determine if a sample is a member of a particular class or not after the fitting of a PARAFAC model. We have decided to use another well known multivariate tool, soft independent model of class analogy (SIMCA) which is based on principal components analysis (PCA), to classify water samples geographically based on the PARAFAC results. Specifically, a PARAFAC model was generated using EEMs of all samples from all three sites studied; the optimal model (see below) consisted of five factors. Each sample could then be described by a point or vector in five-dimensional space, each component of which was the loading for one of the factors found by PARAFAC. For reasons given below, factor 5 was not utilized in the classification scheme, reducing the PARAFAC results to four-dimensional points or vectors. These four-vectors were analyzed by PCA. PCA is a method wherein multivariate data is described by orthogonal vectors, or principal components (PC), that are selected based on capturing the maximum variance in the data. The first PC, PC1, always describes the “direction” in the data that explains the most variance, while the second PC, PC2, describes the direction orthogonal to PC1 that describes the second most variance, etc. until the dimensionality of the data is reached. That is, depending on the distribution of sample points in n-space (n = 4 here), PCA will require k principal components (where k ≤ n) to describe them. The set of all sample points from a given geographical location should “cluster” near each other, providing the basis for a classification scheme. The sample points from each location were fit with a separate PCA model, which should be characteristic of that location. Then each of the samples was tested to see whether it fitted or matched with each of the PCA models. How a sample point compares to a particular PC model is described by two indices, the Q residuals and Hotelling’s T2 . The Q residual is the squared distance between a sample point in n-space and its projection in the k-dimensional subspace of the model, and can be seen as the variation of the sample outside of the model. T2 can be viewed as the squared distance from a sample’s projection into the k-dimensional subspace to the centroid of the subspace, or, more simply, the variation of the sample point within the model. See Fig. 2a. The 95% confidence

120

G.J. Hall, J.E. Kenny / Analytica Chimica Acta 581 (2007) 118–124

Fig. 2. (a) Two component PCA model with graphical representation of Q fit statistic and Hotelling’s T2 . Point i has high Q value with a T2 within the confidence interval. Sample j has a high T2 value with a Q within the confidence interval. (b) SIMCA model of three classes. Each class is defined by a separate PCA model. Each class may have a different number of PCs and calibration samples. Point k has a distance to each model x, y, and z.

intervals can be calculated for both indices, and the ratio of a sample’s Q and T2 to the 95% values are the reduced Q and T2 (Qr and Tr2 ) (Eq. (1.1)) [16]. The overall distance of a sample to the grouping as a whole is based on these reduced values and is measured by dij (Eq. (1.2) and Fig. 2b), where, in this work, i = sample number (from 1 to 18) and j = geographical location (Boston = 1, Baltimore = 2, Sturgeon Bay = 3). It is this distance that is used in this study to determine whether a sample is a member of a group or not. This method of utilizing PCA class modeling is essentially the soft independent modeling of class analogy method, or SIMCA. Qi T2 2 , Tr,i = i Q0.95 T0.95  2 dij = (Qr )2 + (Tr2 )

Qr,i =

(1.1) (1.2)

2. Experimental 2.1. Estuarine sampling The water samples were collected by volunteers in three locations throughout the country during 2003 and 2004. These locations were chosen through the consideration of spanning different climates and environments, as well as where volunteers were available. The three ports where appreciable numbers of samples were collected were Boston, Baltimore, and Sturgeon Bay. These locations are shown in Fig. 3. Six samples from each location were used in this study. The samples were collected by surface grab or Van-Dorn style sampling bottle from docks or the shoreline and transferred to 125 mL light-protected Nalgene screw-cap bottles, immediately frozen for long-term preservation. After receipt of the samples at Tufts, samples were vacuum filtered with 0.22 ␮m nylon membrane filters (Millipore). 2.2. Spectroscopy All fluorescence measurements were made using a dual beam Varian Cary Eclipse Fluorescence Spectrophotometer with a

Xenon flashlamp. Samples were held in standard 1 cm quartz cuvettes (Hellma). EEMs were gathered by collating emission spectra at a range of excitation wavelengths. Emission spectra were scanned from 220 to 600 nm in 1 nm steps. The excitation wavelengths were stepped by 10 nm from 220 to 600 nm to produce EEMs of dimension 381 × 39. Slit widths (5 nm resolution) and lamp voltage (600 V) were held constant in all spectra. Spectra were then corrected for emission spectrometer bias by the method described by Melhuish [17] employing a quartz diffuse scatterer and concentrated (8 g L−1 in ethylene glycol) Rhodamine B. Spectra were additionally corrected across the excitation axis for lamp source intensity by normalizing to the Rhodamine B excitation spectrum measured at 660 nm emission under identical instrumental conditions. Inner filter effect correction was accomplished as described by Patterson and coworkers [18] based on beam profile information provided by Varian. In practice, all corrections were applied by a MATLAB program written in-house [19]. 2.3. Data pre-treatment Rayleigh and Raman scattering signals in the EEMs cause problems for the PARAFAC algorithm because they are not bilinear [14]. There are several options for mitigation of these effects ranging from subtraction of a blank to weighting schemes to have the algorithm ignore those areas. These authors decide to set those areas affected by scattering to NaN, or ‘not a number’, indicating missing data in the data set. These regions were then filled by expectation values by the PARAFAC algorithm [14,20]. 2.4. PARAFAC PARAFAC modeling was accomplished through the use of PLS Toolbox 3.5® (Eigenvector Research Inc., Manson, WA [16]. Jackknifing results were attained by the use of Bro et al.’s jkparafac program [21]. Both programs are subroutines for the MATLAB 7.2® technical computing program (Mathworks Inc.

G.J. Hall, J.E. Kenny / Analytica Chimica Acta 581 (2007) 118–124

121

Fig. 3. Sampling locations. (a) Locations Boston, Baltimore, and Sturgeon Bay with respect to each other. (b) Sampling location for Boston Harbor. (c) Sampling location for Baltimore. (d) Sampling location for Sturgeon Bay.

Cambridge, MA). The three-way PARAFAC model was fit for the entire dataset as a whole. Single value decomposition (SVD) was used for the initialization of loadings for the initial model. A CORCONDIA [22] analysis was performed for a PARAFAC model of the data to determine the correct number of factors. The loadings from the initial model were used for the initialization of the model after removal of an outlier. 1 × 10−6 percent change in the residuals was used as the stop criterion for convergence, and a non-negativity constraint was imposed on all modes. In the final model the outlier was projected onto the PARAFAC model to obtain scores based on the factors found in the other 17 samples. The calculations were performed on a Dell Dimension® desktop computer with a 3 GHz Intel Pentium 4® processor and 4 Gb of RAM. 2.5. SIMCA The scores (sample mode loadings) for every sample for each of the first four factors found in PARAFAC were imported as three different 6 × 4 matrices (one for each geographical location). To avoid any one factor having more influence on the class definition, the scores from each factor were normalized as a fraction over the overall sum of scores from that factor. Each of these matrices was modeled with a PCA model of 2–3 PCs (as required to capture ≥99% of the variance) and the model distance described in Eqs. (1) was found for each sample to the model of each of the three locations. These distances were compared with the 95% confidence interval (a distance of 1). If the

sample distance was less than the 95% confidence interval it was considered a “member” of that location; if not then it was considered “not a member”. 3. Results and discussion 3.1. PARAFAC CORCONDIA analysis showed a large drop in core consistency between five core elements and six, from near 100% to near zero, indicating that a five-factor model was appropriate. Jackknifing results showed one outlier from the Baltimore location, and it was removed from the sample set used to form the PARAFAC model. Fig. 4 shows the spectral loadings in all modes for all factors in the resulting PARAFAC model. The spectral loadings of these factors show similarity to fluorescent factors found in previous studies [1,15,23]. Factors 1 and 2 are similar in appearance to humic acids noted by others to be of terrestrial or fresh water origin [7,23,24]. Factor 3 has the excitation and emission maxima that correspond with humic acids of marine origin [7,23,24]. Factor 4 appears to be contributions from amino acids dissolved in the water samples. It has been shown in other work that tyrosine and tryptophan are fluorescent amino acids that are introduced into estuarine water by microbial activity [24,25]. There is no available assignment for factor 5. Some of the sharp peaks that appear in this factor are similar to effects seen in the literature and attributed to unmitigated scatter in the data [14]. Others have attributed

122

G.J. Hall, J.E. Kenny / Analytica Chimica Acta 581 (2007) 118–124

Fig. 4. Spectral mode loadings for all five components found by PARAFAC analysis. Dotted lines are excitation mode loadings, solid lines are emission mode loadings. (a) Factor 1 a humic acid, (b) factor 2 a humic acid, (c) factor 3 a marine humic acid, (d) factor 4 contributions from amino acids, and (e) factor 5 unassigned.

unknown factors to possible contamination [7]. Since there was no convincing assignment for this factor, it was decided to not use the scores for this factor in the classification step. Fig. 5 shows the sample mode loadings (scores) for all factors in all samples. The sample scores shown in Fig. 5 illustrate why PARAFACbased classification scheme is possible. Fig. 5a shows the scores from each factor for each sample. Boston is an example of a port located fairly close to the Atlantic Ocean, and can be expected to have influences from both the fresh water and marine environments. Baltimore is located on the Chesapeake Bay, far removed from the Atlantic, but still brackish. Sturgeon Bay is a fresh water port. Inclusion of a Great Lakes port was important since

fresh water ports have historically suffered great impact from the introduction of a non-indigenous species. The zebra mussel problem in the Great Lakes is a prime example [8]. The differences between the water EEMs from the different locations is not immediately obvious. Differences in the data do become more apparent when the scores of the different samples are plotted against each other, as shown in Fig. 5b. In these plots filled shapes represent the data point that was eliminated after being identified as an outlier by the jackknifing method. For this sample, the EEM was projected onto the PARAFAC model to obtain scores for each of the identified factors. Obviously it is difficult or impossible to visualize the data grouping in five dimensions; additionally, the range of values for each factor is not the same.

G.J. Hall, J.E. Kenny / Analytica Chimica Acta 581 (2007) 118–124

123

Fig. 6. Distances of samples to the Boston model. A distance of 1 represents the 95% confidence interval. Inset: Boston samples (1–6) and those closest in distance to the Boston model (7 and 8 from Baltimore). No false positives were observed.

Fig. 5. (a) Sample mode scores for all samples in the PARAFAC model for all four factors used for classification. Factor 1 in blue, factor 2 in green, factor 3 in red, factor 4 in cyan. Filled shapes represent the projection of the outlier sample (sample 7 from Boston) onto the PARAFAC model. (b) Sample mode scores for all samples in the PARAFAC model for factors 2, 3 and 4. Boston in green stars, Baltimore in blue up triangles, and Sturgeon Bay in red left triangles. Shaded Baltimore sample is the outlier, later projected onto the PARAFAC model. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of the article.)

Therefore a method for describing a multidimensional region where we expect a sample point to lie is also needed. PCA is perfect for this task.

3 PCs and 99–100% variance captured. The distances for each sample are shown in Figs. 6–8. Since each class has its own distribution, the distance of each sample to each class must be calculated. The three figures represent those distances. In these three figures it is obvious, first that all samples correctly classify into their own classes (locations). This should not be surprising since these samples were used as the training set. The more significant result is that, with one exception (sample 6 from Boston) no sample from any other location in the set is classified incorrectly into another class. It should be noted that this classification scheme is based on uninformed data, i.e., the PARAFAC algorithm is uninformed with respect to which samples are from which classes. The PCA models are based on the samples from a known class, but are uninformed about the similarity or dissimilarity of the samples from the other locations. This means that samples that had the same PARAFAC scores, and therefore the same underlying

3.2. Classification SIMCA is a classification algorithm based on producing a PCA model for each class of samples and then comparing their distance, as described in Eq. (1.2), to the class confidence limits. If a point lies within that confidence limit it is called a member of the class. This method has the additional advantage that a sample can be a member of one or more classes as well as no class at all. The closest class can also be calculated for samples that are not a member of any class. The motivation for development of a classification scheme based upon PARAFAC was to create a method of classification that was not based on abstract regression vectors, as in NPLS-DA, but rather on chemically interpretable factors, and where factors deemed to be interferents could be removed from consideration. SIMCA was performed for the samples from each port based on their normalized PARAFAC scores for each factor. The PCA model for each class ranged between 2 and

Fig. 7. Distances of samples to the Baltimore model. A distance of 1 represents the 95% confidence interval. Inset: Baltimore samples (7–13) and the closest in distance to the Baltimore model (6 from Boston). Sample 6 was the only false positive.

124

G.J. Hall, J.E. Kenny / Analytica Chimica Acta 581 (2007) 118–124

for a large data set, but rather a limited number of samples gathered from the suspect ports compared with a sample or samples from a ballast tank. Therefore the number of samples represented in this study is a reasonable approximation of a forensic situation. Future work utilizing this method needs to be undertaken to measure the fluorescent changes that occur to water in an actual ballast tank over time, and to compare the differences between samples from the same port to samples from different ports. Acknowledgements

Fig. 8. Distances of samples to the Sturgeon Bay model. A distance of 1 represents the 95% confidence interval. Inset: Sturgeon Bay samples (13–18) and the closest in distance to the Sturgeon Bay (5 from Boston). No false positives were observed.

fluorescence, would still classify within those same locations. In addition, using SIMCA, a sample can be a member of two or more classes, or a member of none of the classes modeled. If there were no fundamental difference in the PARAFAC loadings between two classes, their PCA models would fall right on top of each other, and all samples in both classes would classify well into both locations. With all these considerations in mind, the near-zero error classification is impressive, and lends credibility to the postulate that there are real fluorescence differences between these samples that can be attributed to chemically interpretable factors. Furthermore the success of the classification of sample 10, which was not included in the model fit, but did classify into the correct location, is evidence that a focused training set can accommodate samples that are closely related, but with some anomalies, e.g., a sample from a ballast tank but from the same port. 4. Conclusions This work has shown that a classification method that outputs a single measure of fit based on chemically interpretable factors determined by PARAFAC is possible. Furthermore, it has shown the usefulness of PARAFAC-based classification in that factors that are not assignable, or are assigned as interferences, can be excluded from the analysis, and therefore not have an effect on the outcome. It has shown that there are significant differences in the fluorescent factors from different ports within the U.S., where previous work has focused on international differences. The success of a method that couples PARAFAC with SIMCA is not specific to this application, but applicable to any EEM or multiway data based classification, such as petroleum fingerprinting. This work is the first such coupling known to us. We acknowledge the fairly limited “training” set of data included in this study, however, an actual operational situation would not allow

Special thanks go to Andrew Hall, Christopher Lagan, Alison Dreisch, and Melissa Williams for their help gathering samples and to the scientists at Eigenvector Research Inc. for their help with chemometric algorithms. We would also like to thank our funding agencies: NOAA under SeaGrant #NA16RG225, the U.S. Coast Guard Academy Alexander Trust, and internal Tufts University funding. The research described herein does not necessarily reflect the position of the U.S. Coast Guard and no official endorsement should be inferred. References [1] G.J. Hall, K.E. Clow, J.E. Kenny, Environ. Sci. Technol. 39 (2005) 7560–7567. [2] A. Baker, Environ. Sci. Technol. 35 (2001) 948–953. [3] A. Baker, Environ. Sci. Technol. 36 (2002) 1377–1382. [4] A. Baker, Hydrol. Process 16 (2002) 3203–3213. [5] A. Baker, J. Lamont-Black, Groundwater 39 (2001) 745–750. [6] Y. Yan, H. Li, M.L. Myrick, Appl. Spectrosc. 54 (2000) 1539–1542. [7] K.R. Murphy, G.M. Ruiz, W.T.M. Dunsmuir, T.D. Waite, Environ. Sci. Technol. 40 (2006) 2357–2362. [8] A.M. Beeton, Environ. Conservat. 29 (2002) 21–38. [9] K. Murphy, G. Ruiz, Identification and Testing of Ballast Water, U.S. Coast Guard, 2000. [10] R. Bro, Chemometr. Intell. Lab. Syst. 38 (1997) 149–171. [11] N.D. Sidiropoulos, R. Bro, J. Chemometr. 14 (2000) 229–239. [12] J.M.F. Ten Berge, N.D. Sidiropoulos, Psychometrika 67 (2002) 399–409. [13] A. Smilde, R. Bro, P. Geladi, Multi-way Analysis, John Wiley & Sons Ltd., West Sussex, 2005, p. 381. [14] C.M. Andersen, R. Bro, J. Chemometr. 17 (2003) 200–215. [15] C.A. Stedmon, S. Markager, R. Bro, Mar. Chem. 82 (2003) 239–254. [16] B.M. Wise, N.B. Gallagher, R. Bro, J.M. Shaver, W. Windig, R.S. Koch, PLS Toolbox Version 3.5, Eigenvector Research Inc., Manson, WA, 2004. [17] W.H. Melhuish, J. Opt. Soc. Am. 52 (1962) 1256. [18] B.C. MacDonald, S.J. Lvin, H. Patterson, Anal. Chim. Acta 338 (1997) 155–162. [19] G.J. Hall, Chemometric Characterization and Classification of Estuarine Samples by Multidimensional Fluorescence, Ph.D. Dissertation, Tufts University, 2006. [20] R.D. JiJi, K.S. Booksh, Anal. Chem. 72 (2000) 718–725. [21] C.A. Andersson, R. Bro, Chemometr. Intell. Lab. Syst. 52 (2000) 1–4. [22] R. Bro, H.A.L. Kiers, J. Chemometr. 17 (2003) 274–286. [23] P.G. Coble, Mar. Chem. 51 (1996) 325–346. [24] P.G. Coble, C.E. Del Castillo, B. Avril, Deep-Sea Res. II 45 (1998) 2195–2223. [25] K. Sommerville, T. Preston, Rapid Commun. Mass Spectrom. 15 (2001) 1287–1290.