Vibrational Spectroscopy 60 (2012) 43–49
Contents lists available at SciVerse ScienceDirect
Vibrational Spectroscopy journal homepage: www.elsevier.com/locate/vibspec
Histological imaging of a human colon polyp sample using Raman spectroscopy and self organising maps Gavin Rhys Lloyd a , James Wood a,c , Catherine Kendall a,b , Tim Cook c , Neil Shepherd b,d , Nick Stone a,b,∗ a
Biophotonics Research Unit, Leadon House, Gloucestershire Royal Hospital, Great Western Road, Gloucester, GL1 3NN, UK Cranfield Health, Vincent Building (52a), Cranfield University, Cranfield, Bedfordshire, MK43 0AL, UK c Department of Colorectal Surgery, Gloucestershire Royal Hospital, Great Western Road, Gloucester, GL1 3NN, UK d Gloucestershire Cellular Pathology Laboratory, Cheltenham General Hospital, Sandford Road, Cheltenham, GL53 7AN, UK b
a r t i c l e
i n f o
Article history: Received 18 August 2011 Received in revised form 24 February 2012 Accepted 27 February 2012 Available online 6 March 2012 Keywords: Self organising maps Raman spectroscopy Histological imaging Multivariate image analysis Chemometrics
a b s t r a c t Raman spectroscopy has previously been identified as a suitable technique for the analysis of biological samples and recent technological advancements have allowed more rapid forms of spectral mapping to be undertaken. This promising approach potentially allows more biochemical information to be obtained than would normally be possible using a standard chemical stain, however specialised techniques are required to analyse the hyperspectral datasets acquired. In this work Self Organising Maps (SOM) are applied in combination with Principal Component Analysis (PCA) to analyse a single human colon polyp sample containing varying histological features. It is demonstrated that using SOM to compress the data is robust to outlying spectral features allowing greater contrast in the image for the identification of subtle features. As a secondary outcome, the SOM method also provides an alternative visualisation approach that can be used to identify regions of histological interest within a sample. © 2012 Published by Elsevier B.V.
1. Introduction Colorectal cancer is the third most common cancer in men and second most common in women in the UK, with approximately 40,000 new cases in 2008 [1]. It is the third most common cause of cancer related death. Most colorectal cancers develop from pre-existing adenomatous polyps through the adenomaadenocarcinoma sequence [2]. Polyps are abnormal growths rising from the lining of the large intestine (colon) and protruding into the intestinal canal (lumen). Early detection of dysplastic colorectal lesions and treatment improves survival [3] and leads to justification for screening programmes [4]. Several emerging optical techniques potentially have valuable roles in the detection, diagnosis and staging of colorectal neoplasia [5]. The ability of Raman spectroscopy to identify and classify malignant changes has been demonstrated [6]. Raman spectroscopy utilises the inelastic scattering of light photons to interrogate tissues and provide a characteristic biomolecular fingerprint of the sample [7]. Polyps are classified into two types: adenomatous polyps (adenomas) and hyperplastic polyps. Adenomas are the precursor lesions for colon cancer. The more common hyperplastic polyps
∗ Corresponding author at: Biophotonics Research Unit, Leadon House, Gloucestershire Royal Hospital, Great Western Road, Gloucester, GL1 3NN, UK. Tel.: +44 8454 225486. E-mail address:
[email protected] (N. Stone). 0924-2031/$ – see front matter © 2012 Published by Elsevier B.V. doi:10.1016/j.vibspec.2012.02.015
are benign and, in most circumstances, are not considered to be pre-malignant. A definitive distinction between the two types requires polyp removal and microscopic examination by a surgical pathologist [8]. Raman spectroscopy could be a powerful tool for the endoscopist and pathologist aiding the differentiation of benign and malignant polyps, which relies on the visualisation of histological features such as cellular components and tissue architecture of the sample. However, collecting a small number of Raman spectra from a biological sample can result in spectra that do not directly correspond to all histological features present. Because of this there has been increasing interest in the development of Raman mapping techniques so that multiple spectra of each histological feature in the sample can be obtained as well as spectra representing the interface between features [9,10]. Mapping using other types of spectroscopy, such as infra-red (IR), have also been identified as potentially useful tools for diagnosis [11] and the methods described in this work are also applicable to maps containing spectra from a different source. It is common when analysing a standard image (e.g. a digital photograph) to store the data in a compressed form. Since hyperspectral images potentially contain significantly more data than a standard image with an identical number of pixels, it is a logical step to attempt some form of compression in order to store the images. This is particularly important in remote sensing applications, where a large number of images might be collected and storage space may be limited. Several techniques for compressing hyperspectral images have previously been described [12–14], but
44
G.R. Lloyd et al. / Vibrational Spectroscopy 60 (2012) 43–49
are almost exclusively applied to remote sensing applications. In other areas, e.g. chemical analysis [15–17] and biomedical applications [9,11,18–21], which is the focus of the work presented here, the data is either used uncompressed, or compressed using a mixture of two approaches: spatial compression techniques, such as co-addition or binning [22,23], and standard chemometrics approaches such as Principal Component Analysis (PCA) [9,24] which compress the spectral dimension. Spatial techniques can be used to enhance the signal-to-noise ratio of the spectral data, which can improve the results of further spectral analysis. However, spatial techniques often reduce the overall resolution of an image, making interpretation of image features such as the boundaries between pathologies more difficult. PCA, which can be used to compress the spectral dimension, overcomes this by reducing the spectral dimension to a smaller number of latent factors, or principal components, which can then be used to create pseudo-colour images containing only significant spectral information. The difficulty arises here when attempting to interpret scores plots that contain many thousands of data points. Specialist techniques such as brushing [25,26] can be employed, but even this can become cumbersome for very large images with complex structure. Other spectral techniques focus on the reduction of spectral information by clustering similar spectra into a single group, e.g. by fuzzy c-means clustering [27,28] or hierarchical cluster analysis [24]. However, such techniques are only really useful if the number of clusters is known in advance, which is rarely the case in, for example, pathology. To try and overcome some of the issues associated with the analysis of hyperspectral images for biomedical applications, the self organising map (SOM) [29] algorithm was applied as a method for compressing hyperspectral images of colon polyp samples without losing important spectral information, forcing a predefined grouping of the spectra or reducing the overall resolution of the image. SOM are a well-established neural network approach that has been employed successfully in many fields [30–32]. SOM provides several alternative methods for visualising high-dimensional data, such as Raman spectra, that are advantageous in comparison to other techniques. SOM has previously been applied to hyperspectral images in the remote-sensing field, where the algorithm was used as a clustering and classification tool for identifying similar spectral regions within the images [33]. In this article we present the SOM in combination with PCA for the analysis of hyperspectral Raman images of a human colon polyp sample. We use the SOM algorithm to reduce the total number of spectra and simultaneously enhance common spectral features, followed by PCA to extract useful spectral information pertaining to the pathology and biochemistry of the sample. In addition, the unique visualisation properties of SOM in combination with PCA were used to enhance the interpretation of the reconstructed pseudo-colour representations of the sample being investigated.
2. Experimental 2.1. Sample collection Tissue samples were collected during routine colonoscopic surveillance, following informed written consent from the patient. Approval for this study was provided by Gloucestershire local research ethics committee. Following excision, samples were orientated on acetate paper and snap frozen in liquid nitrogen. Specimens were then sectioned with a freezing-microtome, without the addition of any contaminating fixing agents. 20 micron thick sections of colon tissue were placed on UV-grade calcium fluoride (CaF2 ) slides (Crystran, UK) for Raman mapping. Contiguous tissue sections were stained with Haematoxylin & Eosin (H&E) for analysis
Fig. 1. H&E stained of a human colon polyp section.
by a consultant histopathologist to identify the regions and degree of disease within the samples. The H&E image of a contiguous section of the colon poly panalysed by Raman spectroscopy for this work is presented in Fig. 1. 2.2. Rapid Raman imaging Raman spectral imaging has improved significantly over recent years, with the introduction of synchronous readout techniques, such as the Renishaw StreamLineTM approach [9]. These techniques have reduced bottlenecks and time delays to a minimum; however, the number of photons generated is a major limiting factor. For example, speeding up the time taken to image an area by reducing the spectral acquisition time per pixel also reduces the signal-to-noise and therefore the quality of the spectral information available. Alternatively, increasing the pixel size may also improve spectral quality but would reduce the spatial information in the image. For this work the Raman spectra from a human hyperplastic colon polyp sample were collected using a Renishaw Raman System 1000® spectrometer optimised to provide rapid acquisition of high quality tissue spectra, with a single 300 lines per mm grating and 830 nm excitation. The spectrometer was connected to a microscope fitted with a Leica (NA 0.5) × 50 long working distance objective lens and motorised stage to allow micro positioning of small tissue samples. A step size of 2.1 m and a whole laser line acquisition time of 60 s (equivalent to a 2.4 s per point spectrum acquisition time) were used to collect a 226 × 267 pixel Raman image of a selected region of interest. A small number of pixels (233) contained saturated spectra due to fluorescence and so were excluded, leaving 60,109 spectra to be analysed in total. The wavenumber range measured was from 560 to 1890 cm−1 in 393 steps. The spectrometer was initially calibrated using a neon–argon (Ne–Ar) lamp emission spectrum and daily checks were undertaken using silicon to calibrate the wavenumber axis. The power of the laser at the sample was approximately 100 mW with a laser spot size of 3 m ×50 m. The spectral resolution of the system used here is approximately 10 cm−1 based on the full-width halfmaximum of peaks in the emission spectrum of a Ne–Ar lamp. 2.3. Spectral preprocessing Raman spectroscopy is susceptible to interference from cosmic rays, which, in the vast majority of cases, manifest as sharp spikes in the data and are uncharacteristic of the sample being measured. Due to the large number of Raman spectra being obtained during the mapping of a sample, and the length of time required to acquire them, thousands of cosmic ray spikes may be present in the
G.R. Lloyd et al. / Vibrational Spectroscopy 60 (2012) 43–49
spectra of a single Raman image. It is therefore crucial to either remove them, or replace them in order to prevent them unduly influencing the data analysis. In this study cosmic rays were filtered from the images by applying a 3 × 3 window two-dimensional median filter to each wavenumber. For the 3 × 3 window used, the centre pixel replaced with the median value of all pixels in the window for the chosen wavenumber. Since not all windows contain a cosmic ray this results in a decrease in resolution, however, due to the large number of pixels being collected for the images in this study, the loss of resolution was not deemed to be significant. For images with a smaller total number of pixels an alternative approach may be required. All spectral processing and subsequent image analysis was carried out in Matlab 2007a (The Matworks Inc., Natick, Massachusetts, USA) using in-house routines. 3. Multivariate image analysis (MIA)
45
additive scattering as well as the sensitivity of the instrument to different wavelengths. The EMSC model used here is described by Eq. (2), where a signal vector x is modelled as a function of a reference spectrum r, wavenumber v, and a baseline spectrum g: x = [rvg]c
(2)
where c = [cr cv cg ] contains coefficients estimated for each basis spectra by least squares. These coefficients and the basis spectra are used to correct the signal vector according to Eq. (3) which filters out the undesired effects:
xcorr = x − vcv − gcg /cr
(3)
In this work the mean of all SOM weight spectra was used as a reference spectrum. The baseline spectrum used was that of greenglass [35,36], which for the wavelength of light used (830 nm) results in a known fluorescence spectrum. This broadband spectrum is representative of the baseline signal from the instrument.
3.1. Self organising maps (SOM) After the application of median filtering to each wavenumber, the collected Raman spectra were compressed using the SOM algorithm, which is a popular neural network based approach to unsupervised pattern recognition. The SOM algorithm aims to represent the samples on a low dimensional grid, or map, whilst preserving the relative distance between samples as far as is possible. To achieve this, the map is trained by iteratively updating a set of weights assigned to each grid unit. Throughout the training process sample vectors are selected and the grid unit with the most similar weight vector is determined. The winning grid unit and its neighbours within a specific ‘neighbourhood width’ are then updated according to Eq. (1), where wk is the weight vector for unit k, xt is the current sample, ˛t is the learning rate and nt is the neighbourhood weight for iteration t. w k = w k + at nt (xt − w k )
(1)
The learning rate, neighbourhood width and neighbourhood weight are gradually decreased with t ensuring that the learning for each sample is localised to a region of the grid based its similarity to other samples. At the end of training, samples are mapped according to their similarity to the grid unit weight vectors which can then be represented graphically. This approach can also be used to project new data onto the map of a training dataset for predictive purposes. An important feature of the SOM for this work is that when the size of the grid is smaller than the number of spectra, each grid point, or map unit, becomes representative of numerous spectra. By using the reduced number of representative spectra, rather than the original spectra, the data is compressed into small, localised clusters based on spectral similarity. In addition, the relationships between spectra in different clusters are also retained. The SOM spectrum representative of each cluster can be considered an approximate average of the spectra in that cluster with reduced levels of noise and the enhancement of common features. The analysis of these SOM spectra can therefore improve any subsequent multivariate analysis because non-systematic variation in the spectra has been reduced or filtered out and the influence of outlying spectra on the analysis reduced. Hence in this work all further analysis was applied to the reduced number of SOM spectra instead of the raw spectra. 3.2. Extended multiplicative scatter correction (EMSC) EMSC [34] is a powerful spectral preprocessing step that is used to remove complex multiplicative and additive light scattering effects. In this case undesired effects include multiplicative and
3.3. Principal component analysis (PCA) PCA is a commonly employed technique in both chemometrics [37] and multivariate image analysis (MIA) [38] for reducing a highdimensional dataset to a smaller number of uncorrelated Principal Components (PCs) that still describe the original data as completely as possible. This is achieved by assuming that some variables are correlated, and can therefore be combined into a single PC representing both variables without significant loss of information. A further assumption of PCA is that variables with relatively small amounts of variation (i.e. noise) are uninformative and can therefore be excluded without any detrimental effects on the overall analysis. The main equation for PCA is shown in Eq. (4), where the data matrix X is decomposed into a scores matrix T and a loadings matrix P. The matrix E contains residuals after A components have been selected as informative. X = TP + E
(4)
The scores matrix of PCA can be used to explore the relationships between samples, while the loadings can provide insight as the variation being explained by the currently selected components. Like most data analysis techniques, PCA is sensitive to outliers and other data artefacts and so in this work EMSC was applied to the SOM spectra to reduce the amount of spectral variation related to the instrument. This was followed by mean-centring which is commonly applied prior to the application of PCA. Finally, PCA was used to extract relevant spectral variations and the Raman image reconstructed by combining the PC scores with the SOM map unit locations of the spectra for each pixel in the original Raman image. 4. Results and discussion A 20 × 30 SOM was trained using the measured Raman image by introducing the spectra to the SOM algorithm in a cyclical fashion; each pixel was used to update the map five times, resulting in over 300,000 training iterations for the image analysed here. This may seem to be a small number of updates per pixel, however for a histological sample there is usually a much smaller number of histolologies present in comparison to the total number of spectra; hence the number of training iterations per pathology is much larger than the number per pixel. A high number of map units allows for a better representation of the original data but significantly increases computation time; a smaller number of units allows a higher level of compression but is only able to retain small number meaningful components from the original data. A map size of 600 units was therefore selected as this allowed for a high level of compression whilst keeping the total computation time low
46
G.R. Lloyd et al. / Vibrational Spectroscopy 60 (2012) 43–49
(approximately 1h for the sample presented, using a desktop computer with an Intel® Xeon® E5420 (2.5 GHz) processor and 4GB of RAM). After training the SOM map for an image, the Raman spectrum for each pixel was replaced by the weights of the most similar map unit, thus compressing the images to a computationally more manageable size and reducing the amount of noise in the spectra. PCA was applied to the EMSC corrected SOM spectra. Component 1 (included in Supplementary Information S1) corresponds to the difference between the sample and the Calcium Fluoride (CaF2 ) slide that the sample is mounted on; this is not unexpected as variation due to sample pathology is likely to be much more subtle than the difference between the sample and the slide. Up to nine of the remaining principal components were thought to contain interesting biochemical information relating to the pathology of the sample; for brevity, only components two, four and seven are presented graphically here as they were thought to be the most histologically interesting. The full complement of principal component images is included in Supplementary Information S1. It is useful to compare the reconstructed images using SOM compression with those generated without compression. The corresponding principal component images for the uncompressed data are presented in Supplementary Information S2. It can be seen that some of the uncompressed images reveal similar patterns to those in the compressed images. For example, PC4 (compressed) and PC9 (compressed) both highlight similar features, as do PC2 (compressed) and PC7 (uncompressed). However there are also some components from the uncompressed images that represent similar information e.g. PC1 and PC2 (both uncompressed) appear to distinguish between the sample and the CaF2 slide. Furthermore, some components in the uncompressed images are dominated by noise e.g. in the reconstructed image for PC5 (uncompressed) it is difficult to identify any histological features in the sample due to high levels of noise in the image. Conversely, the reconstructed SOM images all appear to represent different features and are not dominated by noise for the first 9 components. This is likely due to the ‘local averaging’ effect of the SOM algorithm, which reduces noise in the dataset prior to PCA and limits the impact of outliers e.g. cosmic rays. Fig. 2 shows false-colour SOMs for the selected components (SOMs for all components (1–9) are included as supplementary information S3). For each map the SOM units are shaded according to their scores value of the principal component being visualised. To assist interpretation, the colour-space has been designed to highlight positive scores values in red and negative scores values in blue; values close to zero appear white. For component 2 (Fig. 2a) it can be seen that the spectra form two distinct regions on the map. These groupings were learned during the training process of the SOM based on features in the Raman spectra. The uniform shading obtained via the PC scores in Fig. 2a demonstrates that similar information is being utilised by both techniques for this component. The SOM for component 4 (Fig. 2b) forms distinct regions of red and blue but they are disconnected i.e. there is more than one region that can be described as red, and similarly for blue. This is because the SOM algorithm has organised the spectra by similarity, but PCA, which was used for the shading, is attempting to isolate uncorrelated components. Hence if principal component 2 represents the presence/absence of some biochemical A and component 4 of some biochemical B then the SOM has organised the spectra into four groups based on overall spectral similarity: A and B, A no B, B no A, no A and no B. For two components this does not necessarily involve disconnected regions but when more components are considered disconnected regions will occur due to the low dimensionality of the SOM representation. Interpretation of individual components on the SOM should therefore be taken in context with the
Fig. 2. SOM of the compressed spectra shaded by the scores of principal components a) two, b) four and c) seven. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of the article.)
representation of other components in order to fully understand the groupings being observed. Component seven shows similarly disconnected regions of red and blue on the map and the regions are less uniformly shaded. This is due to the nonlinearity of the SOM mapping in comparison to the linear PCA representation used for shading, and the higher levels of noise present when examining later principal components. To reconstruct the original image from the SOM maps in Fig. 2, pixels in the final image were represented by the principal component scores value of the SOM unit most similar to each pixel. Note that this similarity is between the preprocessed spectra and the SOM unit spectra, and not the PC scores of the raw spectra and the PC scores of the SOM spectra. In this way the costly PCA calculation for a large number of pixels is avoided. Using this approach also means that wherever several pixels have the same best matching SOM unit they are represented in image by the same PC scores value, thus a form of compression is obtained. For component 2 (Fig. 3a), the red and blue regions clearly relate to different histological features of the sample. In particular, the red pixels correspond to areas containing tissue densely populated with inflammatory cells and associated nuclei, which is comparable to the haematoxylin stain of the H&E image (Fig. 1). Fig. 3b shows
G.R. Lloyd et al. / Vibrational Spectroscopy 60 (2012) 43–49
47
Fig. 4. Composite a) SOM and b) reconstructed Raman image shaded using components two, four and seven as red, green and blue channels respectively. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of the article.)
Fig. 3. Reconstructed Raman images shaded by the value of principal component a) two, b) four and c) seven for the best matching SOM unit of each pixel. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of the article.)
the reconstructed image using PC 4 scores, for which the red pixels are spatially associated with regions of mucin, present as both goblet cells and mucin filled crypts. In Fig. 3c the distribution of PC 7 scores corresponds well with the location of the laminapropria (the basement membrane surrounding the epithelial glands) in red, and with cytoplasmic regions in blue. In addition to the visualisation of individual components it is useful to combine information from several components into a single image. This allows multiple biochemically relevant components to be presented at one time, in this case significantly enhancing the differentiation between different pathology types. Fig. 4 illustrates
this using PCs 2, 4 and 7 as the red, green and blue channels for the image. For visualisation purposes the negative of PCs 2 and 4 were used to obtain more pleasing colour scheme; this has no effect on the interpretation of the final image. The scores values for each component were rescaled to between 0 and 1. This ensures that both the positive and negative components of each loading are included in the visualisation. Scores values close to zero for all three components combine to produce grey in the final image. The most positive and negative values for each component combine to produce colours representing different combinations of positive and negative values of each component for individual pixels. Different regions of the pseudo-coloured SOM (Fig. 4a) can now be identified more easily than by considering the three components individually. For example, distinct regions of magenta, purple, yellow, green and blue can all be identified using this representation. The pseudo-coloured SOM was then used to reconstruct the original image (Fig. 4b). As for single components, the reconstructed pixel colour was taken to be the same as the SOM unit with the most similar weight vector. A large number of features within the reconstructed image can now be observed that were significantly more difficult to locate within the original H&E stained image. For example, magenta regions appear to correspond to mucin pools within the sample, while green regions correspond to connective tissue with high density of nuclei-rich inflammatory cells; in the H&E image these regions are both stained a similar colour. This is partly due to the poor quality of the H&E stain in this case, and other types of chemical staining could be used in order to highlight more of the specific features in the tissue if required. In the SOM image however, it is possible to identify additional features without further staining or data analysis. For example, yellow regions indicate regions of cytoplasmic protein and blue regions overlap with the basement membrane surrounding the epithelial glands. Furthermore, the CaF2 slide can be easily identified as the grey
48
G.R. Lloyd et al. / Vibrational Spectroscopy 60 (2012) 43–49
reconstructed image of this component (Fig. 3b), where mucin (a type of glycoprotein) is known to be present in goblet cells and crypts. However, further work is required to confirm this finding. There are several strong negative peaks in PC4 that can be associated with lipids and/or triglycerides. Finally, for component 7 (Fig. 5c), several positive peaks were assigned as collagen IV, which is known to be present in the basal membrane, and collagen I which is found in the stroma. This is consistent with the location of red regions in the scores image for this component (Fig. 3c). Numerous negative peaks in component 7 can be assigned to nucleic acids, and the tentative assignment of uracil at 784 cm−1 and 1230 cm−1 suggests this may be representing Ribonucleic acid (RNA). However, due to the complexity of the loadings and the high levels of noise, especially in the later components, all peak assignments are tentative and further work is required to fully characterise the biochemical components being represented. For example, comparison with point spectra from pure biochemical components may assist in the assignment of significant peaks in the loadings. Furthermore, noise levels in the data could be reduced by increasing the acquisition time, which for the sample presented here was 2.4 s per-pixel, or approximately 45 h for the complete image. 5. Conclusions
Fig. 5. Loadings of principal components a) two, b) four and c) seven with tentative peak assignments indicated.
regions in the composite image. This indicates that the slide has a PC value close to zero for components 2, 4 and 7, which is to be expected as the slide does not contain any tissue features in its spectrum. Finally, an advantage of PCA is that the loadings provide information about the features being represented by a component. In this case the loadings provide information relating to the Raman peaks, which in turn are related to the biochemistry of the sample. The loadings of components 2 and 4 and 7 are presented in Fig. 5. Loadings for all components (PC1 to PC9) are included as supplementary information S4. It can be seen that all three of the chosen components in Fig. 5 contain a significant number of peaks in both the positive and negative directions, as well as significant noise contributions (particularly in PC7), making them difficult to interpret. However, based on peak assignments from the literature [6,10,39,40] the following tentative assignments were made pending further investigation. Component 2 (Fig. 5a) is tentatively assigned to nucleic acids in the positive direction. This corresponds well with the reconstructed image for this component (Fig. 3a), where the red regions are associated with tissue densely populated with inflammatory cells and nuclei. In the negative direction, PC2 has some contributions from lipids, but this is not definitive. A number of the positive peaks for component 4 (Fig. 5b) can be assigned to proteins. Again, this corresponds well with the distribution of red areas in the
The self organising map (SOM) has been shown to be a useful tool for the exploratory analysis of hyperspectral Raman images. The SOM algorithm is robust to small numbers of artefacts such as cosmic rays that survive an initial filtering process. It has been shown that in combination with principal component analysis, SOM provides an alternative method for displaying and interpreting the data that can significantly enhance the features of a histology section based on biochemical information present in the Raman spectra of the section. It is shown that the SOM algorithm can reduce the spectra in a Raman image to a much smaller number of spectra that are still representative of the dataset without losing valuable spectral information relating to the pathology of the sample. SOM in combination with other MIA techniques such as PCA are therefore a useful tool for the exploratory analysis of hyperspectral Raman images. Further work is required to fully characterise the biochemical components being represented and to confirm our tentative peak assignments. This could be achieved, for example, by comparison with point spectra from pure components. Furthermore, the signal-to-noise could be improved by increasing the acquisition time of the acquired Raman spectrum for each pixel making it easier to identify significant peaks. Acknowledgements Gavin Lloyd would like to acknowledge the Gloucestershire Hospitals NHS Foundation Trust for funding this work. Nick Stone holds a Senior Research Fellowship (Career Scientist) from the National Institute of Health Research. Catherine Kendall is a Royal Society Dorothy Hodgkin research fellow. James Wood is funded by Cancer Research UK (CRUK) and the Bowel Disease Research Foundation (BDRF). Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.vibspec.2012.02.015. References [1] Cancer Research UK www.cancerresearchuk.org (accessed 04/08/11). [2] C.D. Chen, M.F. Yen, W.M. Wang, J.M. Wong, T.H. Chen, Br. J. Cancer 88 (2003) 1866–1873.
G.R. Lloyd et al. / Vibrational Spectroscopy 60 (2012) 43–49 [3] S.J. Winawer, R.H. Fletcher, L. Miller, F. Godlee, M.H. Stolar, C.D. Mulrow, S.H. Woolf, S.N. Glick, T.G. Ganiats, J.H. Bond, L. Rosen, J.G. Zapka, S.J. Olsen, F.M. Giardiello, J.E. Sisk, R. Van Antwerp, C. Brown-Davis, D.A. Marciniak, R.J. Mayer, Gastroenterology 112 (1997) 594–642. [4] P. Hewitson, C. Woodrow, J. Austoker, Evidence Summary: Patient Information for the NHS Bowel Cancer Screening Programme, http://www. cancerscreening.nhs.uk/bowel/publications/nhsbcsp04.pdf (accessed 04/08/11). [5] J.C. Taylor, C. Kendall, N. Stone, T.A. Cook, Br. J. Surg. 94 (2007) 6–16. [6] N. Stone, C. Kendall, N. Shepherd, P. Crow, H. Barr, J. Raman Spectrosc. 33 (2002) 564–573. [7] C. Kendall, M. Isabelle, F. Bazant-Hegemark, J. Hutchings, L. Orr, J. Babrah, R. Baker, N. Stone, Analyst 134 (2009) 1029–1045. [8] D. Day, J. Jass, A. Price, N. Shepherd, J. Sloan, N. Talbot, G. Williams, B. Warren, Morson and Dawson’s Gastrointestinal Pathology, Blackwell Publishing, Oxford, 2003. [9] J. Hutchings, C. Kendall, B. Smith, N. Shepherd, H. Barr, N. Stone, J. Biophotonics 2 (2009) 91–103. [10] C. Krafft, D. Codrich, G. Pelizzo, V. Sergo, J. Biophotonics 1 (2008) 154–169. [11] P. Lasch, W. Haensch, D. Naumann, M. Diem, Biophys. Biochim. Acta 1688 (2004) 176–186. [12] Y. Zhang, M. Desai, Pattern Recognit. 33 (2000) 1851–1861. [13] A. Ifarraguerri, C.I. Chang, IEEE T. Geosci. Remote 38 (2000) 2529–2538. [14] B. Aiazzi, P. Alba, L. Alparone, S. Baronti, IEEE T. Geosci. Remote 37 (1999) 2287–2294. [15] J. Burger, P. Geladi, Analyst 131 (2006) 1152–1160. [16] O. Rodionova, L. Houmøller, A. Pomerantsev, P. Geladi, J. Burger, V. Dorofeyev, A. Arzamastsev, Anal. Chim. Acta 549 (2005) 151–158. [17] C. Gendrin, Y. Roggo, C. Collet, Talanta 73 (2007) 733–741. [18] A. Beljebbar, S. Dukic, N. Amharref, M. Manfait, Analyst 135 (2010) 1090–1097. [19] B. Bird, M. Miljkovic, M.J. Romeo, J. Smith, N. Stone, M. George, M. Diem, BMC Clin. Pathol. 8 (2008) 8. [20] D.C. Fernandez, R. Bhargava, S.M. Hewitt, I.W. Levin, Nat. Biotechnol. 23 (2005) 469–474.
49
[21] C. Krafft, S.B. Sobottka, G. Schackert, R. Salzer, Analyst 130 (2005) 1070–1077. [22] R. Bhargava, T. Ribar, J.L. Koenig, Appl. Spectrosc. 53 (1999) 1313–1322. [23] M.H. Van Benthem, M.R. Keenan, R. Davis, P. Liu, H.D.T. Jones, D.M. Haaland, M.B. Sinclair, A.R. Brasier, J. Chemom. 22 (2008) 491–499. [24] R.G. Brereton, Chemometrics: Data Analysis for the Laboratory and Chemical Plant, John Wiley and Sons, Chichester, 2003. [25] P. Williams, P. Geladi, G. Fox, M. Manley, Anal. Chim. Acta 653 (2009) 121–130. [26] H. Grahn, P. Geladi, Techniques and Applications of Hyperspectral Image Analysis, John Wiley and Sons, Chichester, 2007. [27] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press, New York, 1981. [28] J.C. Dunn, Cybern. Syst. 3 (1973) 32–57. [29] T. Kohonen, Self-organizing Maps, Springer, 2001. [30] G.R. Lloyd, R.G. Brereton, J.C. Duncan, Analyst 133 (2008) 1046–1059. [31] D.R. Chen, R.F. Chang, Y.L. Huang, Ultrasound Med. Biol. 26 (2000) 405–411. [32] B.K. Lavine, C.E. Davidson, D.J. Westover, J. Chem. Inf. Comput. Sci. 44 (2004) 1056–1064. [33] T. Villmann, E. Merényi, B. Hammer, Neural Networks 16 (2003) 389–403. [34] H. Martens, E. Stark, J. Pharm. Biomed. 9 (1991) 625–635. [35] N. Stone, C. Kendall, J. Smith, P. Crow, H. Barr, Faraday Discuss. 126 (2004) 141–157. [36] M. Grimbergen, C. van Swol, C. Kendall, R. Verdaasdonk, N. Stone, J. Bosch, Appl. Spectrosc. 64 (2010) 8–14. [37] R.G. Brereton, Chemometrics for Pattern Recognition, John Wiley and Sons, Chichester, 2009. [38] J.M. Prats-Montalbán, A. de Juan, A. Ferrer, Chemom. Intell. Lab. Syst. 107 (2011) 1–23. ´ Biochem. Biophys. Res. Commun. 300 (2003) 41–46. [39] V. Kopecky, [40] V. Oleinikov, E. Kryukov, M. Kovner, M. Ermishov, A. Tuzikov, S. Shiyan, N. Bovin, I. Nabiev, J. Mol. Struct. 480–481 (1999) 475–480.