Image and Vision Computing 20 (2002) 701–707 www.elsevier.com/locate/imavis
Transforming pixel signatures into an improved metric space A.S. Holmes*, C.J. Rose, C.J. Taylor Imaging Science and Biomedical Engineering, University of Manchester, Stopford Building, Oxford Road, Manchester M13 9PT, UK Received 10 June 2001; accepted 14 March 2002
Abstract We address the problem of using scale-orientation pixel signatures to characterise local structure in X-ray mammograms, though the method we report is of general application. When signatures are treated as vectors for statistical analysis, the Euclidean metric is not well behaved. We have previously described a Best Partial Match (BPM) metric that measures signature similarity more naturally, but at high computational cost. We present a method for transforming signatures into a new space in which Euclidean distance approximates BPM distance, allowing BPM distance to be estimated at low computational cost. The new space is constructed using multi-dimensional scaling. The nonlinear transformation between the old and new spaces is learned using support vector regression. We present experimental results for mammographic data. q 2002 Elsevier Science B.V. All rights reserved. Keywords: Scale-orientation pixel signature; Metric space; Multidimensional scaling; Support vector regression; Computer-aided mammography
1. Introduction It has been shown that prompting in mammography can improve a radiologist’s performance, even if the prompting system makes errors [1]. Prompting involves using computer-based image analysis to locate potential abnormalities so that the radiologist’s attention can be drawn to them. One approach to detecting abnormalities is to compute some form of local feature signature for each pixel and then apply a statistical classifier [2]. Recently, nonlinear scale-orientation signatures have been used for the detection of malignant tumours in mammographic images [3]. The approach can in principle be extended to detect other normal and abnormal structures. However, when the signatures are treated as vectors for statistical classification, the vector space they define has unsatisfactory metric properties—a small change in underlying structure may produce a large change in the vector. This provides an unsatisfactory basis for statistical analysis. We have described previously a novel measure of signature similarity which we called Best Partial Match (BPM) distance that is robust to small changes in the scale * Corresponding author. Present address: Image-Metrics plc, Regent House, Stockport, SK4 1BS, UK. Tel.: þ 44-161-476-8220; fax: þ 44-161480-4583. E-mail addresses:
[email protected] (A.S. Holmes),
[email protected]. (C.J. Rose), chris.taylor@ man.ac.uk (C.J. Taylor).
and orientation of image structure and the presence of other potentially confounding structures [4]. This distance measure, which is based on the transportation (‘earthmover’) algorithm [5], gives good results but is extremely computationally expensive. In this paper, we present a scheme that uses the BPM distance to define a nonlinear transformation of pixel signatures into a vector space with improved metric properties. The transformation can be applied at low computational cost and creates a space in which Euclidean distance approximates BPM distance.
2. Scale-orientation pixel signatures Morphological M and N filters belong to a class of filters, known as sieves, that remove image intensity peaks or troughs smaller than a specified size [6]. By applying sieves of increasing size at a number of orientations, a scaleorientation signature can be constructed for each pixel in an image. The signature is a two-dimensional array in which columns are values for the same orientation, rows are values for the same scale and the values themselves are the greylevel change at the pixel, resulting from the application of the filter at a particular scale and orientation. Sieves have been shown to have desirable properties when compared to other methods of constructing scale-orientation signatures [7]. In particular, the filter responses at different positions on the same structure are similar (local stationarity) and the
0262-8856/02/$ - see front matter q 2002 Elsevier Science B.V. All rights reserved. PII: S 0 2 6 2 - 8 8 5 6 ( 0 2 ) 0 0 0 6 0 - 4
702
A.S. Holmes et al. / Image and Vision Computing 20 (2002) 701–707
Fig. 1. Two simple images, a Gaussian blob (a) and a Gaussian line (b), and scale-orientation signatures computed for the centre pixel of each.
interaction between adjacent structures is minimised. Fig. 1 shows two examples of scale-orientation signatures computed for the centre pixels of two simple structures.
Fig. 3. Partial matching using the transportation problem.
If we assume that the total number of goods supplied by the warehouses are equal to the total goods required by the markets, we have a standard problem and may formulate the linear programming problem as follows. Minimise the total cost of transportation
3. Measuring signature similarity using BPM distance In real applications, signatures may be obtained from possibly overlapping structures embedded in a variable mammographic background, resulting in partial similarity between signatures. This suggests that a fraction of a given signature should match to another obtained from the same structure. Thus, a useful measure of similarity should be able to recognise when similar structures are represented in two signatures despite intrinsic variability and regardless of the presence of other structures. We have addressed these requirements by treating signature matching as a transportation problem [4]. Transportation problems are a class of linear programming problems in which one attempts to minimise the cost of delivering integral quantities of goods produced at m warehouses to n markets whilst balancing supply and demand [5]. This generates a transhipment problem with no intermediate nodes, with each warehouse and market connected, and a set of unit costs, cij, of shipping from warehouse i to market j. We define fij to be the flow of goods from warehouse i to market j. Each warehouse can supply ai goods and each market can receive bj goods. Fig. 2 shows an example of the transportation problem with three warehouses connected to two markets.
Z¼
cij fij ;
ð1Þ
i¼1 j¼1
subject to supply restriction at warehouse i, n X
fij ¼ ai
for i ¼ 1; 2; …; m;
ð2Þ
j¼1
demand requirement at market j, m X
fij ¼ bj
for j ¼ 1; 2; …; n
ð3Þ
i¼1
and nonnegativity constraints fij $ 0
for all ði; jÞ:
ð4Þ
The standard transportation problem is well studied and there are efficient iterative solutions [8]. The elements in a signature can be thought of as warehouses, each containing goods proportional to the intensity of the element. In a signature to be compared, the elements can be thought of as markets, each requiring goods proportional to the intensity. Choosing a suitable set of costs to capture the two-dimensional nature of the signatures leads to a meaningful solution to the problem. Our work uses costs based on Euclidean distance within the twodimensional signature, taking into account the periodicity of the orientation axis of the signatures. Thus, localised movement in both scale and orientation is favoured above larger scale movements. The optimal solution to the transportation problem provides a total cost of transportation of material between signatures, given by Eq. (1). The solution also provides a value for the total flow, i.e. the total quantity of transported material F¼
Fig. 2. A transportation problem with three warehouses and two markets.
m X n X
m X n X i¼1 j¼1
fij :
ð5Þ
A.S. Holmes et al. / Image and Vision Computing 20 (2002) 701–707
703
4. Signature transformation scheme We wish to map signatures into a space in which Euclidean distance approximates BPM distance. We achieve this by learning an appropriate nonlinear transformation that applies to a particular class of images. We start with a large training set of signatures sampled from images of the class of interest (in our case mammograms). Our scheme consists of three separate components: signature prototyping, constructing a new space and learning the transformation into the new space. 4.1. Selecting signature prototypes
Fig. 4. A mammogram containing a variety of structure. The markers indicate the closest image signatures to the set of signature prototypes.
We may then define a similarity measure, c¯ to be the mean cost of moving a single unit of intensity in the optimal solution; the total cost of transportation divided by the quantity of material transported m X n X
Z ¼ c ¼ F
cij fij
i¼1 j¼1 m X n X
:
ð6Þ
fij
i¼1 j¼1
We have previously described a method based on the transportation problem to handle partial similarity between signatures [4] and reproduce it here briefly for completeness. We can implement partial signature similarity by creating a dummy warehouse and market, as shown in Fig. 3. The larger boxes are the signatures and the smaller boxes are a dummy warehouse and market. The first signature has a total supply of S1 goods and the second signature requires S2 goods. We choose a specified fraction of the first signature, fS1, to match to the second signature. This dictates what the values in the dummy warehouse and market should be, as shown in Fig. 3. We can solve the problem of finding an appropriate value for the partial match fraction f by considering c as a function of f. We observe a minimum in c at the value of f that results in the best description of one signature in terms of the other. This is termed the BPM fraction. For example, when a structure is common to both signatures, the BPM fraction corresponds to a solution that finds the common structure but imports and/or exports the remaining material in the two signatures. We use a polynomial fit, iterative search to find the BPM fraction at which the minimum value of c occurs. This minimum is termed the BPM distance.
The first step in our scheme involves obtaining a compact representation of the distribution of signatures in signature space, by extracting a set of representative prototypes. We use k-means clustering to select a set of n prototype signatures from an underlying distribution of N samples. In our case, we wish to find a set of prototypes that are representative of pixel signatures obtained from the full range of mammographic structures. A number, n, of initial prototypes are selected randomly, then all N signatures are examined in turn. Each signature is assigned to its nearest prototype and the mean of that prototype is adjusted accordingly. This is repeated until a stable set of prototypes is obtained that is representative of the whole population of signatures. We chose n to be as large as possible whilst making the next step in constructing the space computationally tractable. Fig. 4 shows a mammogram containing a variety of structures. To demonstrate the performance of the clustering algorithm, we found a set of 256 prototypes representative of a population of 216 signatures. For each of the prototype signatures, the image pixel with the most similar signature is highlighted, giving a visual representation of the prototype distribution. We note that relatively few prototypes are required to represent background tissue. In contrast, the prototype density is high in regions containing more structure. 4.2. Constructing a new space The next step in our scheme is to construct a space in which Euclidean distance approximates BPM distance. This can be achieved using multidimensional scaling. Multidimensional scaling (MDS) was introduced by Gower [9]. There are several variants but they all attempt to solve the following problem: given a set of similarities, S, find a configuration of points that will reproduce a Euclidean distance approximation to these similarities. The similarities need not be Euclidean or even satisfy the triangle inequality, Sac # Sab þ Sbc ;
ð7Þ
704
A.S. Holmes et al. / Image and Vision Computing 20 (2002) 701–707
Fig. 5. (a) A random configuration of points, (b) the distance matrix generated by the configuration and (c) a configuration reconstructed from the distance matrix.
where Sij is the distance between points i and j. If the similarity matrix is metric, that is it contains distances that satisfy Eq. (7), then MDS finds a configuration that will exactly reproduce the generating similarity matrix when the inter-point Euclidean distances are measured. If the similarity matrix is not metric, the new configuration can only reproduce an approximation to the generating similarity matrix. Our work uses an analytical technique that is described briefly below. The reader is referred to Ripley [10] for a more detailed explanation. A configuration X ¼ {xij } is an n £ p matrix of data, considered as n points in p dimensions. Given X, we can compute the Euclidean distance matrix S ¼ {Sij } where S2ij ¼
p X
ðxik 2 xjk Þ2 :
ð8Þ
k¼1
These distances can also be computed using the n £ n matrix: M ¼ XXT ; and calculating X X X S2ij ¼ x2ik þ x2jk 2 2 xik xjk ¼ Mii þ Mjj 2 2Mij
ð9Þ
ð10Þ
where M ¼ {Mij }: This observation leads to the following implementation. If S is the given similarity matrix, we firstly square the similarities Tij ¼ S2ij ;
ð11Þ
and then define T0 by removing row and column means and
add back the overall mean " # 1 ðT1Þ1T 1ðT1ÞT 1T T1 T2 2 þ : T ¼2 2 n n n2 0
ð12Þ
Next we find the eigendecomposition of T0 using Schur decomposition T0 ¼ UV2 UT
ð13Þ
and find V by taking the matrix square root. To finish, we find the order of the eigenvalues the diagonal values of V with the largest first and take the corresponding columns of UV to be our configuration, X. Taking only the first q largest eigenvalues is equivalent to taking the first q principal components of X, providing a lower dimensional view, at the expense of accuracy. An example of MDS is shown in Fig. 5. Fig. 5(a) is a randomly generated set of two-dimensional points. The distance matrix in Fig. 5(b) is generated from the data points by measuring the inter-point Euclidean distances. The distance matrix is symmetric and has a zero diagonal. MDS then generates a new configuration from the distance matrix, as shown in Fig. 5(c). Note that the distance matrix is invariant to translation, rotation and reflection of the configuration of data points. In our application, a distance matrix is constructed by measuring the inter-prototype BPM distances. MDS then places each of the prototypes into a new space in which Euclidean distance approximates BPM distance.
A.S. Holmes et al. / Image and Vision Computing 20 (2002) 701–707
705
The constraints on the optimisation impose the deviationintolerant criterion, whilst the objective function measures the flatness of the function. Note the slack variables, ji ; jpi which cope with infeasible constraints on the problem (i.e. a function f may not exist that approximates all ðxl ; yl Þ with e precision), and the constant C . 0 which determines the trade-off between the flatness of f and the extent to which deviations larger than e are tolerated. The problem can be solved more easily in its dual formulation, which yields: f ðxÞ ¼
l X
ai 2 api ðxi ·xÞ þ b where ai ; api [ ½0; C
ð15Þ
i¼1
Fig. 6. Pixel signature similarity images are generated from (a) a pixel from a linear structure in the original image using (b) Euclidean distance between raw signatures, (c) BPM distance between raw signatures and (d) Euclidean distance between transformed signatures.
4.3. Learning the transformation The final step in our scheme involves finding the nonlinear transformation that maps raw signatures into the new space. We achieve this using support vector regression. The use of support vector machines (SVM) in empirical data modelling is becoming widespread as they generalise well and address the ‘curse of dimensionality’. SVMs provide promising empirical performance and can be used for both classification (SVC) and regression (SVR). It is in the latter capacity that we use support vector methods here. A very brief overview is given below, but the reader is referred to [11] for more detail. SV methods are firmly grounded in statistical learning theory, which characterises properties of learning machines that allow them to generalise well. Assume a training data set {ðx1 ; y1 Þ; …; ðxl ; yl Þ} , Rd £ R: In e -SVR, the goal is to find a function f ðxÞ that deviates from yi by at most e for all the training data, while being as flat as possible. Applying these criteria to the linear function f ðxÞ ¼ w·x þ b with w [ Rd ; b [ R; poses the following optimisation problem: minimise
1 2
kwk2 þ C
l X
ji þ jpi
i¼1
8 yi 2 w·xi 2 b # e þ ji ; > > < subject to w·xi þ b 2 yi # e þ jpi ; > > : ji ; jpi $ 0:
ð14Þ
Notice that the algorithm can be written in terms of dot products between the data; it is this that allows support vector methods to learn nonlinear relationships. The method described so far could be generalised to the nonlinear case by using a simple map F : Rd ! F; where F is some feature space. The standard SVR algorithm could then be applied to the data in the feature space, so the dot product ðxi ·xÞ in Eq. 15 would be replaced by ðzi ·zÞ; where z ¼ FðxÞ: However, this can easily become computationally intractable. This problem can be solved by noting that the dot products in the feature space can be written in terms of a kernel function, Fðxi Þ·FðxÞ ¼ kðxi ; xÞ; for a kernel function k that satisfies certain conditions. By choosing a suitable kernel, k, an implicit nonlinear mapping of the data, prior to regression, can be achieved. Suitable kernel functions include: polynomial kernels, sigmoid kernels and radial basis function kernels. Thus, the nonlinear regression parameters (the vectors a; ap and the offset b ), may be learned. The regression method described above learns a transformation from xi [ Rd to yi [ R: In our application we wish to learn the transformation from the raw signature space to the new space where Euclidean distance approximates BPM distance, i.e. the transformation from xi [ Rd to yi [ Rg (where g , d in our application). For each dimension of Rg a transformation from Rd can be learned using the method presented above. The points in the regression input space are the original prototype signatures, formed using k-means clustering; the output space is defined as the reconstructed prototypes in the new space. After performing SVR, the nonlinear transformation learned could be applied to any signature to transform it into the new space where Euclidean distance approximates BPM distance. Once the nonlinear transformation has been learned off-line, the BPM metric can be approximated at run-time at low computational cost. Using a Gaussian kernel, the transformation from the raw signature space to the new space is: fj ðxÞ ¼
n
X
aij 2
apij
e
2
kxi 2xk2 2s2
þ bj
for j ¼ 1; …; g ð16Þ
i¼1
which defines the transformation from Rd to Rg :
706
A.S. Holmes et al. / Image and Vision Computing 20 (2002) 701–707
Fig. 7. Regression plot for similarity image.
5. Experimental evaluation 5.1. Method In the following experiment, pixel signatures were constructed by applying a morphological M filter, centred on every image pixel, at different scales and orientations. Ten scales were used that increased logarithmically and were chosen to encompass the range of structure sizes that occur in mammograms. For each scale the filter was applied at twelve orientations, separated by fifteen degrees. A region of interest (ROI) was selected from a mammogram (mdb245ls) in the MIAS database [12]. The ROI exhibits a variable background and contains a variety of structures that are commonly found in mammograms. The scheme described above was applied to signatures from the ROI using 256 prototype signatures and 37 dimensions in the output space, using a Gaussian kernel for the SVR. Next, a pixel belonging to a salient (linear) structure was chosen. Fig. 6(a) shows the ROI and the selected pixel. BPM distance was measured between the chosen pixel’s signature and every pixel signature in the ROI. Euclidean distances were also measured in both the raw signature and transformed spaces. Plotting the similarity distance as an intensity at each pixel forms a similarity image for each method. These images were inverted so that bright regions correspond to similar pixels and dark regions correspond to pixels that are less similar. The three similarity images are shown in Fig. 6(b) –(d). A regression (or Shepherd) plot was
also generated by plotting the value of each pixel in the BPM distance image for raw signatures against the same pixel value in the Euclidean distance image for transformed signatures. This allows the accuracy of the transformation scheme to be assessed. 5.2. Results and discussion Fig. 6(b) shows the similarity image generated by measuring Euclidean distance between raw signatures. This image clearly demonstrates the poor metric properties of the vector space defined by the raw signatures. The method does not, as we would hope, highlight other linear structures in the image, although it has clearly rejected bloblike structures. This image illustrates the need for an alternative metric. Fig. 6(c) shows the similarity image generated by explicitly measuring the BPM distance between raw signatures. This image took approximately 24 h to generate. We have previously discussed the suitability of the BPM measure [4], but for completeness we note that the method has successfully highlighted other linear structures and suppressed other structures and background, thus measuring similarity in a practically useful way. This result encourages us to examine the performance of our transformation scheme as shown in Fig. 6(d). This figure shows the similarity image generated by firstly transforming all image signatures, using the scheme described in Section 4 and then measuring the Euclidean distance between the chosen pixel signature and every other
A.S. Holmes et al. / Image and Vision Computing 20 (2002) 701–707
pixel signature. Having already constructed the new space and learned the mapping from the original to the new space ‘off-line’, this image took 0.25 seconds to generate (compared to 24 hours for Fig. 6(c), illustrating the main advantage of our scheme. It is qualitatively clear that as well as offering significant improvement in computational efficiency, the scheme also achieves reasonable accuracy in approximating BPM distance. We can further investigate the performance of the transformation scheme by examining the regression plot shown in Fig. 7. This shows the BPM distance between raw signatures taken from Fig. 6(c), plotted against the corresponding Euclidean distances between transformed signatures taken from Fig. 6(d). A perfect transformation scheme would produce an identical image and all of the points in Fig. 7 would lie on the dotted y ¼ x line. It is clear that the majority of the points lie on, or close to, the dotted line demonstrating the accuracy of the scheme. A closer examination reveals that larger distances (from the chosen linear structure pixel) are more efficiently modelled by the scheme than smaller distances. As the chosen pixel lies on a salient structure in the image, large distances from this pixel correspond to background mammographic structure. This agrees with our observation in Section 4.1 that few prototypes are required to adequately model background structure. Smaller distances correspond to pixels that are more similar to the chosen linear structure pixel. The regression plot indicates that these are modelled less efficiently. A larger number of prototypes would improve the performance of the scheme but would increase computational requirements. At present, it takes approximately 24 h to construct the improved metric space from 256 prototypes.
6. Conclusions We have shown previously that BPM distance is a useful similarity measure for scale-orientation pixel signatures. In this paper we have shown that it is possible to construct a new space, for a given class of images, in which Euclidean distance approximates BPM distance. We have also shown that it is possible to learn the nonlinear transformation required to map raw signatures into this new space, providing a computationally efficient approach to working in the new space. The method is totally unsupervised and we have shown that it is possible to obtain a good approximation to BPM behaviour, for a class of complex medical images. Standard statistical methods can be applied in the transformed space and would be expected to lead to more satisfactory results than those obtained by working in the raw signature space. A detailed description of possible applications of our
707
approach is beyond the scope of this paper, but a brief example may assist the reader. In [3], a principal component model of scale-orientation pixel signatures was built using signatures sampled from a training set of malignant masses from mammograms. A discriminant classifier was constructed in the principal component space and used to detect potential abnormalities in digitised mammograms. Since the vector space defined by scale-orientation pixel signatures has unsatisfactory metric properties, transforming the signatures into a space in which Euclidean distance approximates BPM distance, as described in this paper, prior to model/classifier training should result in improved classification performance. We have demonstrated the superiority of transformed signatures using a simple test of pixel similarity—more elaborate experiments will be the subject of future work.
Acknowledgments Anthony Holmes and Chris Rose were funded by the EPSRC.
References [1] I.W. Hutt, The computer-aided detection of abnormalities in digital mammograms, PhD Thesis, University of Manchester, 1996. [2] D.J. Marchette, R.A. Lorey, C.E. Priebe, An analysis of local feature extraction in digital mammography, Pattern Recognition 30 (9) (1997) 1547– 1554. [3] R. Zwiggelaar, T.C. Parr, J.E. Schuum, I.W. Hutt, S.M. Astley, C.J. Taylor, C.R.M. Boggis, Model-based detection of spiculated lesions in mammograms, Medical Image Analysis 3 (1) (1999) 39–62. [4] A.S. Holmes, C.J. Rose, C.J. Taylor, Measuring similarity between pixel signatures, Image and Vision Computing (2002) in press. [5] F.L. Hitchcock, The distribution of a product from several sources to numerous localities, Journal Mathematical Physics 20 (1941) 224–230. [6] J.A. Bangham, T.G. Campbell, R.V. Alridge, Multiscale median and morphological filters for 2D pattern recognition, Signal Processing 38 (1994) 387–415. [7] R. Harvey, A. Bosson, J.A. Bangham, The robustness of some scalespaces, British Machine Vision Conference (1997) 11– 20. [8] A. Ravindran, D.T. Phillips, J.J. Solberg, Operations Research: Principles and Practice, Wiley, New York, 1987. [9] J.C. Gower, Some distance properties of latent root and vector methods used in multivariate analysis, Biometrika 53 (3/4) (1966) 325–338. [10] B.D. Ripley, Pattern Recognition and Neural Networks, Cambridge University Press, Cambridge, 1996, pp. 305 –311. [11] V. Vapnik, The Nature of Statistical Learning Theory, Springer, New York, 1995. [12] J. Suckling, J. Parker, D. Dance, S. Astley, I. Hutt, C. Boggis, I. Ricketts, E. Stamatakis, N. Cerneaz, S. Kok, P. Taylor, D. Betal, J. Savage, The mammographic images analysis society digital mammogram database, Excerpta Medical International Congress Series 1069 (1994)
[email protected].