Available online at www.sciencedirect.com
Medical Image Analysis 12 (2008) 55–68 www.elsevier.com/locate/media
Hierarchical statistical shape analysis and prediction of sub-cortical brain structures Anil Rao a
a,b,*
, Paul Aljabar a, Daniel Rueckert
a
Visual Information Processing, Department of Computing, Imperial College London, 180 Queens’s Gate, London SW7 2BZ, UK b Clinical Imaging Centre, Imperial College London, Hammersmith Hospital, Du Cane Road, London W12 0NN, UK Received 7 November 2006; received in revised form 24 May 2007; accepted 6 June 2007 Available online 28 June 2007
Abstract In this paper, we describe how two multivariate statistical techniques can be used to investigate how different structures within the brain vary statistically relative to each other. The first of these techniques is canonical correlation analysis which extracts and quantifies correlated behaviour between two sets of vector variables. The second technique is partial least squares regression which determines the best factors within a first set of vector variables for predicting a vector variable from a second set. We applied these techniques to 178 sets of 3D MR images of the brain to quantify and predict correlated behaviour between 18 sub-cortical structures. Pairwise canonical correlation analysis of the structures gave correlation coefficients between 0.51 and 0.67, with adjacent structures possessing the strongest correlations. Pairwise predictions of the structures using partial least squares regression produced an overall sum squared error of 4.26 mm2, compared with an error of 6.75 mm2 produced when using the mean shape as the prediction. We also indicate how the correlation strengths between structures can be used to inform a hierarchical scheme in which partial least squares regression is combined with a model fitting algorithm to further improve prediction accuracy. 2007 Elsevier B.V. All rights reserved. Keywords: Registration; Shape modelling; Prediction; Morphometry
1. Introduction The area of computational anatomy is a rapidly developing discipline (Grenander and Miller, 1998). With the increasing resolution of anatomical scans of the human brain, a number of computational approaches for characterising differences in the shape and neuro-anatomical configuration of different brains have emerged. Such techniques have wide-ranging clinical applications since they may for example, be used to characterise the brain morphology associated with a particular pathology such as Alzheimer’s
*
Corresponding author. Address: Clinical Imaging Centre, Imperial College London, Hammersmith Hospital, Du Cane Road, London W12 0NN, UK. Tel.: +44 (0)2080086043; fax: +44 (0)2080086491. E-mail addresses:
[email protected] (A. Rao),
[email protected] (P. Aljabar),
[email protected] (D. Rueckert). 1361-8415/$ - see front matter 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.media.2007.06.006
disease, or be used to discriminate between groups of healthy and diseased subjects. Most morphometric techniques use image registration to warp brain data sets to a reference space before performing either a statistical analysis of grey-level intensities (e.g. voxel-based morphometry (Ashburner and Friston, 2000)) or the warp fields themselves (e.g. deformation-based morphometry (Ashburner et al., 1998; Chung et al., 2001)). There exist a number of non-rigid image registration techniques such as locally afine (Collins et al., 1996), diffusionbased (Thirion, 1998), spline-based (Rueckert et al., 1999), elastic (Bajcsy and Kovacˇicˇ, 1989; Gee et al., 1993), fluid (Christensen et al., 1995, 1996; Bro-Nielsen and Gramkow, 1996), and optical-flow based (Hellier et al., 2001) registration, but the correspondences implied by the recovered warp fields tend to be less reliable at the imaged boundaries of those brain structures with poor image contrast. This ambiguity in correspondence compromises any subsequent
56
A. Rao et al. / Medical Image Analysis 12 (2008) 55–68
analysis of the warp fields/image intensities at the boundaries of those structures. A possible solution to this problem is to use segmentation techniques to label corresponding structures in the data sets before performing image registration of the labelled images. This would improve image registration accuracy, but if there are a large number of datasets, a robust automated segmentation technique would be required and once again the problem of poor image contrast arises. Model-based segmentation is one automated approach to extract important structures from the brain. Statistical modelling techniques such as active shape models (Cootes et al., 1995), or active appearance models (Cootes et al., 1998) build models that represent the variability of the anatomy across subjects which can then be used to segment structures of interest. However, neither of these techniques explicitly model the variation of one structure with respect to another which is information that could potentially greatly improve the segmentation of hard-to-segment structures. For example, if we know that there is a strong dependence of the shape/appearance of the hippocampus on the shape/appearance of another structure that is easier to segment, we could segment that structure first and then use the dependencies to predict the hippocampus. In this paper, we address the problem of segmentation of structures within the brain by exploiting the correlations between the shapes of different brain structures as determined by statistical shape modelling. Statistical shape modelling refers to the analysis of the shapes of sub-structures (such as the lateral ventricles in the centre of the brain) and aims to describe their variation across subjects and between groups of subjects (e.g., comparing ventricle size and shape between Alzheimer’s sufferers and age-matched normals). We use statistical shape modelling and analysis across a number of subjects in order to explicity quantify the degree and nature of the variation in the shape of one structure with respect to that of another. The model building and analysis is facilitated by the registration of manually labelled training images and gives both a hierarchy of shape dependencies of the labelled structures, and a set of factors that can be used to predict the shape of one structure given the shape of another structure in an unlabelled image. Finally, we formulate a hierarchical prediction strategy that combines prediction with an independent model-based fitting algorithm in order to further improve the shape predictions. 2. Related work One approach to modelling the statistical dependencies of structures is described by Bossa and Olmos (2006) in which principal geodesic analysis is used to build statistical models of the relative poses of brain structures. However, we aim to model overall statistical dependencies between structures in the brain rather than explictly model interstructural pose dependencies. An alternative approach is described by de Bruijne et al. (2006) which they use to dis-
criminate between healthy and non-healthy vertebrae within the spine. In their technique, Gaussian conditional shape models are built that describe the pairwise shape relationships of each healthy vertebrae, and these models are then combined to predict the shape of a single vertebra from the shapes of all the others. Since, in their application, the shapes of all the vertebrae are known, the predicted healthy shape is then compared with the actual shape and if there are significant differences the vertebra is classified as unhealthy. Our approach also calculates pairwise statistical shape dependencies, but we use two different statistical techniques in order to build a hierarchy for the prediction of shapes of brain structures. The first of these techniques is canonical correlation analysis (CCA) which is a multivariate statistical tool for describing and quantifying correlated variation between sets of vector variables. It is an extension of multilinear regression and can be used to extract highly correlated factors (or modes) of variation in shape between different structures and an associated correlation coefficient that quantifies the degree of correlation of these modes (Mardia et al., 1982). Within the field of imaging, canonical correlation analysis has been previously used in image segmentation of magnetic resonance spectroscopic images (Laudadio et al., 2005) and the identification of noise in functional magnetic resonance images (Zollei et al., 2003). Canonical correlation analysis has also been used to estimate the shapes of obscured anatomical sections of the brain from visible structures in magnetic resonance images (Liu et al., 2004) where it was used as a predictive tool for a limited number of structures within the brain, whereas we use it purely to quantify the correlations between a number of different brain structures. The second technique described is partial least squares regression (PLSR) which is a relatively recent technique that extracts the variation within a first set of vectors that is most useful in predicting a second set of vectors. It combines components from multilinear regression and principal components analysis, but unlike standard regression techniques it is applicable even when there is a high degree of colinearity in either set of vectors. By performing a partial least squares regression of the shapes of different structures, the unknown shape of a structure from an unseen subject can be predicted given the known shapes of other structures from that subject. Partial least squares regression has previously been used in the field of imaging to analyze spatial functions of functional brain images (McIntosh et al., 1996), and to perform predictive cardiac motion modelling and correction of cardiac image sequences (Ablitt et al., 2004). Both CCA and PLSR facilitate the embedding of statistical shape modelling of the brain within a hierarchical framework since they can be used to extract and quantify correlated behaviour between any number of brain substructures. Hierarchical statistical models have also been formulated using wavelet transforms (Davatzikos et al., 2003) and partitioned active shape models (Zhao et al.,
A. Rao et al. / Medical Image Analysis 12 (2008) 55–68
2005) but neither of those approaches uses CCA or PLSR to establish and describe an explicit hierarchy of brain substructures. In the next two sections, we describe the mathematical formulation of canonical correlation analysis and partial least squares regression, before presenting results of the application of these techniques in the analysis and prediction of brain structures in Section 5. 3. Canonical correlation analysis The object of canonical correlation analysis is to extract and quantify correlations between two sets of vector variables, X = {xi}, Y = {yi}. The technique determines linear combinations of the components of the vector variables in X that are maximally correlated with linear combinations of the components of Y, and the strength of each of the correlations is described by corresponding correlation coefficients that lie between zero and one. The linear combinations, known as the canonical modes, give insight into the relationships between the two sets of variables (Mardia et al., 1982). For example, the first canonical mode is the pair of vectors ^ a0 ; ^ b0 such that the correlation q0 ð^ a0 ; ^ b0 Þ between x ¼ xTi ^ a0 and y ¼ yTi ^ b0 is maximal. This means that the function to be maximized is E½xy q0 ^a0 ; ^ b0 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi E½x2 E½y 2 h i E ^ aT0 xi yTi ^ b0 ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi h i bT0 yi yTi ^ E ½^ aT0 xi xTi ^ a0 E ^ b0 ^ aT0 C XY ^ b0 ffi ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ^ a0 ^ bT0 C YY ^ b0 aT0 C XX ^
ð1Þ
where CXX, CYY and CXY are the covariance matrices describing variation within X, Y, and between X and Y, respectively. The correlation coefficient associated with the first canonical mode is then equal to the maximum value of q0 ð^ a0 ; ^ b0 Þ. Although negating either ^a0 or ^b0 changes the sign of the correlation, canonical correlation coefficients are by convention always greater than zero as they represent the strength of the relationship between two sets of variables rather than its direction. In practice, the set of canonical modes ^ ak ; ^ bk and correlation coefficients qk for X and Y are calculated by solving the eigenvalue equations (Laudadio et al., 2005) 1 T C 1 ak ¼ q2k ^ ak XX C XY C YY C XY ^ 1 T 1 2 ^k ¼ q ^ C YY C XY C XX C XY b k bk
ð2Þ
The calculated modes ^ ak and ^ bk are then the linear combinations of variable components in X and Y, respectively, that have a corresponding correlation coefficient 0 < qk < 1. The kth mode is defined as the mode with the kth highest correlation coefficient. The corresponding
57
factor ^bk is then the kth best predicted factor of the variables yi, and ^ak is the factor of xi that best predicts ^ bk (Mardia et al., 1982). The set of factors ^ak are not necessarily orthogonal to each other, but are uncorrelated, as are the set of factors ^bk . The number of modes and correlation coefficients determined by a canonical correlation analysis of X and Y will be equal to the minimum of the number of dimensions in the vectors xi and yi. Since each of the correlation coefficients qk lie between 0 and 1, a single correlation coefficient 0 < q < 1 representing the overall correlation between X and Y can be determined by averaging the correlation coefficients over all calculated canonical modes. The coefficient q describes the average predictive power of the extracted canonical modes, and so summarizes the strength of the correlation between xi and yi. Canonical correlation analysis has certain maximal properties similar to those of principal components analysis (PCA). However, the two techniques differ fundamentally in that while CCA focuses on relationships between two groups of variables, PCA considers interrelationships within a single group of variables. If, for example, we were to pool the two sets of variables X = {xi} and Y = {yi} into a single set and then perform a PCA, we would lose the distinction between the two sets of data as the PCA does not ‘know’ to which data set each variable originated from. The resulting modes would then just model the variation of the composite data set without explicitly describing the dependencies of the individual data sets on each other. 4. Partial least squares regression Partial least squares regression is an extension of the multilinear regression model and aims to determine the principal modes within a set of vector variables X = {xi} for predicting a second set of response vector variables Y = {yi}. When X is of full rank, ordinary multilinear regression can be used but in practice X is likely to be singular due to colinearity in the predictor variables. Partial least squares regression can be used even when there is colinearity in the predictors, and it determines factors within X that are also important for Y by searching for modes that explain as much as possible of the covariance between X and Y. Once these modes have been determined, a set of weights can be calculated which can be used to predict an instance of the response variables y from a predictor variable x. Partial least squares regression works by finding a decomposition of the predictor X and response Y variables into latent variables T and U, respectively: X c ¼ TPT þ E Y c ¼ UQT þ F
ð3Þ
where Xc and Yc represent the mean-centred matrices of XT and YT, respectively. In the above equation, P is the factor loading matrix, Q is the coefficient loading matrix, and E
58
A. Rao et al. / Medical Image Analysis 12 (2008) 55–68
and F are error terms that are not described by the PLSR model. In PLSR, a relationship is derived between the latent variables T and U rather than Xc and Yc, i.e. U ¼ TB þ U E
ð4Þ
where B is a diagonal matrix of regression weights, and UE is the regression error term. The latent vectors T and U can be chosen in a number of different ways, but PLSR aims to find two sets of weights W and Q (where Q is the coefficient loading matrix in equation 3) in order to create linear combinations of the columns of Xc and Yc such that their covariance is maximum. Specifically, the goal is to firstly obtain a pair of vectors t and u t ¼ X c w;
jwj ¼ 1
u ¼ Y c q;
jqj ¼ 1
ð5Þ
where w and q are weight vectors, such that tTu is maximal. When the first latent vectors t and u are found, they are subtracted from Xc and Yc, respectively, and the procedure is iterated until Xc becomes a null matrix. Predicted values for the response variables Yc are then given by
Y c ¼ TBQT
ð6Þ
Partial least squares regression can be implemented using the nonlinear iterative partial least squares (NIPALS) algorithm, details of which can be found in Ablitt et al. (2004). 5. Statistical analysis of brain MR data 5.1. Method A set of magnetic resonance brain images of 178 subjects from the Centre for Morphometric Analysis (CMA), Boston, was used to create a training set of surfaces over which the canonical correlation analysis and the partial least squares regression techniques were applied. The images have a resolution of 1 mm · 1.5 mm · 1 mm and had been manually labelled in order to delineate structures within the brain by experts at the CMA. An example magnetic resonance image and its labelled segmentation are shown in Fig. 1. In order to build the training surfaces, a reference
Fig. 1. (a) Axial (left) and coronal (right) slices through one of the MR images in our dataset. (b) The corresponding manually labelled images.
A. Rao et al. / Medical Image Analysis 12 (2008) 55–68
subject was first chosen to define a coordinate system into which the surfaces of all the other subjects would be aligned, and the surfaces of 18 different sub-cortical brain structures of this reference subject were calculated from its labelled image. These structures were the left lateral ventricle, right lateral ventricle, left caudate, right caudate, left putamen, right putamen, left accumbens, right accumbens, left pallidum, right pallidum, left thalamus, right thalamus, left amygdala, right amygdala, left hippocampus, right hippocampus, fourth ventricle and the brain stem. In order to model the variation in the surfaces of these structures across all subjects, correspondences between each reference surface point and the corresponding surface in each of the other subjects must be determined. These were calculated by registering the labelled magnetic resonance images of each subject to the reference image using a B-Spline FFD registration algorithm which represents each transformation as the sum of a global affine component and a non-rigid component based on B-Splines (Rueckert et al., 1999). The optimal transformation is found by maximising the label consistency of the labelled images which measures the degree of their alignment. The determined reference to subject transformations were then used to transform each of the 18 reference surfaces to give the corresponding surface points of the other 177 subjects. For the surfaces of a given subject, each transformation was performed by multiplying the non-rigid component of the reference to subject registration with the inverse of the jacobian of the global affine transformation between reference and subject in order to remove any dependencies of the local non-rigid transformation on the global transformation (Rueckert et al., 2003). This ensures that any subsequent statistical modelling of the generated surface points does not describe differences across the training data that are simply a result of the different position, orientation, and overall size of each subjects brain. However, since this procedure represents a global alignment of the entire brain across the different subjects rather than a series of structure-by-structure alignments, both the residual shape and pose of individual structures will be included in the subsequent modelling. In a previous publication (Rao et al., 2006) we removed the poses of individual structures by performing a structure-by-structure procustes alignment in order to investigate the shape correlations of different structures. In this paper, we do not perform such an alignment because ultimately we want to combine the CCA and PLSR into a predictive scheme that enables both the shape and pose of one structure to be predicted from another. In what follows, the word ‘shape’ is therefore used as shorthand for ‘shape/pose’. For each individual structure X, the vectors xi representing the surface point coordinates for structure X of the ith subject 0 < = i < 178, were pooled to form a matrix set of vectors X = {xi} in which the ith column of X is equal to xi. The number of rows in X is therefore equal to the number of dimensions of the xi (equal to the number of surface points in X multiplied by three spatial coordinates) while
59
the number of columns equals the number of datasets (178). Prior to performing the canonical correlation analysis of surface point coordinates, a principal components analysis was performed on the surface point coordinates for each individual structure across the training data to reduce the dimensionality of the data. The dimension reduction minimizes the computational memory burden of the canonical correlation analysis and also eliminates colinearity in the data which can cause instability in the calculation of CCA. Fifty eight modes of variation were retained from the principal components analysis of each structure ensuring that at least 95% of the variation in each structure across the training data could be represented. For each structure X, the set X = {xi} was then transformed into its corresponding principal components basis to give e ¼ f~ ~i is of a new matrix set of vectors X xi g where each x e dimension 58 and X is of dimension 58 rows by 178 columns. Canonical correlation analysis was then performed for all pairs of structures X and Y using the corresponding e ¼ f~ e ¼ f~yi g. In each case, 58 sets of vectors X xi g and Y canonical modes and correlations are determined describing the correlated behaviour between structure X and structure Y. For a given pair of structures X and Y, these 58 correlations were then averaged to give a final correlation coefficient q between zero and one describing the strength of the correlations between the two structures. We chose not to retain the canonical modes as we were interested in quantifying correlations rather than analyzing the correlated behaviour itself. Leave one out tests were then performed to assess the accuracy of using partial least squares regression to predict surfaces of each subject. For a particular pair of structures X and Y, this was performed by calculating a partial least e k ,Y e k for each subject k, squares regression on data sets X k k e and Y e are derived using the same 0 < = k < 178. Here, X e and Y e but without including procedure that determines X subject k in the principal components analysis or the subsee k and Y e k are quent transformation of the data. Both X therefore of dimension 58 rows by 177 columns. A prediction of yk from xk is then determined by firstly transforming xk into the principal components basis associated with e k , before using the weights derived from the PLSR to give X e k . Finally, this a prediction for yk expressed in the basis of Y prediction is mapped back into the original basis to give the prediction yPk for yk. The prediction error is then calculated as 2
k ¼ kyk yPk k =N
ð7Þ
where N is the number of surface points in the structure Y. The value of k gives the average squared distance between the predicted surfaces points and the corresponding points in the actual surface. The errors k were then averaged to give a number describing the accuracy of the PLSR predictions. The corresponding errors when using the ‘naı¨ve’ prediction y, the mean shape of structure Y, were also calculated
60
A. Rao et al. / Medical Image Analysis 12 (2008) 55–68
and averaged over k to give an error measure d. The mean shape of structure Y is dependent only on variation of the shape of structure Y over the data whereas PLSR explicitly uses the shape of structure X as prior information when forming a prediction for the shape of Y. We were therefore interested in seeing how the predictions using these two approaches would compare across the training data. 5.2. Canonical correlation analysis results Fig. 2 shows the canonical correlation coefficients for each pair of structures as a grey-level matrix image in which brighter areas represent higher correlations. We can see that the correlation coefficients are maximal along the top-left to bottom-right diagonal as this line represents the correlations between a structure and itself which are always equal to one since any structure is perfectly correlated with itself. The matrix image is also symmetrical about this diagonal as the calculation of the correlation coefficients between any two structures does not depend on which of the structures is modelled as the predictor or dependent structure. If we now consider those correlations lying off the leading diagonal we can see that each structure is correlated to varying degrees with the other structures we have considered. For example, although the right caudate is well correlated with the right lateral ventricle (q = 0.6723), it is poorly correlated with the left hippocampus (q = 0.5248). Similarly, the left lateral ventricle is strongly correlated to the right lateral ventricle (q = 0.6603) but poorly correlated to the fourth ventricle (q = 0.5254). Table 1 now shows the
Table 1 List of each structure and its most correlated structure over all 18 subcortical structures Structure
Most correlated structure
Correlation
L. Lateral ventricle R. Lateral ventricle L. Thalamus R. Thalamus L. Caudate R. Caudate L. Putamen R. Putamen L. Pallidum R. Pallidum L. Hippocampus R. Hippocampus L. Amygdala R. Amygdala L. Accumbens R. Accumbens Fourth ventricle Brain stem
R. Lateral ventricle Right caudate L. Lateral ventricle R. Lateral ventricle L. Lateral ventricle R. Lateral ventricle L.Pallidum R. Pallidum L. Putamen R. Putamen L. Amygdala R. Amygdala L. Hippocampus R. Hippocampus L. Caudate R. Caudate Brain stem Fourth ventricle
0.6603 0.6723 0.6319 0.6259 0.6592 0.6723 0.6584 0.6581 0.6584 0.6581 0.6096 0.6382 0.6096 0.6382 0.5948 0.5786 0.6342 0.6342
best correlated structure for each of the 18 structures. Note that there is no intrinsic symmetry to this table, i.e. if structure Y is the strongest correlated structure to structure X, then structure X is not neccessarily the strongest correlated structure to structure Y. Any such symmetries found in Table 1 are a consequence of the ranking of correlation strengths deduced by CCA. An inspection of Table 1 reveals two patterns in our correlation data. Firstly, structures are best correlated with structures that are proximal to them. For example, the left accumbens is most strongly
1 = L. Lateral Ventricle 2 = L. Thalamus 3 = L. Caudate 4 = L. Putamen 5 = L. Pallidum 6 = Fourth Ventricle 7 = Brainstem 8 = L. Hippocampus 9 = L. Amygdala 10 = L. Accumbens 11 = R. Lateral Ventricle 12 = R. Thalamus 13 = R. Caudate 14 = R. Putamen 15 = R. Pallidum 16 = R. Hippocampus 17 = R. Amygdala 18 = R. Accumbens
Fig. 2. The matrix image of the correlations q between each pair of structures. In the image, bright areas represent strong correlations close to one, while dark areas represent weak correlations.
A. Rao et al. / Medical Image Analysis 12 (2008) 55–68
correlated with the left caudate while the right accumbens is most strongly correlated with the right caudate. Similarly, each of the amygdala, hippocampi, putamen, caudate, thalamus, brain stem and fourth ventricle are best correlated with neighbouring or proximal structures. This is interesting because it means that as the shape, size and position of a particular structure changes across the different brains in our data, the shape, size and position of neighbouring structures in the data changes in a correlated manner. The surfaces of each brain structure therefore influence, and are influenced by, structures that are close to them. Secondly, there is a great deal of symmetry in the most correlated structures for paired structures that appear in the left and right sides of the brain. For example, the left accumbens is most strongly correlated with the left caudate, while conversely the right accumbens is most strongly correlated with the right caudate. This symmetry is repeated for all the structures apart from the lateral ventricles, in which the left lateral ventricle is most strongly correlated to the right lateral ventricle, while the right lateral ventricle is most strongly correlated to the right caudate. In fact, the second most strongly correlated structure to the right lateral ventricle is the left lateral ventricle, while the second most strongly correlated structure to the left lateral ventricle is the left caudate, so there is still a strong symmetrical relationship between the most correlated structures of the ventricles. 5.3. Partial least squares regression results Table 2 summarizes the results we obtained when testing the accuracy of PLSR in predicting the shapes of some different structures by using leave one out tests. The first two columns give the predictor and predicted structures while the third and fourth columns give the average prediction errors when using PLSR and the mean of the predicted structure y (mean shape), respectively. We have included in the final column the correlation coefficient determined by the CCA between these two structures in order to illustrate relationships between the degree of correlation and the accuracy in the predictions of the PLSR technique.
61
Firstly, if we look at the errors produced when the right lateral ventricle is used to predict the left lateral ventricle, we can see that the PLSR gives a significantly better prediction than using the mean shape, with a reduction in error from 8.7 mm2 to 3.7 mm2. This pair of structures is also highly correlated, with a correlation coefficient q equal to 0.6603. Table 2 shows a similar improvement in prediction accuracy when using PLSR to predict the right lateral ventricle from the right caudate, and these structures are also highly correlated (q = 0.6723). Figs. 3 and 4 show the results of the individual leave-one-out tests for these predictions and we can see that the PLSR error is almost always smaller than the error associated with the mean shape. However, if we instead try to predict the right lateral ventricle using a poorly correlated structure such as the fourth ventricle (q = 0.5246), we can see that the PLSR errors in Table 2 and Fig. 5 are worse than those associated with the mean shape. Similarly, Table 2 shows that the PLSR prediction of the fourth ventricle is better than the mean shape if we use a well correlated structure such as the brain stem (q = 0.6342), while it is worse if we use a poorly correlated predictor such as the left pallidum (q = 0.5149). The individual leave-one-out tests for these predictions are shown in Figs. 6 and 7. Although a poor correlation between structures does not necessarily imply that the PLSR prediction will be worse than the mean shape, the inverse relationship between correlation strength and prediction error is still apparent. For example, Table 2 shows that while the mean shape errors are comparable when predicting the left or right accumbens, using the left caudate to predict the left accumbens (q = 0.595) gives a PLSR prediction that is 3.3 mm2 smaller than that produced if the right hippocampus is used to predict the right accumbens (q = 0.525). Similarly, Table 2 shows that while the mean shape errors are comparable when predicting the left or right hippocampus, using the left amygdala to predict the left hippocampus (q = 0.610) gives a PLSR prediction that is 0.8 mm2 smaller than that produced if the left hippocampus is used to predict the right hippocampus (q = 0.576). Finally, we show the individual leave-one-out test results for predicting the
Table 2 The errors using the PLSR and mean shape predictions for a number of different input structures X and predicted structures Y Structure X (input)
Structure Y (predicted)
PLSR error (mm2)
Mean shape error d (mm2)
Correlation coefficient q
R. Lateral ventricle R. Caudate Fourth ventricle Brain stem L. Pallidum L. Caudate R. Hippocampus L. Amygdala L. Hippocampus L. Putamen
L. Lateral ventricle R. Lateral ventricle R. Lateral ventricle Fourth ventricle Fourth ventricle L. Accumbens R. Accumbens L. Hippocampus R. Hippocampus L. Thalamus
3.682 3.493 9.4383 1.3690 6.581 1.109 4.419 4.247 5.019 3.267
8.725 8.835 8.835 5.7211 5.721 5.367 5.580 6.750 6.535 5.459
0.6603 0.6723 0.5246 0.6342 0.5149 0.5948 0.5252 0.6096 0.5760 0.5763
The correlation coefficients determined by the CCA of these structures is also given.
62
A. Rao et al. / Medical Image Analysis 12 (2008) 55–68
PLSR error Meanshape error
45
40
Error in mm squared
35
30
25
20
15
10
5
0
0
10
20
30
40
50
60
70
80 90 100 110 120 130 140 150 160 170 Subject predicted
Fig. 3. The errors when predicting the left lateral ventricle from the right lateral ventricle.
60
PLSR error Meanshape error
50
Error in mm squared
40
30
20
10
0
0
10
20
30
40
50
60
70
80 90 100 110 120 130 140 150 160 170 Subject predicted
Fig. 4. The errors when predicting the right lateral ventricle from the right caudate.
left thalamus from the left putamen in Fig. 8. As these structures are fairly well correlated (q = 0.5763), the PLSR predictions are generally more accurate than using the mean shape as the prediction.
Fig. 9 shows an example of the surfaces predicted by PLSR with our data. In Fig. 9a, we show the left and right lateral ventricles of one of the subjects, while in Fig. 9b we show the left lateral ventricle and the predicted right lateral
A. Rao et al. / Medical Image Analysis 12 (2008) 55–68 60
63
PLSR error Meanshape error
50
Error in mm squared
40
30
20
10
0
0
10
20
30
40
50
60
70
80 90 100 110 120 130 140 150 160 170 Subject predicted
Fig. 5. The errors when predicting the right lateral ventricle from the fourth ventricle.
PLSR error Meanshape error 20
Error in mm squared
15
10
5
0
0
10
20
30
40
50
60
70
80 90 100 110 120 130 140 150 160 170 Subject predicted
Fig. 6. The errors when predicting the fourth ventricle from the brain stem.
ventricle for this subject using the left lateral ventricle as the predictor structure. We can see that PLSR gives quite a good prediction for the right lateral ventricle and it is significantly better than using the mean right lateral ventricle as the prediction which is shown in Fig. 9c. The shape of the mean
right lateral ventricle is too narrow, while the PLSR approach incorporates shape information about the left lateral ventricle to predict a wider shape for the right lateral ventricle, reflecting the symmetry we would expect between the two structures. Fig. 10 shows another example of the
64
A. Rao et al. / Medical Image Analysis 12 (2008) 55–68
PLSR error Meanshape error
Error in mm squared
20
15
10
5
0
0
10
20
30
40
50
60
70
80 90 100 110 120 130 140 150 160 170 Subject predicted
Fig. 7. The errors when predicting the fourth ventricle from the left pallidum.
25
20
Error in mm squared
PLSR error Meanshape error
15
10
5
0
0
10
20
30
40
50
60
70
80 90 100 110 120 130 140 150 160 170 Subject predicted
Fig. 8. The errors when predicting the left thalamus from the left putamen.
PLSR predictions from our data. In Fig. 10a, we show the left putamen and left thalamus of one of the subjects, while in Fig. 10b we show the left putamen and the predicted left thalamus for this subject using the left putamen as the pre-
dictor structure. Again, we can see that the PLSR prediction is significantly better than the mean left thalamus shown in Fig. 10c. Not only is the shape of the PLSR predicted left thalamus more similar to the real left thalamus, but its dis-
A. Rao et al. / Medical Image Analysis 12 (2008) 55–68
65
Fig. 9. The performance of the PLSR technique when predicting the left lateral ventricle (shown on the right of each figure) from the right lateral ventricle (shown on the left of each figure) for one of the subjects. In (a), we show the actual left and right lateral ventricles for this subject, while in (b) we show the predicted left lateral ventricle using PLSR and in (c) the mean shape of the left lateral ventricle is shown. The colours at each point on the surfaces indicate the euclidean distance of the surface from the actual surface for this structure as determined by the manual label. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 10. The performance of the PLSR technique when predicting the left thalamus (shown on the left of each figure) from the left putamen (shown on the right of each figure) for one of the subjects. In (a), we show the actual left putamen and left thalamus for this subject, while in (b) we show the predicted left thalamus using PLSR and in (c) the mean shape of the left thalamus is shown. The colours at each point on the surfaces indicate the euclidean distance of the surface from the real surface for this structure. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
tance from the left putamen is also more similar to the corresponding distance between the real structures. It should be noted that there are strong CCA correlations between each of the pairs of structures shown in Figs. 9 and 10 and so the PLSR predictions for these structures are small relative to the errors given by the mean shape.
just using the mean shape of this structure. Instead, we can first predict the shapes of the other unknown structures in the image Y1, . . ., Yn1 which may have stronger correlations to Yn, and incorporate these shapes into the prediction. This can be achieved using the hierarchical scheme outlined in Algorithm 1.
5.4. Hierarchical prediction of structures
Algorithm 1. This algorithm describes the use of a hierarchy to improve structure predictions
In Section 5.3, we saw how the accuracy of the partial least squares regression predictions improve as the correlation coefficients between the predictor and the predicted structure increase. We can exploit this relationship by building a prediction hierarchy into which a model fitting algorithm can be incorporated to further improve predictions. Consider the case in which we are trying to estimate the shape of a structure Yn in a subject for whom we know the shape of a different structure Y0 is y0. Such a situation may arise if we have managed to segment, to good accuracy, the structure Y0 from the magnetic resonance image of this subject, while structure Yn could not be segmented due to poor contrast for this structure. One possibility would be to predict the shape of Yn from y0 using PLSR as we described in the previous section but if these shapes are poorly correlated then this prediction may be worse than
0 Initialize: Predictor structure P as P = Y0 with shape p = y0 Set of unpredicted structures as Q = {Y1, . . ., Yn1} 1 While Q 6¼ ; 2 Find Yi 2 Q that is best correlated to P using CCA 3 Use PLSR on P and Yi to form prediction ~yi for shape of Yi from p 4 Use a model fitting algorithm to refine y~i giving y^i 5a Group P and Yi into a single structure which replaces P 5b Replace p with ½p; ^yi which is shape estimate of new P 6 Remove Yi from Q 7 Repeat
The strength of this hierarchical scheme lies in the incorporation of a model fitting algorithm at line 4 which uses the PLSR prediction made in line 3 as an initial estimate for the shape of the current unknown structure, before further refining and improving this prediction. Without such a
66
A. Rao et al. / Medical Image Analysis 12 (2008) 55–68
refinement, the scheme would be no more accurate than predicting all structures Y1, . . ., Yn in a pairwise fashion from the first structure Y0 using partial least squares regression. However, by combining PLSR and a model fitting algorithm at each level of the hierarchy we can produce better approximations of structures than either the partial least squares regression or the model fitting alone would produce. To some extent, the improvement in prediction accuracy given by this scheme will depend on how well the model fitting in line 4 of Algorithm 1 preserves the correspondences between surface points when ~ yi is deformed into ^ yi . If the model-fitting uses an approach similar to the B-Spline FFD parametrisation used to register the training surfaces, then we would expect the correspondences to be preserved, and so the subsequent predictions will be accurate. In addition, since the surface ^ yi is transformed into the principal components basis for this structure (as outlined in Section 5.1), any perturbations of these correspondences will be effectively ‘smoothed’ into a set of correspondences that lies within the point distribution for that structure, before using it in the next round of predictions. The principal components analysis therefore makes the PLSR prediction more robust to any changes in the correspondences produced by the model fitting. To demonstrate the potential of such a hierarchical scheme, we used partial least squares regression to predict the hippocampus using two different sets of predictor structures. In practice, the hippocampus is one of the most difficult structures to segment from structural magnetic
resonance images, and we wanted to show how the use of a hierarchical model fitting scheme can improve the segmentation of this structure. Firstly, we predicted the hippocampi using the amygdala, which as Table 1 shows, are highly correlated structures. Secondly, we predicted the hippocampi using a single composite structure consisting of the brain stem, the lateral ventricles, and the subcortical nuclei (accumbens, amygdala, putamen, pallidum, and caudate). Here we are using many known structures to predict the unknown shape of the hippocampi rather than predicting all the structures in turn. This prediction is analogous to estimating the shape of the hippocampi from the shape of a single known structure by using the hierarchical scheme outlined above, but where the model fitting algorithm at line 4 always gives a perfect fit of the other unknown shapes until the final PLSR prediction of the hippocampi itself. Although no model fitting technique can be this accurate, the model fitting should improve on the PLSR estimate from line 3, and conversely the initialisation of the model fitting with the PLSR estimate should improve its own performance. This prediction therefore represents the ‘best-case’ scenario of using the hierarchical scheme up until the final prediction which is performed using only partial least squares regression and does not include any model fitting. Fig. 11 shows the PLSR errors for both sets of predictions, and we can see that the errors associated with using the composite structure as the predictor are almost always smaller than those associated with using the amygdala. The average prediction error when using the amygdala as
15 Brain stem/nucleii/ventricles error Amygdala error
Error in mm squared
10
5
0
0
10
20
30
40
50
60
70
80 90 100 110 120 130 140 150 160 170 Subject predicted
Fig. 11. The errors when predicting the hippocampi from the amygdala and from the brain stem/sub-cortical nuclei/lateral ventricles.
A. Rao et al. / Medical Image Analysis 12 (2008) 55–68
the predictor is 4.59 mm2 while the corresponding error when using the composite structure is 2.70 mm2. These results are expected because the prediction using the composite structure uses information about all the other structures to perform the prediction rather than just the amygdala. Performing the prediction within the hierarchical model-fitting framework in which the amygdala is the first predictor and the hippocampi is the final predicted structure should similarly give a better prediction than that produced using just the amygdala as a predictor. 6. Discussion The results of the canonical correlation analysis and the partial least squares regression imply that there are differing degrees of correlated and dependent variation between the shapes of different structures within the brain. The CCA gives larger correlations for structures that are close to each other, which suggests that the shapes of structures both influence, and are influenced by, those structures that are proximal to them. Intuitively, this makes sense as one would not expect to see a subject in which a structure or group of structures possess shapes that are located in the centre of their shape distributions, while another structure neighbouring to them has a shape that lies at the extremum of its distribution. Indeed, the observation of such a distribution of shapes may indicate a degree of pathology in the subject. However, it should be noted that the correlations and associated modes determined by canonical correlation analysis do not necessarily describe a large amount of the variation between structures but instead describe the most correlated behaviour in that variation. Conversely, partial least squares regression explicitly captures the major variation between structures and was shown to be an effective technique for predicting the shape of one structure from another. When the correlations (as determined by the CCA) were relatively large, the PLSR produced better predictions than the mean shape, while for smaller correlations the mean shape predictions were sometimes better than the PLSR predictions. As both CCA and PLSR are essentially linear regression techniques and therefore measure linear dependence between two sets of variables, one would expect the PLSR predictions to improve as the CCA correlations increase. The relationship between the CCA correlations and the PLSR predictions can be used to build a prediction hierarchy into which any model fitting algorithm can be incorporated to estimate structures that are difficult to segment from a given image. At each level of the hierarchy, structures that are well correlated to predictors are predicted using PLSR, refined by the model fitting algorithm, and then incorporated into the set of predictors. The model fitting algorithm will improve the PLSR prediction at each level of the hierarchy, resulting in a more accurate prediction than if PLSR alone was used. Although we presented results in the ideal case of the model fitting producing perfect fits until the prediction of the final structure, we would
67
also expect a practical implementation of this scheme to give better predictions than a pairwise PLSR prediction. The current focus of our work is the incorporation of a novel FFD-based model fitting algorithm into this hierarchical scheme for the segmentation of structures within the brain. Acknowledgements We are grateful for the support of the EPSRC who funded this work under the IBIM project. We also thank David Kennedy from the Centre for Morphometric Analysis for providing the MR data. References Ablitt, N.A., Gao, J., Keegan, J., Stegger, L., Firmin, D.N., Yang, G.Z., 2004. Predictive cardiac motion modeling and correction with partial least squares regression. IEEE-TMI, 1315–1324. Ashburner, J., Friston, K.J., 2000. Voxel-based morphometry – the methods. Neuroimage 11 (6), 805–821. Ashburner, J., Hutton, C., Frackowiak, R., Johnsrude, I., Price, C., Friston, K., 1998. Identifying global anatomical differences: deformation-based morphometry. Human Brain Mapping 6, 638–657. Bajcsy, R., Kovacˇicˇ, S., 1989. Multiresolution elastic matching. CVGIP 46, 1–21. Bossa, M., Olmos, S., 2006. Statistical model of similarity transformations: building a multi-object pose model of brain structures. IEEEMMBIA, 59. Bro-Nielsen, M., Gramkow, C., 1996. Fast fluid registration of medical images. In: VBC’96, pp. 267–276. Christensen, G.E., Miller, M.I., Mars, J.L., Vannier, M.W., 1995. Automatic analysis of medical images using a deformable textbook. Computer Assisted Radiology. Springer, Berlin, pp. 146–151. Christensen, G.E., Joshi, S.C., Miller, M.I., 1996. Individualizing anatomical atlases of the head. In: VBC’96, pp. 434–348. Chung, M.K., Worsley, K.J., Paus, T., Cherif, D.L.C., Giedd, J.N., Rapoport, J.L., Evans, A.C., 2001. A unified statistical approach to deformation-based morphometry. Neuroimage 14 (3), 595–606. Collins, D., Holmes, C., Peters, T., Evans, A., 1996. Automatic 3D modelbased neuro-anatomical segmentation. Human Brain Mapping 3, 190– 208. Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J., 1995. Active shape models – their training and application. CVIU 61 (1), 38–59. Cootes, T.F., Edwards, G.J., Taylor, C.J., 1998. Active appearance models. In: ECCV’98, vol. 2, pp. 484–498. Davatzikos, C., Xiaodong, T., Shen, D., 2003. Hierarchical active shape models, using the wavelet transform. IEEE-TMI, 414–423. de Bruijne, M., Lund, M., Tanko´, L., Pettersen, P., Nielsen, M., 2006. Quantitative vertebral morphometry using neighbor-conditional shape models. In: MICCAI’06. Lecture Notes in Computer Science, pp. 1–8. Gee, J., Reivich, M., Bajcsy, R., 1993. Elastically deforming 3D atlas to match anatomical brain images. Journal of Computer Assisted Tomography 17 (2), 225–236. Grenander, U., Miller, M.I., 1998. Computational anatomy: an emerging discipline. Quarterly of Applied Mathematics 56 (4), 617–694. Hellier, P., Barillot, C., Memin, E., Perez, P., 2001. Hierarchical estimation of a dense deformation field for 3D robust registration. IEEE-TMI 20, 388–402. Laudadio, T., Pels, P., Lathauwer, L., Hecke, P., Huffel, S., 2005. Tissue segmentation and classification of MRSI data using canonical correlation analysis. Magnetic Resonance in Medicine 54, 1519–1529. Liu, T., Shen, D., Davatzikos, C., 2004. Predictive modeling of anatomic structures using canonical correlation analysis. In: IEEE-ISBI. pp. 1279–1282.
68
A. Rao et al. / Medical Image Analysis 12 (2008) 55–68
Mardia, K.V., Kent, J.T., Bibby, J.M., 1982. Multivariate Analysis. Academic Press, Belfast. McIntosh, A.R., Bookstein, F.L., Haxby, J.V., Grady, C.L., 1996. Spatial pattern analysis of functional brain images using partial least squares. Neuroimage 3 (1), 143–157. Rao, A., Babalola, K., Rueckert, D., 2006. Canonical correlation analysis of sub-cortical brain structures using non-rigid registration. In: WBIR’06. Lecture Notes in Computer Science. pp. 66–74. Rueckert, D., Sonoda, L.I., Hayes, C., Hill, D.L.G., Leach, M.O., Hawkes, D.J., 1999. Non-rigid registration using free-form deformations: application to breast MR images. IEEE-TMI 18 (8), 712–721.
Rueckert, D., Frangi, A.F., Schnabel, J.A., 2003. Automatic construction of 3D statistical deformation models of the brain using non-rigid registration. IEEE-TMI 22 (8), 1014–1025. Thirion, J., 1998. Image matching as a diffusion process: an analogy with Maxwell’s demons. Medical Image Analysis 2, 243–260. Zhao, Z., Aylward, S.R., Teoh, E.K., 2005. A novel 3d partitioned active shape model for segmentation of brain mr images. In: MICCAI’05. Lecture Notes in Computer Science. pp. 221–228. Zollei, L., Panych, L., Grimson, E., Wells, W., 2003. Exploratory identification of cardiac noise in FMRI images. In: MICCAI’03. Lecture Notes in Computer Science. pp. 475–482.