Impact of inconsistent resolution on VBM studies

Impact of inconsistent resolution on VBM studies

www.elsevier.com/locate/ynimg NeuroImage 40 (2008) 1711 – 1717 Impact of inconsistent resolution on VBM studies João M.S. Pereira,⁎ Peter J. Nestor, ...

678KB Sizes 0 Downloads 9 Views

www.elsevier.com/locate/ynimg NeuroImage 40 (2008) 1711 – 1717

Impact of inconsistent resolution on VBM studies João M.S. Pereira,⁎ Peter J. Nestor, and Guy B. Williams Department of Clinical Neurosciences, University of Cambridge School of Clinical Medicine, Cambridge, UK Received 13 September 2007; revised 16 January 2008; accepted 22 January 2008 Available online 1 February 2008

This paper considers the effects of using magnetic resonance scans with different voxel dimensions in voxel-based morphometry studies. This is of potential relevance to many longitudinal studies or any ad-hoc study that relies on pre-existing databases of subjects. In order to study this effect, a group of controls were contrasted with a group of semantic dementia as well as with a group of Alzheimer's disease patients using a mixture of different voxel dimensions scans on each side of the statistical test. Scans were interpolated using a sinc function in order to obtain a different voxel depth. The effects were measured by comparing the output of each analysis to the benchmark in which all scans had the original depth (and highest resolution), both visually and through the computation of the root–mean–square error difference between the resulting t-maps. It was shown that the impact is highly dependent on the scan itself, with some images showing more robustness to the interpolation process, and hence yielding fewer differences. A measure of robustness is proposed, which may be used in order to understand the impact of mixing different dimensions or adjusting them for each scan. Indiscriminate use of voxel dimensions on both groups was found to produce more errors (false positives/false negatives) than does an approach involving the use of balanced groups and a voxel dimension nuisance covariate. © 2008 Elsevier Inc. All rights reserved. Keywords: VBM; Voxel dimensions; Interpolation; Degradation

Introduction Voxel based morphometry (VBM) is a popular tool for comparison of structural changes, such as regional grey matter density, in patient cohorts. Ideally, raw data for VBM processing should be as coherent as possible, i.e., scans should be acquired in the same scanner with the same imaging parameters (Good et al., 2001). In general, studies only use consistently acquired data (Mechelli et al., 2005; Gitelman et al., 2001) but this may be an issue for longitudinal or multi-centre studies. For instance, it is conceivable that in rare diseases, acquisition of a cohort may take several years, spanning scanner upgrades or changes to acquisition protocols. In such

⁎ Corresponding author. E-mail address: [email protected] (J.M.S. Pereira). Available online on ScienceDirect (www.sciencedirect.com). 1053-8119/$ - see front matter © 2008 Elsevier Inc. All rights reserved. doi:10.1016/j.neuroimage.2008.01.031

circumstances, it would be potentially advantageous to include such mixed scans in a single analysis. Since the usual VBM method involves reslicing the data to a lower resolution and application of a smoothing kernel, it is feasible that such an analysis might be unbiased. The following experiments explore the impact of one source of scan heterogeneity—variation in slice thickness. This paper addressed the impact of interpolation on results. The modelling of voxel differences through a nuisance covariate was also explored. Methods Subjects Subjects were chosen from two different diseased cohorts with different patterns of atrophy: Alzheimer’s disease (AD—mild diffuse atrophy) and semantic dementia (SD—severe focal atrophy). Each cohort had 14 subjects and was contrasted with a sample of 14 normal controls. Demographic details are provided in the Supplementary material. These dementia cohorts present very distinct atrophic patterns: SD is characterised by severe, typically asymmetric, atrophy of the anterior temporal lobes (Williams et al., 2005), whereas AD shows a more diffuse pattern of cortical atrophy – more so posteriorly – as well as an atrophic hippocampal region (Nestor et al., 2006; Lerch et al., 2005). Imaging Subjects were scanned using a 1.5-T GE MRI scanner. The images were acquired using a coronal T1-weighted 3D spoiled gradient echo sequence with echo time of 4.2 ms, inversion time of 650 ms and flip angle of 20°. All scans had original voxel dimensions of 0.86 × 0.86 × 1.5 mm3 and number of slices 256 × 256 × FoV in which FoV is the field of view on the coronal direction, in this case ranging from 110 to 124. Two additional T1 3D MPRAGE scans were acquired from a single healthy volunteer using a Siemens Trio 3-T MRI scanner, with voxel dimensions 1 × 1 × 1 mm3 (matrix size 240 × 256 × 176) and 1 × 1 × 1.5 mm3 (matrix size 240 × 256 × 256) with echo time 5.9 ms, inversion time 900 ms and flip angle 9°.

1712

J.M.S. Pereira et al. / NeuroImage 40 (2008) 1711–1717

Interpolation The resampling of the scans to new voxel dimensions (0.86 × 0.86 × 1.8 mm3) was performed with sinc interpolation (Thacker et al., 1998) using the mri_convert application available in the Freesurfer software package (http://surfer.nmr.mgh.harvard. edu/fswiki) (Dale et al., 1999; Fischl et al., 1999). The validity of using interpolation to simulate natively acquired scans of equivalent dimensions is tested in the Supplementary material. VBM preprocessing All VBM analyses were performed in SPM5 (Ashburner and Friston, 2005) using default parameters. All segmented scans were modulated and smoothed using isotropic Gaussian kernels. Statistical analysis Data were analysed in SPM5 to test (two-group t-tests) for differences in grey matter volume. A relative mask threshold of 0.001 was used in order to exclude some of the background values. The analysis was repeated with different number of scans being resampled each time. We define n to be the number of available subjects for both groups (total = 14 per group), nc to be the number of controls which had their voxel dimensions changed and nd to be the number of dementia subjects which had their voxel dimensions changed to 1.8 mm. The set of tests consisted in varying nc and nd such that nc,d ∈ {1, 3, 5, 7, 9, 11, 13}. All permutations of these values were tested. Two approaches were tested using this framework: with and without a nuisance covariate, which is meant to account for the voxel dimensions difference. A smoothing kernel of 12 mm was used. Further tests were also performed with 6 mm and 20 mm and the impact of this difference is evaluated in the supplementary material. Effects of chance selection were also excluded by randomly selecting which subjects were interpolated. In order to explore the possibility of simulating a resampling to a higher resolution as an equalisation step after the 1.8 mm interpolation had been performed, tests were performed in which all SD and control scans were first down-interpolated (to simulate the acquisition at 1.8 mm) and then up-interpolated back to their original dimension (1.5 mm). All up-interpolated scans were contrasted, and the result was compared to the nc = nd = 14 case. Studies were performed with a 12-mm smoothing kernel. Impact assessment In order to measure the impact of including scans with different voxel dimensions into a statistical test in SPM5, a simple root–mean– square difference (or error, rmse) was used to compare the t-map produced by the heterogeneous studies (tdeg) versus the homogeneous reference (torig): N  2 P tdeg ðiÞ  torig ðiÞ

rmse ¼ i¼1

N

ð1Þ

in which i is a voxel index and N is the total number of voxels (zero errors were excluded). The derived metric (for the t-maps) was scaled by the number of resels output by SPM’s statistical analysis. Resels are a measure of the intrinsic smoothness of the scan and accommodate this factor when

comparing the errors using different FWHM kernels. Without such scaling, rmse 3D graphs are flatter for lower FWHM kernels. This can be explained by taking a closer look at the t-maps: smaller kernels produce maps with finer structure, so when comparing both maps to produce the rmse as seen in Eq. (1), there are finer differences being picked up. On the other hand, a grosser structure, such as the one from a large kernel t-map, will be associated with grosser errors, more visible in rmse. The result is counterintuitive: smoother t-maps yield higher rmse and steeper error graphs, but the visual assessment of differences in glass brains is much clearer (finer) for the less smoothed cases than in broad significance areas produced by large kernels. To translate to what an observer sees, the rmse was multiplied by the number of resels, which has an inverse relation (higher when smoothing is lower). By using the measure rmse × Resels, it was possible to observe the expected flattening of the parameter with the increase in FWHM. An average 3D graph gradient measure was derived to demonstrate this effect (shown in supplementary material). A systematic analysis of the location of the highest significance peaks was also performed and can be found in the Supplementary material. Robustness measure Looking at the topological characteristics of the cohorts used, controls have a very tight pial surface convolution pattern, with little sulcal/gyral contrast whereas AD subjects have generalised atrophy and hence have more cerebrospinal fluid inside the sulci, thus showing more definition. The SD cases, presenting a focal pathology, can be said to be in between these cases. The grey/white matter contour follows a similar pattern, although less obvious. From an interpolation point of view, the higher the definition, the more information available and hence the less prone the process is to errors. It is important to highlight that the notion of structural definition put forward in this paper is inversely proportional to the density of high spatial frequencies in the image: in this context, higher definition stands for more prominent features, such as clearly defined sulcal banks, which are associated with lower spatial frequencies—hence less affected by under-sampling according to the Nyquist theorem (Oppenheim et al., 1996). Images of control subjects which have tight sulcal banks, with less CSF to provide a clear contrast, will contain higher spatial frequencies due to those areas as the signal change occurs in a very limited space; this is detrimental for a lower frequency sampling (used in interpolation). In order to quantify the definition of the pial surface and grey matter/white matter interface, a binary brain segmentation was used. Modulated grey matter segments were normalised so as to enforce a [0, 1] range and were then hard thresholded at the mean value of the segment (after expunging zero values). The result was a 3D “shell” including the ventricles and the cortical layer. A border area detection function was used, which counted the number of ‘1’ voxels without similar neighbours in all the six main directions. This value Nb was then divided by the foreground Fg, which is the total number of ‘1’ voxels and thus accounts for thickness as well as ventricle size variations: Fg is a then a “shell size” indicator used to normalise Nb. The value obtained by the method above was then divided by an energy indicator. The grey-level co-occurrence matrix (GLCM) is a matrix used to study contrast and texture properties of 2D images: each entry ij of the matrix acts as a 2D coordinate that translates how often a pixel value i occurs horizontally adjacent to a pixel value j. For the purposes of this analysis, the scan was rotated so that the direction of GLCM analysis in each slice (as it is a 2D metric) was parallel to the interpolated dimension (coronal

J.M.S. Pereira et al. / NeuroImage 40 (2008) 1711–1717 Table 1 Root–mean–square errors between original and interpolated scans' grey matter segments using different smoothing kernels Validation of interpolation Comparison

No smoothing

6-mm FWHM

12-mm FWHM

20-mm FWHM

1-mm isotropic versus 1.5 mm original 1-mm isotropic versus 1.5-mm sinc 1-mm isotropic versus 1.5-mm trilinear 1.5-mm original versus 1.5-mm sinc 1.5-mm original versus 1.5-mm trilinear

0.0321

0.0085

0.0048

0.0031

0.0294

0.0067

0.0036

0.0025

0.0342

0.0105

0.0064

0.0046

0.0312

0.0081

0.0046

0.0031

0.0337

0.0108

0.0069

0.0052

axis). The GLCM energy was extracted for each slice and then averaged in order to obtain the final value. Therefore, the final energy value for each scan S with K slices is given by: P P E ðS Þ ¼

k

i;j

K

pði; jÞ

2

ð2Þ

in which p(i, j) is the normalised GLCM ij entry. These calculations were performed using the graycomatrix and graycoprops functions from Matlab7 (Mathworks Inc., Natick, MA, USA), with 100 grey levels. This result is a measure of interpolation robustness (IR) for each scan S:

1713

available for use in the interpolation process. On the other hand, the energy value can be perceived as a voxel-by-voxel noise measure and will thus negatively impact on the interpolation process. Therefore, the higher the IR, the more complex the image is and hence more information will be available for the interpolation to work with, rendering the process more robust. As already discussed, the AD cases presented the image structure with the most easily defined structures, followed by the SD subjects and finally by the controls, so it was expected that the IR values for each group would follow a similar pattern, being the highest for the AD subjects. In order to statistically validate these differences, onetailed t-tests were performed between the mean IR values of each group. The correlation between the IR metric and the rmse (original grey matter segments versus grey matter segments obtained after interpolation) was also obtained with and without smoothing (6 mm, 12 mm and 20 mm). The correlation was extracted taking all rmse and IR values for controls, AD and SD subjects as a whole. Interpolation validation In order to assess the effect of the interpolation method, both 3-T scans were used. The 1-mm isotropic scan was degraded to 1 × 1 × 1.5 mm3 to match the voxel dimension of the other scan using both sinc and trilinear interpolation. Both scans were segmented for grey matter in SPM5 using all default settings and the root–mean– square errors were obtained for the comparisons shown in Table 1. The errors shown include the comparison of the grey matter segments with no smoothing, and with 6-mm, 12-mm and 20-mm FWHM kernels. Results

Nb=Fg IRðS Þ ¼ E ðS Þ

ð3Þ

The rationale is that the numerator is directly proportional to the interfaces’ definition—as it increases, more information will be

Interpolation validation The comparisons shown in Table 1 were performed between modulated grey matter segment images, which have an approx-

Fig. 1. 3D error graphs for both SD and AD studies, with and without the inclusion of the nuisance covariate. The vertical axis represents the quantity defined by multiplying the rmse (between t-map of benchmark and t-map of mixed dimensions study) by the number of resels output by SPM. The horizontal axis indicates the number of degraded (1.8 mm) scans used in each cohort.

1714

J.M.S. Pereira et al. / NeuroImage 40 (2008) 1711–1717

Fig. 2. Glass brains for a study of n = 14 SD cases versus controls with different numbers of interpolated scans. The benchmark studies are the ones on the leftmost column. The statistical threshold is FWE = 0.05 and extent threshold k = 200 for both benchmarks and all cases, except: FDR = 0.003, k = 200 for (⁎) and FDR = 0.03, k = 200 for (⁎⁎). Due to the loss of sensitivity, different statistical thresholds had to be chosen for both these cases. The new thresholds were carefully chosen so as to present similar patterns of atrophy when compared to the benchmark.

imate range of [0, 1], which means that the rmse values can be interpreted in terms of percentage without much error. As expected, the sinc interpolation produces a more similar image to the original (1.5-mm voxel depth) than the trilinear method. Looking at the first three rows (Table 1), it can also be observed that both interpolation methods output an image which differs from the 1-mm isotropic

original approximately as much as the true case (first case). The increase in smoothing kernel size greatly reduces any effects arising from voxel dimensions differences. This is a very crude test, but it proves that the overall gross error introduced by the interpolation is comparable to an actual difference in voxel dimensions.

Fig. 3. Glass brains for a study of n = 14 AD cases versus controls with different kernel dimensions and groups mixes. The statistical threshold is FWE = 0.05 and extent threshold k = 200 for both benchmarks, except: FDR = 0.015, k = 200 for (⁎) and FDR = 0.2, k = 200 for (⁎⁎). The other conclusions hold, but the impact of degrading the controls is now clearly much greater than for the dementia subjects.

J.M.S. Pereira et al. / NeuroImage 40 (2008) 1711–1717 Table 2 Mean interpolation robustness (IR) for the groups under analysis Interpolation robustness Group

IR (mean ± SD)

Controls SD subjects AD subjects

0.6483 ± 0.0330 0.6602 ± 0.0361 0.6927 ± 0.0463

Error graphs and glass brains Fig. 1 illustrates the two 3D rmse graphs for the SD analysis. The two horizontal axes represent the number of subjects degraded to 1.8 mm for each group. A trough was present in the rmse plot indicating that if the two groups were balanced (i.e., the same proportion of subjects from each group had the same voxel dimensions) the output was nearer to the benchmark. In contrast, as the two groups became more imbalanced the results worsened. Examples of the impact of this phenomenon on the glass-brain output are illustrated in Fig. 2. The results with the AD cohort were strikingly different (both for the glass brains, Fig. 3, and for the rmse graphs, Fig. 1), suggesting that simply balancing the proportion of subjects with specific voxel dimensions in each group is not universally applicable. One common feature in the two cohorts was that including a covariate to account for the differences in voxel dimension led to a decrease in statistical sensitivity, especially for severely unbalanced cases. However, it also had the effect of stabilising the behaviour of the AD analysis for the balanced cases in that it produced a trough similar to the one seen for the SD analysis. Tests were also performed in order to verify if the observed results were a consequence of the specific interpolation method and voxel dimension change used. These results are not presented but indicated that the observed impact is not an artefact of the chosen method: both SD and AD rmse presented the same shape, similar axis values and similar glass brain outputs to the ones discussed above. The impact of the smoothing kernel was also evaluated (see Supplementary material), with results showing a flattening of the rmse graph for larger kernels. Finally, it was found that resampling the poorer resolution scans (back) to 1.5 mm prior to the analysis showed little effect. The

1715

scaled rmse for the t-maps generated by the study in which all scans had been down-sampled to 1.8 mm was 1064.4, very similar to the one obtained by analysing the down- then up-interpolated 1.5 mm SD and control scans (1048.8). The errors are similar and this was corroborated by the observed glass brain results (not shown). Robustness measure In order to try to understand the individual effects of the resampling process for each group of scans, the interpolation robustness (IR) was computed and measured for each group (Table 2). Performing one-tailed t-tests, it was observed that the IR values for controls were not significantly different from those of the SD subjects ( p = 0.1846) but were significantly different from those found in the AD subjects ( p = 0.0036). AD subjects were significantly more robust (i.e., higher IR value) than SD subjects ( p = 0.0244). The correlation values between rmse and IR, without smoothing, showed a trend towards significance (R2 = 0.0683, p = 0.0910). This trend becomes significant when smoothing is considered: at 6-mm FWHM, R 2 = 0.2137 ( p = 0.0021); at 12 mm, R 2 = 0.2288 ( p = 0.0014); and at 20 mm, R2 = 0.2166 ( p = 0.0019). All correlations were negative. Examples of linear regression can be seen in Fig. 4, without smoothing and with 12-mm smoothing. Discussion The impact on SPM from varying voxel dimensions was complex in that the outputs behaved differently according to the disease group. The measure of robustness (IR) offers a potential method to predict the consequences of mixing groups with heterogeneous voxel dimensions in situations in which the ground truth is unknown. Only down-interpolation to 1.8 mm was studied in detail, as the reverse up-interpolation back to 1.5 mm showed that there was no advantage in performing this procedure. Due to the constraint of simulating an acquisition at 1.8 mm by down-sampling the original scans, the up-interpolated 1.5-mm scans suffered from a layering of interpolation errors. The test performed with both cohorts up-interpolated back to 1.5 mm presented a t-map (and a glass brain, not shown) very similar to the one obtained by contrasting all of both cohorts down-sampled at 1.8 mm, demonstrating the irreversibility of the process.

Fig. 4. IR versus grey matter segments rmse graphs for all subjects. The rmse was calculated between the segments of the original scan and the segments of the interpolated scan, before smoothing (left) and after smoothing with a 12-mm kernel (right). Linear regressions are shown, together with R2 values.

1716

J.M.S. Pereira et al. / NeuroImage 40 (2008) 1711–1717

The contrast between SD and controls (Fig. 1) may at a first glance suggest that balanced groups yield better performance due to the visible trough around the middle of the graph: balanced group tests presented a lesser deviation from the benchmark study, in which all scans had the original 0.86 × 0.86 × 1.5 mm3 dimensions, although closer inspection reveals a bias towards a higher proportion of low-resolution scans of diseased subjects. For example, in the analysis where there were 13 low-resolution SD subject scans, the most similar analysis to the benchmark study requires 7 low-resolution control scans (and thus 7 control scans at 1.5-mm slice thickness). The resultant glass brain, using the same statistical thresholds, is strikingly more similar to the benchmark than the balanced case (shown in Supplementary material, Fig. S1). This more detailed analysis confirmed that a balanced case does not always minimise the error. The VBM analysis contrasting the AD cohort with controls showed a clearly different behaviour, visible in a ramp-shaped graph (Fig. 1), illustrating that the lowest rmse values occur when very few control scans are at a low resolution (i.e., interpolated). Down-sampling of the AD subjects had little effect on the form of the SPM. This is in marked contrast to the behaviour in the SD versus control analyses. An analysis of the peak location consistency (results shown in the Supplementary material) found similarly different behaviour between the AD and SD analyses. For the SD analyses, the peak location relative to the benchmark study was always less than the FWHM of the smoothing kernel (see Table S2). Thus the error is less than the spatial accuracy of the analysis. The same was not true of the AD analyses (Table S3)—the error was below the smoothing kernel except for the case where the majority of the controls were at a lower resolution. Interestingly, adding a nuisance covariate made the location of the peak less stable for the SD analyses but more stable for the AD analyses. In fact, given that the differences in IR between AD and controls make the AD study less robust than the SD study, the nuisance covariate was more helpful for the unstable AD case. It is important to add that the IR values have to be considered in pairs for both groups being contrasted: it is the difference in IR that defines the stability of the study, not the absolute values. Moreover, although there was a slight shift for the SD study, it was not detrimental for any conclusions. The comparison between the SD study and the AD study in Fig. 1 indicates that the impact of down-sampling control scans is much worse than for AD scans, and slightly worse than for SD scans: the rmse was more adversely sensitive to down-sampled control scans than to down-sampled patient scans. The IR metric addresses this phenomenon by trying to predict how different from the original a lower resolution scan will be: the lower the IR, the greater the difference between original and interpolated scan. Table 1 is a summary of the rmse graphs: AD subjects were the most robust compared to both SD and controls, as they contained better image definition. The image definition of the SD subjects was slightly higher than the controls, but the difference was not statistically significant. The IR points towards a pattern which is consistent with the theoretical hypothesis: the more similar the populations in terms of image definition, the more the populations react equally to the voxel change and therefore the more stable the final output. Fig. 4 reinforces this conclusion, especially when smoothing is applied: smoothing plays an important role in the shown correlation as it eliminates background noise and thus emphasises the effects due to interpolation alone. Again, this is illustrated in Fig. 1 (as well as in Figs. 2 and 3): as the difference in definition, measured by the IR,

between SD and controls is negligible, the effects are somewhat cancelled out with the balanced (or close to balanced) scenarios and the 3D error graph has a clear trough. On the other hand, as the IR is much higher for AD subjects than for controls, the result is ramp shaped: moving along the edges of the graph on both axes, it is obvious that an increase in down-sampled AD subjects leads far less error than an increase in down-sampled controls. It must be noted though, that the IR does not account for all possible factors that can influence the SPM; it is just a predictor of stability, stating that better scenarios occur when interpolation is performed in similar IR populations. Finally, when the nuisance covariate was added to the model, the results were improved for the balanced cases – as the scaled rmse graphs now show these are the best case scenario – although sensitivity was diminished (increased false negatives). To visual inspection, the balanced case was comparable to that obtained without inclusion of the nuisance covariate. On the other hand, the use of such a covariate entails a more conservative interpretation of the results for the unbalanced cases which not only showed a subsequent rise in scaled error but also presented obvious visual consequences in terms of extreme false negatives. This effect is not surprising due to the general linear model (GLM): the nuisance covariate becomes completely collinear with the main group effect and, as such, when accounting for the former, the significance of the latter is affected and output results have limited use as consequence. It therefore appears that balanced cases together with a nuisance covariate seem to yield the most optimal results, although a nuisance covariate will not be beneficial in some cases (as exemplified by the AD study). The IR measure offers a possible assessment of whether an analysis can proceed despite this condition not being met: if IR values are significantly different between contrasting groups, then fewer low-resolution scans must be used for the highest IR group. The significance peaks analysis offered supportive evidence in that there were greater shifts for extremely unbalanced cases in which one cohort was too dissimilar (in terms of IR) from the other. This study showed that a universal statement cannot be made when mixing different voxel dimensions; rather the outputs indicated that results are uniquely dependent on the image definition of each given cohort. It also showed that it may be possible to mix scans of differing resolution if scan definition is similar between the test groups. If the experiment is not properly balanced, the relative definition of the images should be considered and if they are not significantly different then an analysis may still be robust. A nuisance covariate can be used if there is a reasonably balanced mix of scans with differing dimensions across the groups. If this is not the case, then a covariate, if used, must be accompanied by the notion that sensitivity is lowered in order to maintain specificity and conclusions must take this into consideration. On the other hand, not including a covariate leads to an increased sensitivity but at the cost of lowering specificity. As such, when performing a VBM analysis on mixed populations (regarding voxel dimensions), a set of possible VBM studies can be performed as an initial gross test: with and without balance being forced and with and without a nuisance covariate. The difference between results will point towards the stability of the study (the more similar the results, the more stable the study). Nonetheless, this is not sufficient to define which study yields the most reliable results. As such, the following systematic workflow is proposed as a reference and can be used alone or following the initial gross test: 1. Compute the IR metric for both cohorts and check for significance in difference.

J.M.S. Pereira et al. / NeuroImage 40 (2008) 1711–1717

2. In case of non-significance, interpolate necessary scans in order to force balance, and perform the analysis with the resulting scans including a nuisance covariate. 3. If 1. yields a significant difference, identify the group with a lower IR (more sensitive) and avoid interpolating those scans. 4. Only interpolate more robust scans to force equilibrium if possible according to the criteria above. As an example, in the present AD group, if we depart from 7 AD subjects with a slice thickness of 1.8 mm and one control subject with a slice thickness of 1.8 mm, since the controls have much lower IR, they should not be interpolated to match the number of AD subjects. In such a situation – when there is a strong imbalance and it is possible that the data will be adversely affected by the interpolation used to force a balance – the study should be performed both without interpolation and without a nuisance covariate. 5. If there are more non-robust scans than robust scans at a different voxel dimension, force equilibrium and perform VBM with a nuisance covariate. The choice of voxel dimensions was motivated by the real situations in the associated research centre. Further analyses were made using other interpolation targets, from 2.1 mm to 1 mm isotropic, and all conclusions were upheld. The proposed workflow is therefore valid for any kinds of dimensions—the reference dimension should nevertheless be the lowest, as the interpolation process is mathematically more reliable if performing a downsampling rather than an up-sampling (of an already sampled dataset). Further work will address the relationship between the metric IR and the minimum acceptable acquisition scan dimension. Acknowledgments We gratefully acknowledge Professor John R. Hodges for identifying patients as well as the participants themselves and their relatives for their continued support with our research. This research was funded by the Medical Research Council (MRC), UK. The main

1717

author would also like to acknowledge the support of his funding body, Fundação para a Ciência e a Tecnologia, Portugal. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.neuroimage.2008.01.031. References Ashburner, J., Friston, K.J., 2005. Unified Segmentation. NeuroImage 15, 839–851. Dale, A.M., Fischl, B., Sereno, M.I., 1999. Cortical surface-based segmentation: I. Segmentation and surface reconstruction. NeuroImage 9, 179–194. Fischl, B., Sereno, M.I., Dale, A.M., 1999. Cortical surface-based segmentation: II. Inflation, flattening, and a surface-based coordinate system. NeuroImage 9, 195–207. Gitelman, D.R., Ashburner, J., Friston, K.J., Tyler, L.K., Price, C.J., 2001. Voxel-based morphometry of herpes simplex encephalitis. NeuroImage 13, 623–631. Good, C.D., Johnsrude, I.S., Ashburner, J., Henson, R.N.A., Friston, K.J., Frackowiak, R.S.J., 2001. A voxel-based morphometric study of ageing in 465 normal adult human brains. NeuroImage 14, 21–36. Lerch, J.P., Pruessner, J.C., Zijdenbos, A., Hampel, H., Teipel, S.J., Evans, A.C., 2005. Focal decline of cortical thickness in Alzheimer's disease identified by computational neuroanatomy. Cereb. Cortex 15, 995–1001. Mechelli, A., Price, C.J., Friston, K.J., Ashburner, J., 2005. Voxel-based morphometry of the human brain: methods and applications. Curr. Med. Imag. Rev. vol. 1, 105–113. Nestor, P.J., Fryer, T.D., Hodges, J., 2006. Declarative memory impairments in Alzheimer's disease and semantic dementia. NeuroImage 30, 1010–1020. Oppenheim, A.V., Willsky, A.S., Nawab, S.H., 1996. Signals and Systems, 2nd ed. Prentice Hall. Thacker, N.A., Jackson, A., Moriarty, D., Vokurka, B., 1998. Renormalised sinc interpolation. MIUA. Leeds, pp. 33–36. Williams, G.B., Nestor, P.J., Hodges, J.R., 2005. Neural correlates of semantic and behavioural deficits in frontotemporal dementia. NeuroImage 24, 1042–1051.