Multivariate analysis of MRI data for Alzheimer's disease, mild cognitive impairment and healthy controls

Multivariate analysis of MRI data for Alzheimer's disease, mild cognitive impairment and healthy controls

NeuroImage 54 (2011) 1178–1187 Contents lists available at ScienceDirect NeuroImage j o u r n a l h o m e p a g e : w w w. e l s e v i e r. c o m / ...

830KB Sizes 0 Downloads 15 Views

NeuroImage 54 (2011) 1178–1187

Contents lists available at ScienceDirect

NeuroImage j o u r n a l h o m e p a g e : w w w. e l s e v i e r. c o m / l o c a t e / y n i m g

Multivariate analysis of MRI data for Alzheimer's disease, mild cognitive impairment and healthy controls Eric Westman a,⁎, Andrew Simmons b,c, Yi Zhang a, J-Sebastian Muehlboeck d, Catherine Tunnard b, Yawu Liu f, Louis Collins e, Alan Evans e, Patrizia Mecocci g, Bruno Vellas h, Magda Tsolaki i, Iwona Kłoszewska j, Hilkka Soininen f, Simon Lovestone b,c, Christian Spenger d, Lars-Olof Wahlund a for the AddNeuroMed consortium a

Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, Stockholm, Sweden King's College London, Institute of Psychiatry, London, UK c NIHR Biomedical Research Centre for Mental Health, London, UK d Department of Clinical Science, Intervention and Technology, Karolinska Institutet, Stockholm, Sweden e McGill University, Montreal, Canada f Department of Neurology, University and University Hospital of Kuopio, Finland g Institute of Gerontology and Geriatrics, University of Perugia, Perugia, Italy h INSERM U 558, University of Toulouse, Toulouse, France i Aristotle University of Thessaloniki, Thessaloniki, Greece j Medical University of Lodz, Lodz, Poland b

a r t i c l e

i n f o

Article history: Received 3 June 2010 Revised 6 August 2010 Accepted 19 August 2010 Available online 25 August 2010 Keywords: MRI Orthogonal partial least squares Alzheimer's disease Mild cognitive impairment MCI

a b s t r a c t We have used multivariate data analysis, more specifically orthogonal partial least squares to latent structures (OPLS) analysis, to discriminate between Alzheimer's disease (AD), mild cognitive impairment (MCI) and elderly control subjects combining both regional and global magnetic resonance imaging (MRI) volumetric measures. In this study, 117 AD patients, 122 MCI patients and 112 control subjects (from the AddNeuroMed study) were included. High-resolution sagittal 3D MP-RAGE datasets were acquired from each subject. Automated regional segmentation and manual outlining of the hippocampus were performed for each image. Altogether this yielded volumes of 24 different anatomically defined structures which were used for OPLS analysis. 17 randomly selected AD patients, 12 randomly selected control subjects and the 22 MCI subjects who converted to AD at 1-year follow up were excluded from the initial OPLS analysis to provide a small external test set for model validation. Comparing AD with controls we found a sensitivity of 87% and a specificity of 90% using hippocampal measures alone. Combining both global and regional measures resulted in a sensitivity of 90% and a specificity of 94%. This increase in sensitivity and specificity resulted in an increase of the positive likelihood ratio from 9 to 15. From the external test set, the model predicted 82% of the AD patients and 83% of the control subjects correctly. Finally, 73% of the MCI subjects which converted to AD at 1 year follow-up were shown to resemble AD patients more closely than controls. This method shows potential for distinguishing between different patient groups. Combining the different MRI measures together resulted in a significantly better classification than using them separately. OPLS also shows potential for predicting conversion from MCI to AD. © 2010 Elsevier Inc. All rights reserved.

Introduction Multivariate data analysis provides the opportunity to analyze many variables simultaneously and observe inherent patterns in the data. By doing so it is possible to separate groups, determine which factors cause the separation and make predictive models of disease. Methods like principal component analysis (PCA) (Pearson, 1901), ⁎ Corresponding author. Karolinska Universitetssjukhuset, Novum, Plan 4, 141 86 Stockholm, Sweden. Fax: + 46 8 517 761 11. E-mail address: [email protected] (E. Westman). 1053-8119/$ – see front matter © 2010 Elsevier Inc. All rights reserved. doi:10.1016/j.neuroimage.2010.08.044

partial least square to latent structures (PLS) (Walsh et al., 2000; Wold et al., 1984, 2001) and orthogonal PLS (OPLS) (Johan Trygg, 2002) are efficient, robust and validated tools for modelling complex biological data (Wiklund et al., 2008). Alzheimer's disease (AD) is one of the most common forms of neurodegenerative disorders connected with gradual loss of cognitive functions such as episodic memory. The disease is related to pathological amyloid depositions and hyperphosphorylation of structural proteins in the brain which lead to progressive loss of function, metabolic alterations and structural changes in the brain. The majority of the prevalent AD cases are sporadic with a small

E. Westman et al. / NeuroImage 54 (2011) 1178–1187

percentage of familial AD cases connected to specific gene mutations (Selkoe, 2001). To be able to detect AD in the predementia stage, often referred to as mild cognitive impairment (MCI), is of great importance to ensure that therapies are targeted at the correct patient population. The NINCDS–ADRDA criterion (McKhann et al., 1984) is still the standard for making the diagnosis of AD, while Dubois and coworkers have suggested a revision of this criterion (Dubois et al., 2007). This new criterion is still centered on a clinical core of early and significant episodic memory impairment, but also includes at least one, but preferably more than one abnormal biomarker from MRI, PET and CSF. Magnetic resonance imaging (MRI) has been widely used for early detection and diagnosis of AD (O'Brien, 2007; Ries et al., 2008). The staging of AD atrophy has been described by Braak and Braak (1991). Atrophy typically starts in the medial temporal and limbic areas, subsequently spreading to parietal association areas and finally to frontal and primary cortices. Early changes in hippocampus and entorhinal cortex have been demonstrated with the help of MRI and these changes are consistent with the underlying pathology of AD, but it is not yet clear which structures are most useful for early diagnosis (O'Brien, 2007). Many studies have used manual delineation (segmentation) of hippocampus in MR images (Fox et al., 1996; Jack et al., 1992, 1997; Juottonen et al., 1999; Killiany et al., 1993; Laakso et al., 1998; Lehericy et al., 1994). These studies have demonstrated a high accuracy in distinguishing between AD patients and healthy controls. Automatic methods of measuring hippocampal volumes have also been applied (Colliot et al., 2008; Morra et al., 2008) with similar results. Entorhinal cortex measures have additionally been used to discriminate between subjects with AD and controls (Jack et al., 1997; Juottonen et al., 1999; Xu et al., 2000). Hippocampal volumes and entorhinal cortex measures have been found to be equally accurate in distinguishing between AD and normal cognitive elderly subjects (Kantarci, 2005). It is likely that a single structure such as the hippocampus or entorhinal cortex is not sufficient to distinguish between subjects with AD, those with MCI and controls and that a combination of different structures may prove to be more useful for early detection of MCI and AD. Therefore efficient multivariate tools are needed to analyze the complex information obtained from different regions of the brain. Previous studies have utilized different techniques such as principal components analysis (PCA), linear discriminant analysis (LDA) and support vector machines (SVM) to analyze both MRI data (Fan et al., 2008, 2005; Kloppel et al., 2008a,b; McEvoy et al., 2009; Plant et al., 2009; Vemuri et al., 2008; Westman et al., 2009, 2007) and single photon emission computed tomography (SPECT) data (Chaves et al., 2009; López et al., 2009). We have used orthogonal partial least squares to latent structures (OPLS) for multivariate data analysis in this study. Our aim was to combine manual hippocampal volume measurements with automated regional and global volume measures to distinguish between subjects with early AD, MCI and controls. We also aimed to investigate whether the combination of both global and regional measures would more accurately classify the three groups than using hippocampal measures alone. Our hypothesis was that the combination of all MRI measures would more accurately distinguish between patients with early Alzheimer's disease, subjects with mild cognitive impairment and healthy controls. Finally, we wanted to test how well the model would predict conversion from MCI to AD, by assessing whether MCI subjects would be classified as more similar to AD patients or controls. Methods Study data and inclusion and diagnostic criteria All patients originated from the AddNeuroMed project, part of InnoMed (Innovative Medicines in Europe), a European Union program designed to make drug discovery more efficient. The project is designed to develop and validate novel surrogate markers in Alzheimer's disease (AD) and includes a human neuroimaging strand

1179

(Simmons et al., 2009, 2010) which combines magnetic resonance imaging (MRI) data with other biomarkers and clinical data. Data were collected from six different sites across Europe; University of Kuopio, Finland, University of Perugia, Italy, Aristotle University of Thessaloniki, Greece, King's College London, United Kingdom, University of Lodz, Poland and University of Toulouse, France. MRI images from a total of 351 subjects were included in this study; 117 AD patients, 122 MCI patients and 112 healthy age-matched controls. Twenty-two of the MCI subjects had converted to AD at 1 year followup. For OPLS modelling we wanted to acquire equally sized groups. We also needed a test set to validate the models created. Therefore the 22 MCI subjects, who converted after 1 year follow-up along with seventeen AD subjects and twelve controls were removed from the original data set. We hypothesised that the MCI subjects who converted to AD would be characterised as having MRI scans that were more “AD-like” than “control-like” at baseline. Table 1 gives the demographics of the study cohort, both the training set and the test set. All AD and MCI subjects were recruited from the local memory clinics of one of the six participating sites while the control subjects were recruited from non-related members of the patient's families, caregiver's relatives or social centres for the elderly. Informed consent was obtained where the research participant had capacity, and in those cases where dementia compromised capacity then assent from the patient and consent from a relative, according to local law and process, was obtained. This study was approved by ethical review boards in each participating country. The inclusion and exclusion criteria were as follows. Alzheimer's disease Inclusion criteria: (1) ADRDA/NINCDS and DSM- IV criteria for probable Alzheimer's disease. (2) Mini Mental State Examination score range between 12 and 28. (3) Age 65 years or above. Exclusion criteria: (1) Significant neurological or psychiatric illness other than Alzheimer's disease. (2) Significant unstable systematic illness or organ failure. Mild cognitive impairment and controls Inclusion criteria: (1) Mini Mental State Examination score range between 24 and 30. (2) Geriatric Depression Scale score less than or equal to 5. (3) Age 65 years or above. (4) Medication stable. (5) Good general health. Exclusion criteria: (1) Meet the DSM- IV criteria for Dementia. (2) Significant neurological or psychiatric illness other than Alzheimer's disease. (3) Significant unstable systematic illness or organ failure. The distinction between MCI and controls was based on two criteria: (1) subject scores 0 on Clinical Dementia Rating Scale = control. (2) Subject scores 0.5 on Clinical Dementia Rating scale = MCI. For the MCI subjects it was preferable that the subject and

Table 1 Subject characteristics. Training set

Number Gender (female/male) Age MMSE ADAS1 CDRa Years of education

Test set

AD

MCI

CTL

AD

MCI*

CTL

100 65/35 76 ± 6 21 ± 5 7±1 1.2 ± 0.5 8±4

100 53/47 75 ± 5 27 ± 3 5±1 0.5 9±4

100 55/45 73 ± 6 29 ± 1 4±2 0 11 ± 5

17 9/8 74 ± 7 20 ± 4 6±1 1.1 ± 0.4 7±3

22 8/14 73 ± 6 26 ± 2 5±1 0.5 9±4

12 7/5 74 ± 9 28 ± 2 4±1 0 12 ± 3

Data are represented as mean ± standard deviation. AD = Alzheimer's disease, MCI = Mild Cognitive Impairment, CTL = healthy control, MMSE = Mini Mental State Examination, ADAS1 = Word list non-learning (mean), CDR = Clinical Dementia Rating, a Number of AD subjects with CDR 0.5 = 12, CDR 1 = 79, CDR 2 = 26, ⁎MCI subjects which converted after 1-year follow up.

1180

E. Westman et al. / NeuroImage 54 (2011) 1178–1187

informant reported occurrence of memory problems. All AD subjects had a CDR score of 0.5 or above. CDR, Mini-Mental State, and CERAD Cognitive Battery were assessed for each subject. The CERAD Cognitive Battery was replaced with the Alzheimer's Disease Assessment Scale (ADAS–Cog) for the AD subjects. This cognitive test battery is specially designed for AD trials (Rosen et al., 1984). Both the ADAS-cog and the CERAD battery use the same 10-word recall task. The only difference is that the scoring is inverted. The mean number of words not recalled in the CERAD word list immediate recall task was calculated. The variable obtained was named ADAS1, corresponding to the first subtest of ADAS-Cog. This was performed to obtain comparable measures between groups. MRI Data acquisition took place using six different 1.5 T MR systems (4 General Electric, 1 Siemens and 1 Picker) and was designed according to the Alzheimer Disease Neuroimaging Initiative (ADNI) (Jack et al., 2008). At each site a quadrature birdcage coil was used for RF transmission and reception. Following a three plane localizer, a high resolution T1-weighted sagittal 3D MP-RAGE volume and a dual echo fast spin echo dataset providing proton density and T2-weighted images were acquired. Full brain and skull coverage was required for both of the latter datasets and detailed quality control carried out on all MR images (Simmons et al., 2010).

Matching and Anatomical Labeling) which deforms the T1-weighted MP-RAGE volume to match a previously labelled MRI volume. Anatomical labels are defined in the new volume by interpolation from the original labels, via a 3D deformation field (Collins et al., 1994). Altogether, the pipeline produced 39 different anatomical regional volumes. Where left and right hemispheric volumes were produced (such as left and right putamen) these were summed to produce a single volume. This resulted in 23 different automatically generated anatomical regions which are detailed in Table 2. All volumetric measures from each subject were normalized by the subject's intracranial volume. Manual segmentation of hippocampus Manual measurements of hippocampal volume were performed on a HERMES workstation (Nuclear Diagnostics, Stockholm, Sweden). Each measurement was performed with constant parameters by a neuroradiologist who was blinded to clinical information. A ROI tool was used within the HERMES Multimodality software package, to manually delineate the hippocampal formation using previously defined anatomical landmarks. Intra-rater reliability of the measurements was tested in 15 randomly selected subjects by repeated measurements with an interval of 1 month. The intra class correlation coefficients (ICC) of the measurements were N0.93. The total hippocampal volume from each subject was normalized by the subjects' intracranial volume.

Tissue classification and regional volume segmentation Multivariate data analysis We utilized a pipeline developed at Montreal Neurological Institute consisted of image intensity non-uniformity correction, segmentation of brain tissue and regional brain parcellation (Fig. 1). Initially data were corrected for intensity non-uniformity using the N3 algorithm (Sled et al., 1998). The images were subsequently segmented into gray matter, white matter, CSF and lesion subtype using an artificial neural network classifier termed INSECT (Intensity-Normalized Stereotaxic Environment for Classification of Tissues) (Cocosco et al., 2003; Zijdenbos et al., 2002). Regional parcellation of the brain was then achieved using the multi-scale analysis ANIMAL technique (Automated Non-linear Image

MRI measures were analyzed using orthogonal partial least squares to latent structures (OPLS) (Johan Trygg, 2002; Wiklund et al., 2008), a supervised multivariate data analysis method included in the software package SIMCA (Umetrics AB, Umea, Sweden). The advantage of OPLS compared to PLS is that the model created to compare groups is rotated. This means that the information related to class separation is found in the first component of the model, the predictive component. The other orthogonal components in the model, if any, relate to variation in the data not connected to class separation. Focusing the information related

(A)

(B)

(C)

Fig. 1. Tissue classification and segmentation. (A) Original T1-weighted MR volume images. (B) Image classification into gray matter, white matter and CSF. (C) Automated regional segmentation.

E. Westman et al. / NeuroImage 54 (2011) 1178–1187

1181

Table 2 Volumes ± standard deviation. Training set

Test set

AD Hippocampus Total CSF Total GM Total WM 3rd ventricle 4th ventricle Brainstem Caudate Cerebellum Extracerebral CSF Fornix Frontal GM Frontal WM Globus pallidus Lateral ventricle Occipital GM Occipital WM Parietal GM Parietal WM Putamen Subthalamic nucleus Temporal GM Temporal WM Thalamus

+

3.2 ± 0.6* 179.6 ± 37.4*+ 459.9 ± 24.5*+ 360.5 ± 32.8 3.2 ± 0.7*+ 1.9 ± 0.6 24.4 ± 2.3 9.3 ± 1.2 94.1 ± 7.8 136.0 ± 28.7*+ 0.6 ± 0.1*+ 146.6 ± 8.9* 131.4 ± 14.7 1.5 ± 0.2 38.5 ± 15.3*+ 40.2 ± 4.9* 29.7 ± 4.0 71.6 ± 5.3*+ 73.3 ± 8.4 7.3 ± 1.0 0.1 ± 0.01 99.3 ± 9.2*+ 61.6 ± 6.2*+ 10.0 ± 1.0*

MCI

CTL

AD

MCI

CTL

3.9 ± 0.6* 150.4 ± 32.1 477.6 ± 23.2 372.0 ± 26.5 2.6 ± 0.6 1.8 ± 0,7 24.5 ± 2.0 9.2 ± 1.3 93.2 ± 8.8 116.0 ± 24.6 0.6 ± 0.1* 150.1 ± 8.9 134.0 ± 11.5 1.5 ± 0.2 30.0 ± 13.2 42.0 ± 3.5 31.1 ± 3.6 75.3 ± 6.1 76.4 ± 7.4 7.4 ± 1.0 0.1 ± 0.01 107.9 ± 7.3* 65.9 ± 6.0 10.4 ± 0.9

4.6 ± 0.5 142.1 ± 31.3 486.5 ± 20.8 371.3 ± 27.5 2.5 ± 0.7 1.9 ± 0.7 24.7 ± 1.9 8.9 ± 1,1 93.4 ± 7.3 111.0 ± 22.3 0.7 ± 0.1 152.5 ± 8.1 134.6 ± 12.0 1.5 ± 0.2 26.8 ± 13.4 42.7 ± 4.2 30.9 ± 3.5 76.0 ± 5.2 75.8 ± 7.2 7.3 ± 0.7 0.1 ± 0.01 112.6 ± 6.4 65.5 ± 5.3 10.5 ± 0.7

3.2 ± 0.6 188.4 ± 34.7 450.9 ± 30.1 360.8 ± 27.3 3.0 ± 0.5 1.7 ± 0.6 25.0 ± 2.0 8.9 ± 1.2 97.2 ± 10.9 140.7 ± 23.6 0.6 ± 0.1 150.6 ± 35.2 129.9 ± 11.1 1.5 ± 0.3 42.9 ± 17.7 39.0 ± 5.6 29.1 ± 4.6 69.0 ± 8.1 72.2 ± 8.0 7.1 ± 0.9 0.1 ± 0.01 97.7 ± 10.0 62.1 ± 5.9 9.8 ± 1.0

3.4 ± 0.7 164.4 ± 38.0 469.0 ± 25.6 366.6 ± 28.2 3.0 ± 1.2 1.6 ± 0.5 24.6 ± 1.7 8.3 ± 0.9 93.2 ± 6.5 128.4 ± 29.3 0.6 ± 0.2 152.2 ± 10.8 134.4 ± 12.0 1.4 ± 0.2 31.4 ± 10.7 40.8 ± 4.8 29.2 ± 3.8 74.4 ± 5.3 75.2 ± 6.3 6.8 ± 0.7 0.1 ± 0.01 102.4 ± 9.2 62.5 ± 6.2 10.1 ± 1.0

4.4 ± 0.5 136.7 ± 35.3 482.6 ± 22.9 380.8 ± 27.6 2.5 ± 0.9 1.6 ± 0.4 24.7 ± 1.9 8.8 ± 0.9 92.8 ± 7.7 107.7 ± 27.2 0.7 ± 0.2 153.5 ± 7.3 138.3 ± 11.8 1.5 ± 0,2 24.9 ± 10.5 41.4 ± 3.9 31.0 ± 2.4 74.8 ± 4.4 77.4 ± 8.7 7.2 ± 0.9 0.1 ± 0.01 111.6 ± 6.9 68.3 ± 5.4 10.7 ± 0.6

AD = Alzheimer's disease. MCI = mild cognitive impairment, CTL = healthy control, CSF = cerebrospinal fluid, GM = gray matter, WM = white matter. Numbers are given in cubic cm. Group comparisons for individual regions were performed using the two-sample t-test with Bonferroni correction (performed for training set only). Significance level, p b 0.05. * = significant compared to CTL and +=significant compared to MCI.

to class separation on the first component makes data interpretation easier (Wiklund et al., 2008). Pre-processing was performed using mean centring and unit variance scaling. Mean centring improves the interpretability of the data, by subtracting the variable average from the data. By doing so the data set is repositioned around the origin. Large variance variables are more likely to be expressed in modeling than low variance variables. Consequently, unit variance scaling was selected to scale the data appropriately. This scaling method calculates the standard deviation of each variable. The inverse standard deviation is used as a scaling weight for each MR-measure. The results from the OPLS analysis are visualized in a scatter plot by plotting the predictive component, which contains the information related to class separation. Components are vectors, which are linear combinations of partial vectors and are dominated by the input variables (x). The first and second components are by definition orthogonal to each other and span the projection plane of the points. Each point in the scatter plot represents one individual subject. The predictive component receives a Q2(Y) value that describes its statistical significance for separating groups. Q2(Y) values N 0.05 are regarded as statistically significant and a model with a Q2(Y) valueN 0.5 are regarded as good (Eriksson et al., 2006). 2

Q ðY Þ = 1−PRESS = SSY PRESS (predictive residual sum of squares) = Σ(yactual − ypredicted)2 and SSY is the total variation of the Y matrix after scaling and mean centring (Eriksson et al., 2006). Q2(Y) is the fraction of the total variation of the Ys (expected class values) that can be predicted by a component according to cross validation (CV). Cross validation is a statistical method for validating a predictive model which involves building a number of parallel models. These models differ from each other by leaving out a part of the data set each time. The data omitted are then predicted by the respective model. In this study we used seven fold cross-validation, which means that 1/7th of the data is omitted for each cross-validation round. Data are omitted once and only once. Variables were plotted according to their importance for

the separation of groups. The plot shows the MRI measures and their corresponding jack-knifed confidence intervals. Jack-knifing is used to estimate the bias and standard error. Measures with confidence intervals that include zero have low reliability (Wiklund et al., 2008). Covariance is plotted on the y-axis. T

Covðt; Xi Þ = t Xi = ðN−1Þ Where t is the transpose of the score vector t in the OPLS model, i is the centered variable in the data matrix X and N is the number of variables (Wiklund et al., 2008). A measure with high covariance is more likely to have an impact on group separation than a variable with low covariance. Altogether 24 variables were used for OPLS analysis (23 automated measures and the manual hippocampal measure). The groups AD vs. controls, AD vs. MCI and MCI vs. Controls were compared. Three different models were created for each group comparison (1) hippocampal measures, (2) the automated volumetric measures and (3) a hierarchical model containing models (1) and (2). Models were also created containing age and education (using them as model input variables along with all the volumetric measurements) to test if these measures had any significant influence on the predictability of the different models. We also investigated whether age and education would increase the predictive power of the models along with all of the volumetric MRI measures. Finally, a model for each group comparison was created combining all the original variables, to visualize their importance for the separation of groups. A model containing all the three different classes was not created, since this would complicate visualization and biological interpretation. This is due to the fact that a reference point will be created when a model is calculated using all of the classes simultaneously. This point will be a mixture of all class data and the graphs and plots will be anchored around this artificial point (Wiklund et al., 2008). Sensitivity, specificity, positive likelihood ratios (LR+ = sensitivity/ (100− specificity)) and their corresponding confidence intervals (Frost and Kallis, 2009) were calculated for each of the models using the cross-

1182

E. Westman et al. / NeuroImage 54 (2011) 1178–1187

among controls. The MCI group had scores between the AD and the control groups.

validated prediction values. The prediction value for a subject belonging to a group is equal to 1 for maximum likelihood, and 0 for minimum likelihood. The cut-off value was set half way between the maximum and minimum likelihoods at 0.5 for all models. A likelihood ratio between 5 and 10 increases the diagnostic value in a moderate way, while a value above 10 significantly increases the diagnostic value of the test (Qizilbash et al., 2002). All the individual variables for the OPLS training set were also tested using the two sample t-test to allow direct comparison with the multivariate approach. The p-values were adjusted for multiple comparisons with Bonferroni correction with a significance threshold of p b 0.05. To validate the model, a test set including 17 AD subjects, 12 controls and 22 MCI subjects was tested on the AD vs. controls model. The AD subjects and the control subjects were randomly selected, with the only requirement that there were an even distribution from each participating site. The MCI group in the test set all converted to AD at 1-year follow-up. We hypothesised that those MCI subjects who converted to AD at follow up would have MRI scans that were more “AD-like” than “control-like” at baseline.

OPLS modelling and quality Three models were created using (1) manual measures of hippocampal volume, (2) automated global and regional volume measures and (3) the combination of (1) and (2) for AD vs. controls, AD vs. MCI and MCI vs. controls. The AD vs. controls model using the combination of both manual and automated volumes accounted for 86% of the variance of the original data (R2(X)) and its' cross validated predictability, Q2(Y) = 64%. The second model (AD vs. MCI) resulted in one predictive component with R2(X) = 80% and cross validated predictability Q2(Y) = 35%. Finally the comparison between MCI and controls yielded one predictive component with R2(X) = 73% and cross validated predictability Q2(Y) = 22%.

Cross validated scatter plots Results

The separation between groups and the predictive power of the models Q2(Y) can be seen in Figs. 2A–4A. Fig. 2A shows a distinct separation between AD and controls. This model which uses manual outlining of hippocampus combined with regional and global automated volume measures resulted in a sensitivity of 90% and a specificity of 94%. As can be observed in Fig. 2A, 10 out of 100 AD subjects and 6 out of 100 of the control subjects were misclassified. Individually, manual hippocampal measures yielded a sensitivity of 87% and a specificity of 90% and the automated volume measures gave a sensitivity of 81% and a specificity of 82% (Table 3 shows the sensitivity, specificity and likelihood ratio for the different MRI measures and the group comparisons). The best predictive result was obtained when combining all of the measures.

Subject cohort 351 subjects were included in this study: 117 AD patients, 122 MCI patients and 112 controls. Using age and education as x-variables in the OPLS models did not have any effect on the predicative power of the models separating the groups when all image variables were included. Therefore, age and education were excluded from further analysis. All MRI volumetric measures were normalised by dividing by each subject's intracranial volume. As expected, performance on the MMSE, CDR and ADAS1 was poorest among AD patients and best

predictive component (tcv)

(A) Q2(Y) =0.64 AD CTL

2,0 1,0

CUT OFF

0,0 -1,0 -2,0 -3,0

Total CSF

3rd ventricle

Extracerebral CSF

Caudate

Lateral ventricle

Cerebellum

Globus pallidus

Putamen

Brainstem

4th ventricle

Frontal WM

Subthalamic nucleus

Total WM

Parietal WM

Occipital WM

Occipital GM

Thalamus

Frontal GM

Parietal GM

Temporal WM

Fornix

Total GM

Temporal GM

0,40 0,30 0,20 0,10 -0,00 -0,10 -0,20 -0,30 Hippocampus

cov(tcv,X)

(B)

Fig. 2. OPLS cross validated scatter plot and MRI measures of importance for separation between the groups AD and control. (A) The scatter plot visualizes group separation and the predictability of the AD vs. control model. Each red square represents an AD subject and each black circle a control subject. Control subjects below zero and AD subjects above zero are falsely predicted. (B) MRI measures of importance for the separation of AD and controls. MRI measures above zero have a larger value in controls compared to AD and MRI measures below zero have a lower value in controls compared to AD. An MRI measure with high covariance is more likely to have an impact on group separation than a variable with low covariance. MRI measures with jack knifed confidence intervals that include zero have low reliability.

E. Westman et al. / NeuroImage 54 (2011) 1178–1187

1183

(A) Q2(Y) =0.35 predictive component (tcv)

3,0

AD MCI

2,0 1,0

CUT OFF

0,0 -1,0 -2,0 -3,0

Total CSF resulting

3rd ventricle

Lateral ventricle

Extracerebral CSF

Brainstem

Cerebellum

Caudate

4th ventricle

Subthalamic nucleus

Frontal WM

Globus pallidus

Total WM

Putamen

Parietal WM

Occipital WM

Temporal WM

Thalamus

Occipital GM

Fornix

Frontal GM

Parietal GM

Hippocampus

Total GM

0,4 0,2 -0,0 -0,2 -0,4 Temporal GM

cov(tcv,X)

(B)

Fig. 3. OPLS cross validated scatter plot and MRI measures of importance for separation between the groups AD and MCI. (A) The scatter plot visualizes group separation and the predictability of the AD vs. MCI model. Each red square represents an AD subject and each grey circle a MCI subject. MCI subjects below zero and AD subjects above zero are falsely predicted. (B) MRI measures of importance for the separation of AD and MCI. MRI measures above zero have a larger value in MCI compared to AD and MRI measures below zero have a lower value in MCI compared to AD. An MRI measure with high covariance is more likely to have an impact on group separation than a MRI measure with low covariance. MRI measures with jack knifed confidence intervals that include zero have low reliability.

(A) Q2(Y) =0.22 predictive component (tcv)

3,0

CTL MCI

2,0 1,0

CUT OFF

0,0 -1,0 -2,0 -3,0

3rd ventricle

Total WM

Caudate

Total CSF

Parietal WM

Extracerebral CSF

Frontal WM

Occipital WM

Putamen

Globus pallidus

Temporal WM

Lateral ventricle

Brainstem

Subthalamic nucleus

Cerebellum

4th ventricle

Thalamus

Parietal GM

Occipital GM

Fornix

Frontal GM

Total GM

Hippocampus

0,6 0,4 0,2 -0,0 -0,2 -0,4 Temporal GM

cov(tcv,X)

(B)

Fig. 4. OPLS cross validated scatter plot and MRI measures of importance for separation between the groups MCI and control. (A) The scatter plot visualizes group separation and the predictability of the MCI vs. control model. Each grey circle represents a MCI subjects and each black square a control subject. Control subjects above zero and MCI subjects below zero are falsely predicted. (B) MRI measures of importance for the separation of MCI and controls. MRI measures above zero have a larger value in controls compared to MCI and MRI measures below zero have a lower value in controls compared to MCI. An MRI measure with high covariance is more likely to have an impact on group separation than a MRI measure with low covariance. MRI measures with jack knifed confidence intervals that include zero have low reliability.

1184

E. Westman et al. / NeuroImage 54 (2011) 1178–1187

Table 3 Sensitivity/specificity and likelihood ratio for different MRI measures (training set). AD vs. CTL

Combined Hippocampus MNI volumes

AD vs. MCI

MCI vs. CTL

Sen. (CI)

Spec. (CI)

LR+ (CI)

Sen. (CI)

Spec. (CI)

LR+ (CI)

Sen. (CI)

Spec. (CI)

LR+ (CI)

90% (83–94) 87% (79–92) 81% (72–88)

94% (88–97) 90% (83–94) 82% (73–88)

15 (7–33) 9 (5–16) 5 (3–7)

75% (66–82) 71% (61–79) 73% (64–81)

79% (70–97) 73% (64–81) 70% (60–78)

4 (2–5) 3 (2–4) 2 (2–3)

66% (56–75) 69% (59–77) 58% (48–67)

73% (64–81) 73% (64–81) 59% (49–68)

2 (2–3) 3 (2–4) 1 (1–2)

AD = Alzheimer's disease, MCI = mild cognitive impairment, CTL = healthy control, Sen = sensitivity, Spec = specificity, LR+=positive likelihood ratio and CI = 95% confidence interval.

Variables of importance for separation between groups Figs. 2B–4B illustrate the variables in the models comparing AD vs. controls, AD vs. MCI and MCI vs. controls respectively. Table 2 gives the mean volumes and the Bonferroni corrected p-values for the individual variables for the training set. As expected, the greatest number of brain regions showing significant differences between groups is observed in the model discriminating between AD and controls. Hippocampus, fornix, temporal lobe grey matter and CSF measures showed the highest discriminating power along with parietal, frontal, occipital gray matter and temporal white matter. The brain structures that best differentiated between AD and MCI were the same as those that best differentiated between AD and controls. However, regional volumes of temporal gray matter had higher discriminating power than hippocampal measures for this model. Several other structures are also of great importance for the separation: fornix, regional volumes of parietal, frontal, occipital gray matter, temporal white matter and CSF. Medial temporal structures are of greatest importance for the separation between MCI and controls, including manual measures of hippocampus and temporal lobe gray matter. Model validation with external test set 17 AD subjects, 12 healthy controls and 22 MCI converters were tested against the AD vs. controls OPLS model. Fig. 5 illustrates how these subjects in the test set relate to the AD vs. controls model. 14 out of the 17 AD subjects were classified as AD and 10 out of the 12 controls were classified as controls. This resulted in a sensitivity of 82% and specificity of 83%. Out of the 22 MCI subjects who converted to AD 1 year after the MR-scan, 16 were identified as most closely resembling AD (73%).

studies have used hippocampal measures or entorhinal cortex for classification between AD and controls with a high degree of accuracy (80%–90%) (Fox et al., 1996; Jack et al., 1992, 1997; Juottonen et al., 1999; Killiany et al., 1993; Laakso et al., 1996, 1998; Lehericy et al., 1994; Seab et al., 1988). Other prior studies have shown up to 100% accuracy when discriminating between AD and controls (Callen et al., 2001; Fan et al., 2008; Juottonen et al., 1999; Killiany et al., 2002; Lerch et al., 2008). There are a number of reasons why classification accuracy may vary between studies. These include the discriminative features used, the degree of automation of the image segmentation or registration methods, stability of analysis tools and the robustness of the validation approaches. Results from approaches developed using small sample sizes (Callen et al., 2001; Fan et al., 2008; Juottonen et al., 1999; Killiany et al., 2002; Lerch et al., 2008) may not accurately reflect the results that would be obtained with a larger sample. More severely impaired AD groups will generally demonstrate larger structural differences between AD and control groups which are then reflected in higher discrimination accuracy between the groups (Callen et al., 2001; Juottonen et al., 1999; Lerch et al., 2008). For example Kloppel et al. (2008) applied a linear SVM approach to whole brain gray matter for three sets of AD patients and controls. The patients in two groups had lower mean MMSE scores representing more advanced AD (16.7 and 16.1 respectively) while the third group of AD patients had more mild AD (mean MMSE of 23.5). The accuracy, sensitivity and specificity of classification of the first two groups was (95.0%, 95.0%, 95.0%) and (92.9%, 100%, 85.7%), but just (81.1%, 60.6%, 93.0%) for the third group with a sensitivity of 60.6%. Finally it is important to use fully cross-validated results (Callen et al., 2001; Fan et al., 2008; Juottonen et al., 1999; Killiany et al., 2002) since if crossvalidation is not used then an optimistic bias in classification accuracy can be created. We investigated whether manual measures of hippocampus and automated volume measures together or separately would most accurately classify the three patient groups (AD, MCI and controls). Our results in Table 3 show that when comparing AD vs. controls, manual hippocampus measures had the best predictive power alone 4,0 3,0

AD CTL MCI

2,0 1,0

t

The separation between AD and MCI can be observed in Fig. 3A and is a little less distinct than for AD and controls. The best separation between AD and MCI was again obtained when combining all MRI measures which resulted in a sensitivity of 75% and a specificity of 79%. Out of the groups of MRI measures, the manual hippocampal volumes gave almost the same predictive result as the automated volumetric measures. Fig. 4A illustrates the separation between MCI and controls. A sensitivity of 66% and specificity of 73% were obtained when combining all measures. In this case the hippocampus volume alone yielded a better sensitivity and specificity, 69% and 73% respectively.

0,0

CUT OFF

-1,0 -2,0

Discussion Model predictability This study was designed to investigate the feasibility of discriminating between AD, MCI and controls using multivariate data analysis (OPLS) as a tool. We wanted to investigate the power of simultaneously analysing a wide range of MRI volumetric measures. Several

-3,0 -4,0 Fig. 5. Model validation with external test set. The illustration shows how the test set is predicted on to the AD vs. controls model. AD red triangles, controls black triangles and MCI grey triangles. AD subjects above zero are falsely predicted and control subjects below zero are falsely predicted. As can be observed in the plot 16 out of 22 MCI subjects are classified as AD 1 year before they converted to AD.

E. Westman et al. / NeuroImage 54 (2011) 1178–1187

of the two groups of measures with sensitivity of 87% and specificity of 90% (LR+ = 9) which is in line with previous studies. Combining all the measures together gave the most accurate classification and resulted in a sensitivity of 90% and a specificity of 94% (LR+ = 15). This further highlights the value of manual hippocampus measurements as a gold standard and diagnostic tool since they gave the most accurate classification out of the two sets of measures. However medial temporal lobe atrophy is not specific to AD and hippocampal atrophy can be found in both hippocampal sclerosis and frontotemporal dementia. When we combined all of the measures we obtained a LR+ of 15 compared to a LR+ of 9 for hippocampal volumes alone. This shows that the combination of regional and global atrophy measures has additional diagnostic value. We also found significant separations between AD vs. MCI and MCI vs. controls, although the discriminatory power was not as high for these comparisons. One reason for this is the heterogeneity of the MCI group. The MCI group is defined as a group of elderly with cognitive problems who are not yet demented and these subjects can be subclassified as amnestic, single non memory- and multiple domain MCI (Kantarci, 2005). Amnestic MCI subjects are at higher risk of developing AD compared to healthy elderly and most amnestic MCI subjects progress to AD at sometime during their lifespan (Kantarci, 2005). In this study the MCI subjects were classified according to a CDR score of 0.5 so the MCI group contains all three subclasses, some more similar to AD and some more similar to controls. Interestingly the cross-validated scatter plots illustrate clusters of individuals within the MCI group that are more similar to controls and other clusters of individuals that are more similar to AD subjects. This is also supported by the results of Fan et al. (2008) who reported that approximately two thirds of the MCI patients that they had studied had similar patterns of atrophy as AD patients, while the remaining one third has similar patterns of atrophy to healthy controls. We believe that one of the strength of this study is that we have captured the heterogeneity of individuals with MCI. We have not just selected individuals with single domain amnestic MCI, for example, which would not be representative of the general population. However, at the same time this is also a draw back when creating robust models, as can be observed in the decrease in the Q2(Y), sensitivity and specificity when comparing AD vs. MCI and MCI vs. controls, compared to AD vs. controls. This decrease is expected and warrants further investigations of the MCI group. A previous study (Desikan et al., 2009) showed that it is possible to discriminate between patients with MCI and controls in two different cohorts with high accuracy using automated MRI measures (sensitivity 74%, 90%, specificity 94%, 91% respectively for the cohorts 1 and 2). The subjects from the first cohort were amnestic MCI subjects comparable to subjects used in clinical trials and the subjects from the second cohort all converted to AD within 2 years and were all classified as amnestic at baseline according to Petersen et al. revised criteria. Our MCI cohort is much more heterogeneous than this. Since the AD vs. controls model was the most robust model, we wanted to validate it with an external test set which we had randomly selected from the original cohort to achieve even numbered groups for the OPLS analysis. We found a slightly lower sensitivity (82%) and specificity (83%) for the test set than for the original cross-validated model. We did however expect that the test set would be slightly less accurate. We also included 22 MCI subject in the test set which at 1 year follow-up converted to AD. Neurodegeneration in AD is estimated to start 20–30 years before the clinical diagnosis is given (Blennow et al., 2006). Therefore, we included the MCI convertors in the test set to see if these would be classified as more AD-like than control-like 1 year before the clinical diagnosis of AD was given. This resulted in 16 out of the 22 subjects being classified as more AD-like (73%). This indicates the potential of OPLS to predict conversion from MCI to AD, but this will need further improvement and validation. This result is similar to the result of Plant et al. (2009) who found a

1185

classification accuracy of 75% using the voting feature intervals classifier (VFI). Their training set was much smaller than the training set that we used however. With the help of nonlinear multivariate analysis, Fan et al. (2008) predicted decline in MMSE with an accuracy of 87% in MCI patients with AD like atrophy patterns. Several studies have been performed using multivariate analysis and machine learning to distinguish between patients and cognitively normal subjects. One study used linear discriminant analysis (LDA) with quantitative structural neuroimaging measures to discriminate between mild AD and controls (McEvoy et al., 2009). As with our study they included a large number of subjects from different MRI sites and they obtained a sensitivity of 87%, a specificity of 90% and a likelihood ratio of +12 when comparing the two groups. These results are comparable to the results received in our study. In another large multi centre study Vemuri et al. (2008) used support vector machines (SVM) to classify subjects with probable AD from controls. Including ApoE in their analysis they received a sensitivity of 88%, a specificity of 90% and a likelihood ratio of + 9 for their training set which is slightly lower result than we received. However, for their test set they found a sensitivity of 86% and specificity of 92%, which is higher than for our test set. Kloppel et al. (2008a,b) have published two papers where they used SVM classification to distinguish between AD and controls using grey matter from the whole brain as input . These two studies show similar prediction accuracy as our study, but the sample sizes were smaller and their AD subjects were more severely impaired, making the task of classification easier. Plant et al. (2009) used three different classifiers including SVM, VFI and Bayes statistics. The classification accuracy they received comparing AD vs. controls was similar to our results, but the accuracy for MCI vs. controls was higher. Again, the sample size was much smaller. The OPLS algorithm has also previously been successfully applied to discriminate between groups in non-MRI applications. Rantalainen et al. (2006) combined data from two different platforms (2D-DIEGE proteomic and 1H-NMR metabolic data) to analyse blood plasma from mice with a prostate cancer xenograft and matched controls. This study demonstrated that data from different analytical platforms can be successfully combined and gives us a better understanding of in vivo model systems. Another study used gas chromatography coupled mass spectroscopy data to differentiate between two transgenic poplar lines and wild type (Wiklund et al., 2008). MRI measures of importance In our study we considered 24 different MRI measures, but the numbers of correlations between pairs of MRI measures is obviously far higher, which is not considered in univariate modelling. Further, using a univariate test with many variables it is necessary to correct for multiple comparisons which reduces the likelihood of getting significant results. Using a multivariate method, one test is simultaneously performed on all variables at once. Braak and Braak (1991) defined six stages in the progression of AD according to the manner in which the neurofibrillary pathology spreads. During the first two Braak stages the pathology is confined to the entorhinal cortex/transentorhinal cortex with minimal involvement of the hippocampus. In the third and fourth stages the disease spreads to the hippocampus and the medial temporal limbic areas and in the final two stages the pathology extends to the isocortical association areas. In the comparison between MCI and controls (Fig. 4B) MRI measures of importance for the separation were found to be manual measures of hippocampal volume and temporal gray matter. As the disease progresses structures in the medial temporal areas are more affected as illustrated by the AD vs. controls model (Fig. 2B). Frontal, parietal and occipital gray matter volumes are also highly affected. Finally the ventricles are dilated as a result of brain atrophy. Comparing AD vs. MCI (Fig. 3B) the same areas are affected as

1186

E. Westman et al. / NeuroImage 54 (2011) 1178–1187

discussed above, but not to the same extent as between AD and controls. A noticeable difference however is that temporal gray matter volumes have greater discriminating power than hippocampal volumes, demonstrating that structures other than hippocampi are also of interest.

Conclusion Quantitative structural MRI measurements in combination with multivariate data analysis can accurately classify patients with early Alzheimer's disease from normally cognitive elderly controls. The combination of manual hippocampal volume measurements and automated global and regional volume measures gave the best discriminatory result. The model also shows potential in predicting conversion from MCI to AD. To achieve higher discriminatory power between subjects with mild cognitive impairment and patients with AD or controls it may be helpful in future to divide the MCI group into subgroups, due to the heterogeneity within the group. Our approach may allow disease related atrophy patterns to be obtained and combinations of regional and global measures may have high diagnostic value. The combination of all MRI measurements including both regional and global variables almost doubled the likelihood ratio compared to manual hippocampal measurements alone, which had the highest predictive power of the two sets of MRI measures. The method should, however, be evaluated further and it would be helpful to replicate our results using a larger external dataset to test the robustness of the model. Future work will involve more thoroughly studying the MCI group to better pinpoint disease progression and to examine heterogeneity within the group and also investigating the longitudinal follow up scans of the subjects studied here. In conclusion, we have applied a powerful multivariate analysis technique (OPLS) to a study collecting data from different sites across Europe using MRI measurements principally obtained from fully automated analysis methods. There is a danger that analysis techniques applied to small MRI studies recruiting subjects from a single site are not widely applicable since they do not reflect the heterogeneity found in multi-centre studies such as the one described here. Automated analysis techniques are only of interest if they are widely applicable. The multivariate method applied here (OPLS) provides the opportunity to analyze all of the MRI measures simultaneously allowing us to build robust OPLS models for the prediction of disease demonstrating high sensitivity and specificity. Ultimately, it would be desirable to make such techniques available to clinicians. Although this goal is still far from routine practice, the use of automated techniques does offer future promise. The techniques described are extremely promising for biomarker discovery and validation. We conclude that global and regional patterns of atrophy may have greater potential than single global or regional volumes.

Acknowledgments This study was supported by InnoMed, (Innovative Medicines in Europe) an Integrated Project funded by the European Union of the Sixth Framework program priority FP6-2004-LIFESCIHEALTH-5, Life Sciences, Genomics and Biotechnology for Health. Also thanks to the foundation Gamla Tjänarinnor, the Swedish Alzheimer's Association and Swedish Brain Power, Health Research Council of Academy of Finland and Stockholm Medical Image Laboratory and Education (SMILE). AS and SL were supported by funds from the NIHR Biomedical Research Centre for Mental Health at the South London and Maudsley NHS Foundation Trust and Institute of Psychiatry, Kings College London. We would like to thank the reviewers of the paper for their valuable comments.

References Blennow, K., de Leon, M.J., Zetterberg, H., 2006. Alzheimer's disease. Lancet 368, 387–403. Braak, H., Braak, E., 1991. Neuropathological stageing of Alzheimer-related changes. Acta Neuropathol. 82, 239–259. Callen, D.J., Black, S.E., Gao, F., Caldwell, C.B., Szalai, J.P., 2001. Beyond the hippocampus: MRI volumetry confirms widespread limbic atrophy in AD. Neurology 57, 1669–1674. Chaves, R., Ramírez, J., Górriz, J.M., López, M., Salas-Gonzalez, D., Álvarez, I., Segovia, F., 2009. SVM-based computer-aided diagnosis of the Alzheimer's disease using t-test NMSE feature selection with feature correlation weighting. Neurosci. Lett. 461, 293–297. Cocosco, C.A., Zijdenbos, A.P., Evans, A.C., 2003. A fully automatic and robust brain MRI tissue classification method. Med. Image Anal. 7, 513–527. Collins, D.L., Neelin, P., Peters, T.M., Evans, A.C., 1994. Automatic 3D intersubject registration of MR volumetric data in standardized Talairach space. J. Comput. Assist. Tomogr. 18, 192–205. Colliot, O., Chetelat, G., Chupin, M., Desgranges, B., Magnin, B., Benali, H., Dubois, B., Garnero, L., Eustache, F., Lehericy, S., 2008. Discrimination between Alzheimer disease, mild cognitive impairment, and normal aging by using automated segmentation of the hippocampus. Radiology 248, 194–201. Desikan, R.S., Cabral, H.J., Hess, C.P., Dillon, W.P., Glastonbury, C.M., Weiner, M.W., Schmansky, N.J., Greve, D.N., Salat, D.H., Buckner, R.L., Fischl, B., 2009. Automated MRI measures identify individuals with mild cognitive impairment and Alzheimer's disease. Brain 132, 2048–2057. Dubois, B., Feldman, H.H., Jacova, C., DeKosky, S.T., Barberger-Gateau, P., Cummings, J., Delacourte, A., Galasko, D., Gauthier, S., Jicha, G., Meguro, K., O'Brien, J., Pasquier, F., Robert, P., Rossor, M., Salloway, S., Stern, Y., Visser, P.J., Scheltens, P., 2007. Research criteria for the diagnosis of Alzheimer's disease: revising the NINCDS-ADRDA criteria. Lancet Neurol. 6, 734–746. Eriksson, L., Johansson, E., Kettaneh-Wold, N., Trygg, J., Wiksröm, C., Wold, S., 2006. Multi- and Megavariate Data Analysis (Part I—Basics and Principals and Applications), 2nd ed. Umetrics AB, Umeå. Fan, Y., Shen, D., Davatzikos, C., 2005. Classification of Structural Images via HighDimensional Image Warping, Robust Feature Extraction, and SVM. Medical Image Computing and Computer-Assisted Intervention—MICCAI 2005, pp. 1–8. Fan, Y., Batmanghelich, N., Clark, C.M., Davatzikos, C., 2008. Spatial patterns of brain atrophy in MCI patients, identified via high-dimensional pattern classification, predict subsequent cognitive decline. Neuroimage 39, 1731–1743. Fox, N.C., Warrington, E.K., Freeborough, P.A., Hartikainen, P., Kennedy, A.M., Stevens, J. M., Rossor, M.N., 1996. Presymptomatic hippocampal atrophy in Alzheimer's disease. A longitudinal MRI study. Brain 119 (Pt 6), 2001–2007. Frost, C., Kallis, C., 2009. Reply: a plea for confidence intervals and consideration of generalizability in diagnostic studies. Brain 132 e103-. Jack Jr., C.R., Petersen, R.C., O'Brien, P.C., Tangalos, E.G., 1992. MR-based hippocampal volumetry in the diagnosis of Alzheimer's disease. Neurology 42, 183–188. Jack Jr., C.R., Petersen, R.C., Xu, Y.C., Waring, S.C., O'Brien, P.C., Tangalos, E.G., Smith, G.E., Ivnik, R.J., Kokmen, E., 1997. Medial temporal atrophy on MRI in normal aging and very mild Alzheimer's disease. Neurology 49, 786–794. Jack Jr., C.R., Bernstein, M.A., Fox, N.C., Thompson, P., Alexander, G., Harvey, D., Borowski, B., Britson, P.J., Whitwell, J.L., Ward, C., Dale, A.M., Felmlee, J.P., Gunter, J. L., Hill, D.L., Killiany, R., Schuff, N., Fox-Bosetti, S., Lin, C., Studholme, C., DeCarli, C.S., Krueger, G., Ward, H.A., Metzger, G.J., Scott, K.T., Mallozzi, R., Blezek, D., Levy, J., Debbins, J.P., Fleisher, A.S., Albert, M., Green, R., Bartzokis, G., Glover, G., Mugler, J., Weiner, M.W., 2008. The Alzheimer's Disease Neuroimaging Initiative (ADNI): MRI methods. J. Magn. Reson. Imaging 27, 685–691. Johan Trygg, S.W., 2002. Orthogonal projections to latent structures (O-PLS). J. Chemometr. 16, 119–128. Juottonen, K., Laakso, M.P., Partanen, K., Soininen, H., 1999. Comparative MR analysis of the entorhinal cortex and hippocampus in diagnosing Alzheimer disease. AJNR Am. J. Neuroradiol. 20, 139–144. Kantarci, K., 2005. Magnetic resonance markers for early diagnosis and progression of Alzheimer's disease. Expert Rev. Neurother. 5, 663–670. Killiany, R.J., Moss, M.B., Albert, M.S., Sandor, T., Tieman, J., Jolesz, F., 1993. Temporal lobe regions on magnetic resonance imaging identify patients with early Alzheimer's disease. Arch. Neurol. 50, 949–954. Killiany, R.J., Hyman, B.T., Gomez-Isla, T., Moss, M.B., Kikinis, R., Jolesz, F., Tanzi, R., Jones, K., Albert, M.S., 2002. MRI measures of entorhinal cortex vs hippocampus in preclinical AD. Neurology 58, 1188–1196. Kloppel, S., Stonnington, C.M., Barnes, J., Chen, F., Chu, C., Good, C.D., Mader, I., Mitchell, L.A., Patel, A.C., Roberts, C.C., Fox, N.C., Jack Jr., C.R., Ashburner, J., Frackowiak, R.S., 2008a. Accuracy of dementia diagnosis: a direct comparison between radiologists and a computerized method. Brain 131, 2969–2974. Kloppel, S., Stonnington, C.M., Chu, C., Draganski, B., Scahill, R.I., Rohrer, J.D., Fox, N.C., Jack Jr., C.R., Ashburner, J., Frackowiak, R.S.J., 2008b. Automatic classification of MR scans in Alzheimer's disease. Brain 131, 681–689. Laakso, M.P., Partanen, K., Riekkinen, P., Lehtovirta, M., Helkala, E.L., Hallikainen, M., Hanninen, T., Vainio, P., Soininen, H., 1996. Hippocampal volumes in Alzheimer's disease, Parkinson's disease with and without dementia, and in vascular dementia: An MRI study. Neurology 46, 678–681. Laakso, M.P., Soininen, H., Partanen, K., Lehtovirta, M., Hallikainen, M., Hanninen, T., Helkala, E.L., Vainio, P., Riekkinen Sr., P.J., 1998. MRI of the hippocampus in Alzheimer's disease: sensitivity, specificity, and analysis of the incorrectly classified subjects. Neurobiol. Aging 19, 23–31. Lehericy, S., Baulac, M., Chiras, J., Pierot, L., Martin, N., Pillon, B., Deweer, B., Dubois, B., Marsault, C., 1994. Amygdalohippocampal MR volume measurements in the early stages of Alzheimer disease. AJNR Am. J. Neuroradiol. 15, 929–937.

E. Westman et al. / NeuroImage 54 (2011) 1178–1187 Lerch, J.P., Pruessner, J., Zijdenbos, A.P., Collins, D.L., Teipel, S.J., Hampel, H., Evans, A.C., 2008. Automated cortical thickness measurements from MRI can accurately separate Alzheimer's patients from normal elderly controls. Neurobiol. Aging 29, 23–30. López, M.M., Ramírez, J., Górriz, J.M., Álvarez, I., Salas-Gonzalez, D., Segovia, F., Chaves, R., 2009. SVM-based CAD system for early detection of the Alzheimer's disease using kernel PCA and LDA. Neurosci. Lett. 464, 233–238. McEvoy, L.K., Fennema-Notestine, C., Roddey, J.C., Hagler Jr., J.D.J., Holland, D., Karow, D. S., Pung, C.J., Brewer, J.B., Dale, A.M., 2009. Alzheimer disease: quantitative structural neuroimaging for detection and prediction of clinical and structural changes in mild cognitive impairment. Radiology 2511080924. McKhann, G., Drachman, D., Folstein, M., Katzman, R., Price, D., Stadlan, E.M., 1984. Clinical diagnosis of Alzheimer's disease: report of the NINCDS-ADRDA Work Group under the auspices of Department of Health and Human Services Task Force on Alzheimer's Disease. Neurology 34, 939–944. Morra, J.H., Tu, Z., Apostolova, L.G., Green, A.E., Avedissian, C., Madsen, S.K., Parikshak, N., Toga, A.W., Jack Jr., C.R., Schuff, N., Weiner, M.W., Thompson, P.M., 2008. Automated mapping of hippocampal atrophy in 1-year repeat MRI data from 490 subjects with Alzheimer's disease, mild cognitive impairment, and elderly controls. Neuroimage. O'Brien, J.T., 2007. Role of imaging techniques in the diagnosis of dementia. Br. J. Radiol. 80 (Spec No 2), S71–S77. Pearson, K., 1901. On lines and planes of closest fit to systems of points in space. Philos. Mag. 2, 559–572. Plant, C., Teipel, S.J., Oswald, A., Böhm, C., Meindl, T., Mourao-Miranda, J., Bokde, A.W., Hampel, H., Ewers, M., 2009. Automated detection of brain atrophy patterns based on MRI for the prediction of Alzheimer's disease. Neuroimage 50, 162–174. Qizilbash, S., Chui, Tariot, Brodaty, Kaye, Erkinjuntti, 2002. Evindenced-based Dementia Pratice. Blackwell Publishing, Oxford, pp. 20–23. Rantalainen, M., Cloarec, O., Beckonert, O., Wilson, I.D., Jackson, D., Tonge, R., Rowlinson, R., Rayner, S., Nickson, J., Wilkinson, R.W., Mills, J.D., Trygg, J., Nicholson, J.K., Holmes, E., 2006. Statistically integrated metabonomic–proteomic studies on a human prostate cancer xenograft model in mice. J. Proteome Res. 5, 2642–2655. Ries, M.L., Carlsson, C.M., Rowley, H.A., Sager, M.A., Gleason, C.E., Asthana, S., Johnson, S. C., 2008. Magnetic resonance imaging characterization of brain structure and function in mild cognitive impairment: a review. J. Am. Geriatr. Soc. 56, 920–934. Rosen, W.G., Mohs, R.C., Davis, K.L., 1984. A new rating scale for Alzheimer's disease. Am. J. Psychiatry 141, 1356–1364. Seab, J.P., Jagust, W.J., Wong, S.T., Roos, M.S., Reed, B.R., Budinger, T.F., 1988. Quantitative NMR measurements of hippocampal atrophy in Alzheimer's disease. Magn. Reson. Med. 8, 200–208.

1187

Selkoe, D.J., 2001. Alzheimer's disease: genes, proteins, and therapy. Physiol. Rev. 81, 741–766. Simmons, A., Westman, E., Muehlboeck, S., Mecocci, P., Vellas, B., Tsolaki, M., Kłoszewska, I., Wahlund, L.-O., Soininen, H., Lovestone, S., Evans, A., Spenger, C., 2009. MRI measures of Alzheimer's disease and the AddNeuroMed Study. Ann. N.Y. Acad. Sci. 1180, 47–55. Simmons, A., Westman, E., Muehlboeck, S., Mecocci, P., Vellas, B., Tsolaki, M., Kloszewska, I., Wahlund, L-O., Soininen, H., Lovestone, S., Evans, A., Spenger C. for the AddNeuroMed consortium, 2010. The AddNeuroMed framework for multicentre MRI assessment of longitudinal changes in Alzheimer's disease: experience from the first 24 months. Int. J. Geriatr. Psychiatry. Sled, J.G., Zijdenbos, A.P., Evans, A.C., 1998. A nonparametric method for automatic correction of intensity nonuniformity in MRI data. IEEE Trans. Med. Imaging 17, 87–97. Vemuri, P., Gunter, J.L., Senjem, M.L., Whitwell, J.L., Kantarci, K., Knopman, D.S., Boeve, B.F., Petersen, R.C., Jack Jr., C.R., 2008. Alzheimer's disease diagnosis in individual subjects using structural MR images: validation studies. Neuroimage 39, 1186–1197. Walsh, D.M., Tseng, B.P., Rydel, R.E., Podlisny, M.B., Selkoe, D.J., 2000. The oligomerization of amyloid beta-protein begins intracellularly in cells derived from human brain. Biochemistry 39, 10831–10839. Westman, E., Spenger, C., Wahlund, L.-O., Lavebratt, C., 2007. Carbamazepine treatment recovered low N-acetylaspartate+N-acetylaspartylglutamate (tNAA) levels in the megencephaly mouse BALB/cByJ-Kv1.1mceph/mceph. Neurobiol. Dis. 26, 221–228. Westman, E., Spenger, C., Oberg, J., Reyer, H., Pahnke, J., Wahlund, L.O., 2009. In vivo 1Hmagnetic resonance spectroscopy can detect metabolic changes in APP/PS1 mice after donepezil treatment. BMC Neurosci. 10, 33. Wiklund, S., Johansson, E., Sjostrom, L., Mellerowicz, E.J., Edlund, U., Shockcor, J.P., Gottfries, J., Moritz, T., Trygg, J., 2008. Visualization of GC/TOF-MS-based metabolomics data for identification of biochemically interesting compounds using OPLS class models. Anal. Chem. 80, 115–122. Wold, S., Ruhe, A., Wold, H., Dunn III, W.J., 1984. The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses. SIAM J. Sci. Stat. Comput. 5, 735–743. Wold, S., Trygg, J., Berglund, A., Antti, H., 2001. Some recent developments in PLS modeling. Chemom. Intell. Lab. Syst. 58, 131–150. Xu, Y., Jack Jr., C.R., O'Brien, P.C., Kokmen, E., Smith, G.E., Ivnik, R.J., Boeve, B.F., Tangalos, R.G., Petersen, R.C., 2000. Usefulness of MRI measures of entorhinal cortex versus hippocampus in AD. Neurology 54, 1760–1767. Zijdenbos, A.P., Forghani, R., Evans, A.C., 2002. Automatic “pipeline” analysis of 3-D MRI data for clinical trials: application to multiple sclerosis. IEEE Trans. Med. Imaging 21, 1280–1291.