Brain Regions Showing White Matter Loss in Huntington’s Disease Are Enriched for Synaptic and Metabolic Genes

Brain Regions Showing White Matter Loss in Huntington’s Disease Are Enriched for Synaptic and Metabolic Genes

Accepted Manuscript Brain regions showing white matter loss in Huntington’s disease are enriched for synaptic and metabolic genes Peter McColgan, Sara...

3MB Sizes 0 Downloads 55 Views

Accepted Manuscript Brain regions showing white matter loss in Huntington’s disease are enriched for synaptic and metabolic genes Peter McColgan, Sarah Gregory, Kiran K. Seunarine, Adeel Razi, Marina Papoutsi, Eileanoir Johnson, Alexandra Durr, Raymund AC. Roos, Blair R. Leavitt, Peter Holmans, Rachael I. Scahill, Chris A. Clark, Geraint Rees, Sarah J. Tabrizi PII:

S0006-3223(17)32129-7

DOI:

10.1016/j.biopsych.2017.10.019

Reference:

BPS 13364

To appear in:

Biological Psychiatry

Received Date: 19 April 2017 Revised Date:

5 October 2017

Accepted Date: 7 October 2017

Please cite this article as: McColgan P., Gregory S., Seunarine K.K, Razi A., Papoutsi M., Johnson E., Durr A., Roos R.A., Leavitt B.R, Holmans P., Scahill R.I, Clark C.A, Rees G., Tabrizi S.J & and the Track-On HD Investigators, Brain regions showing white matter loss in Huntington’s disease are enriched for synaptic and metabolic genes, Biological Psychiatry (2017), doi: 10.1016/ j.biopsych.2017.10.019. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Brain regions showing white matter loss in ACCEPTED MANUSCRIPT

2

Huntington’s disease are enriched for synaptic and metabolic genes

3

Peter McColgan1, Sarah Gregory1, Kiran K Seunarine2, Adeel Razi3, 4, Marina Papoutsi1, Eileanoir Johnson1,

4

Alexandra Durr5, Raymund AC Roos6, Blair R Leavitt7, Peter Holmans8, Rachael I Scahill1, Chris A Clark2,

5

Geraint Rees3*, Sarah J Tabrizi1, 9* and the Track-On HD Investigators

6

*These authors contributed equally to this work

7 8 9 10 11 12 13 14 15 16 17 18 19 20

1Huntington’s

21

Short title: Transcription and connectivity loss in Huntington’s

EP

TE D

M AN U

SC

Disease Centre, Department of Neurodegenerative Disease, UCL Institute of Neurology, London, WC1N 3BG, UK 2Developmental Imaging and Biophysics Section, UCL Institute of Child Health, London, WC1N 1EH, UK 3Wellcome Trust Centre for Neuroimaging, UCL Institute of Neurology, London, WC1N 3BG, UK 4Department of Electronic Engineering, NED University of Engineering and Technology, Karachi, Pakistan 5APHP Department of Genetics, University Hospital Pitié-Salpêtrière, and ICM (Brain and Spine Institute) INSERM U1127, CNRS UMR7225, Sorbonne Universités – UPMC Paris VI UMR_S1127, Paris, France 6Department of Neurology, Leiden University Medical Centre, 2300RC Leiden, The Netherlands 7Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, University of British Columbia, 950 West 28th Avenue, Vancouver BC, V5Z 4H4 Canada 8 MRC Centre for Neuropsychiatric Genetics and Genomics, School of Medicine, Cardiff University, CF24 4HQ, UK 9National Hospital for Neurology and Neurosurgery, Queen Square, London, WC1N 3BG, UK

AC C

22

RI PT

1

Peter McColgan 1

1

Abstract word count: 231

2

Text word count: 3,997

3

Number of tables: 1

4

Number of figures: 5

5

Number of Supplemental Material files: 2 (1 PDF file, 1 ZIP file)

6

PDF file: contains Methods, Figures S1-S6, Tables S1-S2

7

ZIP file: contains supplemental Excel Files 1-9

SC Correspondence to:

M AN U

8 9

RI PT

ACCEPTED MANUSCRIPT

Professor Sarah J. Tabrizi

11

UCL Huntington’s Disease Centre

12

Department of Neurodegenerative Disease

13

UCL Institute of Neurology and National Hospital for Neurology and Neurosurgery

14

Box 104

15

Queen Square

16

London WC1N 3BG

17

Email: [email protected]

18

Telephone: 0203 108 7474

AC C

EP

TE D

10

Peter McColgan 2

1

Abstract

2

Background: The earliest white matter changes in Huntington’s disease are seen before disease onset in the

3

premanifest stage around the striatum, within the corpus callosum and in posterior white matter tracts. While

4

experimental evidence suggests these changes may be related to abnormal gene transcription we lack an

5

understanding of the biological processes driving this regional vulnerability.

6

Methods: Here, we investigate the relationship between regional transcription in the healthy brain, using the

7

Allen Institute of Brain Science transcriptome atlas, and regional white matter connectivity loss at three time

8

points over 24 months in premanifest Huntington’s disease relative to controls. The baseline cohort included

9

72 premanifest Huntington’s disease participants and 85 healthy controls.

SC

RI PT

ACCEPTED MANUSCRIPT

Results: We show that loss of cortico-striatal, inter-hemispheric and intra-hemispheric white matter

11

connections at baseline and over 24 months, in premanifest Huntington’s disease, is associated with gene

12

expression profiles enriched for synaptic genes and metabolic genes. Cortico-striatal gene expression

13

profiles are predominately associated with motor, parietal and occipital regions, while inter-hemispheric

14

expression profiles are associated with fronto-temporal regions. We also show that genes with known

15

abnormal transcription in human Huntington’s disease and animal models are over-represented in synaptic

16

gene expression profiles but not metabolic gene expression profiles.

17

Conclusions: These findings suggest a dual mechanism of white matter vulnerability in Huntington’s

18

disease, where abnormal transcription of synaptic genes and metabolic disturbance not related to

19

transcription may drive white matter loss.

21

TE D

EP

AC C

20

M AN U

10

Peter McColgan 3

1

Introduction

2

Huntington’s disease (HD) is a progressive fatal neurodegenerative disease caused by a CAG repeat

3

expansion in the huntingtin gene on chromosome 4. Individuals with more than 39 CAG repeats are certain

4

to develop HD, allowing investigation of the premanifest stage (preHD) many years before symptom onset

5

(1). While the caudate and putamen show the earliest grey matter changes (2), white matter changes are seen

6

around the striatum, within the corpus callosum and in the posterior white matter (WM) tracts (2-5). We

7

have demonstrated a hierarchy of white matter vulnerability where cortico-striatal connections show greatest

8

changes in preHD and controls followed by inter-hemispheric and intra-hemispheric connections (6).

RI PT

SC

9

ACCEPTED MANUSCRIPT

Voxel based morphometry suggests (2, 7) that grey and white matter abnormalities in the striatum occur in parallel in those furthest from disease onset, but more recent work (5) suggests that grey matter

11

atrophy precedes white matter atrophy in the striatum. However, as this was a cross-sectional study it is not

12

yet possible to define a typical time lag. Thus patterns of WM loss in preHD are well established, but the

13

underlying pathological processes are unclear.

Mutant huntingtin protein causes cellular dysfunction and ultimately neuronal cell death through

TE D

14

M AN U

10

several processes (8, 9) including downstream effects on synaptic signalling (10), cellular metabolism (11),

16

mitochondrial dysfunction (12), immune activation (13) and alterations in transcription (14). Furthermore

17

transcription levels of genes involved in these processes are atypical in human HD and animal models (14,

18

15). Decreased expression of synaptic proteins in cortical pyramidal neurons of HD mouse models are

19

linked to abnormal cortico-striatal connectivity (16), while changes in transcription levels of Brain Derived

20

Neurotrophic Factor (BDNF), another protein involved in synaptic transmission, are associated with changes

21

in cortico-cortical connectivity (17). Excitotoxic striatal lesion models of HD are consistent with these

AC C

EP

15

Peter McColgan 4

1

findings. Reduced BDNF is seen in the rat striatum after quinolinic acid injection (18) and reduced BDNF

2

and nerve growth factor are seen after 3-nitropropionic acid treatment (19).

ACCEPTED MANUSCRIPT

3

Some genes show a direct association with WM integrity. Loss of Peroxisome-proliferator-activated receptor gamma co-activator α (PGC1α), involved in the transcriptional regulation of energy metabolism,

5

results in striatal degeneration and corpus callosum WM abnormalities in HD mouse models (20). Reduced

6

transcription levels of myelin-related genes are associated with WM abnormalities in HD mouse models

7

(21).

9

Given the relationship between WM connectivity and gene transcription in HD, here we investigated

SC

8

RI PT

4

how regional gene transcription profiles of the healthy human brain, obtained from the Allen Institute of Brain Science (AIBS) human transcriptome atlas (22) were associated with WM connectivity loss in preHD.

11

Based on association between synaptic and metabolic genes and WM loss in HD (20, 21) we hypothesized

12

that WM connectivity loss in preHD would be associated with regional transcription profiles enriched for

13

synaptic and metabolic genes.

M AN U

10

17 18 19

EP

16

AC C

15

TE D

14

Peter McColgan 5

1

Methods and Materials

2

Overview

3

To test our hypothesis, WM connectivity loss was determined using diffusion weighted imaging (DWI) from

4

a longitudinal cohort of preHD and control participants. Brains were parcellated into 70 cortical and 2 sub-

5

cortical (caudate and putamen) regions of interest (ROI) based on the Desikan Freesurfer atlas (23). The

6

caudate and putamen were chosen as these regions show the greatest changes in preHD (2). Whole brain

7

tractography was performed using these parcellations to construct WM brain networks. We have recently

8

published a longitudinal analysis using this cohort (6).

RI PT

SC

9

ACCEPTED MANUSCRIPT

For each set of connections associated with a cortical ROI, WM connectivity loss was defined as either cortico-striatal (connections between cortex and caudate/putamen), inter-hemispheric (cortico-cortical

11

connections between hemispheres) or intra-hemispheric (cortico-cortical connections within the same

12

hemisphere) (see Figure 1). WM connectivity and rate of change in WM connectivity over 24 months were

13

normalised for preHD relative to controls for each participant. Connectivity measures were then transformed

14

to give atrophy and rate of atrophy measures. The resulting atrophy score was used in the cross-sectional

15

analysis, while the rate of atrophy score was used in the longitudinal analysis.

TE D

M AN U

10

To compare regional WM loss in preHD with regional gene expression in the healthy brain the 70

17

cortical ROIs (23) were matched to the closest AIBS ROI and gene expression data were averaged across

18

RNA probes corresponding to the same gene. ROIs with gene expression values greater than two standard

19

deviations above the mean or range were excluded this resulted in the inclusion of 20,737 genes across 68

20

cortical ROIs.

AC C

EP

16

Peter McColgan 6

1

Partial least squares (PLS) regression was used to investigate the relationship between regional gene

ACCEPTED MANUSCRIPT

expression and regional white matter loss. PLS is a multivariate technique used when the number of

3

predictor variables (i.e. regional gene expression) is much larger than the number of observations (i.e.

4

regional white matter loss). It has been used previously to investigate the relationship between gene

5

expression and MRI-derived regional brain measures in healthy volunteers (24, 25). For our analysis the

6

predictor variable comprised a gene x ROI matrix 20,737 x 68 and the response variable comprised a WM

7

loss x ROI matrix; 68 x 4 for the cortico-striatal analysis (68 cortical ROIs x left and right caudate and

8

putamen WM loss to each ROI region) and 68 x 1 for the inter and intra-hemispheric analyses (68 cortical

9

ROIs x inter/intra hemispheric WM loss for each ROI). PLS identified components or patterns of regional

10

gene expression having maximum covariance with regional white matter loss, such that the first few PLS

11

components provide the greatest representation of the covariance. For each component individual genes are

12

assigned weights based on their contribution to the variance explained (24).

SC

M AN U

13

RI PT

2

This analysis provided a weight for each gene indicating its contribution to WM connectivity loss for each component or pattern. Using this information, genes were ranked according to their PLS weight. Gene

15

enrichment analysis was then performed to identify the biological functions of genes with the highest

16

weights using gene ontology (GO) terms (26). Here, the significance of a GO term was determined based

17

on the rank of genes associated with that term.

20 21

EP

19

AC C

18

TE D

14

Peter McColgan 7

1

Imaging Cohort

2

The cohort included preHD and control participants from the Track-On HD study (27), followed up at 3

3

time-points over 24 months at four sites (London, Leiden, Paris and Vancouver). Baseline participants

4

included 72 preHD and 85 controls. For the longitudinal analysis only preHD participants with diffusion

5

data from all 3-time points were included (56 preHD, 65 controls; Supplemental Methods).

6

MRI Acquisition

7

T1 and diffusion weighted images were acquired on two different 3T MRI scanners (Philips Achieva at

8

Leiden and Vancouver and Siemens TIM Trio at London and Paris). Diffusion-weighted images were

9

acquired with 42 unique gradient directions (b = 1000 sec/mm2; Supplemental methods).

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

Diffusion Tractography

11

Whole brain probabilistic tractography was performed using MRtrix (28). The spherical deconvolution

12

informed filtering of tractograms (SIFT2) algorithm (29) was used to reduce biases. To demonstrate our

13

results were robust to varying methodologies, additional cross-sectional analyses used alternative

14

connectome construction methodologies (see Supplemental Methods).

15

Mapping gene expression data to MRI space

16

Gene expression microarray data was used from the AIBS atlas (22). Maybrain software

17

(https://github.com/rittman/maybrain) matched centroids of MRI regions to the closest AIBS region. For the

18

cross-sectional analyses a leave one out approach and 3 out of 6 permutations of AIBS brain samples was

19

also used to ensure results were robust to different combinations of AIBS subjects (see Supplemental

20

Methods).

AC C

EP

TE D

10

Peter McColgan 8

1

Statistical analysis

2

Partial least squares regression was used to investigate the association between gene transcriptome of the

3

healthy brain and WM connectivity loss in preHD. Code used to perform this analysis was adapted from

4

Whitaker et al. (25). Random permutations of the gene predictor variable were also investigated to ensure

5

results were not due to chance (see Supplemental Methods).

6

Gene ontology enrichment analysis

7

We used the gene ontology enrichment analysis and visualisation tool (GOrilla) (http://cbl-

8

gorilla.cs.technion.ac.il) (26) to identify GO terms that were significantly enriched in the target gene list.

9

Overlap between gene profiles and Huntington’s disease related genes

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

To investigate similarities between gene profiles, we identified the significance of gene overlap between

11

analyses using a hypergeometric distribution. Gene ontology enrichment analysis was also repeated with

12

overlap genes removed to assess whether this affected the resulting GO terms. Overlap between genes in top

13

gene ontology terms and HD genes was also investigated.

14

Enrichment for Huntington’s disease related genes

15

We investigated whether genes showing abnormal transcription in human and animal models of HD were

16

enriched greater than chance in the first PLS components of the cortico-striatal, inter-hemispheric and intra-

17

hemispheric analyses. HD gene lists were obtained from Langfelder et al. (30). Additionally we investigated

18

whether HD related genes were more strongly enriched in these gene lists than other biologically plausible

19

gene sets, chosen at random. Gene sets for human supragranular genes, oligodendrocytes and cell cycle

20

metabolism were also investigated (see Supplemental Methods).

AC C

EP

TE D

10

Peter McColgan 9

1

Results

2

Gene expression profiles of the healthy human brain explain the variance of regional white

3

matter connectivity loss in preHD

4

For the majority of analyses the first PLS component accounted for a large percentage of the variance in

5

regional WM loss. We therefore focused on this component. Gene expression data explained 66% of the

6

variance of regional WM connectivity loss in the cortico-striatal cross-sectional analysis and 70% in the

7

longitudinal analysis for the first component of the PLS and 11% and 6% respectively for the second

8

component. For the inter-hemispheric analysis, gene expression explained 67% WM loss cross-sectionally

9

and 17% longitudinally for the first component and 9% and 60% respectively for the second component. For

10

the intra-hemispheric analysis gene expression explained 24% cross-sectionally and 65% longitudinally for

11

the first component and 47% and 11% respectively for the second component. See Supplemental File 1 for

12

the first component PLS gene weights for these analyses.

RI PT

SC

M AN U

13

ACCEPTED MANUSCRIPT

For each analysis the first components of the PLS were explored. The second components were also explored if they accounted for a large proportion of the variance. Variances explained by the first component

15

ranged between 45-69% for random permutations of the gene predictor matrix however gene and ROI

16

weights were very different from the original analysis.

19

EP

18

AC C

17

TE D

14

Peter McColgan 10

3

Similar significant GO terms were seen for the cortico-striatal and inter-hemispheric analyses including

4

modulation of chemical synaptic transmission, regulation of cell projection organisation and cell projection

5

organisation. We refer to these as a synaptic profile. For the intra-hemispheric analysis the most significant

6

GO terms included mRNA metabolic process, RNA processing and chromatin organisation (see Table 1 and

7

Figure 2), which we refer to as a metabolic/chromatin profile. For the intra-hemispheric analysis the second

8

component of the PLS was significantly associated with GO terms involved in myelination and lipid

9

metabolism. See Supplemental File 2 for all significant GO terms for each analysis.

SC

RI PT

2

Expression profiles associated with cross-sectional variation in white matter connections in ACCEPTED MANUSCRIPT preHD relative to controls

1

The leave-one-out analyses showed modulation of chemical synaptic signalling and cell projection

11

organization were the most significant GO terms for cortico-striatal and inter-hemispheric connections for

12

nearly all permutations. For intra-hemispheric connections, the GO terms mRNA metabolic process, RNA

13

processing and chromatin organisation were among the most significant for all permutations. Similarly the

14

addition of Gaussian noise also revealed consistent results (see Supplemental File 3). The 3 out of 6

15

permutation analyses revealed similar findings across many of the 8 permutations (see Supplemental File 4).

16

The use of FA weighting and the thresholded scale 60 easy Lausanne atlas resulted in a change from

17

synaptic to metabolic/chromatin profiles for the cortico-striatal and inter-hemispheric connections. For intra-

18

hemispheric connections FA weighting revealed a consistent metabolic/chromatin profile. For the

19

thresholded scale 60 Lausanne atlas intra-hemispheric connections showed a synaptic profile. There was no

20

change in profiles across consensus thresholds of 75% and 50%. Cross-sectional analyses using random

AC C

EP

TE D

M AN U

10

Peter McColgan 11

1

permutations of genes revealed very different GO terms at minimal levels of significance suggesting our

2

results are not due to chance (see Supplemental File 5).

3

Expression profiles associated with longitudinal change in white matter connections in preHD

4

relative to controls

5

For both cortico-striatal and inter-hemispheric analyses longitudinal change in white matter was associated

6

with GO terms involving metabolism or chromatin organisation (see Supplemental Material Table S1 and

7

Figures S1 and S2). For intra-hemispheric analysis longitudinal change was associated with GO terns

8

involved in mitochondrial function, metabolism and synaptic transmission (see Supplemental Material Table

9

S1 and Figure S3). The second component of the PLS for the inter-hemispheric analysis was significantly

SC

RI PT

ACCEPTED MANUSCRIPT

associated with a range of GO terms including immune function, development and protein folding (see

11

Supplemental File 2). In summary, these results suggest regional gene expression profiles associated with

12

loss of WM connectivity in preHD are involved in synaptic, metabolic and chromatin related biological

13

processes.

14

Overlap between synaptic and metabolic gene profiles and HD related genes

15

A significant overlap of 346 genes (p< 0.001) was found between the top genes in the cortico-striatal

16

analysis and intra-hemispheric analyses. These were then compared to the striatum genes showing

17

transcriptional abnormalities in HD humans and animal models. This revealed 8 genes in common, encoding

18

proteins involved in cell cycle (CEP135), axon development (NEK1) and G-protein coupling (ADORA2A).

19

See Supplemental File 6. Gene ontology enrichment analysis with overlap genes removed did not change the

20

most significant GO terms. The gene ontology terms “modulation of chemical synaptic transmission” and

21

“mRNA metabolic process” showed overlap of 7 genes. HD related genes showed overlap of 44 genes with

AC C

EP

TE D

M AN U

10

Peter McColgan 12

1

the GO terms modulation of chemical synaptic transmission and 7 genes with mRNA metabolic process.

2

The overlaps were not greater than those expected by chance.

3

Dissociation of cortico-striatal, inter- and intra-hemispheric gene enrichment in the cortex

4

The next step in our analysis was to explore the spatial pattern of each gene expression profile in the brain.

5

To determine what brain regions were enriched with each gene expression profile, we analysed PLS ROI

6

weights from each analysis where higher weights related to greater gene profile enrichment (see

7

Supplemental File 7 for ROI weights for each analysis). Cortical regions with the highest weights in the

8

cortico-striatal analysis (cross-sectional) were predominantly in motor, parietal and occipital cortices.

9

Conversely, cortical regions with the highest weights in the inter-hemispheric analysis (cross-sectional) were

SC

RI PT

ACCEPTED MANUSCRIPT

predominantly in frontal, temporal and insular cortices. Cortical regions with the highest weights in the

11

intra-hemispheric analysis (cross-sectional) included frontal, temporal and occipital regions (see

12

Supplemental Material table S2 and Figure 3). Plotting cortico-striatal ROI weights against both inter-

13

hemispheric and intra-hemispheric ROI weights revealed dissociation in terms of regions involved, where

14

regions enriched in the cortical-striatal analysis were distinctly different from those enriched in the inter-

15

hemispheric and intra-hemispheric analyses (see Figure 4). Cross-sectional analyses using random

16

permutations of ROIs revealed very different distribution of ROI weights suggesting our results are not due

17

to chance (see Supplemental Figure S5 and Supplemental file 7).

19 20

TE D

EP AC C

18

M AN U

10

Peter McColgan 13

3

Our next step was to assess whether genes that show abnormal transcription in HD, both in the cortex and

4

striatum, might be associated with white matter loss. The cortico-striatal gene list was significantly enriched

5

for abnormal HD genes in the striatum (p < 0.001) and in the cortex (p < 0.001). The inter-hemispheric gene

6

list was significantly enriched for genes in the striatum (p < 0.001) but not the cortex. No significant

7

enrichment was seen for the intra-hemispheric gene list (see Figure 5). To ensure the significance difference

8

for the striatum gene list was not related to the size of the gene data set we repeated the analysis using the

9

top 25 most significant genes based on Hodges q-value. Results were consistent with the 515 gene list

SC

RI PT

2

Enrichment of genes showing abnormal transcription in HD is seen in the cortico-striatal and ACCEPTED MANUSCRIPT inter-hemispheric gene expression profiles

1

showing significant enrichment for HD genes in the striatum for the cortico-striatal (p = 0.019) and inter-

11

hemispheric (p = 0.004) analyses (see Supplementary Material Figure S4). Enrichment compared against

12

biologically plausible gene sets revealed similar results, for both 515 and 25 striatum gene lists, with

13

significant enrichment for cortico-striatal (p < 0.001) and inter-hemispheric analyses (p < 0.001) but not for

14

the intra-hemispheric analysis. This suggests that abnormal transcription in HD may be associated with

15

cortico-striatal and inter-hemispheric WM connectivity loss.

TE D

16

M AN U

10

To further investigate the relationship between changes in gene expression in HD relative to controls and cortico-striatal WM loss we performed correlations between the log2 fold change in the Hodges (31),

18

Durrenberger (32) and Langfelder studies (30) for the 515 striatum gene set and the PLS weights from the

19

cross-sectional cortico-striatal analysis. This revealed negative correlations between PLS weights and Log2

20

fold change (Hodges, rho = -0.23, p = 1.1x10-7, Durrenberger, rho = -0.23, 8.4x10-8, Langfelder, rho = -0.19,

21

p = 1.6x10-5) (see Supplemental Material Figure S6). This suggests that genes associated with cortico-

AC C

EP

17

Peter McColgan 14

1

striatal white matter loss in preHD are also those that show reduced levels of transcription in human HD and

2

animal models.

3

Enrichment for other gene lists

4

We also investigated enrichment for genes associated with human supragranular cortex, oligodendrocytes

5

and cell cycle metabolism. The cortico-striatal (CS) and inter-hemispheric (IH) gene lists were significantly

6

enriched for human supragranular cortex genes (CS, p = 0.002, IH, p = 0.006) and oligodendrocyte genes

7

(CS, p < 0.001, IH, p < 0.001) but not cell cycle metabolism genes. Conversely the intra-hemispheric gene

8

list was significantly enriched for cell cycle metabolism genes (p < 0.001) but not human supragranular or

9

oligodendrocyte genes. This suggests a relationship between cortico-striatal white matter loss and abnormal

SC

RI PT

ACCEPTED MANUSCRIPT

transcription in oligodendrocytes. Additionally abnormal transcription in cortical supragranular genes,

11

which are implicated in long-range connectivity (33), may be linked to connectivity cortico-striatal and

12

inter-hemispheric white matter loss.

16 17 18 19

EP

15

AC C

14

TE D

13

M AN U

10

Peter McColgan 15

1

Discussion

2

In this study, we find that regional variance in white matter loss in preHD is differentially associated with

3

the pattern of expression of genes involved in synaptic, metabolic and chromatin related processes in the

4

healthy human brain. Cortico-striatal and inter-hemispheric WM loss is associated with synaptic genes,

5

whereas intra-hemispheric WM loss is associated with metabolic and chromatin-related genes. There is also

6

a distinction between gene enrichment in cortical regions where enrichment associated with cortico-striatal

7

connections is seen in more posterior regions such as motor, occipital and parietal cortices, whereas

8

enrichment associated with inter-hemispheric connections is seen in frontal, temporal and insula cortices.

9

We reveal that genes showing abnormal transcription in HD humans and animal models are over expressed

SC

RI PT

ACCEPTED MANUSCRIPT

in the ranked gene list associated with cortico-striatal and inter-hemispheric WM loss but not intra-

11

hemispheric WM connection loss.

12

M AN U

10

We focus on synaptic, metabolic and chromatin related genes to simplify interpretation of our results. However specific GO terms such as DNA metabolism may relate to DNA repair (34). DNA repair genes,

14

such as MSH3, have been linked to CAG instability (35), age of onset (36) and disease progression (37).

15

The GO terms mRNA metabolism may relate to splicing of mRNA, which has also been implicated in HD

16

pathogenesis. Aberrant splicing of the mutant huntingtin gene leads to the generation of the pathogenic exon

17

1 HTT protein (38). We note that further work would be needed to link these specific gene sets directly to

18

white matter loss in HD.

EP

Several studies have analysed gene expression profiles both in human HD and animal models. Gene

AC C

19

TE D

13

20

expression measured in post-mortem brain samples from HD patients was most affected in the caudate,

21

followed by the motor cortex, while no abnormalities were detected in the prefrontal association cortex (31).

Peter McColgan 16

1

The GO term showing greatest significance for both the caudate and motor cortex was synaptic transmission.

2

Furthermore, significance for the GO terms metabolism and glucose metabolism were seen in the cortex, but

3

not the caudate. These findings agree with the associations between synaptic genes and cortico-striatal WM

4

connection loss and metabolic genes and intra-hemispheric WM connection loss that we demonstrate here.

5

ACCEPTED MANUSCRIPT

In our previous longitudinal study, WM loss was greatest in cortico-striatal and inter-hemispheric connections in preHD relative to controls. No group differences were seen in intra-hemispheric connections

7

(6). The analysis presented here is based on regional atrophy of connection subtype. Therefore cortico-

8

striatal and inter-hemispheric regional atrophy is likely to be greater than intra-hemispheric regional atrophy.

9

Furthermore cortico-striatal and inter-hemispheric connections have greater topographical lengths than intra-

SC

RI PT

6

hemispheric connections (6). Therefore these similarities between cortico-striatal and inter-hemispheric

11

connections may account for the similarity between gene profiles.

12

M AN U

10

Changes from synaptic to metabolic profiles in cross-sectional vs. longitudinal, streamline volume vs. FA weighting and Desikan vs. scale 60 easy Lausanne atlas were seen for cortico-striatal and inter-

14

hemispheric connections. We investigated this further showing common genes highly ranked in both

15

profiles. One explanation for this may be that atrophy scores cross-sectionally will be higher than

16

longitudinal rate of atrophy scores. Similarly, atrophy scores in the Desikan 68-region atlas are likely to be

17

larger than in the more finely parcellated easy Lausanne scale 60 (110-region) atlas. With respect to FA

18

weighting, this metric is difficult to interpret in crossing fibre regions, which make up an estimated 60-90%

19

of the human brain (39).

EP

AC C

20

TE D

13

The gene ontology categories identified contain large numbers of genes. We therefore balance this

21

data driven approach by investigating whether gene profiles associated with regional white matter loss in

22

preHD are enriched for genes known to show abnormal transcription in both human HD and animal models.

Peter McColgan 17

1

Similar GO terms such as synaptic transmission and chromatin modification have been associated with

2

functional brain networks in healthy participants (24, 40). This likely represents the close relationship

3

between the healthy brain network and the perturbation of that network in neurodegeneration (41).

4

ACCEPTED MANUSCRIPT

We acknowledge the limitations of diffusion tractography. To address these we used both CSD, which deals more effectively with crossing fibres than the diffusion tensor or multi-tensor methods (28) and

6

SIFT2, which has higher reproducibility and is more representative of the underlying biology of WM

7

connections than conventional methods (42). CSD performs well at the acquisition protocol specifications

8

used in this study (b =1000) (43, 44). At b=1000 a minimum number of 28 gradient directions is required

9

(45). Therefore the angular coverage achieved using CSD at b=1000 is more than sufficient with 42

11

SC

directions.

M AN U

10

RI PT

5

The use of gene expression data from the healthy human brain to explain white matter loss in preHD is limited to the extent that transcription in preHD may be different than that seen in healthy brains.

13

However studies from post mortem manifest HD brains show that the transcription in the striatum is most

14

affected with limited abnormalities in the cortex (31). Indeed the transcription of only 25 genes in the cortex

15

is abnormal in both human and animal studies, compared to 515 in the striatum (30). Therefore, we mitigate

16

for the likely transcription abnormalities in preHD by using only cortical gene expression data from the

17

AIBS transcriptome atlas (46).

EP

18

TE D

12

We map the anatomical location of ROIs to corresponding regions in the AIBS atlas. However the resolution of these atlases are different and thus we acknowledge that the correspondence may not be exact

20

and may be a limitation of our methodology. There are other human brain transcriptome atlas such as

21

Braineac (47) and the Human Brain Transcriptome Project (48) however these atlases offer low resolution

22

compared to the AIBS atlas, where only a small number of cortical regions have been sampled so the

AC C

19

Peter McColgan 18

1

analysis carried out in this study could not be reproduced using Braineac or the Human Brain Transcriptome

2

Project atlas.

3

ACCEPTED MANUSCRIPT

The utility of using information from the healthy human brain to inform us about the patterns and mechanisms of neurodegeneration has been demonstrated many times in neuroimaging. Functional

5

connectivity and white matter networks from healthy participants can predict atrophy in Alzheimer’s disease,

6

corticobasal syndrome, fronto-temporal dementia and Parkinson’s disease (41, 49-51). More recently

7

transcriptome data from the healthy brains of the AIBS atlas has been used to investigate the association

8

between the expression of schizophrenia risk genes and white matter disconnectivity (52). The regional

9

expression of the tau gene MAPT from the AIBS atlas has also been linked to the selective vulnerability of

SC

RI PT

4

highly connected brain regions in Parkinson’s disease and progressive supranuclear palsy (53).

11

Conclusion

12

We show that cortico-striatal and inter-hemispheric WM connection loss is associated with the expression of

13

synaptic genes in preHD, while intra-hemispheric WM loss is associated with metabolic genes. Genes

14

showing abnormal transcription in HD are associated with the synaptic but not metabolic gene profiles.

15

These findings have important implications for linking the earliest WM changes in preHD to the underlying

16

pathological processes that may drive them.

17

Acknowledgements

18

Track-On HD Investigators

19

A Coleman, J Decolongon, M Fan, T. Petkau (University of British Columbia, Vancouver); C Jauffret, D

20

Justo, S Lehericy, K Nigaud, R Valabrègue (ICM and APHP, Pitié- Salpêtrière University Hospital,

AC C

EP

TE D

M AN U

10

Peter McColgan 19

1

Paris). A Schoonderbeek, E P ‘t Hart (Leiden University Medical Centre, Leiden); DJ Hensman Moss, R

2

Ghosh, H Crawford, M Papoutsi, C Berna, D Mahaleskshmi (University College London, London). R

3

Reilmann, N Weber (George Huntington Institute, Munster); I Labuschagne, J Stout (Monash University,

4

Melbourne); B Landwehrmeyer, M Orth, I Mayer (University of Ulm, Ulm); H Johnson (University of

5

Iowa); D Crawfurd (University of Manchester).

6

Financial disclosure

7

All authors report no biomedical financial interests or potential conflicts of interest.

8

Funding

9

This study was funded by the Wellcome Trust (GR, PMC) (091593/Z/10/Z, 515103) and supported by the

10

National Institute for Health Research [NIHR] University College London Hospitals [UCLH] Biomedical

11

Research Centre [BRC]. Track-On HD is funded by the CHDI foundation, a not for profit organisation

12

dedicated to finding treatments for Huntington’s disease. We would like to thank Timothy Rittman for

13

guidance on using Maybrain software and Dr Kirstie Whitaker and Dr Petra Vertes for making their code

14

freely available and guidance on its implementation.

15

References

16

1.

17

Dis Primers. 1:15005.

18

2.

19

premanifest and early stage Huntington's disease in the TRACK-HD study: the 12-month longitudinal analysis. Lancet

20

Neurol. 10:31-42.

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

EP

Bates GP, Dorsey R, Gusella JF, Hayden MR, Kay C, Leavitt BR, et al. (2015): Huntington disease. Nat Rev

AC C

Tabrizi SJ, Scahill RI, Durr A, Roos RA, Leavitt BR, Jones R, et al. (2011): Biological and clinical changes in

Peter McColgan 20

1

3.

2

in white matter pathways of the sensorimotor cortex in premanifest Huntington's disease. Hum Brain Mapp. 33:203-

3

212.

4

4.

5

Multimodal MRI analysis of the corpus callosum reveals white matter differences in presymptomatic and early

6

Huntington's disease. Cereb Cortex. 22:2858-2866.

7

5.

8

and deep gray matter alterations in premanifest Huntington disease. Neuroimage Clin. 11:450-460.

9

6.

Dumas EM, van den Bogaard SJ, Ruber ME, Reilman RR, Stout JC, Craufurd D, et al. (2012): Early changes

ACCEPTED MANUSCRIPT

Di Paola M, Luders E, Cherubini A, Sanchez-Castaneda C, Thompson PM, Toga AW, et al. (2012):

RI PT

Faria AV, Ratnanather JT, Tward DJ, Lee DS, van den Noort F, Wu D, et al. (2016): Linking white matter

McColgan P, Seunarine KK, Gregory S, Razi A, Papoutsi M, Long JD, et al. (2017): Topological length of

white matter connections predicts their rate of atrophy in premanifest Huntington's disease. JCI Insight. 2.

11

7.

12

manifestations of Huntington's disease in the longitudinal TRACK-HD study: cross-sectional analysis of baseline data.

13

Lancet Neurol. 8:791-801.

14

8.

15

Neurol. 10:83-98.

16

9.

Saudou F, Humbert S (2016): The Biology of Huntingtin. Neuron. 89:910-926.

17

10.

Plotkin JL, Surmeier DJ (2015): Corticostriatal synaptic adaptations in Huntington's disease. Curr Opin

18

Neurobiol. 33C:53-62.

19

11.

20

analysis implicates the huntingtin polyglutamine tract in extra-mitochondrial energy metabolism. PLoS Genet. 3:e135.

21

12.

22

and free radical damage in the Huntington R6/2 transgenic mouse. Ann Neurol. 47:80-86.

23

13.

24

disease pathogenesis. Curr Opin Pharmacol. 26:33-38.

SC

10

M AN U

Tabrizi SJ, Langbehn DR, Leavitt BR, Roos RA, Durr A, Craufurd D, et al. (2009): Biological and clinical

TE D

Ross CA, Tabrizi SJ (2011): Huntington's disease: from molecular pathogenesis to clinical treatment. Lancet

EP

Lee JM, Ivanova EV, Seong IS, Cashorali T, Kohane I, Gusella JF, et al. (2007): Unbiased gene expression

AC C

Tabrizi SJ, Workman J, Hart PE, Mangiarini L, Mahal A, Bates G, et al. (2000): Mitochondrial dysfunction

Andre R, Carty L, Tabrizi SJ (2016): Disruption of immune cell function by mutant huntingtin in Huntington's

Peter McColgan 21

1

14.

2

disease? Neurobiol Dis. 45:83-98.

3

15.

4

disease patient myeloid cells reveals innate transcriptional dysregulation associated with proinflammatory pathway

5

activation. Hum Mol Genet. 25:2893-2904.

6

16.

7

layer 5 pyramidal neurons may contribute to impaired corticostriatal connectivity in huntington disease. J Neuropathol

8

Exp Neurol. 69:880-895.

9

17.

Seredenina T, Luthi-Carter R (2012): What have we learned from gene expression profiles in Huntington's

ACCEPTED MANUSCRIPT

Miller JR, Lo KK, Andre R, Hensman Moss DJ, Trager U, Stone TC, et al. (2016): RNA-Seq of Huntington's

RI PT

Zucker B, Kama JA, Kuhn A, Thu D, Orlando LR, Dunah AW, et al. (2010): Decreased Lin7b expression in

Gambazzi L, Gokce O, Seredenina T, Katsyuba E, Runne H, Markram H, et al. (2010): Diminished activity-

dependent brain-derived neurotrophic factor expression underlies cortical neuron microcircuit hypoconnectivity

11

resulting from exposure to mutant huntingtin fragments. J Pharmacol Exp Ther. 335:13-22.

12

18.

13

derived neurotrophic factor (BDNF) and wild-type huntingtin in normal and quinolinic acid-lesioned rat brain. Eur J

14

Neurosci. 18:1093-1102.

15

19.

16

mRNA expression in the mouse striatum: 18S-rRNA is a reliable control gene for studies of the striatum. Neurosci

17

Bull. 28:517-531.

18

20.

19

receptor gamma coactivator 1 alpha contributes to dysmyelination in experimental models of Huntington's disease. J

20

Neurosci. 31:9544-9553.

21

21.

22

deficits occur prior to neuronal loss in the YAC128 and BACHD models of Huntington disease. Hum Mol Genet.

23

25:2621-2632.

24

22.

25

genetic signatures of the adult human brain. Nat Neurosci. 18:1832-1844.

SC

10

M AN U

Fusco FR, Zuccato C, Tartari M, Martorana A, De March Z, Giampa C, et al. (2003): Co-localization of brain-

TE D

Espindola S, Vilches-Flores A, Hernandez-Echeagaray E (2012): 3-Nitropropionic acid modifies neurotrophin

EP

Xiang Z, Valenza M, Cui L, Leoni V, Jeong HK, Brilli E, et al. (2011): Peroxisome-proliferator-activated

AC C

Teo RT, Hong X, Yu-Taeger L, Huang Y, Tan LJ, Xie Y, et al. (2016): Structural and molecular myelination

Hawrylycz M, Miller JA, Menon V, Feng D, Dolbeare T, Guillozet-Bongaarts AL, et al. (2015): Canonical

Peter McColgan 22

1

23.

2

system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. Neuroimage.

3

31:968-980.

4

24.

5

transcription profiles associated with inter-modular hubs and connection distance in human functional magnetic

6

resonance imaging networks. Philos Trans R Soc Lond B Biol Sci. 371.

7

25.

8

associated with genomically patterned consolidation of the hubs of the human brain connectome. Proc Natl Acad Sci

9

U S A. 113:9105-9110.

Desikan RS, Segonne F, Fischl B, Quinn BT, Dickerson BC, Blacker D, et al. (2006): An automated labeling

ACCEPTED MANUSCRIPT

Vertes PE, Rittman T, Whitaker KJ, Romero-Garcia R, Vasa F, Kitzbichler MG, et al. (2016): Gene

RI PT

Whitaker KJ, Vertes PE, Romero-Garcia R, Vasa F, Moutoussis M, Prabhu G, et al. (2016): Adolescence is

10

26.

11

enriched GO terms in ranked gene lists. BMC Bioinformatics. 10:48.

12

27.

13

Huntington's Disease: Evidence From the Track-On HD Study. EBioMedicine. 2:1420-1429.

14

28.

15

Imaging Systems and Technology. 22:53-56.

16

29.

17

brain white matter connectivity using streamlines tractography. Neuroimage. 119:338-351.

18

30.

19

and proteomics define huntingtin CAG length-dependent networks in mice. Nat Neurosci. 19:623-633.

20

31.

21

expression changes in human Huntington's disease brain. Hum Mol Genet. 15:965-977.

22

32.

23

Common mechanisms in neurodegeneration and neuroinflammation: a BrainNet Europe gene expression microarray

24

study. J Neural Transm (Vienna). 122:1055-1068.

SC

Eden E, Navon R, Steinfeld I, Lipson D, Yakhini Z (2009): GOrilla: a tool for discovery and visualization of

M AN U

Kloppel S, Gregory S, Scheller E, Minkova L, Razi A, Durr A, et al. (2015): Compensation in Preclinical

Tournier JD, Calamante F, Connelly A (2012): MRtrix: Diffusion tractography in crossing fiber regions.

TE D

Smith RE, Tournier JD, Calamante F, Connelly A (2015): SIFT2: Enabling dense quantitative assessment of

Langfelder P, Cantle JP, Chatzopoulou D, Wang N, Gao F, Al-Ramahi I, et al. (2016): Integrated genomics

EP

Hodges A, Strand AD, Aragaki AK, Kuhn A, Sengstag T, Hughes G, et al. (2006): Regional and cellular gene

AC C

Durrenberger PF, Fernando FS, Kashefi SN, Bonnert TP, Seilhean D, Nait-Oumesmar B, et al. (2015):

Peter McColgan 23

1

33.

2

enriched genes associate with corticocortical network architecture in the human brain. Proc Natl Acad Sci U S A.

3

113:E469-478.

4

34.

5

variants associated with Huntington's disease progression: a genome-wide association study. Lancet Neurol.

6

35.

7

postmitotic neurons. Proc Natl Acad Sci U S A. 105:3467-3472.

8

36.

9

Onset of Huntington's Disease. Cell. 162:516-526.

Krienen FM, Yeo BT, Ge T, Buckner RL, Sherwood CC (2016): Transcriptional profiles of supragranular-

ACCEPTED MANUSCRIPT

Moss DJH, Pardinas AF, Langbehn D, Lo K, Leavitt BR, Roos R, et al. (2017): Identification of genetic

RI PT

Gonitel R, Moffitt H, Sathasivam K, Woodman B, Detloff PJ, Faull RL, et al. (2008): DNA instability in

Genetic Modifiers of Huntington's Disease C (2015): Identification of Genetic Factors that Modify Clinical

10

37.

11

variants associated with Huntington's disease progression: a genome-wide association study. Lancet Neurol. 16:701-

12

711.

13

38.

14

splicing of HTT generates the pathogenic exon 1 protein in Huntington disease. Proc Natl Acad Sci U S A. 110:2366-

15

2370.

16

39.

17

fiber configurations in white matter tissue with diffusion magnetic resonance imaging. Hum Brain Mapp. 34:2747-

18

2766.

19

40.

20

NETWORKS. Correlated gene expression supports synchronous activity in brain networks. Science. 348:1241-1244.

21

41.

22

the healthy brain functional connectome. Neuron. 73:1216-1227.

23

42.

24

biological accuracy of the structural connectome. Neuroimage. 104:253-265.

M AN U

SC

Moss DJH, Pardinas AF, Langbehn D, Lo K, Leavitt BR, Roos R, et al. (2017): Identification of genetic

Sathasivam K, Neueder A, Gipson TA, Landles C, Benjamin AC, Bondulich MK, et al. (2013): Aberrant

TE D

Jeurissen B, Leemans A, Tournier JD, Jones DK, Sijbers J (2013): Investigating the prevalence of complex

EP

Richiardi J, Altmann A, Milazzo AC, Chang C, Chakravarty MM, Banaschewski T, et al. (2015): BRAIN

AC C

Zhou J, Gennatas ED, Kramer JH, Miller BL, Seeley WW (2012): Predicting regional neurodegeneration from

Smith RE, Tournier JD, Calamante F, Connelly A (2015): The effects of SIFT on the reproducibility and

Peter McColgan 24

1

43.

2

clinical b-values: an evaluation study. Med Phys. 38:5239-5253.

3

44.

4

MRI: development of simulated brain images and comparison of multi-fiber analysis methods at clinical b-values.

5

Neuroimage. 109:341-356.

6

45.

7

gradient directions for high-angular-resolution diffusion-weighted imaging. NMR Biomed. 26:1775-1786.

8

46.

9

comprehensive atlas of the adult human brain transcriptome. Nature. 489:391-399.

Ramirez-Manzanares A, Cook PA, Hall M, Ashtari M, Gee JC (2011): Resolving axon fiber crossings at

ACCEPTED MANUSCRIPT

Wilkins B, Lee N, Gajawelli N, Law M, Lepore N (2015): Fiber estimation and tractography in diffusion

RI PT

Tournier JD, Calamante F, Connelly A (2013): Determination of the appropriate b value and number of

Hawrylycz MJ, Lein ES, Guillozet-Bongaarts AL, Shen EH, Ng L, Miller JA, et al. (2012): An anatomically

10

47.

11

regulation of gene expression in ten regions of the human brain. Nat Neurosci. 17:1418-1428.

12

48.

13

human brain. Nature. 478:483-489.

14

49.

15

connectivity predicts atrophy progression in non-fluent variant of primary progressive aphasia. Brain.

16

50.

17

scale human brain networks. Neuron. 62:42-52.

18

51.

19

brain atrophy in de novo Parkinson's disease. Elife. 4.

20

52.

21

and Cortical Gene Expression in Patients With Schizophrenia. Biol Psychiatry.

22

53.

23

the MAPT gene is associated with loss of hubs in brain networks and cognitive impairment in Parkinson disease and

24

progressive supranuclear palsy. Neurobiol Aging. 48:153-160.

SC

Ramasamy A, Trabzuni D, Guelfi S, Varghese V, Smith C, Walker R, et al. (2014): Genetic variability in the

M AN U

Kang HJ, Kawasawa YI, Cheng F, Zhu Y, Xu X, Li M, et al. (2011): Spatio-temporal transcriptome of the

Mandelli ML, Vilaplana E, Brown JA, Hubbard HI, Binney RJ, Attygalle S, et al. (2016): Healthy brain

TE D

Seeley WW, Crawford RK, Zhou J, Miller BL, Greicius MD (2009): Neurodegenerative diseases target large-

Zeighami Y, Ulla M, Iturria-Medina Y, Dadar M, Zhang Y, Larcher KM, et al. (2015): Network structure of

EP

Romme IA, de Reus MA, Ophoff RA, Kahn RS, van den Heuvel MP (2016): Connectome Disconnectivity

AC C

Rittman T, Rubinov M, Vertes PE, Patel AX, Ginestet CE, Ghosh BC, et al. (2016): Regional expression of

Peter McColgan 25

1

Figures legends and tables

2

Figure 1. Schematic illustrating sub-groups of regional white matter connectivity. (A) Cortico-striatal:

3

connections between cortex and striatum (caudate and putamen) for each cortical region of interest (ROI).

4

(B) Inter-hemispheric: connections to the opposite hemisphere for each cortical ROI. (C) Intra-hemispheric:

5

connections within the same hemisphere for each cortical ROI. Light blue – left hemisphere, purple – right

6

hemisphere, dark blue – caudate, yellow – putamen.

7

Figure 2. (A) Cortico-striatal cross-sectional analysis semantic similarity scatter plot: Significant gene

8

ontology (GO) terms for biological processes associated with the first component of the partial least squares

9

(PLS) analysis are plotted in semantic space, where similar terms are clustered together. The top 5 most

RI PT

ACCEPTED MANUSCRIPT

significant GO terms are labelled for each analysis. Redundant GO terms and those associated with greater

11

than 1000 genes have been excluded. Markers are scaled based on the log10 q-value for the significance of

12

each GO term. Large blue circles are highly significant, while red circles are less significant (see colour bar).

13

(B) Inter-hemispheric cross-sectional analysis semantic similarity scatter plot: Significant gene

14

ontology (GO) terms for biological processes associated with the first component of the partial least squares

15

(PLS) analysis are plotted in semantic space, where similar terms are clustered together. The top 5 most

16

significant GO terms are displayed for each analysis. Redundant GO terms and those associated with greater

17

than 1000 genes have been excluded. Markers are scaled based on the log10 q-value for the significance of

18

each GO term. Large blue circles are highly significant, while red circles are less significant (see colour bar).

19

(C) Intra-hemispheric cross-sectional analysis semantic similarity scatter plot: Significant gene

20

ontology (GO) terms for biological processes associated with the first component of the partial least squares

21

(PLS) analysis are plotted in semantic space, where similar terms are clustered together. The top 5 most

22

significant GO terms are displayed for each analysis. Redundant GO terms and those associated with greater

23

than 1000 genes have been excluded. Markers are scaled based on the log10 q-value for the significance of

24

each GO term. Large blue circles are highly significant, while red circles are less significant (see colour bar).

25

Figure 3. ROI weights for cross-sectional partial least squares regression analyses. (A) Cortico-striatal

26

(B) Inter-hemispheric (C) Intra-hemispheric. Brain regions displayed on brain mesh. Size and colour of

27

region indicates size of ROI weight (ranked from smallest-largest, 1-6). See colour map.

AC C

EP

TE D

M AN U

SC

10

Peter McColgan 26

1

Figure 4. Dissociation of cortico-striatal and inter/intra-hemispheric gene enrichment in the cortex.

2

(A) ROI weights for the first PLS component of the cross-sectional analysis for inter-hemispheric vs.

3

cortical-striatal. (B) ROI weights for the first PLS component of the longitudinal analysis for inter-

4

hemispheric vs. cortical-striatal. (C) ROI weights for the first PLS component of the cross-sectional analysis

5

for intra-hemispheric vs. cortical-striatal. (D) ROI weights for the first PLS component of the longitudinal

6

analysis for intra-hemispheric vs. cortical-striatal. Each red circle represents a cortical ROI.

7

Figure 5. Enrichment of genes showing abnormal transcription in Huntington’s disease in the first

8

PLS components of cortico-striatal, inter-hemispheric and intra-hemispheric cross-sectional analyses.

9

Red circle illustrates the mean weight (on the x-axis) for the gene list of interest in the first PLS component.

10

The y-axis represents the number of permutations of random genes from the first PLS component. Gene lists

11

over expressed in the first PLS component have a mean great than that of the random permutations (red

12

circle to the right of the permutation distribution).

14 15 16

21 22 23 24

EP

20

AC C

19

TE D

17 18

SC

M AN U

13

RI PT

ACCEPTED MANUSCRIPT

Peter McColgan 27

1

Table 1. Cortico-striatal, inter-hemispheric and intra-hemispheric cross-sectional analyses: Gene

2

ontology (GO) terms for biological processes associated with top ranking genes from the first component of

3

the partial least squares (PLS) analysis. The top 5 most significant GO terms are displayed for each analysis.

4

Full tables can be found in Supplementary file 2. Redundant GO terms and those associated with greater

5

than 1000 genes have been excluded. B – total number of genes associated with a specific GO term, n –

6

number of genes in the target set, b – is the number of genes in the intersection. Enrichment (E) = (b/n) /

7

(B/total number of genes). See (26) for further details. PLS1 Cortico-striatal Cross-sectional P-value

GO:0031344

Description regulation of dendrite development modulation of chemical synaptic transmission regulation of cell projection organization

1.88E-06

4.06E-03

1.29

549

6375

255

GO:0044057

regulation of system process

3.31E-06

4.98E-03

1.33

481

5795

209

GO:0030030

cell projection organization

4.31E-06

5.41E-03

1.24

699

6498

319

GO:0050804

PLS1 Inter-hemispheric Cross-sectional

GO:0043623

Description modulation of chemical synaptic transmission regulation of cell projection organization cellular protein complex assembly

GO:0061024

membrane organization

GO:0030030

cell projection organization

GO:0050804 GO:0031344

3.03E-03

1.06E-06

3.19E-03

P-value

B

n

b

2.18

124

3150

48

1.4

297

6419

151

FDR q-value

Enrichment

1.40E-14

3.51E-11

1.74

297

5246

153

1.99E-13

2.73E-10

1.64

549

3924

199

7.20E-13

8.34E-10

1.65

371

4892

169

1.51E-11

1.19E-08

1.45

820

4221

283

3.85E-08

1.47

699

4281

248

FDR q-value

Enrichment

TE D

GO Term

8.05E-07

Enrichment

SC

GO:0050773

FDR q-value

M AN U

GO Term

RI PT

ACCEPTED MANUSCRIPT

6.38E-11

B

n

b

PLS1 Intra-hemispheric Cross-sectional

9

GO:0016071

mRNA metabolic process

2.91E-33

1.12E-30

1.84

593

5085

313

GO:0006396

RNA processing

4.39E-30

1.58E-27

1.65

806

5357

402

GO:0006325

chromatin organization

1.11E-25

3.73E-23

1.79

657

4364

289

GO:0006397

mRNA processing

4.33E-21

1.42E-18

1.78

402

5435

219

GO:0019083

viral transcription

2.79E-20

8.77E-18

3.01

99

4044

68

EP

Description

AC C

8

GO Term

P-value

B

n

b

Peter McColgan 28

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIPT McColgan et al.

Supplement

Brain Regions Showing White Matter Loss in Huntington’s Disease Are Enriched for Synaptic and Metabolic Genes

RI PT

Supplemental Information

Supplemental Methods

SC

Imaging Cohort

Track-On is an extension of the Track-HD (1) study, but with only preHD and control

M AN U

participants carried over (early HD participants from Track-HD were excluded). Informed consent was obtained from each participant, and the study protocol was approved by the local ethics committees. Of the participants included, 31 preHD and 29 controls had participated previously in Track-HD (1). The preHD participants required a disease burden score (DBS) > 250 (2), on the basis of their medical records at the time of assessment. Controls were

TE D

selected from the spouses or partners of preHD individuals or were gene-negative siblings, to ensure consistency of environments. For this study, we excluded participants who had manifest disease at baseline, were left handed or ambidextrous, or had poor quality diffusion-

EP

weighted imaging (DWI) data, as defined by visual quality control. Therefore only preHD

AC C

participants were included who have not yet developed the motor manifestations of HD.

MRI Acquisition

Data were acquired on two different 3T MRI scanners (Philips Achieva at Leiden and Vancouver and Siemens TIM Trio at London and Paris), both using a 12-channel head coil. T1-weighted image volumes were acquired using a 3D MPRAGE acquisition sequence with the following imaging parameters: TR = 2200ms (Siemens)/ 7.7ms (Philips), TE=2.2ms (S)/3.5ms (P), FA=10◦ (S)/8◦(P), FOV= 28cm (S)/ 24cm (P), matrix size 256x256

1

ACCEPTED MANUSCRIPT McColgan et al.

Supplement

(S)/224x224 (P), 208 (S)/164 (P) sagittal slices to cover the entire brain with a slice thickness of 1.0 mm with no gap. Diffusion-weighted images were acquired with 42 unique gradient directions

RI PT

(b = 1000 sec/mm2). Eight images with no diffusion weighting (b = 0 sec/mm2) and one image with no diffusion weighting (b = 0 sec/mm2) were acquired from the Siemens and Philips scanners respectively. For the Siemens scanners, TE = 88ms and TR = 13s; for the Phillips scanners, TE = 56ms and TR = 11s. Voxel size for the Siemens scanners was 2 x 2 x

SC

2 mm and for the Phillips scanners 1.96 x 1.96 x 2. Seventy-five slices were collected for Scanning time was

M AN U

each diffusion-weighted and non-diffusion weighted volume.

approximately 12 minutes for T1-weighted and 10 minutes for diffusion-weighted acquisitions.

MRI Data Analysis

TE D

Structural MRI Data

Cortical and sub-cortical regions of interest (ROIs) were generated by segmenting a T1weighted image using FreeSurfer (3). These included 70 cortical regions and 4 sub-cortical

EP

regions (caudate and putamen bilaterally). We chose to focus on the caudate and putamen sub-cortical structures based on observations from our cross-sectional structural connectivity

AC C

study (4) and from the earlier Track-HD studies (5, 6) that show the caudate and putamen are the sub-cortical structures most affected in preHD both in terms of grey matter volume and white matter connections While some studies have shown changes in the thalamus, globus pallidus and nucleus accumbens in preHD these tend to occur in preHD participants closer to disease onset (7, 8). Furthermore automatic segmentation of globus pallidus, nucleus accumbens and amygdala are not sufficiently reliable (9).



2

ACCEPTED MANUSCRIPT McColgan et al.

Supplement

We choose the Desikan FreeSurfer atlas as this is based on 40 subjects across a range of ages encompassing 4 groups; young adults, middle aged adults, elderly adults and patients with Alzheimer’s disease. By including subjects with age and neurodegenerative related

RI PT

atrophy this better accounts for inter-subject variability (3), particularly in the case of our cohort, which contains adults across a range of ages and those with preHD. We have used this atlas extensively in HD, for both cross-sectional and longitudinal connectome analyses (4, 10-12). Atlases with large numbers of ROIs demonstrate less reproducibility (13). While the

SC

AAL atlas is commonly used in graph theory studies this is derived from single subject who

M AN U

was young and healthy and is therefore not suitable for the cohort investigated here (14).

Data Pre-processing

For the diffusion data the b=0 image was used to generate a brain mask using FSL’s brain extraction tool (15). Eddy current correction was used to align the diffusion-weighted

TE D

volumes to the first b=0 image and the gradient directions updated to reflect the changes to the image orientations. Finally, diffusion tensor metrics were calculated and constrained spherical deconvolution (CSD) applied to the data as implemented in MRtrix (16). FreeSurfer

EP

Desikan atlas (3) ROIs were warped into diffusion space by mapping between the T1weighted image and fractional anisotropy (FA) map using NiftyReg (17) and applying the A foreground mask was generated by combining

AC C

resulting warp to each of the ROIs.

FreeSurfer segmentations with the WM mask.

Diffusion Tensor Imaging Data Diffusion Tractography Whole brain probabilistic tractography was performed using the iFOD2 algorithm in MRtrix (16). Specifically, five million streamlines were randomly seeded throughout the WM, in all



3

ACCEPTED MANUSCRIPT McColgan et al.

Supplement

foreground voxels where FA>0.2. Streamlines were terminated when they either reached the cortical or subcortical grey-matter mask or exited the foreground mask. The spherical deconvolution informed filtering of tractograms (SIFT2) algorithm (18) was used to reduce

RI PT

biases. The resulting set of streamlines was used to construct the structural brain network. To demonstrate our results were robust to varying methodologies additional cross-sectional analyses were completed using the addition of Gaussian noise to connectomes, FA weighting of connections and the Easy Lausanne scale 60 atlas (110 ROIs) (13) with connectomes

SC

undergoing consensus based thresholding at 75% and 50%. These values were chosen as they

M AN U

have been commonly used in structural connectomics (4, 19, 20).

Construction of Structural Connectivity Matrices

For structural connectivity matrices ROIs were defined as connected if a fibre originated in ROI 1 and terminated in ROI 2. Structural connections were weighted by streamline count

TE D

and a cross-sectional area multiplier as implemented in SIFT2 (18). Probabilistic tractography as implemented in MRtrix3 creates a connectome composed of one upper triangle of a connectivity matrix. This is then copied to the lower triangle to generate a symmetric matrix

EP

of 74x74. As there is no consensus in the literature regarding the optimal graph thresholding strategy (21) and results can vary widely based on the chosen approach (22) SIFT2 was our

AC C

preferred method of bias correction. Indeed the creators of SIFT2 argue against the use of matrix thresholding as it introduces an arbitrary threshold value (23). SIFT2 was chosen in preference to SIFT as it requires much less processing time and retains the full connectome. SIFT2 utilises information from the FOD to determine a cross sectional area for each streamline thereby generating streamline volume estimates between regions (18). Currently in the literature there is no consensus regarding volume normalisation in connectome studies. There is a suggestion that volume normalisation may overcompensate



4

ACCEPTED MANUSCRIPT McColgan et al.

Supplement

volume-driven effects on streamline count (24). In keeping with this in our previous study we analysed both volume normalised and un-normalised connectomes and showed that volume normalisation results in biologically implausible findings, which are likely spurious (4). In a

RI PT

subsequent study using the same data set presented here we performed two complimentary tractography approaches: connectomics and voxel connectivity profiles (VCPs) (10). Volume normalisation was performed in the VCP analyses as the tractography is performed at the voxel level. Results between the two approaches were consistent suggesting the limited

SC

amount of brain atrophy seen in preHD has a minimal effect on tractography. Previous work

M AN U

by our group has demonstrated low within-subject variability of diffusion metrics in manifest HD participants, suggesting atrophy does not cause significant distortion of the diffusion signal (25). Thus the more limited atrophy seen in preHD is unlikely to introduce systematic differences in connectome construction.

TE D

Regional White Matter Atrophy

For each cortical brain region connection strength was defined as either the sum of corticostriatal connection weights, sum of connection weights from regions in the opposite

EP

hemisphere (inter-hemispheric) or sum of connection weights from regions in the same hemisphere (intra-hemispheric). Rate of change in connection strength over 24 months was

AC C

defined in the same way. PreHD were normalised relative to controls using a Z-score. These were then transformed to give positive atrophy and rate of atrophy measures, where higher scores represent greater connection atrophy. The atrophy score was used in the crosssectional analysis, while the rate of atrophy score was used in the longitudinal analysis.

Cross-sectional Analysis For the cross-sectional analysis a Z-score was calculated as follows:



5

ACCEPTED MANUSCRIPT McColgan et al.

Supplement





.

where i is the regional connection strength, k is preHD, h is healthy controls, C is connection

RI PT

strength, μ is mean and σ is standard deviation. This was then transformed to produce atrophy measures between -1 and 1, were positive measures represent greatest atrophy, using the following equation:

.

SC

tanh

M AN U

This resulted in a transformed Z-score for each cortical region for each preHD participant cortico-striatal, inter-hemispheric and intra-hemispheric connections. An average was then calculated across the preHD group resulting in a single transformed Z-score for each

TE D

cortical region.

Longitudinal Analysis

For each preHD participant and for each connection a least squares line was fitted over the

EP

regional connection strengths across time points and the rate of connection atrophy defined as the gradient of the least squares line. A Z-score was then calculated using the following

AC C

equation:





.

where R is the rate of change of connection strength. This was then transformed to produce rate of atrophy measures between -1 and 1, using the following equation:

tanh

. 6

ACCEPTED MANUSCRIPT McColgan et al.

Supplement

This resulted in a transformed Z-score of rate of regional atrophy for cortico-striatal, interhemispheric and intra-hemispheric connections for each preHD participant. An average was then calculated across the preHD group resulting in a single transformed Z-score for each

RI PT

cortical region.

Mapping Gene Expression Data to MRI Space

SC

Gene expression microarray data was used from the Allen Human brain atlas (26). This atlas is based on data from 6 post-mortem human brains with no known neuropsychiatric or history

(H0351.2001,

H0351.2002,

H0351.1009,

H0351.1012,

M AN U

neuropathological

H0351.1015, H0351.1016). Five donors were male and one was female with a mean age 42.5yrs. Three were Caucasian, two were African-American and one was Hispanic. This data is freely available to download from AIBS (http://human.brain-map.org/static/download). Maybrain software (https://github.com/rittman/maybrain) was used to match centroids

TE D

of MRI regions to the closest AIBS region. The nearest gene expression profile to the ROI coordinates was used as the expression profile for that ROI. Therefore for each ROI only one tissue sample was used from the AIBS atlas. The sample coverage for the AIBS atlas varied

EP

from 255-291 cortical samples for the 4 participants with data from one hemisphere. For the 2 participants with data from both hemispheres one had 412 samples and the other 528. Probes

AC C

were excluded that did not match to gene symbols in the AIBS data resulting in 20,737 genes included in the analysis. Expression data was then averaged across all samples from all donors. Data were also averaged across both hemispheres as two donors had data for both hemispheres, while four only had data for the left hemisphere. The maximum standard deviation across subjects for each gene probe in each brain region ranged from 0.1 to 4.6 (see Supplemental File 8). To account for this variability the mean and range of expression values for each brain region were calculated and regions excluded if they had values greater than



7

ACCEPTED MANUSCRIPT McColgan et al.

Supplement

two standard deviations from either the mean or range. This resulted in the exclusion of two brain regions (right pars orbitalis and right rostral middle frontal), leaving a total of 68 cortical ROIs included in the analysis. Expression data were then normalised by calculating

RI PT

the Z-score across the 68 FreeSurfer regions. Similar approaches as those outlined above have been used when matching AIBS data to MRI atlases in other studies (27-29). Genetic data from outlier regions is likely to be unreliable. While it is difficult to pin point the exact

optimal matching between the AIBS and MRI atlases.

SC

reason for outlier regions in these analyses it may be that outlier regions represent sub-

M AN U

To investigate how robust results were to different combinations of AIBS participants, we also performed cortico-striatal, inter-hemispheric, intra-hemispheric cross-sectional analyses using using a leave one out approach. Average gene expression was calculated for 5 participants leaving one participant out in turn. A leave one out approach has been used in a previous study investigating regional gene expression and functional connectivity using the

TE D

Allen institute of Brain Science human transcriptome atlas (28). We also repeated the crosssectional analyses using permutations of 3 out of 6 AIBS brain samples resulting in a total of

EP

8 permutations.

Statistical Analysis

AC C

All statistical analysis was performed in MATLAB v8.3. Partial least squares regression was used to investigate the association between gene transcriptome of the healthy brain and WM connectivity loss in preHD both cross-sectionally and longitudinally. Code used to perform this analysis was adapted from Whitaker et al. (29). The original code is freely available (https://github.com/KirstieJane/NSPN_WhitakerVertes_PNAS2016). Partial least squares regression is a multivariate technique used to identify associations between response and predictor variables. In our case the predictor variable was



8

ACCEPTED MANUSCRIPT McColgan et al.

Supplement

a 20,737 gene x 68. ROI matrix, as outlined above. For the cortico-striatal analysis the MRI data response variable was a 4 x 68 matrix of left and right caudate and putamen WM connectivity loss (preHD relative to controls) to 68 cortical ROIs. This was performed for

RI PT

both white matter atrophy (cross-sectional) and rate of white matter atrophy (longitudinal). For the inter-hemispheric analysis the MRI response variable was a vector of 1 x 68, representing WM inter-hemispheric connectivity loss for each cortical ROI. Similarly for the intra-hemispheric analysis the MRI response variable was a vector of 1 x 68, representing

SC

WM intra-hemispheric connectivity loss for each cortical ROI. For a cortical region inter-

M AN U

hemispheric connectivity was calculated as the sum of streamline volumes between that region and regions in the opposite hemisphere. Similarly intra-hemispheric connectivity was defined as the sum of streamline volumes between that region and regions in the same hemisphere. Atrophy scores were then calculated as using Z-scores and the tanh transform as described above.

TE D

As the greatest amount of variance was explained by the first PLS component, genes were ranked based on their contribution to this component. The error in estimating the weight of each gene was assessed by boot strapping and the ratio of the weight of each gene to its

EP

bootstrap standard deviation was used to rank the genes in descending order based on their contribution of the first component.

AC C

Random permutations of the gene predictor variable were also investigated to ensure results were not due to chance. To do this the randperm function in MATLAB was used to randomly reorder the predictor variable both in terms of genes and ROIs. Cross-sectional analyses were then re-run using the resulting predictor variables. Partial least squares regression (PLS) is well suited for high dimensional data as it combines Principle components analysis (for dimension reduction) with linear regression. It is also well suited in the case when the number of predictor variables far exceeds the number



9

ACCEPTED MANUSCRIPT McColgan et al.

Supplement

of observations – exactly the scenario we are dealing with having 20,737 gene expression (predictor variables) and 68 brain region (observations). In comparison, other multivariate methods such as canonical variance analysis (CVA) or linear discriminant analysis (LDA)

RI PT

require around 4-8 times observations than the predictor variables. Boulesteix et al. (30) have previously shown the utility of this approach in high dimensional datasets for e.g. tumor classification from transcriptome data, identification of relevant genes, survival analysis and modeling of gene networks and transcription factor activities. There are several previous

SC

studies that used PLS for the large gene expression datasets from the Allen Institute of Brain

M AN U

Science (AIBS) mouse and human brain transcriptome atlases (28, 29, 31).

Gene Ontology Enrichment Analysis

We used the gene ontology enrichment analysis and visualisation tool (GOrilla) (http://cblgorilla.cs.technion.ac.il) (32) to identify GO terms that were significantly enriched in the

TE D

target gene list, based on the first PLS component. GOrilla GO terms are updated weekly. The target gene list is defined by finding the optimal hypergeometric tail probability over all possible partitions induced by gene ranking (see (32) for further details). Significance of a

EP

GO term is determined based on the rank of genes associated with that GO term and a false discovery rate (FDR) correction for multiple comparisons. This was performed for the first

AC C

PLS component for the cortico-striatal, inter-hemispheric and intra-hemispheric analysis both cross-sectionally and longitudinally. We also removed general GO terms by excluding those with greater than 1000 genes in their classification, in keeping with other studies in the literature (28, 29). This allowed us to focus on specific gene sets as opposed to GO terms encompassing thousands of genes covering a range of processes. The reduce and visualize gene ontology tool REViGO (33) (http://revigo.irb.hr) was then used to summarise significant GO terms by removing redundant terms.



10

ACCEPTED MANUSCRIPT McColgan et al.

Supplement

Overlap Between Gene Profiles and Huntington’s Disease Related Genes To investigate similarities between gene profiles in each analysis we identified which genes overlap in the top ranked 7,000 genes (based on target gene lists from top GO terms) from the

RI PT

cross-sectional cortico-striatal analysis and the intra-hemispheric analysis. We also assessed the probability of this overlap occurring greater than chance using a hypergeometric distribution

as

implemented

in

https://github.com/brentp/bio-playground/blob/master/

SC

utils/list_overlap_p.py. Gene ontology enrichment analysis was also repeated with overlap genes removed to assess whether this affected the resulting GO terms.

M AN U

To further assess the relationship between gene ontologies we investigated the overlap between genes in the top gene ontology terms across analyses: “modulation of chemical synaptic transmission” and “mRNA metabolic process”. Finally we investigated the overlap between top gene ontology terms and HD related genes. Gene lists for HD related genes for

TE D

both the striatum and cortex were obtained from (34).

Cortical Regional Enrichment

We used ROI weights from the PLS analysis to assess which cortical regions where enriched

EP

for genes in the first PLS component for the cortico-striatal, inter-hemispheric and intra-

AC C

hemispheric analysis. ROI weights were plotted for each analysis using BrainNet Viewer (35).

Enrichment for Huntington’s Disease Related Genes We also investigated whether genes showing abnormal transcription in human and animal models of HD were enriched greater than chance in the first PLS components of the corticostriatal, inter-hemispheric and intra-hemispheric analyses. Gene lists were obtained from (34). These included 515 genes in the striatum and 25 in the cortex.



11

ACCEPTED MANUSCRIPT McColgan et al.

Supplement

Gene lists for the striatum include the 6-month allelic series striatum from Langfelder et al. (34) and the human caudate nucleus (CN) data sets by Durrenberger et al. (36) and Hodges et al. (37) are reported. Each striatal gene satisfies the following criteria: FDR<0.05

RI PT

in the allelic series striatum, FDR<0.1 in each of the human data sets, and same sign of fold change across all 3 data sets. For the cortex the gene lists include the allelic series 6-month cortex, Brodmann area (BA) 4 and BA9 data by Hodges et al. (37), and prefrontal cortex (PFC) and visual cortex (VC) data from the Harvard Brain Tissue Resource Centre are

SC

reported (38). Each cortical gene satisfies the following criteria: FDR<0.05 in the allelic

M AN U

series cortex, FDR<0.1 in at least 3 of the 4 of the human data sets, and same sign of fold change in the allelic series cortex and at least 3 of the 4 human data sets. Genes in the Langfelder lists not included in the AIBS gene set were excluded; this resulted in the exclusion of 28 striatum genes.

The mean PLS weight of candidate gene sets were compared against the mean PLS

TE D

weight of 1000 random permutations of genes. A p-value was calculated based on the number of times in 1000 that the random gene list showed a higher mean rank than the candidate gene list. We also investigated whether HD related genes were more strongly enriched in these

EP

gene lists than other biologically plausible gene sets, chosen at random. In order to do this gene sets from known gene ontologies were downloaded from the molecular signatures

AC C

database (MSigBD) (http://software.broadinstitute.org/gsea/msigdb/). A p-value was calculated based on the number of times that the MSigBD gene list showed a higher mean rank than the candidate gene list. This was performed for the 515 striatum HD genes and MSigBD gene lists truncated at 515 (306 lists in total). In order to investigate smaller alternative gene sets the top 25 striatum HD genes were also compared with MSigBD gene lists truncated at 25 (3,633 lists in total).



12

ACCEPTED MANUSCRIPT McColgan et al.

Supplement

To further investigate the relationship between changes in gene expression in HD relative to controls and cortico-striatal WM loss we performed correlations between the log2 fold change in the Hodges (37), Durrenberger (36) and Langfelder studies (34) for the 515

Enrichment for Alternative Gene Sets

RI PT

striatum gene set and the PLS weights from the cross-sectional cortico-striatal analysis.

SC

Enrichment of the PLS components of the cortico-striatal, inter-hemispheric and intrahemispheric analyses were also tested for a range of other gene sets. We included a set of

M AN U

human supragranular genes (n = 19) as these have been implicated in long-range connectivity (39) and we have previously shown cortico-striatal connections to have the longest topological length of the white connections subtypes investigated here (10). Genes specific to oligodendroctyes (n = 94) (40) were also included to investigate whether white matter loss may be driven by axonal or myelination dysfunction. Finally, genes involved in cell cycle

TE D

metabolism (n = 252) (http://www.bmrb.wisc.edu/data_library/Genes/Metabolic_Pathways/ Cell_cycle.html) were included as mutant huntingtin has been shown to cause cell cycle

AC C

EP

abnormalities (41).



13

ACCEPTED MANUSCRIPT McColgan et al.

Supplement

TE D

M AN U

SC

RI PT

Supplemental Figures

AC C

EP

Figure S1. Cortico-striatal longitudinal analysis semantic similarity scatter plot: Significant gene ontology (GO) terms for biological processes associated with the first component of the partial least squares (PLS) analysis are plotted in semantic space, where similar terms are clustered together. The top 5 most significant GO terms are labelled for each analysis. Redundant GO terms and those associated with greater than 1000 genes have been excluded. Markers are scaled based on the log10 q-value for the significance of each GO term. Large blue circles are highly significant, while red circles are less significant (see colour bar).



14

ACCEPTED MANUSCRIPT Supplement

M AN U

SC

RI PT

McColgan et al.

AC C

EP

TE D

Figure S2. Inter-hemispheric longitudinal analysis semantic similarity scatter plot: Significant gene ontology (GO) terms for biological processes associated with the first component of the partial least squares (PLS) analysis are plotted in semantic space, where similar terms are clustered together. The top 5 most significant GO terms are labelled for each analysis. Redundant GO terms and those associated with greater than 1000 genes have been excluded. Markers are scaled based on the log10 q-value for the significance of each GO term. Large blue circles are highly significant, while red circles are less significant (see colour bar).



15

ACCEPTED MANUSCRIPT Supplement

M AN U

SC

RI PT

McColgan et al.

AC C

EP

TE D

Figure S3. Intra-hemispheric longitudinal analysis semantic similarity scatter plot: Significant gene ontology (GO) terms for biological processes associated with the first component of the partial least squares (PLS) analysis are plotted in semantic space, where similar terms are clustered together. The top 5 most significant GO terms are labelled for each analysis. Redundant GO terms and those associated with greater than 1000 genes have been excluded. Markers are scaled based on the log10 q-value for the significance of each GO term. Large blue circles are highly significant, while red circles are less significant (see colour bar).



16

ACCEPTED MANUSCRIPT Supplement

RI PT

McColgan et al.

AC C

EP

TE D

M AN U

SC

Figure S4. Enrichment of top 25 striatum genes showing abnormal transcription in Huntington’s disease (as defined by lowest Hodges q-value) in the first PLS components of cortico-striatal cross-sectional analyses. Red circle illustrates the mean weight (on the xaxis) for the gene list of interest in the first PLS component. The y-axis represents the number of permutations of random genes from the first PLS component. Gene lists over expressed in the first PLS component have a mean great than that of the random permutations (red circle to the right of the permutation distribution).



17

ACCEPTED MANUSCRIPT Supplement

TE D

M AN U

SC

RI PT

McColgan et al.

AC C

EP

Figure S5. Random permutation ROI weights for cross-sectional partial least squares regression analyses. (a) Cortico-striatal (b) Inter-hemispheric (c) Intra-hemispheric. Brain regions displayed on brain mesh. Size and colour of region indicates size of ROI weight (ranked from smallest-largest, 1-6). See colour map.



18

ACCEPTED MANUSCRIPT Supplement

SC

RI PT

McColgan et al.

AC C

EP

TE D

M AN U

Figure S6. Correlation between PLS1 cortico-striatal weights and log2 fold change in human HD (Hodges and Durrenberger) and animal HD model (Langfelder) studies. The red line represents a least squares regression line, rho = correlation coefficient, p = p-value and df = degrees of freedom.



19

ACCEPTED MANUSCRIPT McColgan et al.

Supplement

PLS1 Cortico-striatal Longitudinal Description

P-value

GO:0016071

mRNA metabolic process

3.51E-33

1.35E-30

GO:0006396 GO:0019083 GO:0006325

RNA processing viral transcription chromatin organization nuclear-transcribed mRNA catabolic process, nonsense-mediated decay

7.90E-29 3.28E-28 9.85E-26

2.77E-26 1.12E-25 3.22E-23

2.16E-19

6.78E-17

PLS1 Inter-hemispheric Longitudinal GO Term

Description

GO:0016071 GO:0006325 GO:0006396

mRNA metabolic process chromatin organization RNA processing

GO:0006397

mRNA processing

GO:0016569

covalent chromatin modification

M AN U

GO:0000184

P-value

PLS1 Intra-hemispheric Longitudinal Description

GO:0022904

GO:0006091 GO:0009117

respiratory electron transport chain modulation of chemical synaptic transmission generation of precursor metabolites and energy nucleotide metabolic process

GO:0070271

protein complex biogenesis

B

n

b

1.81

593

5324

322

1.64 2.86 1.77

806 99 657

5324 5251 4520

397 84 296

2.53

102

5298

77

B

n

b

Enrichment

3.98E-16 4.09E-14 7.78E-14

1.67E-13 1.47E-11 2.66E-11

1.48 1.4 1.36

593 657 806

6539 6476 6410

323 337 397

6.16E-12

2.02E-09

1.49

402

6323

213

4.62E-09

1.36E-06

1.38

455

6476

230

FDR q-value

Enrichment

B

n

b

2.29E-13

1.15E-09

2.71

92

3917

55

2.95E-11

6.35E-08

1.84

297

3658

113

1.42E-10 4.97E-10

2.38E-07 7.48E-07

1.64 1.88

263 418

5414 2364

132 105

9.42E-10

1.01E-06

2.46

81

4186

47

P-value

AC C

EP

Enrichment

FDR q-value

TE D

GO Term

GO:0050804

FDR q-value

SC

GO Term

RI PT

Table S1. Cortico-striatal, inter-hemispheric and intra-hemispheric longitudinal analysis: Gene ontology (GO) terms for biological processes associated with top ranking genes from the first component of the partial least squares (PLS) analysis. The top 5 most significant GO terms are displayed for each analysis. Full tables can be found in supplementary file 2. Redundant GO terms and those associated with greater than 1000 genes have been excluded. B – total number of genes associated with a specific GO term, n – number of genes in target set, b – is the number of genes in the intersection. Enrichment (E) = (b/n) / (B/total number of genes). See (32) for further details.



20

ACCEPTED MANUSCRIPT McColgan et al.

Supplement

Table S2. ROI weights from first PLS components. BG – basal ganglia, IH – Interhemispheric, IA – Intra-hemispheric, cross – cross-sectional, long – longitudinal. Weights ordered for basal ganglia cross-sectional analysis, decreasing strongest to weakest. CS cross

IH cross

IA cross

CS long

IH long

0.22034 0.20856

0.028159 0.047965

-0.077541 -0.088025

-0.11582 -0.12867

R.superiorparietal L.cuneus L.inferiorparietal L.isthmuscingulate L.lateraloccipital

0.20856 0.15725 0.15725 0.15725 0.15725

0.047965 0.10562 0.10562 0.10562 0.10562

-0.088025 -0.12229 -0.12229 -0.12229 -0.12229

-0.12867 -0.12238 -0.12238 -0.12238 -0.12238

-0.059647 -0.10801 -0.10801 -0.10801 -0.10801

0.17176 0.13927 0.13927 0.13927 0.13927

L.paracentral L.pericalcarine L.posteriorcingulate L.precuneus L.superiorparietal

0.15725 0.15725 0.15725 0.15725 0.15725

0.10562 0.10562 0.10562 0.10562 0.10562

-0.12229 -0.12229 -0.12229 -0.12229 -0.12229

-0.12238 -0.12238 -0.12238 -0.12238 -0.12238

-0.10801 -0.10801 -0.10801 -0.10801 -0.10801

0.13927 0.13927 0.13927 0.13927 0.13927

L.supramarginal R.isthmuscingulate R.paracentral R.posteriorcingulate R.precuneus

0.15725 0.15725 0.15725 0.15725 0.15725

0.10562 0.10562 0.10562 0.10562 0.10562

-0.12229 -0.12229 -0.12229 -0.12229 -0.12229

-0.12238 -0.12238 -0.12238 -0.12238 -0.12238

-0.10801 -0.10801 -0.10801 -0.10801 -0.10801

0.13927 0.13927 0.13927 0.13927 0.13927

R.supramarginal R.caudalmiddlefrontal R.postcentral R.cuneus L.postcentral

0.15725 0.14542 0.14542 0.12658 0.11425

0.10562 0.10192 0.10192 0.097045 0.099365

-0.12229 -0.093706 -0.093706 -0.10093 -0.10889

-0.12238 -0.090327 -0.090327 -0.1211 -0.12017

-0.10801 -0.16404 -0.16404 -0.078746 -0.094212

0.13927 0.11031 0.11031 0.10593 0.1137

0.054748 0.030688 0.030688 0.0041809

0.11976 0.11459 0.11459 0.1269

-0.11586 -0.10071 -0.10071 -0.10137

-0.10097 -0.10298 -0.10298 -0.089321

-0.12061 -0.055668 -0.055668 -0.092781

0.097573 0.067514 0.067514 0.062488

M AN U

TE D

EP

0.15782 0.17176

RI PT

R.inferiorparietal R.precentral

L.caudalmiddlefrontal L.transversetemporal R.caudalanteriorcingulate L.precentral

-0.068989 -0.059647

IA long

SC

Region

0.0041809 0.00108 0.00108 0.00108 0.00108

0.1269 0.11641 0.11641 0.11641 0.11641

-0.10137 -0.10077 -0.10077 -0.10077 -0.10077

-0.089321 -0.099388 -0.099388 -0.099388 -0.099388

-0.092781 -0.054861 -0.054861 -0.054861 -0.054861

0.062488 0.055352 0.055352 0.055352 0.055352

L.parsorbitalis L.parstriangularis L.rostralmiddlefrontal L.superiorfrontal L.frontalpole

-0.013671 -0.013671 -0.026883 -0.026883 -0.026883

0.13099 0.13099 0.1453 0.1453 0.1453

-0.10188 -0.10188 -0.118 -0.118 -0.118

-0.083891 -0.083891 -0.095567 -0.095567 -0.095567

-0.11266 -0.11266 -0.12 -0.12 -0.12

0.051551 0.051551 0.061948 0.061948 0.061948

R.frontalpole L.entorhinal L.medialorbitofrontal L.temporalpole R.entorhinal

-0.026883 -0.077019 -0.077019 -0.077019 -0.077019

0.1453 -0.16519 -0.16519 -0.16519 -0.16519

-0.118 0.12322 0.12322 0.12322 0.12322

-0.095567 0.10631 0.10631 0.10631 0.10631

-0.12 0.20821 0.20821 0.20821 0.20821

0.061948 -0.10852 -0.10852 -0.10852 -0.10852

AC C

R.superiorfrontal L.caudalanteriorcingulate L.parsopercularis L.superiortemporal L.insula



21

ACCEPTED MANUSCRIPT McColgan et al.

Supplement

Region

CS cross

IH cross

IA cross

CS long

IH long

IA long

-0.077019 -0.077019 -0.077019 -0.11678

-0.16519 -0.16519 -0.16519 -0.11347

0.12322 0.12322 0.12322 0.12528

0.10631 0.10631 0.10631 0.14275

0.20821 0.20821 0.20821 0.07509

-0.10852 -0.10852 -0.10852 -0.13103

L.inferiortemporal L.lateralorbitofrontal L.lingual L.middletemporal L.parahippocampal

-0.11678 -0.11678 -0.11678 -0.11678 -0.11678

-0.11347 -0.11347 -0.11347 -0.11347 -0.11347

0.12528 0.12528 0.12528 0.12528 0.12528

0.14275 0.14275 0.14275 0.14275 0.14275

0.07509 0.07509 0.07509 0.07509 0.07509

-0.13103 -0.13103 -0.13103 -0.13103 -0.13103

L.rostralanteriorcingulate L.hippocampus R.hippocampus R.parahippocampal

-0.11678 -0.11678 -0.11678 -0.11678

-0.11347 -0.11347 -0.11347 -0.11347

0.12528 0.12528 0.12528 0.12528

0.14275 0.14275 0.14275 0.14275

0.07509 0.07509 0.07509 0.07509

-0.13103 -0.13103 -0.13103 -0.13103

R.rostralanteriorcingulate R.fusiform R.lateraloccipital R.lingual R.pericalcarine

-0.11678 -0.12353 -0.12353 -0.12353 -0.12353

-0.11347 -0.094844 -0.094844 -0.094844 -0.094844

0.12528 0.14257 0.14257 0.14257 0.14257

0.14275 0.16319 0.16319 0.16319 0.16319

0.07509 -0.0012139 -0.0012139 -0.0012139 -0.0012139

-0.13103 -0.15301 -0.15301 -0.15301 -0.15301

R.transversetemporal L.bankssts R.parstriangularis R.bankssts R.inferiortemporal

-0.12353 -0.12417 -0.12699 -0.13821 -0.13821

-0.094844 -0.11555 -0.1361 -0.14829 -0.14829

0.14257 0.12622 0.096136 0.15138 0.15138

0.16319 0.12119 0.094345 0.11965 0.11965

-0.0012139 0.089642 0.14245 0.19037 0.19037

-0.15301 -0.11748 -0.058084 -0.13649 -0.13649

R.middletemporal R.parsopercularis R.superiortemporal

-0.13821 -0.13821 -0.13821

-0.14829 -0.14829 -0.14829

0.15138 0.15138 0.15138

0.11965 0.11965 0.11965

0.19037 0.19037 0.19037

-0.13649 -0.13649 -0.13649

R.insula

-0.13821

-0.14829

0.15138

0.11965

0.19037

-0.13649





AC C

EP

TE D

M AN U

SC

RI PT

R.lateralorbitofrontal R.medialorbitofrontal R.temporalpole L.fusiform

22

ACCEPTED MANUSCRIPT McColgan et al.

Supplement

Supplemental References Tabrizi SJ, Langbehn DR, Leavitt BR, Roos RA, Durr A, Craufurd D, et al. (2009): Biological and clinical manifestations of Huntington's disease in the longitudinal TRACK-HD study: cross-sectional analysis of baseline data. Lancet Neurol. 8:791-801.

2.

Penney JB, Jr., Vonsattel JP, MacDonald ME, Gusella JF, Myers RH (1997): CAG repeat number governs the development rate of pathology in Huntington's disease. Ann Neurol. 41:689-692.

3.

Desikan RS, Segonne F, Fischl B, Quinn BT, Dickerson BC, Blacker D, et al. (2006): An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. Neuroimage. 31:968-980.

4.

McColgan P, Seunarine KK, Razi A, Cole JH, Gregory S, Durr A, et al. (2015): Selective vulnerability of Rich Club brain regions is an organizational principle of structural connectivity loss in Huntington's disease. Brain. 138:3327-3344.

5.

Tabrizi SJ, Scahill RI, Durr A, Roos RA, Leavitt BR, Jones R, et al. (2011): Biological and clinical changes in premanifest and early stage Huntington's disease in the TRACKHD study: the 12-month longitudinal analysis. Lancet Neurol. 10:31-42.

6.

Tabrizi SJ, Reilmann R, Roos RA, Durr A, Leavitt B, Owen G, et al. (2012): Potential endpoints for clinical trials in premanifest and early Huntington's disease in the TRACKHD study: analysis of 24 month observational data. Lancet Neurol. 11:42-53.

7.

Faria AV, Ratnanather JT, Tward DJ, Lee DS, van den Noort F, Wu D, et al. (2016): Linking white matter and deep gray matter alterations in premanifest Huntington disease. Neuroimage Clin. 11:450-460.

8.

van den Bogaard SJ, Dumas EM, Acharya TP, Johnson H, Langbehn DR, Scahill RI, et al. (2011): Early atrophy of pallidum and accumbens nucleus in Huntington's disease. J Neurol. 258:412-420.

9.

Hibar DP, Stein JL, Renteria ME, Arias-Vasquez A, Desrivieres S, Jahanshad N, et al. (2015): Common genetic variants influence human subcortical brain structures. Nature. 520:224-229.

EP

TE D

M AN U

SC

RI PT

1.

10. McColgan P, Seunarine KK, Gregory S, Razi A, Papoutsi M, Long JD, et al. (2017): Topological length of white matter connections predicts their rate of atrophy in premanifest Huntington's disease. JCI Insight. 2.

AC C

11. McColgan P, Razi A, Gregory S, Seunarine KK, Durr A, R ACR, et al. (2017): Structural and functional brain network correlates of depressive symptoms in premanifest Huntington's disease. Hum Brain Mapp. 38:2819-2829. 12. McColgan P, Gregory S, Razi A, Seunarine KK, Gargouri F, Durr A, et al. (2017): White matter predicts functional connectivity in premanifest Huntington's disease. Ann Clin Transl Neurol. 4:106-118. 13. Cammoun L, Gigandet X, Meskaldji D, Thiran JP, Sporns O, Do KQ, et al. (2012): Mapping the human connectome at multiple scales with diffusion spectrum MRI. J Neurosci Methods. 203:386-397. 14. Tzourio-Mazoyer N, Landeau B, Papathanassiou D, Crivello F, Etard O, Delcroix N, et al. (2002): Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. Neuroimage. 15:273-289.

23

ACCEPTED MANUSCRIPT McColgan et al.

Supplement

15. Smith SM (2002): Fast robust automated brain extraction. Hum Brain Mapp. 17:143-155. 16. Tournier JD, Calamante F, Connelly A (2012): MRtrix: Diffusion tractography in crossing fiber regions. Imaging Systems and Technology. 22:53-56.

RI PT

17. Modat M, Ridgway GR, Taylor ZA, Lehmann M, Barnes J, Hawkes DJ, et al. (2010): Fast free-form deformation using graphics processing units. Comput Methods Programs Biomed. 98:278-284. 18. Smith RE, Tournier JD, Calamante F, Connelly A (2015): SIFT2: Enabling dense quantitative assessment of brain white matter connectivity using streamlines tractography. Neuroimage. 119:338-351. 19. van den Heuvel MP, Sporns O (2011): Rich-club organization of the human connectome. J Neurosci. 31:15775-15786.

SC

20. van den Heuvel MP, Kahn RS, Goni J, Sporns O (2012): High-cost, high-capacity backbone for global brain communication. Proc Natl Acad Sci U S A. 109:11372-11377.

M AN U

21. Qi S, Meesters S, Nicolay K, Romeny BM, Ossenblok P (2015): The influence of construction methodology on structural brain network measures: A review. J Neurosci Methods. 253:170-182. 22. Garrison KA, Scheinost D, Finn ES, Shen X, Constable RT (2015): The (in)stability of functional brain network measures across thresholds. Neuroimage. 23. Yeh CH, Smith RE, Liang X, Calamante F, Connelly A (2016): Correction for diffusion MRI fibre tracking biases: The consequences for structural connectomic metrics. Neuroimage.

TE D

24. Zalesky A, Fornito A (2009): A DTI-derived measure of cortico-cortical connectivity. IEEE Trans Med Imaging. 28:1023-1036. 25. Cole JH, Farmer RE, Rees EM, Johnson HJ, Frost C, Scahill RI, et al. (2014): TestRetest Reliability of Diffusion Tensor Imaging in Huntington's Disease. PLoS Curr. 6. 26. Hawrylycz M, Miller JA, Menon V, Feng D, Dolbeare T, Guillozet-Bongaarts AL, et al. (2015): Canonical genetic signatures of the adult human brain. Nat Neurosci. 18:18321844.

AC C

EP

27. Rittman T, Rubinov M, Vertes PE, Patel AX, Ginestet CE, Ghosh BC, et al. (2016): Regional expression of the MAPT gene is associated with loss of hubs in brain networks and cognitive impairment in Parkinson disease and progressive supranuclear palsy. Neurobiol Aging. 48:153-160. 28. Vertes PE, Rittman T, Whitaker KJ, Romero-Garcia R, Vasa F, Kitzbichler MG, et al. (2016): Gene transcription profiles associated with inter-modular hubs and connection distance in human functional magnetic resonance imaging networks. Philos Trans R Soc Lond B Biol Sci. 371. 29. Whitaker KJ, Vertes PE, Romero-Garcia R, Vasa F, Moutoussis M, Prabhu G, et al. (2016): Adolescence is associated with genomically patterned consolidation of the hubs of the human brain connectome. Proc Natl Acad Sci U S A. 113:9105-9110. 30. Boulesteix AL, Strimmer K (2007): Partial least squares: a versatile tool for the analysis of high-dimensional genomic data. Brief Bioinform. 8:32-44.



24

ACCEPTED MANUSCRIPT McColgan et al.

Supplement

31. Rubinov M, Ypma RJ, Watson C, Bullmore ET (2015): Wiring cost and topological participation of the mouse brain connectome. Proc Natl Acad Sci U S A. 112:1003210037. 32. Eden E, Navon R, Steinfeld I, Lipson D, Yakhini Z (2009): GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics. 10:48.

RI PT

33. Supek F, Bosnjak M, Skunca N, Smuc T (2011): REVIGO summarizes and visualizes long lists of gene ontology terms. PLoS One. 6:e21800. 34. Langfelder P, Cantle JP, Chatzopoulou D, Wang N, Gao F, Al-Ramahi I, et al. (2016): Integrated genomics and proteomics define huntingtin CAG length-dependent networks in mice. Nat Neurosci. 19:623-633.

SC

35. Xia M, Wang J, He Y (2013): BrainNet Viewer: a network visualization tool for human brain connectomics. PLoS One. 8:e68910.

M AN U

36. Durrenberger PF, Fernando FS, Kashefi SN, Bonnert TP, Seilhean D, Nait-Oumesmar B, et al. (2015): Common mechanisms in neurodegeneration and neuroinflammation: a BrainNet Europe gene expression microarray study. J Neural Transm (Vienna). 122:1055-1068. 37. Hodges A, Strand AD, Aragaki AK, Kuhn A, Sengstag T, Hughes G, et al. (2006): Regional and cellular gene expression changes in human Huntington's disease brain. Hum Mol Genet. 15:965-977. 38. Zhang B, Gaiteri C, Bodea LG, Wang Z, McElwee J, Podtelezhnikov AA, et al. (2013): Integrated systems approach identifies genetic nodes and networks in late-onset Alzheimer's disease. Cell. 153:707-720.

TE D

39. Krienen FM, Yeo BT, Ge T, Buckner RL, Sherwood CC (2016): Transcriptional profiles of supragranular-enriched genes associate with corticocortical network architecture in the human brain. Proc Natl Acad Sci U S A. 113:E469-478. 40. Cahoy JD, Emery B, Kaushal A, Foo LC, Zamanian JL, Christopherson KS, et al. (2008): A transcriptome database for astrocytes, neurons, and oligodendrocytes: a new resource for understanding brain development and function. J Neurosci. 28:264-278.

AC C



EP

41. Molina-Calavita M, Barnat M, Elias S, Aparicio E, Piel M, Humbert S (2014): Mutant huntingtin affects cortical progenitor cell division and development of the mouse neocortex. J Neurosci. 34:10034-10040.



25