Sample sizes and population differences in brain template construction

Sample sizes and population differences in brain template construction

Journal Pre-proof Sample sizes and population differences in brain template construction Guoyuan Yang, Sizhong Zhou, Jelena Bozek, Hao-Ming Dong, Meiz...

18MB Sizes 0 Downloads 58 Views

Journal Pre-proof Sample sizes and population differences in brain template construction Guoyuan Yang, Sizhong Zhou, Jelena Bozek, Hao-Ming Dong, Meizhen Han, Xi-Nian Zuo, Hesheng Liu, Jia-Hong Gao PII:

S1053-8119(19)30909-7

DOI:

https://doi.org/10.1016/j.neuroimage.2019.116318

Reference:

YNIMG 116318

To appear in:

NeuroImage

Received Date: 24 May 2019 Revised Date:

1 October 2019

Accepted Date: 26 October 2019

Please cite this article as: Yang, G., Zhou, S., Bozek, J., Dong, H.-M., Han, M., Zuo, X.-N., Liu, H., Gao, J.-H., Sample sizes and population differences in brain template construction, NeuroImage (2019), doi: https://doi.org/10.1016/j.neuroimage.2019.116318. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier Inc.

Sample sizes and population differences in brain template construction Guoyuan Yang1,2,3, Sizhong Zhou1,2,3, Jelena Bozek4, Hao-Ming Dong5, Meizhen Han1,2,3, Xi-Nian Zuo5,6,7, Hesheng Liu8,9, Jia-Hong Gao1,2,3,10*

1

Beijing City Key Lab for Medical Physics and Engineering, Institute of Heavy Ion Physics, School of Physics, Peking University, Beijing, China 2

Center for MRI Research, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China 3

4

McGovern Institute for Brain Research, Peking University, Beijing, China

Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia

5

Department of Psychology, University of Chinese Academy of Sciences (UCAS), Beijing, China

6

CAS Key Laboratory of Behavioral Science, Institute of Psychology, Beijing, China 7

Key Laboratory of Brain and Education, Nanning Normal University, Nanning, China

8

Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology,

Massachusetts General Hospital, Harvard Medical School, Charlestown, MA, USA 9

Beijing Institute for Brain Disorders, Capital Medical University, Beijing, China 10

Shenzhen Institute of Neuroscience, Shenzhen, China

*

Address correspondence to Jia-Hong Gao, Ph.D.

Address: Center for MRI Research, Peking University, Beijing, China, 100871 E-mail: [email protected] 1 / 37

Abstract Spatial normalization or deformation to a standard brain template is routinely used as a key module in various pipelines for the processing of magnetic resonance imaging (MRI) data. Brain templates are often constructed using MRI data from a limited number of subjects. Individual brains show significant variabilities in their morphology; thus, sample sizes and population differences are two key factors that influence brain template construction. To address these influences, we employed two independent groups from the Human Connectome Project (HCP) and the Chinese Human Connectome Project (CHCP) to quantify the impacts of sample sizes and population on brain template construction. We first assessed the effect of sample size on the construction of volumetric brain templates using data subsets from the HCP and CHCP datasets. We applied a voxel-wise index of the deformation variability and a logarithmically transformed Jacobian determinant to quantify the variability associated with the template construction and modeled the brain template variability as a power function of the sample size. At the system level, the frontoparietal control network and dorsal attention network demonstrated higher deformation variability and logged Jacobian determinants, whereas other primary networks showed lower variability. To investigate the population differences, we constructed Caucasian and Chinese standard brain atlases (namely, US200 and CN200). The two demographically matched templates, particularly the language-related areas, exhibited dramatic bilaterally in supramarginal gyri and inferior frontal gyri differences in their deformation variability and logged Jacobian determinant. Using independent data from the HCP and CHCP, we examined the segmentation and registration accuracy and observed significant reduction in the performance of the brain segmentation and registration when the population-mismatched templates were used in the spatial normalization. Our findings provide evidence to support the use of population-matched templates in human brain mapping studies. The US200 and CN200 templates have been released on the Neuroimage Informatics

Tools

and

Resources

(https://www.nitrc.org/projects/us200_cn200/).

2 / 37

Clearinghouse

(NITRC)

website

Key words: MRI, brain template, variability, sample size, population

3 / 37

1. Introduction Magnetic resonance imaging (MRI) has profoundly advanced the understanding of human brain structure and function. When performing a group-level analysis with MRI, spatial normalization or registration/deformation are indispensable steps that require a standard brain template, such as the International Consortium for Brain Mapping template (ICBM152) (Evans et al., 2012). The adoption of an age- and population-matched brain template has been suggested to reduce the variability of spatial deformation during image registration and to maintain more characteristics of individual brains (Xie et al., 2015; Yoon et al., 2009; Zhao et al., 2019). To date, various popular existing adult brain templates (e.g., ICBM152 and ICBM452 (Evans et al., 2012), Richards’ age-specific templates (Richards et al., 2016), NIH-PD atlases (Fonov et al., 2011), Chinese56 (Tang et al., 2010), Chinese2020 (Liang et al., 2015), and the Indian brain template (Bhalerao et al., 2018)) have been constructed based on sample sizes ranging from dozens to thousands of images. Until now, the influence of sample size and how it affects brain template variability remain unclear. It is not surprising that the use of a larger sample size for template construction yields a more representative brain template in standard stereotaxic space. However, obtaining a large MRI sample is both costly and time-consuming, and the sample size that is sufficiently large for the construction of a robust brain template in stereotaxic space remains an unanswered question. Answering this question warrants a guideline for the choice of sample size for specific brain template construction. We hypothesize that an extremely large sample size is not necessary and that an increase in sample size beyond a certain threshold would not have a strong effect on the template variability. In other words, the brain template variability would converge and reach a plateau after an appropriate sample size is used. Genetic, cultural and environmental factors have been shown to exert population effects on the construction of brain templates (Tang et al., 2010; Tang et al., 2018). Differences have been found between Caucasian and Asian brain features, such as brain size, shape, AC-PC distance and brain structure volumes (Bai et al., 2012; Lee et al., 2005; Tang et al., 2010; Xie et al., 2015a). These anatomical differences result in imprecise registration when transforming individual images into a population-mismatched template (e.g., Chinese 4 / 37

subjects registered to a Caucasian brain template) (Richards and Xie, 2015; Xie et al., 2015b). However, previous studies did not consider the sample-size effects on brain template variability. Thus, to better understand the differences between templates across populations, we constructed age- and sex-matched Chinese and Caucasian brain templates with an appropriate sample size. As a result, it would be possible to explore the morphological differences between brain template in a quantitative manner and to investigate the segmentation and registration accuracies between Chinese and Caucasian templates. In the present study, we first investigated the effect of sample size on the deformation variability in the brain templates using both the Human Connectome Project (HCP) (https://db.humanconnectome.org) and the Chinese Human Connectome Project (CHCP) datasets. Specifically, a range of different sample sizes using subsets from the HCP and CHCP datasets were applied for the construction of brain templates. We used the deformation variability of the nonlinear registration to measure the template variability introduced by different sample sizes. Furthermore, the detailed regional brain template variability associated with increases in the sample size was analyzed using the Yeo network atlas (Yeo et al., 2011). Furthermore, to assess the morphological differences between Caucasian and Chinese adult brain templates, we constructed age-, sex- and sample size-matched Caucasian (US200) and Chinese adult (CN200) brain/head volumetric templates with the probability tissue maps of gray matter (GM), white matter (WM) and cerebrospinal fluid (CSF) using the iterative procedure introduced by Xie et al. (2015) and Fonov et al. (2011). To control for the effects of differences in the MRI acquisition protocols, our CHCP dataset was also obtained using a 3.0 T MRI scanner with the same 3D MPRAGE sequence as that used to acquire the HCP dataset. We compared the morphological differences between Caucasian and Chinese brain templates and further quantified the segmentation accuracies as well as the registration bias introduced by using population-mismatched brain templates. The code used for brain atlas analysis is available for download from https://github.com/Yangguoyuan/CBTV.

2. Materials and methods 5 / 37

2.1 Participants and image acquisition This study included three datasets of healthy populations (Table 1). The first dataset was the HCP S1200, which is publicly available from the Human Connectome Project supported by the WU-Minn Consortium (Van Essen et al., 2013). In HCP S1200, twins have very similar brain anatomical features, but siblings do not. To remove the influence of twins, we chose one participant from each twin pair and included siblings. Finally, 800 healthy volunteers (400 females, age range: 22 - 35 years, mean ± SD: 27.8 ± 3.2 years) were used in this study. All the participants provided written informed consent, and the protocol was approved by the local institutional review board at Washington University. MR images from each participant were acquired using a customized Siemens 3T Connectome Skyra scanner. Structural images were acquired with a 3D T1-weighted sequence (MPRAGE: TR = 2.4 s, TI = 1 s, TE = 2.14 ms, flip angle = 8°, FOV = 224 × 224 mm, matrix = 320 × 320, slices = 256). The final isotropic resolution of the participant’s data was 0.7 × 0.7 × 0.7 mm. The second dataset included 250 (121 females, age range: 19 - 37 years, mean ± SD: 21.5 ± 2.4 years) native Chinese healthy volunteers, all of which were right-handed. None of the participants reported a history of psychiatric or neurological disorders. The dataset was collected within a project that aims to build a public research database through a series of studies focusing on the connections within the Chinese brain and is referred to as the CHCP dataset. All of the MRI data were acquired using a Siemens Prisma 3T scanner. High-resolution T1-weighted MR images were acquired using a 3D sequence (MPRAGE: TR = 2.4 s, TI = 1 s, TE = 2.22 ms, flip angle = 8°, 224 axial slices with thickness = 0.8 mm, axial FOV = 256 × 256 mm, and data matrix = 320 × 320). The final isotropic resolution of the participant’s data was 0.8 × 0.8 × 0.8 mm. In addition, to validate the global brain tissue volume measurements between Caucasian and Chinese confounded by the difference in data acquisition of HCP and CHCP datasets, we collected the third dataset which has different acquisition parameters compared to CHCP and HCP dataset. The third dataset included 32 (16 females, age range: 18 - 28 years, mean ± SD: 24.7 ± 3.1) native Chinese healthy volunteers scanned using a Siemens Prisma 3T scanner. The dataset was used as a Chinese validation dataset and is referred to as the CNV dataset. 6 / 37

The detailed scanning parameters were as follows: 3D T1 MPRAGE sequence: TR = 2.5 s, TI = 1.1 s, TE = 2.98 ms, flip angle = 7°, 192 axial slices with thickness = 1 mm, axial FOV = 224 × 256 mm, and the data matrix = 224 × 256. The final isotropic resolution was 1 × 1 × 1 mm. All the Chinese participants scanned at Peking University were provided written informed consent, and the study was conducted under the approval of the Institutional Review Board of Peking University. This study can be divided into two distinct parts: sample size assessment and template construction. For these, we had to partition the data into several sets. An overview of the dataset partition is given in Table 2. When investigating the effect of sample size, we did not control for ethnicity because we focused only on the sample size effect. Thus, we used the HCP dataset, which includes multiple ethnicities and a large number of participants (800). However, it is important to note that we controlled for ethnicity in the second part of our study, namely, in the template construction and anatomical feature analysis. In this part of the study, we selected 200 Caucasian participants from the HCP dataset that could be considered representative of the Caucasian population for construction of the US200 atlas. Similarly, for construction of the CN200 template, we used 200 Chinese Han ethnicity participants from the CHCP dataset. Furthermore, as independent test sets, we used 50 images (25 males and 25 females) from the HCP (Caucasian population) and 50 images (25 males and 25 females) from the CHCP (Han ethnic) datasets that were not included in the construction of the US200 and CN200 brain templates. To eliminate the influence of age and gender, all the subsets were matched in terms of age and gender. We also constructed gender-specific templates using HCP and CHCP datasets. Finally, the CNV dataset which has different acquisition parameters compared to the CHCP and HCP datasets was used to calculate the global brain tissue volume measurements and compare them with CHCP and HCP dataset.

2.2 Data preprocessing For the HCP, CHCP and CNV datasets, the ‘minimal preprocessing pipelines’ for 7 / 37

structural

MRI

data

were

performed

(https://github.com/Washington-University/HCPpipelines/releases), and all T1-weighted images were processed using FSL (FMRIB Software Library), Connectome Workbench software and FreeSurfer (Glasser et al., 2013). All T1-weighted MR images were subjected to quality control procedures (Marcus et al., 2013). The preprocessing started with gradient nonlinearity distortion correction using a proprietary Siemens gradient coefficient file. A robust initial brain extraction was then performed using the linear and nonlinear registration deformation field. After brain extraction, the bias field correction was applied to each extracted brain image to remove the effect of inhomogeneity in the magnetic field of the MRI machine. Finally, each T1-weighted image was “AC-PC aligned”, i.e., it was rigidly aligned to the ICBM152 space.

2.3 Template construction method The templates were constructed based on a diffeomorphic template building framework using the symmetric image normalization (SyN) algorithm in ANTs software (Avants et al., 2008; Avants et al., 2009) (https://github.com/ANTsX/ANTs). The brain templates were constructed using multiple iterations, as demonstrated by Xie et al. (2015) and Fonov et al. (2011). First, after rigid alignment of the T1-weighted images to the ICBM152 space, all images from a dataset with a particular sample size were linearly averaged to obtain the initial brain template (A0). Each T1-weighted image was then registered to this initial averaged brain template A0 using the affine and nonlinear SyN registration methods in ANTs software. By averaging all registered images, we generated an initial mean template (A1). Subsequently, each T1-weighted image was linearly and nonlinearly registered to this mean template (A1), and by averaging all of these registered images, we obtained a refined brain template (A2). Brain template A2 was used as the target in the next registration step. According to a previous study (Avants et al., 2011), a hierarchical scheme with three iterations at 50 × 90 × 20 registration steps was adopted during the construction process. To increase the stability and sharpness of the template, we increased the number of iterations to seven at 100 × 70 × 50 × 10 registration steps. The last averaged template was selected as the 8 / 37

final brain template.

2.4 Evaluation of the effect of sample size on brain template variability We used both the HCP and CHCP datasets to investigate the effects of the sample size. Specifically, we constructed brain templates using a series of different sample sizes, i.e., different numbers of randomly selected images were used to build a brain template. The sample sizes for the HCP dataset ranged from 20 to 400 images (20, 40, 60, 80, 100, 150, 200, 250, 300, 350, and 400), and for the CHCP dataset, the sample sizes ranged from 20 to 100 images (20, 40, 60, 80, and 100). We repeated the sampling eight times and thereby obtained eight samples per sample size, which could be used to calculate an average of the deformation value and logged Jacobian determinant in the subsequent analysis and weaken the sampling effect. In each sample set, we included equal numbers of male and female participants. We constructed templates from each sample set using the method described in Section 2.3. Using the SyN nonlinear registration method, we then calculated the deformation field for each pair of templates for each sample size. We subsequently obtained the variability represented by the deformation value and the logged Jacobian determinant for each sample size and illustrated the relationship between the sample size and brain template variability for the HCP and CHCP datasets. In nonlinear brain registration, a deformation field represents the amount of displacement needed to move a voxel location from the original MR image to the target brain template to achieve the registration criteria. A large deformation field indicates a high anatomical difference between the original and target brain images. Thus, the deformation field can be used to characterize the difference between two brain templates. First, we obtained the absolute values of the deformation field in each dimension for each voxel. Second, by summing each dimensional deformation field in each voxel, we obtained the Manhattan distance as each voxel’s displacement distance. Finally, we averaged the displacement distance of all voxels in the deformation field to obtain the final deformation value. It should be noted that both the Manhattan distance and the Euclidean distance can represent the voxel 9 / 37

displacement; however, as an L^1 metric, the Manhattan distance does not require floating point calculations, which can improve the computational speed and precision. Therefore, we used the Manhattan distance to represent the voxel’s displacement distance. For a deformation field, the mean deformation value (mDV) was defined as follows: 1

𝑖 𝑖 𝑖 mDV = 𝑁 ∑𝑁 𝑖=1(|𝑉𝑥 | + |𝑉𝑦 | + |𝑉𝑧 |),

(1)

Specifically, i is the ith voxel of the deformation field matrix, and the vector (𝑉𝑥𝑖 , 𝑉𝑦𝑖 , 𝑉𝑧𝑖 ) represents the amount of displacement in the ith voxel when moving from the fixed image to the target image to meet the registration criteria in three dimensions. N is the total number of voxels in each deformation field matrix. We also calculated the mean of the logarithmically transformed Jacobian determinant (mLJD) as another index to represent the brain template variability. As a common pattern for examining the volume change across original MR images to target images, the Jacobian determinant of the deformation field can be evaluated through univariate voxel-wise statistical parametric mapping. By definition, the Jacobian determinant is used as a local index of tissue expansion (Jacobian determinant > 1) or shrinkage (Jacobian determinant < 1) relative to the target (Chung et al., 2001). We opted to use the logged Jacobian determinant because its null distribution is closer to normal compared with that of the Jacobian determinant, whose has a skewed distribution that is bounded below zero (Ashburner and Friston, 2000; Leow et al., 2007; Yanovsky et al., 2007). The Jacobian determinant was defined as follows:

𝑑𝑒𝑡(𝐽(𝑖, 𝑗, 𝑘)) =

𝜕𝑈𝑖

𝜕𝑈𝑖

𝜕𝑈𝑖

𝜕𝑥 | 𝜕𝑈𝑗

𝜕𝑦 𝜕𝑈𝑗

𝜕𝑧 𝜕𝑈𝑗 |

| 𝜕𝑥

𝜕𝑈𝑘

𝜕𝑦 𝜕𝑈𝑘

𝜕𝑧 | 𝜕𝑈𝑘

𝜕𝑥

𝜕𝑦

𝜕𝑧

,

(2)

Specifically, det represents the determinant operator. 𝑈 is the deformation field, and 𝐽(𝑖, 𝑗, 𝑘) is the Jacobian matrix of a voxel’s displacement vector. After logarithmically transforming and averaging across the entire deformation field matrix, the mLJD was defined as follows: 1

mLJD = 𝑁 ∑𝑁 𝑖=1 log(|𝑑𝑒𝑡(𝐽)|), 10 / 37

(3)

For each pair of HCP and CHCP brain templates, we obtained the mDV and mLJD, which represent the brain template difference associated with different sample sizes. It is important to ascertain the pattern of the changes in the mDV and mLJD due to the sample size, i.e., to determine which function/model can accurately fit our observed data well. Here, we applied three widely used candidate function/model forms to fit the mDV and mLJD and assess the prediction accuracies: a power function ( f(𝑥) = 𝑎 × 𝑥 𝑏 + 𝑐 ), an exponential function (f(𝑥) = 𝑎 × 𝑒𝑏×𝑥 + 𝑐) and a linear function (f(𝑥) = 𝑎 × 𝑥 + 𝑏). The r2 value was used to evaluate the goodness-of-fit, and an r2 value closer to 1 indicates a better fit. To better understand the trend of the model, we calculated the derivative of the best performed model. We also calculated the normalized derivative in both mDV and mLJD indexes to quantify the percentage of convergence. It was calculated by first obtaining a reference derivative value which is defined when the sample size is equal to 20. Then the normalized derivative was obtained by dividing the actual derivative values with the reference value. Furthermore, considering the spatially distributed regional brain variations that potentially result from genetic and environmental effects during development, we extended the variability of the global brain template from the HCP dataset into detailed brain regions using the seven resting-state networks atlas from Yeo and colleagues (Yeo et al., 2011). Specifically, the ICBM152 nonlinear template was first warped into the mean of the US templates using a nonlinear SyN registration method. The generated transformation field map was applied to the seven-network atlas to obtain an adjusted brain network from the US templates. For each adjusted brain network, we then calculated the mDV and mLJD corresponding to each specific sample size. We subsequently obtained the spatially distributed brain variability with increasing sample size.

2.5 Caucasian and Chinese brain templates with tissue probability maps We constructed the Caucasian and Chinese adult brain/head volumetric templates US200 and CN200 from the corresponding subsets (as described in Section 2.1. and Table 2). The method used for brain template construction was the diffeomorphic template building 11 / 37

framework described in Section 2.3. Furthermore, to obtain the tissue probability maps for brain templates, we used the FMRIB Automated Segmentation Tool (FAST) to segment each subject brain into WM, GM and CSF (Zhang et al., 2001). These tissue maps were warped onto brain templates using the corresponding deformation fields obtained through SyN nonlinear registration between individual T1 images and brain templates. We ultimately obtained a template tissue probability map by averaging the individual registered tissue probability maps of all the subjects.

2.5.1 Evaluation of the heterogeneity of intersubject variability between Caucasian and Chinese populations Individual brain morphological variability is the key factor which determines the optimal number of subjects for brain template construction in specific population. Quantification of intersubject brain morphological variability would contribute to the comprehension of sample size effects in different population. Thus, we investigated the heterogeneity of intersubject brain morphological variability of participants from the US200 and CN200 subsets. We used the deformation-based morphometric analysis framework, which used the SyN tool in ANTs (Croxson et al., 2018), to quantify intersubject variability. Specifically, for both HCP and CHCP datasets, the processing was performed using the following steps: i) We first used the US200 and CN200 as the population-specific standard brain template. ii) HCP and CHCP individual T1 maps were registered to their corresponding templates and sets of 3D diffeomorphic deformation maps were extracted for each subject corresponding to the orthogonal projection of the deformations for each voxel. iii) Variability was represented as the strength of the indexes in both mDV and mLJD. iv) Using individual DK atlas in gray matter ribbon, we calculated the regional distributions of individual variability across 68 DK areas. v) Caucasian and Chinese intersubject variability maps were obtained through averaging all subjects’ variability maps across DK areas in HCP and CHCP datasets, respectively. Finally, we analysed the association of the intersubject variability between Caucasian and Chinese brain anatomy. 12 / 37

2.5.2 Morphological differences between the Caucasian and Chinese sets and templates We performed morphological comparisons to assess the following: (1) global differences between the populations, (2) morphological differences between US200 and CN200 templates, (3) gender-specific individual and templates’ morphological differences in Caucasian and Chinese populations. To determine the global brain anatomical differences between Caucasian and Chinese populations, we used participants from the US200 and CN200 subsets. Three global brain features, including length, width and height were measured using a previously described method (Tang et al., 2010). Additionally, using each subject’s brain segmentation results, we calculated WM, GM and CSF tissue volumes. A statistical comparison between groups based on the brain shape and size indexes were conducted using a two-sample t-test. Analogous comparisons were also conducted in gender-specific subsets of participants from the US200 and CN200. Furthermore, we used CNV dataset to validate the global brain tissue volume measurements between Caucasian and Chinese confounded by the difference in data acquisition. Specifically, we perform the global brain tissue volume measurements between CNV and a subset of HCP dataset using the same method as conducted between HCP and CHCP datasets. The new HCP subset is extracted from the HCP dataset with the same age and gender distribution and used the same number of subjects as in the CNV dataset. Analogous comparisons were also conducted between CNV and a subset from the CHCP dataset. Second, to assess the morphological differences between the US200 and CN200 templates, both templates were rigidly registered to the ICBM152 space using a six-parameter rigid transformation, which ensured that both templates’ anterior and posterior commissures were in the same transverse plane. We then measured the length of the whole brain from the anterior pole to the posterior pole on the AC-PC plane, the width of the whole brain through the middle point of the AC-PC line, and the height of the whole brain from the superior pole to the inferior pole on the coronal plane. For evaluating gender-specific 13 / 37

templates, we also constructed Caucasian female (US100(F)) and male (US100(M)) as well as Chinese female (CN100(F)) and male (CN100(M)) brain templates. Then, template’s length, width and height were measured using the method described for the comparison of US200 and CN200 templates. These comparisons were performed between US100(F) and US100(M), and between CN100(F) and CN100(M) brain templates. We further examined the regional distributions of anatomical differences between the US200 and CN200 based on mDV and mLJD in 82 Brodmann areas. Specifically, the Brodmann atlas was nonlinearly warped into the US200 and CN200 templates using the ICBM152 template as an intermediary, and we then further divided the US200 and CN200 templates into 82 areas. We subsequently calculated the mDV and mLJD transformation between US200 and CN200 in each Brodmann area to assess the anatomical differences that influence the brain template nonlinear registration. A larger mDV and mLJD indicate a greater anatomical difference between the templates in the observed area. The same comparisons were also conducted in gender-specific templates between Caucasian and Chinese.

2.5.3 Tissue segmentation accuracy between the Caucasian and Chinese templates To further assess the effect of using population-matched and population-mismatched brain templates on the accuracy of the segmentation of WM, GM and CSF, we conducted a cross-population validation by performing a comparison of the registration-based segmentation accuracy using the constructed US200 and CN200 templates (Luo et al., 2014; Zhao et al., 2019). We compared the accuracy of the registration-based segmentation of tissues in images from the HCP and CHCP test sets with the constructed US200 and CN200 templates as registration targets. Specifically, for both the HCP and CHCP test sets, we first performed an automatic segmentation without any prior information of individual T1 images using FAST (Zhang et al., 2001). These individual automatic-segmentation probabilistic tissue maps of WM, GM and CSF were used as a reference for the registration-based segmentation comparison. Using a priori template, registration-based segmentation was then conducted separately for the US200 and CN200 brain templates. A range of tissue probability 14 / 37

thresholds (p from 0.1 to 0.4, at 0.1 intervals) was applied to generate the final tissue classification. We then calculated the Dice coefficient (Dice, 1945) for WM, GM and CSF between the individual-based and the US200 and CN200 template-based tissue probability maps. A paired t-test was performed on the Dice coefficients to evaluate whether the population-matched templates could achieve segmentation improvements.

2.5.4 Image registrations to the Caucasian and Chinese templates We used the HCP and CHCP test sets to analyze the linear and nonlinear registration performance by employing population-matched (e.g., registration of images from the HCP test set to the US200 template) and population-mismatched brain templates (e.g., registration of images from the HCP test set to the CN200 template). First, after linear registration of the images from the HCP and CHCP test sets to the US200 and CN200 brain templates, we assessed the scale values that represented the change in the magnitude of stretch when registering the source image to the target image. For each subject, to quantitatively detect the degree of linear registration, we first examined the scale values obtained from the 3D linear affine registration processes (12-parameter affine transformation). The scale value is an index for describing stretch strength when performing linear registration. A smaller stretch represents a more accurate linear registration, and a scale value of 1.0 is obtained when the source image is the same as the target image (i.e., template) in each dimension (i.e., sagittal, coronal and axial). Second, we also examined the difference in the mDV and mLJD obtained from the nonlinear SyN registration, which was performed after linear affine registration. Small values of mDV and mLJD indicate that few biases would be induced during nonlinear registration. Finally, mixed-design ANOVA was performed to test the statistical significance of different population subjects (HCP test set and CHCP test set) and different templates (US200 and CN200) on both the linear and nonlinear registration results. The post hoc tests were performed with Bonferroni correction. We further analyzed the global brain shape, including the length, width and height of the original images and images from the HCP and the CHCP test sets that were nonlinearly 15 / 37

registered to population-matched or population-mismatched templates. A paired t-test was performed on the brain shape indexes measured between original and registered images.

3. Results 3.1 Template variability influenced by the sample size The results of the analysis of effect of the sample size on brain template variability are presented in Fig. 1. We averaged the mDV and mLJD obtained for the various sample sizes of the HCP and CHCP datasets separately. Fig. 1 (a) and (b) show the averaged mDV and mLJD maps for the HCP dataset projected onto the US200 template obtained with sample sizes of 40 and 200, and the results demonstrate that a small sample size leads to high brain template variability. As shown in Fig. 1 (c) and (d), an increase in the number of subjects resulted in asymptotic decreases in the mDV and mLJD, and this finding was obtained for both the HCP and CHCP datasets. Based on the model fitting performance, the power relationship model better fits the mean and SD of the mDV and mLJD obtained for both the HCP and CHCP datasets (Table 3). The model fitting parameters are shown in Table 4. As indicated by the form of a specific model, the power relationship model approaches infinitely close to zero with the increase of sample size. Theoretically, when sample size is large enough, the variability would reach zero. The form of power relationship model embodies this characteristic as expected during model fitting process, thus, we choose power relationship as the final model. As shown in Fig. 1 (c) and (d), the normalized derivative is lower than 14% when the sample size is equal to 100. To precisely define the convergence value for the optimal sample size, we have set the plateaued threshold to 5% of the change in normalized derivative. Our results revealed that this criterion threshold for mDV and mLJD indexes is reached with 200 samples for both HCP and CHCP datasets. We further investigated the effect of sample size on brain template variability using the resting-state network atlas. The analysis of the spatially distributed brain template variability in association with the sample size (Fig. 2) revealed that the regional brain template variability, in terms of both the mDV and mLJD indexes, exhibited a explicit separation 16 / 37

phenomenon. Specifically, the highest level of structural variability included the associational cortical area in the dorsal attention and frontoparietal control networks, whereas limbic systems showed the least variability. The sensory-motor and visual systems exhibited structural variability that was moderately lower than that of the dorsal attention and frontoparietal control networks but higher than that of the limbic networks.

3.2 Heterogeneity of intersubject variability between Caucasian and Chinese The results of the brain morphological intersubject variability are presented in Fig. 3. We used mDV and mLJD to quantify the intersubject variability and project it onto 3D inflated surface view. We found that intersubject variability in cortical structural anatomy was significantly associated between Caucasian and Chinese in both indexes of mDV (Fig. 3, r = 0.97, p < 0.001) and mLJD (Fig. 3, r = 0.95, p < 0.001). Specifically, in mDV maps, the bilateral inferior parietal gyri, bilateral superior parietal gyri and bilateral caudal middle frontal gyri showed the large intersubject variability in both HCP and CHCP datasets. Similarly, in mLJD maps, the bilateral caudal middle frontal gyri, the bilateral superior parietal gyri, the bilateral precentral gyri and the bilateral superior temporal sulcus showed large anatomical intersubject variability in both HCP and CHCP datasets. We also performed the two-sample t-test between the HCP and CHCP datasets in each DK atlas areas in both mDV and mLJD indexes with Bonferroni correction for the number of brain areas (p < 0.001/68 = 1.47  10-5). Only two and three brain areas show significant intersubject variability difference in mDV and mLJD, respectively. In other words, 96% of brain areas showed no significant difference in brain anatomy heterogeneity between Caucasian and Chinese.

3.3 Global brain features morphological differences of Caucasian and Chinese participants and brain templates Significant differences in the structural volume were found between Caucasian and Chinese brain images (Fig. 4a). The two groups showed significant differences in brain length, width and height (Table 5). The Caucasian brains were significantly longer but presented a 17 / 37

smaller width and a decreased height compared with the Chinese brains. The global volume measurements revealed that the Caucasian brains had lower total GM (p < 0.001, t = 6.26, two-sample t-test), total CSF (p < 0.001, t = 10.69, two-sample t-test) and total brain volumes (p < 0.001, t = 6.13, two-sample t-test) compared with the Chinese brains, but no significant difference in total WM volume was found between the Caucasian and Chinese brains. To validate the global volume measurements between Caucasian and Chinese confounded by the difference in data acquisition we performed global volume measurements difference analysis of the HCP and CNV datasets. We found similar results as in comparison with the CHCP dataset. Caucasian brains had lower total GM (p < 0.001, t = 3.83, two-sample t-test), total CSF (p < 0.001, t = 4.85, two-sample t-test) and total brain volumes (p < 0.001, t = 4.76, two-sample t-test) compared with the Chinese brains (Fig. 4b). However, between CNV and a subset from the CHCP dataset, there were no significant differences among global brain tissue measurements of GM, WM and CSF (Fig. 4c, all p-values > 0.01). Fig. 4 (d) and (e) show significant differences in the structural volume for gender effect in both Caucasian and Chinese brains. The female’s brain had lower total GM, WM, CSF and total brain volumes in HCP and CHCP datasets. The detailed results of global brain measurements of gender-specific individual brain in the HCP and CHCP datasets show that female brains were shorter in length and smaller in width and height compared with the male brains (Table 5). Fig. 5 shows detailed US200 and CN200 brain/head templates and tissue probability maps. A visual inspection revealed different morphological structures between the US200 and CN200 brain templates in the brain gyri and sulci and obvious anatomical differences in the surface of the GM/WM junctions. Quantitative morphological differences between the US200 and CN200 brain templates are shown in Table 6 and Fig. 6, where the US200 and CN200 brain templates are aligned to the same AC-PC position using the rigid registration method in ANTs software to preserve the original morphological information. The Chinese brain template showed a smaller length, greater width and increased height compared with the Caucasian brain template. The gender-specific templates including US100(F), US100(M), CN100(F) and CN100(M) are shown in Fig. 7, and the quantitative global brain 18 / 37

measurements are shown in Table 6. The female brain template showed a smaller length, width and height compared with the male brain template in both Caucasian and Chinese datasets. The regional distributions of the anatomical differences between the two templates using mDV and mLJD in 82 Brodmann regions are illustrated in Table 7 and Fig. 8. The regions with large anatomical differences are located bilaterally in the supramarginal gyri (part of Wernicke’s area), inferior frontal gyri (part of Broca’s area) and superior temporal cortex. The regional anatomical differences between gender-specific templates are shown in Fig. 9. Specifically, in Caucasian templates, the regions showing greatest anatomical differences are mainly located in the right superior frontal gyrus, the bilateral supramarginal gyri and bilateral angular gyri. Similarly, in Chinese templates differences are mainly located in the bilateral superior frontal gyri, the bilateral supramarginal gyri and bilateral angular gyri.

3.4 Population-matched templates produce higher tissue segmentation accuracy As indicated by the results shown in Fig. 10, the use of a population-matched brain template produced more accurate tissue segmentation in the classification of GM, WM and CSF with probability thresholds ranging from 0.1 to 0.4, and this finding was obtained with both the HCP and CHCP test sets (all p-values < 0.001).

3.5 Linear registration parameters Fig. 11 shows the scale values obtained after registering the test set images to the different brain templates. A scale value close to 1.0 was obtained when registering a subject’s brain images to the population-matched templates, which indicates that less stretching was needed for the registration. In contrast, the scale value explicitly deviated from 1.0 when the brain images were registered to population-mismatched brain templates. Significant interaction was observed between different population subjects and different templates in all three directions (sagittal direction, Fx = 86.72, p < 0.001; coronal direction, Fy = 29.79, p < 0.001; and axial direction, Fz = 41.41, p < 0.001).

19 / 37

3.6 Nonlinear registration parameters Fig. 12 (a) and (b) show the averaged mDV and mLJD mapped onto the US200 and CN200 templates, which include averaged deformation fields obtained through the registration of images from the HCP test set to the US200 brain template, the HCP test set to the CN200 brain template, the CHCP test set to the CN200 brain template and the CHCP test set to the US200 brain template. Fig. 12 (c) and (d) show violin plots of the mDV and mLJD. The results show that nonlinear registration using the population-matched brain template generates a smaller deformation field, which improves the registration accuracy. As demonstrated by mixed-design ANOVA, different population subjects and different templates exert significant interaction effects on the mDV (F = 220.01, p < 0.001) and mLJD (F = 4187.87, p < 0.001). Moreover, the comparison of brain anatomical features between original brain images and images registered to the population-matched or mismatched brain templates revealed significant changes in length, width and height (Table 8). The results illustrated better alignment and smaller deformations with the population-matched template than with the population-mismatched template. Specifically, the analysis of the Caucasian dataset revealed significant differences in brain length (p < 0.001, t = 7.07, paired t-test), width (p < 0.001, t = 9.53, paired t-test) and height (p < 0.001, t = 4.43, paired t-test) between the original Caucasian brain images and those obtained after registration to the CN200 template. In contrast, no significant difference in length, width and height was found between the original Caucasian brains and those obtained after registration to the US200 brain template. Similar results were obtained for the Chinese dataset: significant differences in brain length (p < 0.001, t = 8.47, paired t-test), width (p < 0.001, t = 16.75, paired t-test) and height (p < 0.001, t = 5.05, paired t-test) were found between the original brain images and those obtained after registration to the US200 template, and no significant difference in brain length, width and height were found between the original Chinese brain images and those obtained after registration to the CN200 brain template.

20 / 37

4. Discussion In this study, we analyzed the changes in volumetric brain template variability with respect to changes in the sample size. We found that a sample size of 200 images was sufficient for brain template construction based on considerations of both cost and the status of brain template variability. An analysis of the US200 and CN200 brain templates also revealed significant morphological differences between the two templates bilaterally in supramarginal gyri and inferior frontal gyri. Finally, MRI segmentation and registration accuracy analyses showed higher segmentation and registration accuracies with the population-matched templates than the population-mismatched templates, which indicates the importance and necessity of constructing a population-matched template. The separation phenomenon of spatially distributed regional brain variations reflects an intersubject structural variability in the heterogeneous pattern across the human brain cortex. The results demonstrated that structural variability has a specific distribution, with the heteromodal association cortex being the most variable and relatively stable in the sensorimotor and limbic brain regions. In principle, a large and slow convergence in the mDV or mLJD for an area of the brain indicates high intersubject structural variability, and thus, more subjects were needed for brain template construction to obtain a stable and representative template for that area. Inversely, a small and quick convergence in the mDV or mLJD for a specific brain region reflects low intersubject structural variability, which implies that this brain area would achieve stability even with a small sample size during brain template construction. These regional differences were consistent with those obtained in a previous study that illustrated a plethora of evidence showing that neural systems with higher-order association and integration processes are more variable than those implicated in unimodal processing (Amunts et al., 1999; Kaas, 2006; Mueller et al., 2013; Smaers et al., 2011). According to evolutionary and postnatal expansion studies, human cortical expansion is nonuniform: high-expansion brain regions include the lateral temporal, parietal, and frontal cortices, and low-expansion regions are concentrated in the sensorimotor, visual and other limbic brain cortex (Hill et al., 2010a; Hill et al., 2010b). These findings are also consistent with our finding of brain variabilities in the sensorimotor, visual and limbic brain regions. By 21 / 37

extension, the detailed spatially distributed regional brain variations could help us understand which brain template regions show increased accuracy and stability with a certain sample size. The heterogeneity of brain anatomy is the key factor that determines the optimal number of the subjects for a template of a specific population. In other words, different heterogeneity of brain anatomy would need different optimal sample size for brain template construction. Thus, in this study, we have quantified the brain anatomy heterogeneity using intersubject variability by indexes of mDV and mLJD in both Caucasian and Chinese brains. We found that the heterogeneity in cortical structural anatomy was significantly correlated between Caucasian and Chinese in both indexes of mDV and mLJD. Furthermore, the comparison of intersubject variability between Caucasian and Chinese subjects in specific brain areas show that 96% of brain areas exhibited no significant difference in brain anatomy heterogeneity. These results revealed that Caucasian and Chinese brains shared the similar pattern of anatomy heterogeneity. This suggests a similar brain anatomy heterogeneity across different populations. Hence, the sample size effect in the Caucasian brain template can be considered similar to the sample size effect in the Chinese brain template. Previous studies have reported global differences in brain shape, size and volume between Chinese and Caucasian populations (Tang et al., 2010; Xie et al., 2015). In this study, under the consideration of the sample size effect, we also extracted appropriate sample sizes of age- and gender-matched data from the HCP and CHCP datasets to construct US200 Caucasian and CN200 Chinese brain templates. Consistent with previous studies that have reported global differences in brain shape, size and volume between Chinese and Caucasian populations (Tang et al., 2010; Xie et al., 2015), our results identified significant morphological differences in global brain features. Moreover, our study extended these global structural differences to a detailed regional level in adult brain templates. We found large anatomical differences in brain regions that were bilaterally located in the supramarginal gyri (part of Wernicke’s area), inferior frontal gyri (part of Broca’s area) and superior temporal cortex, which were consistent with the results of a previous pediatric brain template study (Zhao et al., 2019). In comparison of gender-specific templates, we found that the regions 22 / 37

showing the greatest anatomical differences are mainly located in the right superior frontal gyrus, the bilateral supramarginal gyri and bilateral angular gyri in both Caucasian and Chinese datasets, which was consistent with the gender-differences results reported in a pediatric brain template study (Zhao et al., 2019). Furthermore, to illustrate the necessity of using a population-matched brain template, we assessed the segmentation accuracies and linear and nonlinear registrations of the HCP and CHCP test sets using both the US200 and CN200 brain templates. When the HCP images were linearly registered to the CN200 brain template, the axial and sagittal scale values increased, and the coronal scale values decreased. Conversely, when the CHCP images were linearly registered to the US200 brain template, the axial and sagittal scale values decreased, and the coronal scale values increased. This finding indicated that the brains from the CHCP test set were smaller in length but showed increased values of width and height than the brains from the HCP test set. In nonlinear registration, the deformation field represents the pixel displacement during the registration of two images and thus reflects the global nonlinear shape differences (Ashburner and Friston, 2000). Thus, a smaller deformation indicates that fewer distortions were introduced when registering the source image to the target image. Our results show that the use of a population-matched template as the target resulted in a decreased scale and a smaller deformation field (represented by mDV and mLJD) compared

with

the

use

of

population-mismatched

templates.

Furthermore,

a

population-matched brain template improved the tissue segmentation. These findings suggest that using a population-matched brain template reduces brain segmentation and registration bias and improves the accuracy of the subsequent statistical analysis of the data (Shi et al., 2017). Compared with the scalar mDV and mLJD indexes, which we used for the evaluation of volume change and voxel displacement, the displacement vector field (deformation tensor matrix) is a multivariate pattern for measuring the deformation field feature in orientation-specific characteristics (Studholme and Cardenas, 2007). Such orientation components of the deformation tensor characteristics might reveal anatomical regions that are predominantly anterior-posterior or medial-lateral and can provide an orientation-sensitive 23 / 37

correlation with clinical variables of interest. In this study, we focused on whole brain region variability in the mean three orientations and did not subdivide the deformation field into three directions. Thus, we selected the deformation value and the logged Jacobian determinant for the measurement of voxel displacement and volume change in three orientations. Several limitations related to this current study should be noted. First, the CHCP dataset used in this study is still confined by the number of participants because data from only 250 adults have been collected thus far. However, the specific form of the observed sample size effect was well replicated across the HCP and CHCP datasets when the sample size was less than 100, which illustrates that the effect of sample size on brain template variability is consistent in the HCP and CHCP datasets. In the future, datasets with a larger sample size and other populations need to be explored. Second, due to the limited size of the CHCP dataset, the gender-specific templates were constructed only using the sample size equal to 100, which might induce the relevant deformation variability. However, the normalized derivative was lower than 14% when the sample size equaled to 100, which means the sample size effect has been relatively small at 100 samples. Furthermore, in light of the gender differences in brain structure (Cosgrove et al., 2007; Giedd et al., 1996; Luders and Toga, 2010; Reiss et al., 1996), we may expect that single-gender groups would exhibit smaller heterogeneity than the group including both genders, which may indicate the gender-specific group having a smaller convergence sample size. Third, the proposed CN200 brain templates were generated using the structural MRI data collected in a 3.0 T MRI scanner. Future studies should determine whether these brain templates are appropriate for images collected using 1.5 T MRI scanners. Fourth, volumetric templates are suitable only for volume-based analyses, whereas cortical surface atlases allow the analysis of the highly variable and convoluted cerebral cortex and could be used for investigating differences in the cortical anatomy between Caucasian and Chinese (Bozek et al., 2018; Van Essen and Dierker, 2007). In the future, a corresponding cortical surface atlas should accompany the brain volumetric atlas for a specific population. Furthermore, it is necessary to investigate the differences in the cerebral cortex and brain surface across populations and to construct population-specific 24 / 37

cortical surface atlases.

Acknowledgements This work was supported by the National Key Research and Development Program of China (2017YFC0108901); National Natural Science Foundation of China (81790650, 81790651, 81430037, 81727808 and 31421003); Beijing Municipal Science & Technology Commission

(Z171100000117012).

Use

of

the

Human

Connectome

Project

(https://db.humanconnectome.org) dataset is acknowledged. JB has been supported by the European Regional Development Fund under the grant KK.01.1.1.01.0009 (DATACROSS). The authors also thank the National Center for Protein Sciences at Peking University in Beijing, China, for assistance with MRI data acquisition and data analyses.

Declaration of interest The authors have no financial or competing interests to declare.

25 / 37

References Amunts, K., Schleicher, A., Bürgel, U., Mohlberg, H., Uylings, H.B., Zilles, K., 1999. Broca's region revisited: cytoarchitecture and intersubject variability. Journal of Comparative Neurology 412, 319-341. Ashburner, J., Friston, K.J., 2000. Voxel-based morphometry-the methods. Neuroimage 11, 805-821. Avants, B.B., Epstein, C.L., Grossman, M., Gee, J.C., 2008. Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain. Med Image Anal 12, 26-41. Avants, B.B., Tustison, N., Song, G., 2009. Advanced normalization tools (ANTS). Insight J 2, 1-35. Avants, B.B., Tustison, N.J., Song, G., Cook, P.A., Klein, A., Gee, J.C., 2011. A reproducible evaluation of ANTs similarity metric performance in brain image registration. Neuroimage 54, 2033-2044. Bai, J., Abdul-Rahman, M.F., Rifkin-Graboi, A., Chong, Y.S., Kwek, K., Saw, S.M., Godfrey, K.M., Gluckman, P.D., Fortier, M.V., Meaney, M.J., Qiu, A., 2012. Population differences in brain morphology and microstructure among Chinese, Malay, and Indian neonates. PLoS One 7, e47816. Bhalerao, G.V., Parlikar, R., Agrawal, R., Shivakumar, V., Kalmady, S.V., Rao, N.P., Agarwal, S.M., Narayanaswamy, J.C., Reddy, Y.C.J., Venkatasubramanian, G., 2018. Construction of population-specific Indian MRI brain template: morphometric comparison with Chinese and Caucasian templates. Asian J Psychiatr 35, 93-100. Bozek, J., Makropoulos, A., Schuh, A., Fitzgibbon, S., Wright, R., Glasser, M.F., Coalson, T.S., O'Muircheartaigh, J., Hutter, J., Price, A.N., Cordero-Grande, L., Teixeira, R., Hughes, E., Tusor, N., Baruteau, K.P., Rutherford, M.A., Edwards, A.D., Hajnal, J.V., Smith, S.M., Rueckert, D., Jenkinson, M., Robinson, E.C., 2018. Construction of a neonatal cortical surface atlas using multimodal surface matching in the developing human connectome project. Neuroimage 179, 11-29. Chung, M.K., Worsley, K.J., Paus, T., Cherif, C., Collins, D.L., Giedd, J.N., Rapoport, J.L., Evans, A.C., 2001. A unified statistical approach to deformation-based morphometry. Neuroimage 14, 595-606. Cosgrove, K.P., Mazure, C.M., Staley, J.K., 2007. Evolving knowledge of sex differences in brain structure, function, and chemistry. Biological psychiatry 62, 847-855. Croxson, P.L., Forkel, S.J., Cerliani, L., Thiebaut de Schotten, M., 2018. Structural Variability Across the Primate Brain: A Cross-Species Comparison. Cereb Cortex 28, 3829-3841. Dice, L.R., 1945. Measures of the amount of ecologic association between species. Ecology 26, 297-302. Evans, A.C., Janke, A.L., Collins, D.L., Baillet, S., 2012. Brain templates and atlases. Neuroimage 62, 911-922. Fonov, V., Evans, A.C., Botteron, K., Almli, C.R., McKinstry, R.C., Collins, D.L., Brain Development Cooperative, G., 2011. Unbiased average age-appropriate atlases for pediatric studies. Neuroimage 54, 313-327. Giedd, J.N., Snell, J.W., Lange, N., Rajapakse, J.C., Casey, B., Kozuch, P.L., Vaituzis, A.C., 26 / 37

Vauss, Y.C., Hamburger, S.D., Kaysen, D., 1996. Quantitative magnetic resonance imaging of human brain development: ages 4–18. Cereb Cortex 6, 551-559. Glasser, M.F., Sotiropoulos, S.N., Wilson, J.A., Coalson, T.S., Fischl, B., Andersson, J.L., Xu, J., Jbabdi, S., Webster, M., Polimeni, J.R., Van Essen, D.C., Jenkinson, M., Consortium, W.U.-M.H., 2013. The minimal preprocessing pipelines for the human connectome project. Neuroimage 80, 105-124. Hill, J., Inder, T., Neil, J., Dierker, D., Harwell, J., Van Essen, D., 2010a. Similar patterns of cortical expansion during human development and evolution. Proc Natl Acad Sci U S A 107, 13135-13140. Hill, J., Dierker, D., Neil, J., Inder, T., Knutsen, A., Harwell, J., Coalson, T., Van Essen, D., 2010b. A surface-based analysis of hemispheric asymmetries and folding of cerebral cortex in term-born human infants. Journal of Neuroscience 30, 2268-2276. Kaas, J.H., 2006. Evolution of the neocortex. Current Biology 16, R910-R914. Lee, J.S., Lee, D.S., Kim, J., Kim, Y.K., Kang, E., Kang, H., Kang, K.W., Lee, J.M., Kim, J.-J., Park, H.-J., Kwon, J.S., Kim, S.I., Yoo, T.W., Chang, K.-H., Lee, M.C., 2005. Development of Korean standard brain templates. J Korean Med Sci 20, 483-488. Leow, A.D., Yanovsky, I., Chiang, M.C., Lee, A.D., Klunder, A.D., Lu, A., Becker, J.T., Davis, S.W., Toga, A.W., Thompson, P.M., 2007. Statistical properties of Jacobian maps and the realization of unbiased large-deformation nonlinear image registration. IEEE Trans Med Imaging 26, 822-832. Liang, P., Shi, L., Chen, N., Luo, Y., Wang, X., Liu, K., Mok, V.C., Chu, W.C., Wang, D., Li, K., 2015. Construction of brain atlases based on a multi-center MRI dataset of 2020 Chinese adults. Sci Rep 5, 18216. Luders, E., Toga, A.W., 2010. Sex differences in brain anatomy. Progress in brain research. Elsevier, pp. 2-12. Luo, Y., Shi, L., Weng, J., He, H., Chu, W.C., Chen, F., Wang, D., 2014. Intensity and sulci landmark combined brain atlas construction for Chinese pediatric population. Hum Brain Mapp 35, 3880-3892. Marcus, D.S., Harms, M.P., Snyder, A.Z., Jenkinson, M., Wilson, J.A., Glasser, M.F., Barch, D.M., Archie, K.A., Burgess, G.C., Ramaratnam, M., Hodge, M., Horton, W., Herrick, R., Olsen, T., McKay, M., House, M., Hileman, M., Reid, E., Harwell, J., Coalson, T., Schindler, J., Elam, J.S., Curtiss, S.W., Van Essen, D.C., Consortium, W.U.-M.H., 2013. Human connectome project informatics: quality control, database services, and data visualization. Neuroimage 80, 202-219. Mueller, S., Wang, D., Fox, M.D., Yeo, B.T., Sepulcre, J., Sabuncu, M.R., Shafee, R., Lu, J., Liu, H., 2013. Individual variability in functional connectivity architecture of the human brain. Neuron 77, 586-595. Reiss, A.L., Abrams, M.T., Singer, H.S., Ross, J.L., Denckla, M.B., 1996. Brain development, gender and IQ in children: a volumetric imaging study. Brain 119, 1763-1774. Richards, J.E., Sanchez, C., Phillips-Meek, M., Xie, W., 2016. A database of age-appropriate average MRI templates. Neuroimage 124, 1254-1259. Richards, J.E., Xie, W., 2015. Brains for all the ages: structural neurodevelopment in infants and children from a life-span perspective. Adv Child Dev Behav 48, 1-52. Shi, L., Liang, P., Luo, Y., Liu, K., Mok, V.C.T., Chu, W.C.W., Wang, D., Li, K., 2017. Using 27 / 37

large-scale statistical Chinese brain template (Chinese2020) in popular neuroimage analysis toolkits. Front Hum Neurosci 11, 414. Smaers, J., Steele, J., Case, C., Cowper, A., Amunts, K., Zilles, K., 2011. Primate prefrontal cortex evolution: human brains are the extreme of a lateralized ape trend. Brain, behavior and evolution 77, 67-78. Studholme, C., Cardenas, V., 2007. Population based analysis of directional information in serial deformation tensor morphometry. International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, pp. 311-318. Tang, Y., Hojatkashani, C., Dinov, I.D., Sun, B., Fan, L., Lin, X., Qi, H., Hua, X., Liu, S., Toga, A.W., 2010. The construction of a Chinese MRI brain atlas: a morphometric comparison study between Chinese and Caucasian cohorts. Neuroimage 51, 33-41. Tang, Y., Zhao, L., Lou, Y., Shi, Y., Fang, R., Lin, X., Liu, S., Toga, A., 2018. Brain structure differences between Chinese and Caucasian cohorts: a comprehensive morphometry study. Hum Brain Mapp 39, 2147-2155. Van Essen, D.C., Dierker, D.L., 2007. Surface-based and probabilistic atlases of primate cerebral cortex. Neuron 56, 209-225. Van Essen, D.C., Smith, S.M., Barch, D.M., Behrens, T.E., Yacoub, E., Ugurbil, K., Consortium, W.U.-M.H., 2013. The WU-Minn human connectome project: an overview. Neuroimage 80, 62-79. Wang, X., Chen, N., Zuo, Z., Xue, R., Jing, L., Yan, Z., Shen, D., Li, K., 2013. Probabilistic MRI brain anatomical atlases based on 1,000 Chinese subjects. PLoS One 8, e50939. Xie, W., Richards, J.E., Lei, D., Lee, K., Gong, Q., 2015a. Comparison of the brain development trajectory between Chinese and U.S. children and adolescents. Frontiers in Systems Neuroscience 8. Xie, W., Richards, J.E., Lei, D., Zhu, H., Lee, K., Gong, Q., 2015b. The construction of MRI brain/head templates for Chinese children from 7 to 16 years of age. Dev Cogn Neurosci 15, 94-105. Yanovsky, I., Thompson, P.M., Osher, S., Leow, A.D., 2007. Topology preserving log-unbiased nonlinear image registration: Theory and implementation. 2007 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp. 1-8. Yeo, B.T., Krienen, F.M., Sepulcre, J., Sabuncu, M.R., Lashkari, D., Hollinshead, M., Roffman, J.L., Smoller, J.W., Zollei, L., Polimeni, J.R., Fischl, B., Liu, H., Buckner, R.L., 2011. The organization of the human cerebral cortex estimated by intrinsic functional connectivity. J Neurophysiol 106, 1125-1165. Yoon, U., Fonov, V.S., Perusse, D., Evans, A.C., Brain Development Cooperative, G., 2009. The effect of template choice on morphometric analysis of pediatric brain data. Neuroimage 45, 769-777. Zhang, Y., Brady, M., Smith, S., 2001. Segmentation of brain MR images through a hidden markov random field model and the expectation-maximization algorithm. IEEE Trans Med Imaging 20, 45-57. Zhao, T., Liao, X., Fonov, V.S., Wang, Q., Men, W., Wang, Y., Qin, S., Tan, S., Gao, J.H., Evans, A., Tao, S., Dong, Q., He, Y., 2019. Unbiased age-specific structural brain atlases for Chinese pediatric population. Neuroimage 189, 55-70. 28 / 37

29 / 37

Tables

Table 1. Demographic information of the HCP, CHCP and CNV subsets used in this study. Age range (Mean ± SD)

Number of

Gender

(years)

subjects

(female/male)

HCP dataset

22-35 (27.8 ± 3.2)

800

400/400

CHCP dataset

19-37 (21.5 ± 2.4)

250

121/129

CNV dataset

18-28 (24.7 ± 3.1)

32

16/16

Table 2. Partition scheme of the HCP and CHCP subsets used in this study. Sample size effect

Template construction

Test sets

HCP dataset

800 (multi-racial)

200 (U.S. Caucasian)

50 (U.S. Caucasian)

CHCP dataset

200 (Chinese Han)

200 (Chinese Han)

50 (Chinese Han)

Table 3. Model fitting performance of brain template variability in the HCP and CHCP datasets. Power function

Exponential function

Linear function

(mDV / mLJD)

(mDV / mLJD)

(mDV / mLJD)

r2 (HCP)

0.92 / 0.94

0.90 / 0.91

0.80 / 0.82

r2 (CHCP)

0.83 / 0.83

0.81 / 0.82

0.72 / 0.79

30 / 37

Table 4. The model fitting parameters of brain template variability in the HCP and CHCP datasets. Power function

Exponential function

Linear function

(f(𝑥) = 𝑎 × 𝑥 𝑏 + 𝑐)

(f(𝑥) = 𝑎 × 𝑒 𝑏×𝑥 + 𝑐)

(f(𝑥) = 𝑎 × 𝑥 + 𝑏)

a, b, c (HCP, mDV)

6.200, -0.351, 3.2e-13

1.456, -0.007, 0.603

-0.003, 1.709, 0

a, b, c (HCP, mLJD)

0.086, -0.248, 4.9e-07

0.024, -0.009, 0.018

-4.4e-05, 0.034, 0

a, b, c(CHCP, mDV)

6.448, -0.377, 2.2e-14

1.823, -0.023, 0.919

-0.014, 2.252, 0

a, b, c(CHCP, mLJD)

0.088, -0.225, 7.8e-05

0.028, -0.020, 0.026

-2.1e-04, 0.048, 0

Table 5. Global brain measurements between individual Caucasian and Chinese brains, gender-specific brains between individual in Caucasian and Chinese. Caucasian brains

Chinese brains

t

p

(n = 200)

(n = 200)

Length (mm)

171.13 ± 7.98

162.76 ± 6.64

12.13

< 0.001

Width (mm)

130.57 ± 5.66

140.61 ± 6.27

19.36

< 0.001

Height (mm)

107.01 ± 4.87

109.14 ± 4.21

5.05

< 0.001

Caucasian brains (F)

Caucasian brains (M)

t

p

(n = 100)

(n = 100)

Length (mm)

166.08 ± 6.07

176.17 ± 6.33

13.18

< 0.001

Width (mm)

128.01 ± 4.85

133.13 ± 5.23

7.58

< 0.001

Height (mm)

104.63 ± 3.97

109.38 ± 4.52

8.44

< 0.001

Chinese brains (F)

Chinese brains (M)

t

p

(n = 100)

(n = 100)

Length (mm)

161.97 ± 6.14

165.60 ± 5.25

4.01

< 0.001

Width (mm)

138.44 ± 6.74

143.66 ± 5.02

5.65

< 0.001

31 / 37

Height (mm)

110.08 ± 3.89

112.91 ± 4.65

4.18

< 0.001

Table 6. Global brain measurements between the Caucasian brain template (US200) and Chinese brain template (CN200), Caucasian female (US100(F)) and male (US100(M)), Chinese female (CN100(F)) and male (CN100(M)). Template

Length (mm)

Width (mm)

Height (mm)

US200

171

130

107

CN200

162

141

110

US100(M)

176

132

111

US100(F)

166

127

106

CN100(M)

164

143

112

CN100(F)

161

138

109

32 / 37

Table 7. Regions showing large anatomical differences between the US200 and CN200 templates. Brodmann area

mDV

Brodmann area

mLJD

Left angular gyrus, Wernicke’s area (BA 39)

4.161

Left angular gyrus, Wernicke’s area (BA 39)

0.109

Right angular gyrus, Wernicke’s area (BA 39)

3.561

Right angular gyrus, Wernicke’s area (BA 39)

0.090

Left primary visual cortex (BA 18)

3.001

Right superior temporal cortex (BA 40)

0.078

Right temporal pole (BA38)

2.937

Left superior temporal cortex (BA 40)

0.075

Left medial occipital cortex (BA 17)

2.795

Left inferior frontal cortex, Broca’s area (BA 44)

0.064

Right medial temporal cortex (BA 27)

2.766

Right inferior frontal cortex, Broca’s area (BA 44)

0.062

Left junction of insula, frontal and parietal cortex (BA 43)

2.715

Right lateral temporal cortex (BA 21)

0.061

Right posterior temporal cortex (BA 37)

2.649

Right superior temporal gyrus (BA 22)

0.060

Left inferior frontal cortex, Broca’s area (BA 44)

2.627

Left dorsal-lateral prefrontal cortex (BA 9)

0.060

Right anterior-medial temporal cortex (BA 28)

0.060

Left superior parietal cortex (BA 7)

0.060

Regions with indexes greater than the mean plus one standard deviation are listed in descending order in the table.

33 / 37

Table 8. Global brain size changes in the Caucasian and Chinese test sets between the original brain images and the images obtained after registration to the US200 and CN200 templates, and the respective p-values. HCP test set

Length (mm)

Width (mm)

Height (mm)

Original brains (HCP)

169.88 ± 8.58

131.68 ± 6.16

107.26 ± 4.01

Registered to US200

171.02 ± 0.14

130.20 ± 0.40

106.50 ± 0.67

Registered to CN200

161.16 ± 0.37

140.08 ± 0.27

109.62 ± 0.82

p (Original vs. Registered to US200)

0.351

0.102

0.184

p (Original vs. Registered to CN200)

< 0.001

< 0.001

< 0.001

CHCP test set

Length (mm)

Width (mm)

Height (mm)

Original brains (CHCP)

161.92 ± 6.11

139.82 ± 4.37

108.70 ± 4.27

Registered to CN200

160.86 ± 0.35

138.66 ± 0.47

107.86 ± 0.35

Registered to US200

169.24 ± 0.43

129.30 ± 0.46

105.60 ± 0.53

p (Original vs. Registered to CN200)

0.234

0.071

0.171

p (Original vs. Registered to US200)

< 0.001

< 0.001

< 0.001

34 / 37

Figure legends

Figure 1. Effect of sample size on brain template variability. Left: Averaged (a) mDV and (b) mLJD maps of the HCP dataset projected onto the US200 template obtained with sample sizes of 40 and 200 (N = number of subjects). Right: Relationship between the sample size and (c) mDV and (d) mLJD. The circles indicate the HCP dataset, and the triangles indicate the CHCP dataset. (The dashed lines represent the value of normalized derivative when sample size equals 100 and 200.)

Figure 2. Associations of the mDV (a) and mLJD (b) of a spatially distributed regional brain template with increases in the number of subjects in the sample based on the seven resting-state networks atlas (N = number of subjects).

Figure 3. Regional intersubject variability based on the HCP (a) and CHCP (b) dataset quantified using indexes of mDV and mLJD, and shown in a 3D inflated surface view. The colors ranging from red to yellow represent low to high index values, respectively. (c) Correlation between mDV and mLJD indexes derived for the HCP and CHCP dataset across the whole brain cortex. The visualizations of the 3D inflated surface views were generated using Workbench software (www.humanconnectome.org/software/connectome-workbench).

Figure 4. (a) Differences in global brain tissue volume tissue measurements between the Caucasian (HCP) and Chinese (CHCP) brain images. (b) The validation of global brain tissue volume measurements between HCP and CNV datasets. (c) The validation of global brain tissue volume measurements between CHCP and CNV datasets. (d) Differences in global brain tissue measurements between different genders in the HCP dataset. (e) Differences in global brain tissue measurements between different genders in the CHCP dataset. (*p < 0.001).

35 / 37

Figure 5. Example slices of brain templates and tissue probability atlases. Top: The Caucasian head/brain template (US200) and its tissue probability atlases. Bottom: Chinese head/brain template (CN200) and its tissue probability atlases.

Figure 6. Morphological differences between the Caucasian (US200) and Chinese (CN200) brain templates.

Figure 7. Example slices of the gender-specific templates. These templates include the Caucasian female (US100 (F)) and male (US100(M)) brain templates, and the Chinese female (CN100 (F)) and male (CN100 (M)) brain templates.

Figure 8. Regional anatomical differences between the US200 and CN200 brain templates. Distributions of the regional anatomical differences as indicated by (a) mDV and (b) mLJD are shown in a 3D inflated surface view, and the colors ranging from red to yellow represent low to high index values, respectively. According to both indexes the regions with large anatomical differences are located bilaterally in the supramarginal gyri (part of Wernicke’s area), inferior frontal gyri (part of Broca’s area) and superior temporal cortex. The visualizations of the 3D surface views were generated using Workbench software (www.humanconnectome.org/software/connectome-workbench).

Figure 9. Regional anatomical differences between gender-specific templates in both Caucasian and Chinese. Distributions of the regional anatomical differences as indicated by the mean deformation value (mDV, panel a) and mean logged Jacobian determinant (mLJD, panel b) between US100(F) and US100(M). Regional anatomical differences represented by the mean deformation value (mDV, panel c) and mean logged Jacobian determinant (mLJD, panel d) between CN100(F) and CN100(M). The visualizations of the 3D inflated surface views

were

generated

using 36 / 37

Workbench

software

(www.humanconnectome.org/software/connectome-workbench).

Figure 10. Tissue segmentation accuracies of the US200 and CN200 brain templates for the HCP and CHCP test sets. The Dice coefficient between each individual-based (using the FAST automatic segmentation without any prior information) and population-matched and population-mismatched

brain

template-based

segmentations

were

calculated

using

cross-population validation of the HCP (left column) and CHCP (right column) test sets. The population-matched brain template shows a significantly higher Dice coefficient overlay with the individual segmentations (the reference) than the population-mismatched brain template in the tissue classifications of GM, WM and CSF, with probability thresholds ranging from 0.1 to 0.4. Two-sample t-test between using population-matched and population-mismatched templates were all p-values < 0.001.

Figure 11. Sagittal, coronal, and axial scale values following linear registration of the test sets from the HCP and CHCP datasets to population-matched and population-mismatched brain templates. (a) Fifty T1 images from the HCP test set linearly registered to the US200 and CN200 brain templates. (b) Fifty T1 images from the CHCP test set linearly registered to the CN200 and US200 brain templates.

Figure 12. mDV and mLJD values obtained using the nonlinear SyN registration method following registrations of the test sets from the HCP and CHCP datasets to the US200 and CN200 brain templates. The averaged deformation fields projected onto the US200 and CN200 templates obtained by averaging the (a) mDV and (b) mLJD from the nonlinear registration of the HCP and CHCP test set images to the US200 and CN200 brain templates are shown. Violin plots of the (c) mDV and (d) mLJD are shown.

37 / 37