Genome-wide scanning reveals genetic diversity and signatures of selection in Chinese indigenous cattle breeds

Genome-wide scanning reveals genetic diversity and signatures of selection in Chinese indigenous cattle breeds

Accepted Manuscript Genome-wide scanning reveals genetic diversity and signatures of selection in Chinese indigenous cattle breeds L. Xu , W.G. Zhang...

1MB Sizes 0 Downloads 54 Views

Accepted Manuscript

Genome-wide scanning reveals genetic diversity and signatures of selection in Chinese indigenous cattle breeds L. Xu , W.G. Zhang , H.X. Shen , Y. Zhang , Y.M. Zhao , Y.T. Jia , X. Gao , B. Zhu , L.Y. Xu , L.P. Zhang , H.J. Gao , J.Y. Li , Y. Chen PII: DOI: Reference:

S1871-1413(18)30232-4 https://doi.org/10.1016/j.livsci.2018.08.005 LIVSCI 3511

To appear in:

Livestock Science

Received date: Revised date: Accepted date:

2 May 2018 6 August 2018 6 August 2018

Please cite this article as: L. Xu , W.G. Zhang , H.X. Shen , Y. Zhang , Y.M. Zhao , Y.T. Jia , X. Gao , B. Zhu , L.Y. Xu , L.P. Zhang , H.J. Gao , J.Y. Li , Y. Chen , Genome-wide scanning reveals genetic diversity and signatures of selection in Chinese indigenous cattle breeds, Livestock Science (2018), doi: https://doi.org/10.1016/j.livsci.2018.08.005

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Highlights Collecting 724 individuals from 20 Chinese indigenous cattle breeds. Southern cattle have the abundant ROH segments and high IBS values. Southern cattle have a low genetic diversity than central and northern cattle. Detecting the selective sweeps related to growth and environments adaptation.

AC

CE

PT

ED

M

AN US

CR IP T

๏ฌ ๏ฌ ๏ฌ ๏ฌ

ACCEPTED MANUSCRIPT

Genome-wide scanning reveals genetic diversity and signatures of selection in Chinese indigenous cattle breeds L.P. Zhang1, H.J. Gao1, J.Y. Li1*, Y. Chen1*

CR IP T

L. Xu1, W.G. Zhang1, H.X. Shen2, Y. Zhang3, Y.M. Zhao4, Y.T. Jia5, X. Gao1, B. Zhu1, L.Y. Xu1,

1 Cattle Genetics and Breeding Team, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China;

2 Animal Husbandry and Veterinary Bureau of Yiling, Yichang 44300, China; 3 Xinjiang Academy of Animal Science, Urumqi 830011, China; 4 Jilin Academy of Animal Science, Changchun 130124, China;

AN US

5 Institute of Animal Husbandry and Veterinary Medicine, AnhuiAcademyof Agricultural Sciences, Hefei 230031, China;

* Correspondence: [email protected] (Y.C.); [email protected] (Y.L.); Tel.: +86-010-6281-6065

M

Email address of other authors: Ling Xu

ED

Wengang Zhang Hongxue Shen Yang Zhang

Yutang Jia Xue Gao

CE

Bo Zhu

PT

Yumin Zhao

[email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]

Lupei Zhang

[email protected]

Huijiang Gao

[email protected]

AC

Lingyang Xu

ACCEPTED MANUSCRIPT

Abstract

Chinese indigenous cattle exhibit abundant genetic resources and extensive gene pool, with 53 indigenous breeds generally classified into northern cattle breeds, central

CR IP T

cattle breeds, and southern cattle breeds. To determine the population genetic diversity and signatures of selection of Chinese indigenous cattle, we collected 724 cattle from 20 geographically representative Chinese indigenous cattle breeds and

AN US

genotyped all samples using GeneSeek Genomic Profiler Bovine LD (GGP-LD, n = 30,125). Runs of homozygosity (ROH) and identical by state (IBS) analyses were performed to investigate genetic diversity in Chinese indigenous cattle. Meanwhile, the integrated Haplotype Score (iHS) and FST-based d๐‘– methods were used to reveal

M

candidate selective sweeps. Our results showed that southern cattle breeds have

ED

abundant ROH segments and higher IBS values in comparison with northern and central cattle breeds. We also detected many potential selective sweeps in Chinese

PT

indigenous cattle. The genes within intervals spanning the candidate regions are associated with growth and development (NCAPG, LAP3, LCORL, IBSP and MEPE),

CE

fertility and reproduction (ABCG2, CATSPER4 and H1foo), immune functions

AC

(AZU1, PROC and LRP1) and environment adaption (RBFA, BARX2). Overall, these findings provide new insights into the level of genetic diversity of Chinese indigenous cattle, and suggest a role of natural/artificial selection in shaping their genome genetic variability. Keywords Chinese indigenous cattle; Genetic diversity; Signatures of selection

ACCEPTED MANUSCRIPT

1. Introduction

Cattle represent one of excellent models of domestication among four major old world livestock species (cattle, goat, sheep and pig), and play an economically important role in agriculture providing meat, milk, hides for leather, and draught force

CR IP T

for pulling carts, and ploughing. According to available genetic and archaeological evidences, cattle were domesticated around 8,000โ€“10,000 years ago in the Near East and South Asia, and scattered over nearly all inhabited continents rapidly along with

AN US

human immigration (Bradley et al., 1996; Jared and Peter, 2003; Troy et al., 2001; Zhang et al., 2013). Since then, combined effects of natural and artificial selection have operated on cattle and resulted in marked changes on behavioral, morphological

M

and physiological characteristics.

ED

In China, cattle exhibit abundant genetic resources and extensive gene pool (Yu et al., 1999). According to geographic dispersal of Chinese cattle breeds in Animal

PT

Genetic Resources in China: Bovines, 53 indigenous breeds (Rischkowsky and Pilling,

CE

2007) are generally classified into three types: northern, central, and southern breeds

AC

(Zhang et al., 2011). Chinese indigenous cattle are known to be well adapted to diverse environmental conditions and have resistance or tolerance to disease. Currently, under production-oriented breeding, they are mainly developed for meat, milk, and fertility. Thus, long-term natural selection and different intensity artificial selection acting on Chinese indigenous cattle have left detectable signatures of selection on their genome, and the functionally important candidate regions and genes

ACCEPTED MANUSCRIPT

that contribute to phenotypic diversity can be localized. Recently, accompanied by the available genomic data generated by powerful high-throughput genotyping and sequencing technology, a number of selective sweep studies using different statistical methods have efficiently characterized the putative

CR IP T

candidate regions and genes in cattle (Daetwyler et al., 2014; Qanbari et al., 2014), pig (Groenen et al., 2012; Rubin et al., 2012), sheep (Kijas et al., 2012) and chicken (Rubin et al., 2010). In Chinese indigenous cattle, Gao et al. (2017) scanned

AN US

whole-genome selective sweep for 437 Chinese indigenous cattle using the population differentiation statistic (XtX) and detected the candidate genes involved in environment adaptation (TNFRSF19, RFX2). Meanwhile, Mei et al. (2017) adopted

M

the allele frequency spectrum based Tajimaโ€™D and d๐‘– method to localize the

ED

signatures of selection for Six Chinese indigenous breeds with resequencing data. They detected genes within the intervals spanning the candidate regions associated

PT

with coat color (RECC2, MC1R), dairy traits (NCAPG, PAG1) and meat production

CE

(BBS2, R3HDM1). Indeed, these studies have found the potential signatures of

AC

selection in Chinese indigenous cattle. However, the number of samples and the breeds involved are limited, and most importantly, the features of the selection that were recently obtained from the properties of haplotype segregating within population were not taken into account. Here, to better target candidate regions/genes subjected to positive selection and investigate genetic diversity in Chinese indigenous cattle, we collected 724 individuals from 20 representative breeds, including six northern cattle

ACCEPTED MANUSCRIPT

breeds, eight central cattle breeds and six southern cattle breeds, and obtained SNP genotyping data from GeneSeek Genomic Profiler Bovine LD (GGP-LD) assays (n = 30,125). Previous studies of genetic diversity of Chinese indigenous cattle were mainly based on microsatellite (Zhang et al., 2007; Zhou et al., 2005) and mtDNA

CR IP T

(Cai et al., 2007; Lai et al., 2006). In this study, we used whole-genome genotyping data for the runs of homozygosity (ROH) and identical by state (IBS) analyses to elucidate genetic diversity of indigenous cattle. The whole genome-scan for possible

AN US

signatures of selection were performed through the complementary Extended Haplotype Homozygosity (EHH)-based integrated Haplotype Score (iHS) (Sabeti et al., 2002; Voight et al., 2006) and FST-based d๐‘– (Akey et al., 2010; Weir and

M

Cockerham, 1984) methods. A set of candidate regions and genes related to growth

ED

and development, fertility and reproduction, immune functions and environment adaption were identified. The results illustrate the level of genetic diversity in Chinese

PT

indigenous cattle, and target the potential selected hot points on the bovine genome.

CE

2. Materials and Methods

AC

2.1. Population, SNP Genotyping and Quality control According to description of geographic dispersal of Chinese cattle breeds in

Animal Genetic Resources in China: Bovines, a total of 724 individuals of 20 representative indigenous breeds have been collected from 14 provinces or municipalities, including six northern breeds, eight central breeds and six south breeds (Fig. 1A). The six northern cattle breeds contained Chinese Simmental (CS), Chinese

ACCEPTED MANUSCRIPT

Caoyuan red (CR), Yanhuang (YH), Menggu (MG), Liaoyu white (LW) and Xinjiang Brown (XB). The eight central cattle breeds included Qinchuan (QC), Nanyang (NY), Jinnan (JN), Luxi (LX), Huangpi (HP), Zaobei (ZB), Wuling (WL) and Yiling (YL). The six southern cattle breeds comprised Dabieshan (DS), Wenshan (DZ), Dianzhong

CR IP T

(DZ), Zhaotong (ZT), Nandan (ND) and Longlin (NL). The name abbreviation, sample size and geographic distribution of per breed was summarized in Table 1.

For each individual, ~5 mL of venous blood was collected from the jugular vein

AN US

and then stored at -20ยฐC. DNA was extracted using a TIANamp Blood DNA Kit (Tiangen Biotech Company limited, Beijing, China), and qualified DNA samples were genotyped using GeneSeek Genomic Profiler Bovine LD Chip (GGP Bovine LDv4).

M

A basic genetic information of 30,330 SNPs was scanned using iScan platform and

ED

analyzed using GenomeStudio software, with the average SNP spacing within the genome is approximately 89 kb.

PT

Quality control of SNP genotype data was assessed using PLINK v1.07 software

CE

(Purcell et al., 2007) (http://pngu.mgh.harvard.edu/purcell/plink/). We pruned out

AC

individuals and loci that failed any of the following 5 criterions: (1) markers with >0.90 call rate; (2) minor allele frequency (MAF) of SNP > 0.01; (3) a p-value of Hardy-Weinberg Equilibrium (HWE) test higher than 10-6; (4) SNP only on autosomal and (5) individual with >0.95 call rate. Finally, 724 individuals and 23,748 SNPs were left for downstream analysis. 2.2.Principal components analysis and linkage disequilibrium analysis

ACCEPTED MANUSCRIPT

Principal component analysis (PCA) was performed using EigenStrat (Price et al., 2006). Before analysis, to ensure the high LD level do not distort PCA result, SNPs pruning process was adopted with a window size 50 SNPs, a step of 5 SNPs and r2 threshold of 0.25, resulting in 9,825 independent SNPs identified. We also quantified

CR IP T

the degree of linkage disequilibrium (LD) for all pairwise SNP in 1.5Mb window with the squared correlation coefficients (r2) using PLINK v1.07 software. 2.3. Genetic diversity analysis

AN US

Four statistics, observed heterozygosis (Ho), expected heterozygosis (He), inbreeding coefficient (F) and polymorphic SNP (Pn) were calculated. Identical by state (IBS) and runs of homozygosity (ROH) were estimated for each breed to

M

observe genetic relatedness and genomic homozygosity level. The definition of ROH

ED

segments need the following several requirements. Firstly, the length of ROH should be more than 1000 kb, since very short and common ROH occur often due to LD.

PT

Second, the number of homozygous SNPs of ROH should exceed 20, and no more

CE

than two SNPs with possible heterozygous genotype present in each ROH. Last, the

AC

distance of two consecutive homozygous SNPs within a ROH need be less than 1000kb, if not, the ROH could be split in two. All of those work was completed through PLINK v1.07 software. 2.4. selective sweep, Gene annotation and functional analysis Evidence of segregating positive selection was investigated through a complementary method of Integrated Haplotype Score (iHS) and FST-based d๐‘– . Firstly,

ACCEPTED MANUSCRIPT

scanning of within-population signatures of selection was conducted for each breed using iHS method which bases on the extent of local long haplotypes carrying the ancestral/derived state of allele and in favor of variants that have not yet reached fixation (Voight et al., 2006). For this analysis, the haplotype was phased using

CR IP T

Beagle (Browning and Browning, 2007), and iHS score was calculated for each SNP within breed using Selscan software (Szpiech and Hernandez, 2014). The formula for the standardized iHS was as follows: )

* (

)+

AN US

(

*

(

)+

(1)

where iHHA and iHHD represent the integrated Extended Haplotype Homozygosity

M

(EHH) score for ancestral and derived core alleles respectively. The top 0.5% of |iHS| score was used to infer genomic candidate regions under recent positive selection.

ED

Secondly, d๐‘– of unbiased estimation based on FST was applied to investigation

PT

population differentiation among northern, central and southern cattle breeds. This

CE

method was robust whether selection acts on newly arisen or pre-existing variations (Akey et al., 2010). Given the small population size may have less representative

AC

genetic information, breeds with the number of individual below 20 were not included. Consequently, three cattle breed groups were used for this analysis, inluding northern cattle breeds (CS, CR, YH and LW), central cattle breeds (QC, JN, HP, ZB, WL and YL) and southern cattle breeds (WS, DZ, ZT, ND and LL). Briefly, for each SNP in a comparison, we calculated the expected value of FST with the Genepop software

ACCEPTED MANUSCRIPT

(Rousset, 2008). Then d๐‘– statistic was calculated as described by (Akey et al., 2010) : ๐‘–๐‘—

d๐‘–

where

๐‘–๐‘—

โˆ‘

๐‘—โ‰ ๐‘–

๐น๐‘†๐‘‡

๐‘–๐‘—

[๐น๐‘†๐‘‡ ]

(2)

๐‘–๐‘—

๐‘ ๐‘‘[๐น๐‘†๐‘‡ ]

๐‘–๐‘—

[๐น๐‘†๐‘‡ ] and ๐‘ ๐‘‘[๐น๐‘†๐‘‡ ] denoted the expected value and SD of FST between

CR IP T

group i and j calculated from all SNPs. For each group, d๐‘– value was averaged over the SNPs in overlapping window size of 500 kb sliding 250 kb. The top 1% of windows with significant d๐‘– value then was defined as candidate selective sweep regions.

AN US

Finally, candidate regions under selection were retrieved from Ensembl genome browser (http://www.ensembl.org/) using the Bos_taurus_UMD_3.1.1 reference genome assembly to annotate candidate genes. Gene ontology enrichment and

M

functional annotation of candidate genes were defined based on the DAVID database

ED

(Huang et al., 2009a; Huang et al., 2009b) to identify the significantly relevant

CE

3. Results

PT

pathways, biological processes, cellular component and molecular function.

3.1. population structure pattern

AC

After quality control, 724 individuals of 20 Chinese indigenous cattle breeds

were remained (Table 1), and a total of 23,749 SNPs were used in the final analyses. The population structure pattern then was inferred by principal components analysis (PCA) in scatter plot of Fig. 1B. The PC1 accounting for 2.98% of total variation separated all individuals into three distinctive clusters, which was consistent with the

ACCEPTED MANUSCRIPT

description of breed geographic dispersal. The central cattle breeds were localized between southern and northern breeds, and individuals of central breed NY, LX, YL and WL were clustered with southern cattle breeds LL and ZT. The PC2 accounting for 1.44% of total variation positioned the southern breed DS apart from other

CR IP T

southern breeds (WS, DZ, ZT, LL and ND) and formed an independent branch. Likewise, the northern breeds MG and LW were separated from other northern breeds by PC2. Table 1

AN US

Description of observed and expected heterozygosities, proportion of polymorphic SNPs and inbreeding coefficient in 20 Chinese indigenous cattle breeds. Breed

Region1

Breed

Number of

abbr.

individuals

CS

106

North

CR

26

North

Yanhuang

YH

33

Menggu

MG

15

Liaoyu

LW

20

Chinese

Ho

He

Pn

F-gene

F-ROH

Dual-purpose

0.34

0.34

0.97

0

0.060

Dual-purpose

0.34

0.33

0.89

-0.028

0.093

North

Dual-purpose

0.34

0.30

0.91

-0.015

0.046

North

Dual-purpose

0.34

0.33

0.91

-0.026

0.055

North

Dual-purpose

0.35

0.34

0.90

-0.030

0.059

North

Dual-purpose

0.31

0.29

0.80

-0.071

0.149

M

Simmental Chinese Caoyuan

Xinjiang

XB

18

CE

Brown

PT

white

ED

red

Combined data set2

Type

QC

50

Central

meat

0.37

0.38

0.99

0.020

0.044

Nanyang

NY

13

Central

meat

0.36

0.34

0.95

-0.044

0.057

Jinnan

JN

76

Central

meat

0.37

0.37

0.99

-0.005

0.040

Luxi

LX

15

Central

meat

0.36

0.35

0.96

-0.020

0.055

Huangpi

HP

25

Central

meat

0.44

0.38

0.98

-0.014

0.014

Zaobei

ZB

35

Central

meat

0.35

0.36

0.99

0.026

0.070

Wuling

WL

30

Central

meat

0.35

0.35

0.98

0.004

0.055

Yiling

YL

39

Central

meat

0.33

0.33

0.98

0.005

0.077

Dabieshan

DS

47

South

meat

0.33

0.33

0.98

0.011

0.086

Wenshan

WS

50

South

meat

0.32

0.33

0.97

0.029

0.098

Dianzhong

DZ

49

South

meat

0.31

0.33

0.94

0.051

0.140

Zhaotong

ZT

41

South

meat

0.36

0.36

0.98

0.015

0.057

AC

Qinchuan

ACCEPTED MANUSCRIPT

Nandan

ND

19

South

meat

0.26

0.26

0.82

0.020

0.184

Longlin

LL

17

South

meat

0.30

0.30

0.90

-0.003

0.115

1

Northern refers to northern China; Central refers to central China; and Southern refers to southern

China. 2

proportion of polymorphic SNPs (Pn), observed heterozygosities (Ho), expected heterozygosities (He)

AN US

CR IP T

and inbreeding coefficient (F-gene, F-ROH) were calculated using SNPs after quality control.

M

Fig. 1. The geographical distribution illustration and Principle component analysis (PCA) for 20 indigenous cattle breeds. (A) The geographical distribution of selected 20 indigenous Chinese cattle

ED

breeds including six northern cattle breeds, eight central cattle breeds and six southern cattle breeds. (B) PCA analysis for 724 individuals of 20 breeds studied, in which PC1 explained 2.98% of total

PT

variation and PC2 explained 1.44% of total variation.

CE

3.2. Genetic diversity assessment

AC

The results of polymorphic SNP (Pn), expected heterozygosis (He), and

Observed heterozygosis (Ho) were ranged from 0.80 (XB) to 0.99 (QC), 0.26 (ND) to 0.38 (HP), and 0.26 (DN) to 0.44 (HP) respectively (Table 1). Pn was generally high in all cattle breeds with more than 90% of loci displayed polymorphism, omitting XB, CR and ND. In terms of Ho, the average value of northern breeds (CS, CR, YH, MG and LW) was higher than southern breeds (WS, DS, DZ, LL, and ND), but lower than

ACCEPTED MANUSCRIPT

most of central breeds. The lower Ho value tends to reflect a lower level of genetic variability, like the lowest Ho value in southern breed ND (Ho=0.26) and LL (Ho=0.30). In addition, according to F-gene and F-ROH inbreeding coefficient estimates, southern breeds presented a higher inbreeding level than northern and

estimates (F-gene=0.051, F-ROH= 0.140).

CR IP T

central breeds, for instance southern breed DZ had the highest inbreeding coefficient

Runs of homozygosity (ROH) are contiguous lengths of homozygous genotypes

AN US

that are present in an individual due to parents transmitting identical haplotypes to their offspring (Purfield et al., 2012). Firstly, we divided the length of ROH into seven categories (1-5Mb, 5-10 Mb, 10-15 Mb, 15-20 Mb, 20-25 Mb, 25-30 Mb and >30Mb)

M

to reflect the distribution of ROH length within breeds (Fig. 2A). The average ROH

ED

length of each category was calculated for each breed by summing all ROH segments per individual in each ROH length category and dividing by the number of individual

PT

of respective breed. Among 20 cattle breeds, the average length of Short ROH length

CE

category (1โ€“5 Mb) ranged from 25.78 Mb (HP) to 315.5Mb (ND). It is worth noting

AC

that southern breeds ND, LL and DS presented longer average ROH length in 1โ€“5 Mb category in comparison with northern breeds (MG, CR and YH) and central breeds (QC, JN and HP). In long ROH length category (>30Mb), most of breeds had low average ROH length, while the southern breed DZ, ND and northern breed XB displayed the abundance of long ROH segments. Second, the sum of ROH length (in Mb) per individual genome has been calculated to observe the ROH content of each

ACCEPTED MANUSCRIPT

breed (Fig. 2B). We found that the southern breeds had more ROH content than northern and central breeds. And remarkably, among 724 individuals, three of the most homozygous individuals also came from the southern breeds DZ and ND, with the content of ROH of 1020 Mb, 1000Mb and 918.54Mb respectively, which is more

CR IP T

than a quarter of whole genome. Furthermore, individuals of ND generally presented abundant ROH content and had the highest average sum of ROH on their genome, followed by XB (373.39 Mb) and DZ (349.82 Mb). By contrast, individual of central

AN US

breed HP displayed the lowest average sum of ROH (34.79Mb), and the least individual only was 4.87Mb. Finally, we counted the amount of ROH for each breed. As expected, southern breeds had more ROH segments than northern and central

M

breeds. The number of ROH segments of ND, LL and DS was 139, 104 and 79, in

ED

contrast to QC, JN and HP of 34, 34 and 14. Additionally, the content of ROH was summarized across all chromosomes, and we found that the number of ROH per

PT

chromosome was the greatest for chromosome 5 (2276) with on average, but the least

CE

for chromosome 26 (601). On the other hand, the fraction of chromosome containing

AC

ROH was the greatest on chromosomes 13 and 19, with 9.61% and 9.48% of the chromosome consisting of a ROH, respectively.

AN US

CR IP T

ACCEPTED MANUSCRIPT

Fig. 2. Genetic diversity assessment for each breed. (A) The average sum of Run of Homozygosity (ROH) of each breed in different ROH length categories. (B) The sum of ROH length (Mb) per individual genome in each breed. (C) Genetic relatedness of average IBS value for pairwise

M

comparisons of each breed.

ED

Average identical by state (IBS) value of each breed was generated to show genetic relatedness between indigenous cattle breeds (Fig. 2C). We observed the mean

PT

IBS values of breed pairwise ranged from 0.04 (ND vs LW) to 0.47 (ND vs LL). The

CE

mean IBS between northern and southern breeds generally lower than 0.15, and the

AC

values of southern cattle breed ND and LL versus most of northern breeds (LW, XB, MG and YH) did not exceed 0.10. Whereas cattle breeds from same geographic dispersal region (north, central and south) were related to each other (IBS> 0.33), like ND vs LL (0.47) and DZ vs WS (0.37). Within breeds, individuals of southern breeds displayed more relatedness than central breeds. The highest and lowest average IBS value were observed within ND (0.52) and QC (0.33). Taken together, northern breeds

ACCEPTED MANUSCRIPT

and southern breeds formed the distinct clusters based on the IBS value analysis. This was in accordance with PCA pattern that a clear demarcation of northern and southern breeds clusters. 3.3. Linkage disequilibrium assessment

CR IP T

Linkage disequilibrium analysis of a panel of SNPs revealed a non-uniform distribution of LD in Chinese indigenous cattle. As shown in Fig. 3, the level of LD decreased with physical distance of inter-marker increasing and gradually reached

AN US

steady state when the physical distance extending to 1 Mb across all breeds. At the inter-markers distance of 1.5Mb, the LD level ranged from 0.06 (QC) to 0.19 (XB). From 0 to 1Mb, XB, CR and LW presented the higher LD level and slower decay rate

M

of LD, however, QC, JN, YL and CS had the lower LD level and dropped quickly

ED

along with the distance increasing. XB displayed the smoothest LD decay curve, with r2 value decreasing from 0.52 to 0.2, followed by CR (46 to 0.13) and LW (0.45 to

PT

0.12), which indicated that a smaller effective population size in XB. However, QC

CE

displayed a quick decay with r2 value decreasing from 0.34 to 0.07, which may be

AC

attributed to a long-term natural selection acted on them.

AN US

CR IP T

ACCEPTED MANUSCRIPT

Fig. 3. Average linkage disequilibrium (r2) as a function of average genomic distance for 20 Chinese indigenous breeds. The level of LD was estimated in SNP pairwise distance < 1500 kb.

M

3.4. Signatures of selection

iHS scores were computed over the whole genome for 20 indigenous cattle

ED

breeds to infer recent selection sweeps within population. Fig. 4 depicted the

PT

distribution of the iHS scores of some representative breeds to visualize candidate

CE

selective sweeps. Table 2 summarized the main candidate regions and genes with significant |iHS| value in each breed. For example, the evidence of selective sweeps in

AC

central breeds (HP, ZB, WL and YL) of Hubei province demonstrated that the similar selection events occurred in two significant candidate regions. One of the two regions on chromosome 3 (32.87-33.37Mb) spanned the potassium channels gene family (KCNA2, KCNA3, KCNC4, KCNA10) and another region on chromosome 7 (44.90-45.30Mb) contained AZU1 and KISSIR gene. These results suggested that alleles in the two regions have undergone positive selection with higher LD level and

ACCEPTED MANUSCRIPT

M

AN US

CR IP T

haplotype homozygosity.

Fig. 4. Circos plot of whole genomic iHS value illuminated the candidate regions under positive

ED

selection. Six breeds were displayed, including the YH, CS, QC, WL, DS, and WS, from outer circle to

PT

inner circle. The red color indicated the potential selective sweep regions in each breed.

Table 2

CE

Main candidate regions and genes detected by iHS in each breed. Chromosome

Region (Mb)

6

38.57โ€“38.97

AC

Breed

iHS 5.49

YB XB

NCAPG, ABCG2, LAP3, LCORL, FAM184B, MED28

CS

CR

Gene Name

value

6

41.71-43.01

4.61

KCNIP4, SLIT2

2

127.3-127.6

5.10

TRIM63, SLC30A2, EXTL1, PDIK1L

22

56.42-56.82

4.12

H1FOO

22

58.30-58.90

4.11

WNT7A

2

49.45-50.45

4.83

IWS1, PROC

3

32.87-33.37

5.43

KCNA2, KCNA3, KCNC4, KCNA10

29

29.05-29.75

4.14

CDON, PATE2

29

50.22-50.62

3.72

TNNT2, TNNT3, LSP1

ACCEPTED MANUSCRIPT

5

117.2-118.0

3.70

CDPF1, CERK, GRAMD4, GTSE1, PKDREJ, PPARA

WL

YL

ZB

HP

3.43

TBX2

6

38.00-38.40

5.16

IBSP, MEPE

3

33.46-33.86

4.35

STRIP1, ALX3

23

2.47-2.87

4.77

PRIM2, U6

5

55.90-56.20

4.71

ARHGEF25, B4GALNT1, DTX3,

3

32.87-33.37

5.42

KCNA2, KCNA3, KCNC4, KCNA10

7

45.05-45.50

4.14

AZU1, KISSIR

16

25.75-26.15

4.66

DUSP10

3

32.87-33.37

4.29

KCNA2, KCNA3, KCNC4, KCNA10

7

45.05-45.50

4.01

AZU1, KISSIR

11

10.77-11.27

3.90

ALMS1

3

32.87-33.37

4.64

KCNA2, KCNA3, KCNC4, KCNA10

7

44.90-45.30

4.17

AZU1, KISSIR

18

67.00-67.50

4.03

NLRP5, ZNF787

2

127.4-127.5

3.60

CATSPER4, CNKSR1, LLGL2

15

24.06-26.46

5.96

NCAM1, TTC12

3

32.87-33.37

5.14

KCNA2, KCNA3, KCNC4, KCNA10

6

38.57โ€“38.97

5.11

NCAPG, ABCG2, LAP3, LCORL, FAM184B,

CR IP T

MG

119.2-119.5

AN US

LW

19

MED28

22.15-22.45

23

39.13-39.23

19

56.42-56.72

5

LX

AC ZT

ND

-

4.65

DEK, TPMT, NHLRC1

5.03

ITGB4, LLGL2, RECQL5, SMIM5, SMIM6

4.29

DCTN2, DDIT3, MARS, ARHGAP9, NXPH4, R3HDM2, RDH16, SHMT2, STAC3, TAC3

4.92

-

15

24.06-26.46

4.23

NCAM1, TTC12

8

74.75-75.25

3.97

PPP2R2A, BNIP3L, DPYSL2

20

22.15-22.45

5.75

-

23

39.30-39.70

4.67

KIF13A, U6

21

51.54-5174

5.32

SLC24A4

1

132.3-132.6

4.79

-

3

32.87-33.37

4.78

KCNA2, KCNA3, KCNC4, KCNA10

7

44.90-45.30

4.17

AZU1, KISSIR

18

62.46-62.86

4.61

ISOC2, NAT14, RPL28, SHISA7, SSC5D

CE

DS

DZ

10.88-11.18

PT

9

56.20-56.90

5.75

M

JN

20

ED

QC

In addition, FST-based d๐‘– was used to detect signatures of selection among northern, central and southern cattle breeds (Akey et al., 2010). Results showed that 66, 87 and 90 genes within candidate regions were detected in northern, central and

ACCEPTED MANUSCRIPT

southern cattle breeds respectively. Five candidate regions with significant d๐‘– values in northern, central and southern cattle breeds were summarized in Table 3. Overall, we found a total of 223 candidate genes participated in 38 significant functional terms, including the biological processes (13 items), cellular component (5 items), molecular

CR IP T

function (6 items) and KEGG pathway (14 items). Significantly, the top three pathways were AMPK signaling pathway, insulin resistance and adrenergic signaling in cardiomyocytes with P-value lower than 0.01.

AN US

Table 3

Candidate regions and genes detected by ๐‘‘๐‘– in northern, central and southern cattle breeds. Chromosome

Region (Mb)

24

0.00-0.50

7

62.75-63.25

27

32.50-33.00

23

19.50-20.0

13 24

cattle breeds

4.91

PDGFRB, CAMK2A, TCOF1

4.46

FGFR1

4.38

RCAN2, PLA2G7

58.75-59.25

4.27

PCK1

0.00-0.50

5.94

ADNP2, RBFA

58.50-58.60

4.85

SYCP2, PHACTR3

60.25-60.75

4.42

TPM2, TLN1

51.50-52.00

4.34

SLC23A1, CXXC5

4

65.75-66.25

3. 80

GHRHR, AQP1, GARS

2

27.25.2-27.75

4.53

ABCB11, G6PC2

3

19.25-19.75

4.18

29

33.00-33.50

4.08

BARX2

16

67.00-67.50

4.00

MMEL1

5

26.00-26.50

3.89

13

Central

8

cattle breeds

CE

PT

7

Southern

AC

cattle breeds

1

Gene Name

ADNP2, RBFA

M

Northern

di value 5.43

ED

Breed1

TUFT1, CGN, SELENBP1, TMOD4, TNFAIP8L2, GABPB2

HOXC4, HOXC5, HOXC6, HOXC8, HOXC9, HOXC10

Northern cattle breeds contain CS, CR, YH and LW; Central cattle breeds contain QC, JN, HP, ZB, WL

and YL; Southern cattle breeds contain WS, DZ, ZT, ND and LL.

4. Discussion

Our study investigated the genetic diversity and signatures of positive selection

ACCEPTED MANUSCRIPT

for 724 individuals from 20 Chinese cattle breeds using genome-wide SNP data. The population genetic analyses showed that the genetic diversity of southern cattle breeds was lower than central and northern breeds. This was supported by the abundant ROH segments in southern cattle compared to central and northern cattle and a higher level

CR IP T

of genetic relatedness reflected by high IBS values within southern population.

Previous genetic diversity studies based on microsatellite and mtDNA have characterized the genetic diversity varied among Chinese cattle and confirmed

AN US

southern cattle displayed the low nucleotide and haplotype diversity (Cai et al., 2007; Lai et al., 2006). In the present study, southern breeds showed the lower average observed heterozygosis (Ho) values than central breeds. Gao et al. (2017) assessed the

M

Ho for Chinese indigenous cattle using 50K SNP assay and found that central and

ED

northern cattle have more genomic variations than southern cattle. Meanwhile, the higher sum of ROH length across the southern cattle genomes confirmed its lower

PT

level of genetic diversity. Among all indigenous cattle breeds, southern breed ND had

CE

the highest average sum of ROH length in short length category (1-5Mb), which

AC

indicated the presence of more ancient genetic relatedness compared with other breeds (McQuillan et al., 2008). Since a previous study had identified the high correlation of inbreeding coefficient between the pedigree based and ROH based (Kayser et al., 2010), southern breed DZ appeared to reflect more possibility of recent inbreeding within population due to its highest average sum of ROH length in long length category (>30Mb), and this is consistence with the highest F-gene estimated

ACCEPTED MANUSCRIPT

value of DZ (0.051). However, central cattle presented high Ho value and few ROH segments to give an evidence of the high level of genetic diversity. This is consistent with previous studies that B. taurus and B. indicus admixture events in central breeds are more likely to increase the genomic variations (Lei et al., 2000; Lei et al., 2006).

CR IP T

Moreover, central breed QC, JN, LX and NY have a long history of domestication and breeding with a better adaptability and performance in various environments, which in turn contribute to their higher genetic diversity. On the other hand, homozygosity of

AN US

south breeds may be overestimated because of SNP ascertainment bias, leading to more ROH segments present in southern breeds. In addition, genetic relatedness analysis revealed that there was a stronger correlation between southern cattle breeds,

M

like ND vs LL (IBS=0.47) and DZ vs WS (IBS=0.37), which was consistent with the

ED

results that southern cattle breeds held the high average ROH length in 1-5Mb category. This is probably attribute to the fact that south breeds shared the recent

PT

common ancestors (Gao et al., 2017). The relatedness between northern and southern

CE

cattle was the smallest across the genome, which was consistent with the research that

AC

the north-south gradient of taurine and indicine cattle ancestries existed in Chinese indigenous cattle (Lei et al., 2006). Previous studies based on population structure analysis suggested an origin of E. taurine lineage in northern cattle breeds and an origin of A. indicine lineage in southern cattle breeds, respectively (Gao et al., 2017; Zhang et al., 2007). According to the historical domestication information, E. taurine cattle spread into northern China between approximately 3,000 and 2,000 BC, and

ACCEPTED MANUSCRIPT

subsequently expanded to the central plains between 2,500 and 1,900 BC, nevertheless, A. indicine cattle appeared in the southern China after 1,500 BC (Flad et al., 2009; Cai etal., 2014; Yue et al., 2014; Chen et al., 2009). Therefore, the migration events of A. indicine and E. taurine historically shaped southern and northern indigenous breeds,

CR IP T

respectively. Additionally, the natural barriers, like Qinling Mountains, may hamper northern cattle breeds in the flow to the south direction (Cai et al., 2006). Moreover, the extent and decay pattern of LD was breed-specific in indigenous cattle breeds

AN US

because of unique selection history and population structure. Most of Chinese indigenous cattle exhibited a quick LD decay and low level of LD, especially XB with the slowest LD decay rate. This may be explained by the fact that XB as a cultivated

M

breed has subjected to a period of intensive breeding objective selection (Zhang et al.,

ED

2011). Nevertheless, compared with worldwide cattle breeds subjected to high-strength selection with commercial purposes, such as Holstein, Simmental and

PT

Angus, the LD of Chinese indigenous breeds were relatively lower (de Roos et al.,

CE

2008; McKay et al., 2007; Porto-Neto et al., 2014; Sargolzaei et al., 2008).

AC

The analytical concepts of previous studies scanning signatures of selection of

Chinese indigenous cattle focused on the allele frequency spectrum (Gao et al., 2017; Mei et al., 2017).. For a comprehensive investigation of potential selection signals, our study adopted complementary integrated Haplotype Score (iHS) and d๐‘– methods. It is noteworthy that there are only two genes, U4 and U6 involved in mRNA processing regulation, coincident in the two methods, and no other candidate genes

ACCEPTED MANUSCRIPT

overlapping. The lack of overlap between iHS and d๐‘– may be explained by the increased power of iHS to detect regions where alleles have intermediate frequency rather than have reached fixation (Voight et al., 2006). On the other hand, the two methods have different time-scale selections, in wihch iHS is suitable for detecting

CR IP T

signatures of recent selection but d๐‘– for early selection events (Akey et al., 2010). Overall, our study has identified genes in the putative candidate regions involved in many biology processes including growth and development, fertility and reproduction,

AN US

immune functions and environment adaption for Chinese indigenous cattle.

Based on iHS method, several candidate genes related to growth and development were identified as targets of positive selection. NCAPG, LAP3 and

M

LCORL on chromosome 6 involved in body weight and height were found in northern

ED

breed CS and central breed HP. NCAPG has the role in affecting bovine carcass weight (Setoguchi et al., 2009), fetal growth (Eberlein et al., 2009) as well as

PT

increased body frame size at puberty in cattle (Setoguchi et al., 2011), and had been

CE

considered as bovine carcass weight QTL (Eberlein et al., 2009; Setoguchi et al.,

AC

2009). Meanwhile, many genome-wide association studies have found a significant association of several markers located in the NCAPG-LCORL locus and LAP3 with the average daily gain in cattle (Lindholm-Perry et al., 2013; Lindholm-Perry et al., 2011) and other livestock (Al-Mamun et al., 2015; Liu et al., 2013; Tetens et al., 2013). However Mei et al. (2017) characterized the NCAPG in response to selection for milk production in dairy cattle. We also found a panel of candidate genes

ACCEPTED MANUSCRIPT

(BMPR1A, PKD2, IBSP and MEPE) associated with skeletal development were positively selected in northern breed LW and southern breed DS. The IBSP and MEPE genes have been primarily associated with bone and cartilage morphogenesis (Rowe et al., 2000) The MEPE gene including a cluster of bone-tooth mineral

CR IP T

extracellular matrix (ECM) phospho glycoproteins plays a role in bone-related traits in humans, mice and cattle (Bouleftour et al., 2015; Ormsby et al., 2014; Zelenchuk et al., 2015). The IBSP gene is a significant component of the bone extracellular matrix,

AN US

and have been proved to influence skeletal development in knockout mouse (Rivadeneira et al., 2009).

Other candidate genes focusing on animal fertility and reproduction

M

characteristics were identified, including ABCG2 in CS, LW, WL and HP, CATSPER

ED

in ZB, H1foo in CR and NLRP5 in ZB. Sperm ion channel proteins 4 (CATSPER4) was essential for sperm hyperactivated motility and male fertility (Qi et al., 2007), and

PT

influenced the function of ejaculated sperm in cattle (Johnson et al., 2017). H1foo was

CE

crucial for the process of bovine oocyte maturation (Yun et al., 2015) and participated

AC

in activation or repression of genes during oogenesis and embryo development before embryonic genome activation (McGraw et al., 2006). Chinese indigenous cattle have the great adaptability and performance in various

agro-ecological environments. Scanning their whole-genome, a set of important selection imprints of immune function were captured (SH3BGRL3 in ZB and ZT; PROC in YH; LRP1 in DS and LX; ELANE in YL, ZB and ZT; AZU1 in YL, ZB and

ACCEPTED MANUSCRIPT

WL). Interestingly, the innate immunity target gene, AZU1, was identified in three central breeds of Hubei province (YL, ZB and WL), which plays a role of increasing vascular permeability, bound endotoxin and chemotactic for monocytes (Mรผller et al., 2005). This suggested that the similar selection pressure may have acted on their

CR IP T

genomes. Notably, candidate genes (KCNA2, KCNA3, KCNC4, KCNA10) mediating potassium voltage-gated channel has been found in many indigenous cattle breeds. They have diverse functions including neurotransmitter release, regulating heart rate,

AN US

insulin secretion and neuronal excitability as well as expressed in T and B lymphocytes ( like KCNA3) and involved in autoimmune (DeCoursey et al., 1984; Gutman et al., 2005). Overall, these putative candidate genes may reflect the genomic

basing on population differentiation for northern, central and

ED

Selective sweeps

M

selected hot points of indigenous cattle involved in immune function.

southern cattle breeds were detected. We found candidate genes residing in

PT

significantly selected regions directly or indirectly influenced environments

CE

adaptation, morphology development and meat quality (Table 3). Similarly, Gao et al.

AC

(2017) also have detected the potential selective sweeps associated with morphology development and environment adaptation in Chinese indigenous cattle. The candidate region with the highest d๐‘– value in northern cattle breeds contained ADNP2 and RBFA genes, in which, RBFA was a cold shock protein whose absence will trigger the cold shock response (Dammel and Noller, 1995; Jones and Inouye, 1996). Therefore, the positive selection acting on this candidate region may be beneficial for the

ACCEPTED MANUSCRIPT

northern cattle adaptation of cold environment. In southern cattle breeds, the candidate regions included BARX2 gene, which is a member of the Bar class of homeobox genes and participated in hair follicle development in human and animal (Olson et al., 2005; Sander et al., 2000). This was consistent with the fact that

CR IP T

southern cattle have sparse and short hair to loss heat and adapt to hydrothermal environment. Moreover, Ai et al. (2015) conducted the scanning of selection signals for Chinese southern indigenous pig using whole-genome sequencing data, and found

AN US

BARX2 gene was subjected to positive selection similarly. In addition, we detected a cluster homeobox genes (HOXC4, HOXC5, HOXC6, HOXC8, HOXC9 and HOXC10) in southern cattle breeds. They are key regulators during morphogenesis in all

M

multicellular organisms, for instance, the spatial and temporal deployment of

ED

homeobox genes are responsible for spinal axis, hair follicles, tooth and mammary gland development (Duverger and Morasso, 2008; Stelnicki et al., 1998). In central

PT

cattle breeds, TPM2 and GARS genes associated with meat quality were localized in

CE

two candidate regions respectively. Previously, Choi et al. (2015) identified TPM2

AC

gene related to lipid accumulation and marbling in Hanwoo using whole-genome resequencing data. Muscle inosine monophosphate (IMP) is one of the most important flavour components in meat, and many studies suggested that GARS-AIRS-GART genes influenced the contents of IMP in chickens (Shu et al., 2009; Ye et al., 2010). Those two putative signatures of selection may be in accordance with the fact that central breeds LX and JN are well known with obvious marble and special meat

ACCEPTED MANUSCRIPT

flavor. 5. Conclusions

In this study, the comprehensive genome-wide genetic diversity and signatures of

CR IP T

selection of representative 20 Chinese indigenous cattle breeds have been investigated. The southern cattle breeds presented lower genetic diversity than central and northern cattle breeds, with abundant runs of homozygosity (ROH) and high identical by state

AN US

(IBS) value. We also detected many potential selective sweeps within or between populations via iHS and d๐‘– methods. The genes within intervals spanning the candidate regions are associated with growth and development, fertility and

M

reproduction, immune functions and environment adaption. In general, our study provides new insights into the level of genetic diversity of Chinese indigenous cattle,

ED

and suggests a role of natural/artificial selection in shaping their genome genetic

PT

variability.

CE

Acknowledgments

This work was funded in part by the National Natural Science Foundation of China

AC

(31402039), the National Beef Cattle Industrial Technology System (CARS-37), the Species and Breed Resources Conservation of China's Ministry of Agriculture (2017-2019), the Agricultural Science and Technology Innovation Program of Chinese Academy of Agricultural Sciences (ASTIPIAS03), the Cattle Breeding Innovative Research Team of Chinese Academy of Agricultural Sciences (cxgc-ias-03), and China

ACCEPTED MANUSCRIPT Scholarship Council (CSC). We are grateful to all scientists and staff of the National Beef Cattle Industrial Technology System in China for supporting the work. References

Ai, H., Fang, X., Yang, B., Huang, Z., Chen, H., Mao, L., Zhang, F., Zhang, L., Cui, L., He, W.,

whole-genome sequencing. Nat Genet. 47, 217-225.

CR IP T

2015. Adaptation and possible ancient interspecies introgression in pigs identified by

Akey, J.M., Ruhe, A.L., Akey, D.T., Wong, A.K., Connelly, C.F., Madeoy, J., Nicholas, T.J., Neff,

AN US

M.W., 2010. Tracking footprints of artificial selection in the dog genome. Proc Natl Acad Sci U S A. 107, 1160-1165.

Al-Mamun, H.A., Kwan, P., Clark, S.A., Ferdosi, M.H., Tellam, R., Gondro, C., 2015.

M

Genome-wide association study of body weight in Australian Merino sheep reveals an

ED

orthologous region on OAR6 to human and bovine genomic regions affecting height and weight. Genetics Selection Evolution. 47, doi: 10.1186/s12711-12015-10142-12714.

PT

Bouleftour, W., Bouet, G., Granito, R.N., Thomas, M., Linossier, M.T., Vanden-Bossche, A.,

CE

Aubin, J.E., Lafage-Proust, M.H., Vico, L., Malaval, L., 2015. Blocking the expression of

AC

both bone sialoprotein (BSP) and osteopontin (OPN) impairs the anabolic action of PTH in mouse calvaria bone. J Cell Physiol. 230, 568-577.

Bradley, D.G., MacHugh, D.E., Cunningham, P., Loftus, R.T., 1996. Mitochondrial diversity and the origins of African and European cattle. Proc Natl Acad Sci U S A. 93, 5131-5135. Browning, S.R., Browning, B.L., 2007. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering.

ACCEPTED MANUSCRIPT

Am J Hum Genet. 81, 1084-1097. Cai, X., Chen, H., Lei, C., Wang, S., Xue, K., Zhang, B., 2007. mtDNA diversity and genetic lineages of eighteen cattle breeds from Bos taurus and Bos indicus in China. Genetica. 131, 175-183.

CR IP T

Cai, D.W., Sun, Y., Tang, Z.W., Hu, S.M., Li, W.Y., Zhao, X.B., Xiang, H., Zhou, H., 2014. The origins of Chinese domestic cattle as revealed by ancient DNA analysis. J Archaeol Sci. 41, 423-434.

AN US

Cai, X., Chen, H., Wang, S., Xue, K., Lei, C., 2006. Polymorphisms of two Y chromosome microsatellites in Chinese cattle. Genet Sel Evol. 38, 525-534.

Chen, S.Y., Lin, B.Z., Baig, M., Mitra, B., Lopes, R.J., Santos, A.M., Magee, D.A., Azevedo, M.,

M

Tarroso, P., Sasazaki, S., 2010. Zebu Cattle Are an Exclusive Legacy of the South Asia

ED

Neolithic. Molecular biology and evolution. 27, 1-6. Choi, J.W., Choi, B.H., Lee, S.H., Lee, S.S., Kim, H.C., Yu, D., Chung, W.H., Lee, K.T., Chai,

PT

H.H., Cho, Y.M., 2015. Whole-Genome Resequencing Analysis of Hanwoo and Yanbian

CE

Cattle to Identify Genome-Wide SNPs and Signatures of Selection. Mol Cells. 38,

AC

466-473.

Daetwyler, H.D., Capitan, A., Pausch, H., Stothard, P., van Binsbergen, R., Brondum, R.F., Liao, X., Djari, A., Rodriguez, S.C., Grohs, C., 2014. Whole-genome sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle. Nat Genet. 46, 858-865. Dammel, C.S., Noller, H.F., 1995. Suppression of a cold-sensitive mutation in 16S rRNA by overexpression of a novel ribosome-binding factor, RbfA. Genes & Development. 9,

ACCEPTED MANUSCRIPT

626-637. de Roos, A.P., Hayes, B.J., Spelman, R.J., Goddard, M.E., 2008. Linkage disequilibrium and persistence of phase in Holstein-Friesian, Jersey and Angus cattle. Genetics. 179, 1503-1512.

CR IP T

DeCoursey, T.E., Chandy, K.G., Gupta, S., Cahalan, M.D., 1984. Voltage-gated K+ channels in human T lymphocytes: a role in mitogenesis? Nature. 307, 465-468.

Duverger, O., Morasso, M.I., 2008. Role of homeobox genes in the patterning, specification, and

AN US

differentiation of ectodermal appendages in mammals. J Cell Physiol. 216, 337-346.

Eberlein, A., Takasuga, A., Setoguchi, K., Pfuhl, R., Flisikowski, K., Fries, R., Klopp, N., Fรผrbass, R., Weikard, R., Kรผhn, C., 2009. Dissection of genetic factors modulating fetal growth in

M

cattle indicates a substantial role of the non-SMC condensin I complex, subunit G

ED

(NCAPG) gene. Genetics. 183, 951-964.

Flad, R., Yuan, J., Li, S., 2009. On the source and features of the Neolithic domestic animals in the

PT

Gansu and Qinghai Region, China. Archaeology. 5.

CE

Gao, Y., Gautier, M., Ding, X., Zhang, H., Wang, Y., Wang, X., Faruque, M.O., Li, J., Ye, S., Gou,

AC

X., 2017. Species composition and environmental adaptation of indigenous Chinese cattle. Sci Rep. 7, 16196.

Groenen, M.A., Archibald, A.L., Uenishi, H., Tuggle, C.K., Takeuchi, Y., Rothschild, M.F., Rogel-Gaillard, C., Park, C., Milan, D., Megens, H.J., 2012. Analyses of pig genomes provide insight into porcine demography and evolution. Nature. 491, 393-398. Gutman, G.A., Chandy, K.G., Grissmer, S., Lazdunski, M., Mckinnon, D., Pardo, L.A., Robertson,

ACCEPTED MANUSCRIPT

G.A., Rudy, B., Sanguinetti, M.C., St รผ hmer, W., 2005. International Union of Pharmacology. LIII. Nomenclature and molecular relationships of voltage-gated potassium channels. Pharmacological reviews. 57, 473-508. Huang, D.W., Sherman, B.T., Lempicki, R.A., 2009a. Bioinformatics enrichment tools: paths

CR IP T

toward the comprehensive functional analysis of large gene lists. Nucleic Acids Research. 37, 1-13.

Huang, D.W., Sherman, B.T., Lempicki, R.A., 2009b. Systematic and integrative analysis of large

AN US

gene lists using DAVID bioinformatics resources. Nature protocols. 4, 44-57.

Jared, D., Peter, B., 2003. Farmers and Their Languages: The First Expansions. Science. 5619, 597-603.

M

Johnson, G.P., English, A.-M., Cronin, S., Hoey, D.A., Meade, K.G., Fair, S., 2017. Genomic

ED

identification, expression profiling, and functional characterization of CatSper channels in the bovine. Biology of Reproduction. 97, 302-312.

PT

Jones, P.G., Inouye, M., 1996. RbfA, a 30S ribosomal binding factor, is a coldโ€shock protein

CE

whose absence triggers the cold โ€ shock response. Molecular microbiology. 21,

AC

1207-1218.

Kayser, M., Kirin, M., McQuillan, R., Franklin, C.S., Campbell, H., McKeigue, P.M., Wilson, J.F., 2010. Genomic Runs of Homozygosity Record Population History and Consanguinity. PLoS ONE. 5, doi: 10.1371/journal.pone.0013996. Kijas, J.W., Lenstra, J.A., Hayes, B., Boitard, S., Porto Neto, L.R., San Cristobal, M., Servin, B., McCulloch, R., Whan, V., Gietzen, K., 2012. Genome-wide analysis of the world's sheep

ACCEPTED MANUSCRIPT

breeds reveals high levels of historic mixture and strong recent selection. PLoS Biol. 10, doi: 10.1371/journal.pbio.1001258. Lai, S.J., Liu, Y.P., Liu, Y.X., Li, X.W., Yao, Y.G., 2006. Genetic diversity and origin of Chinese cattle revealed by mtDNA D-loop sequence variation. Mol Phylogenet Evol. 38, 146-154.

CR IP T

Lei, C.Z., Chen, H., Hu, S.R., 2000. Studies on Y chromosome polymorphism and the origin and classification of Chinese yellow cattle. Acta Agriculturae Boreali-occidentalis Sinica. 9, 43-47.

AN US

Lei, C.Z., Chen, H., Zhang, H.C., Cai, X., Liu, R.Y., Luo, L.Y., Wang, C.F., Zhang, W., Ge, Q.L., Zhang, R.F., 2006. Origin and phylogeographical structure of Chinese cattle. Anim Genet. 37, 579-582.

M

Lindholm-Perry, A.K., Kuehn, L.A., Oliver, W.T., Sexten, A.K., Miles, J.R., Rempel, L.A.,

ED

Cushman, R.A., Freetly, H.C., 2013. Adipose and muscle tissue gene expression of two genes (NCAPG and LCORL) located in a chromosomal region associated with cattle feed

PT

intake and gain. PloS one. 8, doi: 10.1371/journal.pone.0080882.

CE

Lindholm-Perry, A.K., Sexten, A.K., Kuehn, L.A., Smith, T.P., King, D.A., Shackelford, S.D.,

AC

Wheeler, T.L., Ferrell, C.L., Jenkins, T.G., Snelling, W.M., 2011. Association, effects and validation of polymorphisms within the NCAPG-LCORL locus located on BTA6 with feed intake, gain, meat and carcass traits in beef cattle. BMC genetics. 12, doi: 10.1186/1471-2156-1112-1103.

Liu, R., Sun, Y., Zhao, G., Wang, F., Wu, D., Zheng, M., Chen, J., Zhang, L., Hu, Y., Wen, J., 2013. Genome-wide association study identifies loci and candidate genes for body composition

ACCEPTED MANUSCRIPT

and

meat

quality

traits

in

Beijing-You

chickens.

PLoS

One.

8,

doi:

10.1371/journal.pone.0061172. Mรผller, C., Autenrieth, I., Peschel, A., 2005. Intestinal epithelial barrier and mucosal immunity. Cellular and molecular life sciences. 62, 1297-1307.

CR IP T

McGraw, S., Vigneault, C., Tremblay, K., Sirard, M.A., 2006. Characterization of linker histone H1FOO during bovine in vitro embryo development. Molecular reproduction and development. 73, 692-699.

AN US

McKay, S.D., Schnabel, R.D., Murdoch, B.M., Matukumalli, L.K., Aerts, J., Coppieters, W., Crews, D., Neto, E.D., Gill, C.A., Gao, C., 2007. Whole genome linkage disequilibrium maps in cattle. BMC genetics. 8, 74-85.

M

McQuillan, R., Leutenegger, A.L., Abdel-Rahman, R., Franklin, C.S., Pericic, M., Barac-Lauc, L.,

ED

Smolej-Narancic, N., Janicijevic, B., Polasek, O., Tenesa, A., 2008. Runs of homozygosity in European populations. Am J Hum Genet. 83, 359-372.

PT

Mei, C., Wang, H., Liao, Q., Wang, L., Cheng, G., Wang, H., Zhao, C., Zhao, S., Song, J., Guang,

CE

X., 2017. Genetic architecture and selection of Chinese cattle revealed by whole genome

AC

resequencing. Mol Biol Evol

Olson, L.E., Zhang, J., Taylor, H., Rose, D.W., Rosenfeld, M.G., 2005. Barx2 functions through distinct corepressor classes to regulate hair follicle remodeling. Proceedings of the National Academy of Sciences of the United States of America. 102, 3708-3713. Ormsby, R.T., Findlay, D.M., Kogawa, M., Anderson, P.H., Morris, H.A., Atkins, G.J., 2014. Analysis of vitamin D metabolism gene expression in human bone: evidence for autocrine

ACCEPTED MANUSCRIPT

control of bone remodelling. J Steroid Biochem Mol Biol. 144 Pt A, 110-113. Porto-Neto, L.R., Kijas, J.W., Reverter, A., 2014. The extent of linkage disequilibrium in beef cattle breeds using high-density SNP genotypes. Genetics Selection Evolution. 46, 22-26. Price, A.L., Patterson, N.J., Plenge, R.M., Weinblatt, M.E., Shadick, N.A., Reich, D., 2006.

CR IP T

Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 38, 904-909.

Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A.R., Bender, D., Maller, J., Sklar,

AN US

P., de Bakker, P.I.W., Daly, M.J., 2007. PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. The American Journal of Human Genetics. 81, 559-575.

M

Purfield, D.C., Berry, D.P., McParland, S., Bradley, D.G., 2012. Runs of homozygosity and

ED

population history in cattle. Bmc Genetics. 13, 70-80. Qanbari, S., Pausch, H., Jansen, S., Somel, M., Strom, T.M., Fries, R., Nielsen, R., Simianer, H.,

PT

2014. Classic selective sweeps revealed by massive sequencing in cattle. PLoS Genet. 10,

CE

doi: 10.1371/journal.pgen.1004148.

AC

Qi, H., Moran, M.M., Navarro, B., Chong, J.A., Krapivinsky, G., Krapivinsky, L., Kirichok, Y., Ramsey, I.S., Quill, T.A., Clapham, D.E., 2007. All four CatSper ion channel proteins are required for male fertility and sperm cell hyperactivated motility. Proceedings of the National Academy of Sciences. 104, 1219-1223. Rischkowsky, B., Pilling, D. (2007). The state of the world's animal genetic resources for food and agriculture: Food & Agriculture Org.

ACCEPTED MANUSCRIPT

Rivadeneira, F., Styrkรกrsdottir, U., Estrada, K., Halldรณrsson, B.V., Hsu, Y.-H., Richards, J.B., Zillikens, M.C., Kavvoura, F.K., Amin, N., Aulchenko, Y.S., 2009. Twenty bone-mineral-density loci identified by large-scale meta-analysis of genome-wide association studies. Nature Genetics. 41, 1199.

Windows and Linux. Mol Ecol Resour. 8, 103-106.

CR IP T

Rousset, F., 2008. genepop'007: a complete re-implementation of the genepop software for

Rowe, P.S., De Zoysa, P.A., Dong, R., Wang, H.R., White, K.E., Econs, M.J., Oudet, C.L., 2000.

AN US

MEPE, a new gene expressed in bone marrow and tumors causing osteomalacia. Genomics. 67, 54-68.

Rubin, C.J., Megens, H.J., Martinez Barrio, A., Maqbool, K., Sayyab, S., Schwochow, D., Wang,

M

C., Carlborg, O., Jern, P., Jorgensen, C.B., 2012. Strong signatures of selection in the

ED

domestic pig genome. Proc Natl Acad Sci U S A. 109, 19529-19536. Rubin, C.J., Zody, M.C., Eriksson, J., Meadows, J.R., Sherwood, E., Webster, M.T., Jiang, L.,

PT

Ingman, M., Sharpe, T., Ka, S., 2010. Whole-genome resequencing reveals loci under

CE

selection during chicken domestication. Nature. 464, 587-591.

AC

Sabeti, P.C., Reich, D.E., Higgins, J.M., Levine, H.Z., Richter, D.J., Schaffner, S.F., Gabriel, S.B., Platko, J.V., Patterson, N.J., McDonald, G.J., 2002. Detecting recent positive selection in the human genome from haplotype structure. Nature. 419, 832-837.

Sander, G., Simon Bawden, C., Hynd, P.I., Nesci, A., Rogers, G., Powell, B.C., 2000. Expression of the Homeobox Gene, Barx2, in Wool Follicle Development. Journal of Investigative Dermatology. 115, 753-756.

ACCEPTED MANUSCRIPT

Sargolzaei, M., Schenkel, F., Jansen, G., Schaeffer, L., 2008. Extent of linkage disequilibrium in Holstein cattle in North America. Journal of Dairy Science. 91, 2106-2117. Setoguchi, K., Furuta, M., Hirano, T., Nagao, T., Watanabe, T., Sugimoto, Y., Takasuga, A., 2009. Cross-breed comparisons identified a critical 591-kb region for bovine carcass weight

CR IP T

QTL (CW-2) on chromosome 6 and the Ile-442-Met substitution in NCAPG as a positional candidate. BMC Genet. 10, doi: 10.1186/1471-2156-1110-1143.

Setoguchi, K., Watanabe, T., Weikard, R., Albrecht, E., Kuhn, C., Kinoshita, A., Sugimoto, Y.,

AN US

Takasuga, A., 2011. The SNP c.1326T>G in the non-SMC condensin I complex, subunit G (NCAPG) gene encoding a p.Ile442Met variant is associated with an increase in body frame size at puberty in cattle. Anim Genet. 42, 650-655.

M

Shu, J.T., Bao, W.B., Zhang, X.Y., Ji, C.J., Han, W., Chen, K.W., 2009. Combined effect of

ED

mutations in ADSL and GARS-AIRS-GART genes on IMP content in chickens. British Poultry Science. 50, 680-686.

PT

Stelnicki, E.J., Komuves, L.G., Kwong, A.O., Holmes, D., Klein, P., Rozenfeld, S., Lawrence, H.J.,

CE

Adzick, N.S., Harrison, M., Largman, C., 1998. HOX homeobox genes exhibit spatial and

AC

temporal changes in expression during human skin development. J Invest Dermatol. 110, 110-115.

Szpiech, Z.A., Hernandez, R.D., 2014. selscan: an efficient multithreaded program to perform EHH-based scans for positive selection. Mol Biol Evol. 31, 2824-2837. Tetens, J., Widmann, P., Kรผhn, C., Thaller, G., 2013. A genomeโ€wide association study indicates LCORL/NCAPG as a candidate locus for withers height in German Warmblood horses.

ACCEPTED MANUSCRIPT

Anim Genet. 44, 467-471. Troy, C.S., MacHugh, D.E., Bailey, J.F., Magee, D.A., 2001. Genetic evidence for Near-Eastern origins of European cattle. Nature. 410, 1088-1091. Voight, B.F., Kudaravalli, S., Wen, X., Pritchard, J.K., 2006. A map of recent positive selection in

CR IP T

the human genome. PLoS Biol. 4, doi: 10.1371/journal.pbio.0040072.

Weir, B.S., Cockerham, C.C., 1984. Estimating F-Statistics for the Analysis of Population Structure. Evolution. 38, 1358-1370.

AN US

Ye, M.H., Chen, J.L., Zhao, G.P., Zheng, M.Q., Wen, J., 2010. Correlation between polymorphisms in ADSL and GARS-AIRS-GART genes with inosine 5 โ€ฒ -monophosphate (IMP) contents in Beijing-you chickens. British Poultry Science. 51,

M

609-613.

ED

Yue, X.P., Li, R., Liu, L., Zhang, Y.S., Huang, J.P., Chang, Z.H., Dang, R.H., Lan, X.Y., Chen, H., Lei, C.Z., 2014 When and how did Bos indicus introgress into Mongolian cattle? Gene.

PT

537, 214-219.

CE

Yu, Y., Nie, L., He, Z.Q., Wen, J.K., Jian, C.S., Zhang, Y.P., 1999. Mitochondrial DNA variation in

AC

cattle of South China: origin and introgression. Anim Genet. 30, 245-250.

Yun, Y., An, P., Ning, J., Zhao, G.-M., Yang, W.-L., Lei, A.-M., 2015. H1foo is essential for in vitro meiotic maturation of bovine oocytes. Zygote. 23, 416-425.

Zelenchuk, L.V., Hedge, A.M., Rowe, P.S., 2015. Age dependent regulation of bone-mass and renal function by the MEPE ASARM-motif. Bone. 79, 131-142. Zhang, Y., 2011., in Animal Genetic Resources in China: Bovines, 1rd ed.; China Agricultural

ACCEPTED MANUSCRIPT

Press: Beijing, China, 2011; pp. 1-23; ISBN. Zhang, G.X., Wang, Z.G., Chen, W.S., Wu, C.X., Han, X., Chang, H., Zan, L.S., Li, R.L., Wang, J.H., Song, W.T., 2007. Genetic diversity and population structure of indigenous yellow cattle breeds of China using 30 microsatellite markers. Anim Genet. 38, 550-559.

CR IP T

Zhang, H., Paijmans, J.L., Chang, F., Wu, X., Chen, G., Lei, C., Yang, X., Wei, Z., Bradley, D.G., Orlando, L., 2013. Morphological and genetic evidence for early Holocene cattle management in northeastern China. Nat Commun. 4, doi: 10.1038/ncomms3755.

AN US

Zhou, G.L., Jin, H.G., Zhu, Q., Guo, S.L., Wu, Y.H., 2005. Genetic diversity analysis of five cattle

AC

CE

PT

ED

M

breeds native to China using microsatellites. Journal of Genetics. 84, 77-80.