Accepted Manuscript
Genome-wide scanning reveals genetic diversity and signatures of selection in Chinese indigenous cattle breeds L. Xu , W.G. Zhang , H.X. Shen , Y. Zhang , Y.M. Zhao , Y.T. Jia , X. Gao , B. Zhu , L.Y. Xu , L.P. Zhang , H.J. Gao , J.Y. Li , Y. Chen PII: DOI: Reference:
S1871-1413(18)30232-4 https://doi.org/10.1016/j.livsci.2018.08.005 LIVSCI 3511
To appear in:
Livestock Science
Received date: Revised date: Accepted date:
2 May 2018 6 August 2018 6 August 2018
Please cite this article as: L. Xu , W.G. Zhang , H.X. Shen , Y. Zhang , Y.M. Zhao , Y.T. Jia , X. Gao , B. Zhu , L.Y. Xu , L.P. Zhang , H.J. Gao , J.Y. Li , Y. Chen , Genome-wide scanning reveals genetic diversity and signatures of selection in Chinese indigenous cattle breeds, Livestock Science (2018), doi: https://doi.org/10.1016/j.livsci.2018.08.005
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Highlights Collecting 724 individuals from 20 Chinese indigenous cattle breeds. Southern cattle have the abundant ROH segments and high IBS values. Southern cattle have a low genetic diversity than central and northern cattle. Detecting the selective sweeps related to growth and environments adaptation.
AC
CE
PT
ED
M
AN US
CR IP T
๏ฌ ๏ฌ ๏ฌ ๏ฌ
ACCEPTED MANUSCRIPT
Genome-wide scanning reveals genetic diversity and signatures of selection in Chinese indigenous cattle breeds L.P. Zhang1, H.J. Gao1, J.Y. Li1*, Y. Chen1*
CR IP T
L. Xu1, W.G. Zhang1, H.X. Shen2, Y. Zhang3, Y.M. Zhao4, Y.T. Jia5, X. Gao1, B. Zhu1, L.Y. Xu1,
1 Cattle Genetics and Breeding Team, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China;
2 Animal Husbandry and Veterinary Bureau of Yiling, Yichang 44300, China; 3 Xinjiang Academy of Animal Science, Urumqi 830011, China; 4 Jilin Academy of Animal Science, Changchun 130124, China;
AN US
5 Institute of Animal Husbandry and Veterinary Medicine, AnhuiAcademyof Agricultural Sciences, Hefei 230031, China;
* Correspondence:
[email protected] (Y.C.);
[email protected] (Y.L.); Tel.: +86-010-6281-6065
M
Email address of other authors: Ling Xu
ED
Wengang Zhang Hongxue Shen Yang Zhang
Yutang Jia Xue Gao
CE
Bo Zhu
PT
Yumin Zhao
[email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]
Lupei Zhang
[email protected]
Huijiang Gao
[email protected]
AC
Lingyang Xu
ACCEPTED MANUSCRIPT
Abstract
Chinese indigenous cattle exhibit abundant genetic resources and extensive gene pool, with 53 indigenous breeds generally classified into northern cattle breeds, central
CR IP T
cattle breeds, and southern cattle breeds. To determine the population genetic diversity and signatures of selection of Chinese indigenous cattle, we collected 724 cattle from 20 geographically representative Chinese indigenous cattle breeds and
AN US
genotyped all samples using GeneSeek Genomic Profiler Bovine LD (GGP-LD, n = 30,125). Runs of homozygosity (ROH) and identical by state (IBS) analyses were performed to investigate genetic diversity in Chinese indigenous cattle. Meanwhile, the integrated Haplotype Score (iHS) and FST-based d๐ methods were used to reveal
M
candidate selective sweeps. Our results showed that southern cattle breeds have
ED
abundant ROH segments and higher IBS values in comparison with northern and central cattle breeds. We also detected many potential selective sweeps in Chinese
PT
indigenous cattle. The genes within intervals spanning the candidate regions are associated with growth and development (NCAPG, LAP3, LCORL, IBSP and MEPE),
CE
fertility and reproduction (ABCG2, CATSPER4 and H1foo), immune functions
AC
(AZU1, PROC and LRP1) and environment adaption (RBFA, BARX2). Overall, these findings provide new insights into the level of genetic diversity of Chinese indigenous cattle, and suggest a role of natural/artificial selection in shaping their genome genetic variability. Keywords Chinese indigenous cattle; Genetic diversity; Signatures of selection
ACCEPTED MANUSCRIPT
1. Introduction
Cattle represent one of excellent models of domestication among four major old world livestock species (cattle, goat, sheep and pig), and play an economically important role in agriculture providing meat, milk, hides for leather, and draught force
CR IP T
for pulling carts, and ploughing. According to available genetic and archaeological evidences, cattle were domesticated around 8,000โ10,000 years ago in the Near East and South Asia, and scattered over nearly all inhabited continents rapidly along with
AN US
human immigration (Bradley et al., 1996; Jared and Peter, 2003; Troy et al., 2001; Zhang et al., 2013). Since then, combined effects of natural and artificial selection have operated on cattle and resulted in marked changes on behavioral, morphological
M
and physiological characteristics.
ED
In China, cattle exhibit abundant genetic resources and extensive gene pool (Yu et al., 1999). According to geographic dispersal of Chinese cattle breeds in Animal
PT
Genetic Resources in China: Bovines, 53 indigenous breeds (Rischkowsky and Pilling,
CE
2007) are generally classified into three types: northern, central, and southern breeds
AC
(Zhang et al., 2011). Chinese indigenous cattle are known to be well adapted to diverse environmental conditions and have resistance or tolerance to disease. Currently, under production-oriented breeding, they are mainly developed for meat, milk, and fertility. Thus, long-term natural selection and different intensity artificial selection acting on Chinese indigenous cattle have left detectable signatures of selection on their genome, and the functionally important candidate regions and genes
ACCEPTED MANUSCRIPT
that contribute to phenotypic diversity can be localized. Recently, accompanied by the available genomic data generated by powerful high-throughput genotyping and sequencing technology, a number of selective sweep studies using different statistical methods have efficiently characterized the putative
CR IP T
candidate regions and genes in cattle (Daetwyler et al., 2014; Qanbari et al., 2014), pig (Groenen et al., 2012; Rubin et al., 2012), sheep (Kijas et al., 2012) and chicken (Rubin et al., 2010). In Chinese indigenous cattle, Gao et al. (2017) scanned
AN US
whole-genome selective sweep for 437 Chinese indigenous cattle using the population differentiation statistic (XtX) and detected the candidate genes involved in environment adaptation (TNFRSF19, RFX2). Meanwhile, Mei et al. (2017) adopted
M
the allele frequency spectrum based TajimaโD and d๐ method to localize the
ED
signatures of selection for Six Chinese indigenous breeds with resequencing data. They detected genes within the intervals spanning the candidate regions associated
PT
with coat color (RECC2, MC1R), dairy traits (NCAPG, PAG1) and meat production
CE
(BBS2, R3HDM1). Indeed, these studies have found the potential signatures of
AC
selection in Chinese indigenous cattle. However, the number of samples and the breeds involved are limited, and most importantly, the features of the selection that were recently obtained from the properties of haplotype segregating within population were not taken into account. Here, to better target candidate regions/genes subjected to positive selection and investigate genetic diversity in Chinese indigenous cattle, we collected 724 individuals from 20 representative breeds, including six northern cattle
ACCEPTED MANUSCRIPT
breeds, eight central cattle breeds and six southern cattle breeds, and obtained SNP genotyping data from GeneSeek Genomic Profiler Bovine LD (GGP-LD) assays (n = 30,125). Previous studies of genetic diversity of Chinese indigenous cattle were mainly based on microsatellite (Zhang et al., 2007; Zhou et al., 2005) and mtDNA
CR IP T
(Cai et al., 2007; Lai et al., 2006). In this study, we used whole-genome genotyping data for the runs of homozygosity (ROH) and identical by state (IBS) analyses to elucidate genetic diversity of indigenous cattle. The whole genome-scan for possible
AN US
signatures of selection were performed through the complementary Extended Haplotype Homozygosity (EHH)-based integrated Haplotype Score (iHS) (Sabeti et al., 2002; Voight et al., 2006) and FST-based d๐ (Akey et al., 2010; Weir and
M
Cockerham, 1984) methods. A set of candidate regions and genes related to growth
ED
and development, fertility and reproduction, immune functions and environment adaption were identified. The results illustrate the level of genetic diversity in Chinese
PT
indigenous cattle, and target the potential selected hot points on the bovine genome.
CE
2. Materials and Methods
AC
2.1. Population, SNP Genotyping and Quality control According to description of geographic dispersal of Chinese cattle breeds in
Animal Genetic Resources in China: Bovines, a total of 724 individuals of 20 representative indigenous breeds have been collected from 14 provinces or municipalities, including six northern breeds, eight central breeds and six south breeds (Fig. 1A). The six northern cattle breeds contained Chinese Simmental (CS), Chinese
ACCEPTED MANUSCRIPT
Caoyuan red (CR), Yanhuang (YH), Menggu (MG), Liaoyu white (LW) and Xinjiang Brown (XB). The eight central cattle breeds included Qinchuan (QC), Nanyang (NY), Jinnan (JN), Luxi (LX), Huangpi (HP), Zaobei (ZB), Wuling (WL) and Yiling (YL). The six southern cattle breeds comprised Dabieshan (DS), Wenshan (DZ), Dianzhong
CR IP T
(DZ), Zhaotong (ZT), Nandan (ND) and Longlin (NL). The name abbreviation, sample size and geographic distribution of per breed was summarized in Table 1.
For each individual, ~5 mL of venous blood was collected from the jugular vein
AN US
and then stored at -20ยฐC. DNA was extracted using a TIANamp Blood DNA Kit (Tiangen Biotech Company limited, Beijing, China), and qualified DNA samples were genotyped using GeneSeek Genomic Profiler Bovine LD Chip (GGP Bovine LDv4).
M
A basic genetic information of 30,330 SNPs was scanned using iScan platform and
ED
analyzed using GenomeStudio software, with the average SNP spacing within the genome is approximately 89 kb.
PT
Quality control of SNP genotype data was assessed using PLINK v1.07 software
CE
(Purcell et al., 2007) (http://pngu.mgh.harvard.edu/purcell/plink/). We pruned out
AC
individuals and loci that failed any of the following 5 criterions: (1) markers with >0.90 call rate; (2) minor allele frequency (MAF) of SNP > 0.01; (3) a p-value of Hardy-Weinberg Equilibrium (HWE) test higher than 10-6; (4) SNP only on autosomal and (5) individual with >0.95 call rate. Finally, 724 individuals and 23,748 SNPs were left for downstream analysis. 2.2.Principal components analysis and linkage disequilibrium analysis
ACCEPTED MANUSCRIPT
Principal component analysis (PCA) was performed using EigenStrat (Price et al., 2006). Before analysis, to ensure the high LD level do not distort PCA result, SNPs pruning process was adopted with a window size 50 SNPs, a step of 5 SNPs and r2 threshold of 0.25, resulting in 9,825 independent SNPs identified. We also quantified
CR IP T
the degree of linkage disequilibrium (LD) for all pairwise SNP in 1.5Mb window with the squared correlation coefficients (r2) using PLINK v1.07 software. 2.3. Genetic diversity analysis
AN US
Four statistics, observed heterozygosis (Ho), expected heterozygosis (He), inbreeding coefficient (F) and polymorphic SNP (Pn) were calculated. Identical by state (IBS) and runs of homozygosity (ROH) were estimated for each breed to
M
observe genetic relatedness and genomic homozygosity level. The definition of ROH
ED
segments need the following several requirements. Firstly, the length of ROH should be more than 1000 kb, since very short and common ROH occur often due to LD.
PT
Second, the number of homozygous SNPs of ROH should exceed 20, and no more
CE
than two SNPs with possible heterozygous genotype present in each ROH. Last, the
AC
distance of two consecutive homozygous SNPs within a ROH need be less than 1000kb, if not, the ROH could be split in two. All of those work was completed through PLINK v1.07 software. 2.4. selective sweep, Gene annotation and functional analysis Evidence of segregating positive selection was investigated through a complementary method of Integrated Haplotype Score (iHS) and FST-based d๐ . Firstly,
ACCEPTED MANUSCRIPT
scanning of within-population signatures of selection was conducted for each breed using iHS method which bases on the extent of local long haplotypes carrying the ancestral/derived state of allele and in favor of variants that have not yet reached fixation (Voight et al., 2006). For this analysis, the haplotype was phased using
CR IP T
Beagle (Browning and Browning, 2007), and iHS score was calculated for each SNP within breed using Selscan software (Szpiech and Hernandez, 2014). The formula for the standardized iHS was as follows: )
* (
)+
AN US
(
*
(
)+
(1)
where iHHA and iHHD represent the integrated Extended Haplotype Homozygosity
M
(EHH) score for ancestral and derived core alleles respectively. The top 0.5% of |iHS| score was used to infer genomic candidate regions under recent positive selection.
ED
Secondly, d๐ of unbiased estimation based on FST was applied to investigation
PT
population differentiation among northern, central and southern cattle breeds. This
CE
method was robust whether selection acts on newly arisen or pre-existing variations (Akey et al., 2010). Given the small population size may have less representative
AC
genetic information, breeds with the number of individual below 20 were not included. Consequently, three cattle breed groups were used for this analysis, inluding northern cattle breeds (CS, CR, YH and LW), central cattle breeds (QC, JN, HP, ZB, WL and YL) and southern cattle breeds (WS, DZ, ZT, ND and LL). Briefly, for each SNP in a comparison, we calculated the expected value of FST with the Genepop software
ACCEPTED MANUSCRIPT
(Rousset, 2008). Then d๐ statistic was calculated as described by (Akey et al., 2010) : ๐๐
d๐
where
๐๐
โ
๐โ ๐
๐น๐๐
๐๐
[๐น๐๐ ]
(2)
๐๐
๐ ๐[๐น๐๐ ]
๐๐
[๐น๐๐ ] and ๐ ๐[๐น๐๐ ] denoted the expected value and SD of FST between
CR IP T
group i and j calculated from all SNPs. For each group, d๐ value was averaged over the SNPs in overlapping window size of 500 kb sliding 250 kb. The top 1% of windows with significant d๐ value then was defined as candidate selective sweep regions.
AN US
Finally, candidate regions under selection were retrieved from Ensembl genome browser (http://www.ensembl.org/) using the Bos_taurus_UMD_3.1.1 reference genome assembly to annotate candidate genes. Gene ontology enrichment and
M
functional annotation of candidate genes were defined based on the DAVID database
ED
(Huang et al., 2009a; Huang et al., 2009b) to identify the significantly relevant
CE
3. Results
PT
pathways, biological processes, cellular component and molecular function.
3.1. population structure pattern
AC
After quality control, 724 individuals of 20 Chinese indigenous cattle breeds
were remained (Table 1), and a total of 23,749 SNPs were used in the final analyses. The population structure pattern then was inferred by principal components analysis (PCA) in scatter plot of Fig. 1B. The PC1 accounting for 2.98% of total variation separated all individuals into three distinctive clusters, which was consistent with the
ACCEPTED MANUSCRIPT
description of breed geographic dispersal. The central cattle breeds were localized between southern and northern breeds, and individuals of central breed NY, LX, YL and WL were clustered with southern cattle breeds LL and ZT. The PC2 accounting for 1.44% of total variation positioned the southern breed DS apart from other
CR IP T
southern breeds (WS, DZ, ZT, LL and ND) and formed an independent branch. Likewise, the northern breeds MG and LW were separated from other northern breeds by PC2. Table 1
AN US
Description of observed and expected heterozygosities, proportion of polymorphic SNPs and inbreeding coefficient in 20 Chinese indigenous cattle breeds. Breed
Region1
Breed
Number of
abbr.
individuals
CS
106
North
CR
26
North
Yanhuang
YH
33
Menggu
MG
15
Liaoyu
LW
20
Chinese
Ho
He
Pn
F-gene
F-ROH
Dual-purpose
0.34
0.34
0.97
0
0.060
Dual-purpose
0.34
0.33
0.89
-0.028
0.093
North
Dual-purpose
0.34
0.30
0.91
-0.015
0.046
North
Dual-purpose
0.34
0.33
0.91
-0.026
0.055
North
Dual-purpose
0.35
0.34
0.90
-0.030
0.059
North
Dual-purpose
0.31
0.29
0.80
-0.071
0.149
M
Simmental Chinese Caoyuan
Xinjiang
XB
18
CE
Brown
PT
white
ED
red
Combined data set2
Type
QC
50
Central
meat
0.37
0.38
0.99
0.020
0.044
Nanyang
NY
13
Central
meat
0.36
0.34
0.95
-0.044
0.057
Jinnan
JN
76
Central
meat
0.37
0.37
0.99
-0.005
0.040
Luxi
LX
15
Central
meat
0.36
0.35
0.96
-0.020
0.055
Huangpi
HP
25
Central
meat
0.44
0.38
0.98
-0.014
0.014
Zaobei
ZB
35
Central
meat
0.35
0.36
0.99
0.026
0.070
Wuling
WL
30
Central
meat
0.35
0.35
0.98
0.004
0.055
Yiling
YL
39
Central
meat
0.33
0.33
0.98
0.005
0.077
Dabieshan
DS
47
South
meat
0.33
0.33
0.98
0.011
0.086
Wenshan
WS
50
South
meat
0.32
0.33
0.97
0.029
0.098
Dianzhong
DZ
49
South
meat
0.31
0.33
0.94
0.051
0.140
Zhaotong
ZT
41
South
meat
0.36
0.36
0.98
0.015
0.057
AC
Qinchuan
ACCEPTED MANUSCRIPT
Nandan
ND
19
South
meat
0.26
0.26
0.82
0.020
0.184
Longlin
LL
17
South
meat
0.30
0.30
0.90
-0.003
0.115
1
Northern refers to northern China; Central refers to central China; and Southern refers to southern
China. 2
proportion of polymorphic SNPs (Pn), observed heterozygosities (Ho), expected heterozygosities (He)
AN US
CR IP T
and inbreeding coefficient (F-gene, F-ROH) were calculated using SNPs after quality control.
M
Fig. 1. The geographical distribution illustration and Principle component analysis (PCA) for 20 indigenous cattle breeds. (A) The geographical distribution of selected 20 indigenous Chinese cattle
ED
breeds including six northern cattle breeds, eight central cattle breeds and six southern cattle breeds. (B) PCA analysis for 724 individuals of 20 breeds studied, in which PC1 explained 2.98% of total
PT
variation and PC2 explained 1.44% of total variation.
CE
3.2. Genetic diversity assessment
AC
The results of polymorphic SNP (Pn), expected heterozygosis (He), and
Observed heterozygosis (Ho) were ranged from 0.80 (XB) to 0.99 (QC), 0.26 (ND) to 0.38 (HP), and 0.26 (DN) to 0.44 (HP) respectively (Table 1). Pn was generally high in all cattle breeds with more than 90% of loci displayed polymorphism, omitting XB, CR and ND. In terms of Ho, the average value of northern breeds (CS, CR, YH, MG and LW) was higher than southern breeds (WS, DS, DZ, LL, and ND), but lower than
ACCEPTED MANUSCRIPT
most of central breeds. The lower Ho value tends to reflect a lower level of genetic variability, like the lowest Ho value in southern breed ND (Ho=0.26) and LL (Ho=0.30). In addition, according to F-gene and F-ROH inbreeding coefficient estimates, southern breeds presented a higher inbreeding level than northern and
estimates (F-gene=0.051, F-ROH= 0.140).
CR IP T
central breeds, for instance southern breed DZ had the highest inbreeding coefficient
Runs of homozygosity (ROH) are contiguous lengths of homozygous genotypes
AN US
that are present in an individual due to parents transmitting identical haplotypes to their offspring (Purfield et al., 2012). Firstly, we divided the length of ROH into seven categories (1-5Mb, 5-10 Mb, 10-15 Mb, 15-20 Mb, 20-25 Mb, 25-30 Mb and >30Mb)
M
to reflect the distribution of ROH length within breeds (Fig. 2A). The average ROH
ED
length of each category was calculated for each breed by summing all ROH segments per individual in each ROH length category and dividing by the number of individual
PT
of respective breed. Among 20 cattle breeds, the average length of Short ROH length
CE
category (1โ5 Mb) ranged from 25.78 Mb (HP) to 315.5Mb (ND). It is worth noting
AC
that southern breeds ND, LL and DS presented longer average ROH length in 1โ5 Mb category in comparison with northern breeds (MG, CR and YH) and central breeds (QC, JN and HP). In long ROH length category (>30Mb), most of breeds had low average ROH length, while the southern breed DZ, ND and northern breed XB displayed the abundance of long ROH segments. Second, the sum of ROH length (in Mb) per individual genome has been calculated to observe the ROH content of each
ACCEPTED MANUSCRIPT
breed (Fig. 2B). We found that the southern breeds had more ROH content than northern and central breeds. And remarkably, among 724 individuals, three of the most homozygous individuals also came from the southern breeds DZ and ND, with the content of ROH of 1020 Mb, 1000Mb and 918.54Mb respectively, which is more
CR IP T
than a quarter of whole genome. Furthermore, individuals of ND generally presented abundant ROH content and had the highest average sum of ROH on their genome, followed by XB (373.39 Mb) and DZ (349.82 Mb). By contrast, individual of central
AN US
breed HP displayed the lowest average sum of ROH (34.79Mb), and the least individual only was 4.87Mb. Finally, we counted the amount of ROH for each breed. As expected, southern breeds had more ROH segments than northern and central
M
breeds. The number of ROH segments of ND, LL and DS was 139, 104 and 79, in
ED
contrast to QC, JN and HP of 34, 34 and 14. Additionally, the content of ROH was summarized across all chromosomes, and we found that the number of ROH per
PT
chromosome was the greatest for chromosome 5 (2276) with on average, but the least
CE
for chromosome 26 (601). On the other hand, the fraction of chromosome containing
AC
ROH was the greatest on chromosomes 13 and 19, with 9.61% and 9.48% of the chromosome consisting of a ROH, respectively.
AN US
CR IP T
ACCEPTED MANUSCRIPT
Fig. 2. Genetic diversity assessment for each breed. (A) The average sum of Run of Homozygosity (ROH) of each breed in different ROH length categories. (B) The sum of ROH length (Mb) per individual genome in each breed. (C) Genetic relatedness of average IBS value for pairwise
M
comparisons of each breed.
ED
Average identical by state (IBS) value of each breed was generated to show genetic relatedness between indigenous cattle breeds (Fig. 2C). We observed the mean
PT
IBS values of breed pairwise ranged from 0.04 (ND vs LW) to 0.47 (ND vs LL). The
CE
mean IBS between northern and southern breeds generally lower than 0.15, and the
AC
values of southern cattle breed ND and LL versus most of northern breeds (LW, XB, MG and YH) did not exceed 0.10. Whereas cattle breeds from same geographic dispersal region (north, central and south) were related to each other (IBS> 0.33), like ND vs LL (0.47) and DZ vs WS (0.37). Within breeds, individuals of southern breeds displayed more relatedness than central breeds. The highest and lowest average IBS value were observed within ND (0.52) and QC (0.33). Taken together, northern breeds
ACCEPTED MANUSCRIPT
and southern breeds formed the distinct clusters based on the IBS value analysis. This was in accordance with PCA pattern that a clear demarcation of northern and southern breeds clusters. 3.3. Linkage disequilibrium assessment
CR IP T
Linkage disequilibrium analysis of a panel of SNPs revealed a non-uniform distribution of LD in Chinese indigenous cattle. As shown in Fig. 3, the level of LD decreased with physical distance of inter-marker increasing and gradually reached
AN US
steady state when the physical distance extending to 1 Mb across all breeds. At the inter-markers distance of 1.5Mb, the LD level ranged from 0.06 (QC) to 0.19 (XB). From 0 to 1Mb, XB, CR and LW presented the higher LD level and slower decay rate
M
of LD, however, QC, JN, YL and CS had the lower LD level and dropped quickly
ED
along with the distance increasing. XB displayed the smoothest LD decay curve, with r2 value decreasing from 0.52 to 0.2, followed by CR (46 to 0.13) and LW (0.45 to
PT
0.12), which indicated that a smaller effective population size in XB. However, QC
CE
displayed a quick decay with r2 value decreasing from 0.34 to 0.07, which may be
AC
attributed to a long-term natural selection acted on them.
AN US
CR IP T
ACCEPTED MANUSCRIPT
Fig. 3. Average linkage disequilibrium (r2) as a function of average genomic distance for 20 Chinese indigenous breeds. The level of LD was estimated in SNP pairwise distance < 1500 kb.
M
3.4. Signatures of selection
iHS scores were computed over the whole genome for 20 indigenous cattle
ED
breeds to infer recent selection sweeps within population. Fig. 4 depicted the
PT
distribution of the iHS scores of some representative breeds to visualize candidate
CE
selective sweeps. Table 2 summarized the main candidate regions and genes with significant |iHS| value in each breed. For example, the evidence of selective sweeps in
AC
central breeds (HP, ZB, WL and YL) of Hubei province demonstrated that the similar selection events occurred in two significant candidate regions. One of the two regions on chromosome 3 (32.87-33.37Mb) spanned the potassium channels gene family (KCNA2, KCNA3, KCNC4, KCNA10) and another region on chromosome 7 (44.90-45.30Mb) contained AZU1 and KISSIR gene. These results suggested that alleles in the two regions have undergone positive selection with higher LD level and
ACCEPTED MANUSCRIPT
M
AN US
CR IP T
haplotype homozygosity.
Fig. 4. Circos plot of whole genomic iHS value illuminated the candidate regions under positive
ED
selection. Six breeds were displayed, including the YH, CS, QC, WL, DS, and WS, from outer circle to
PT
inner circle. The red color indicated the potential selective sweep regions in each breed.
Table 2
CE
Main candidate regions and genes detected by iHS in each breed. Chromosome
Region (Mb)
6
38.57โ38.97
AC
Breed
iHS 5.49
YB XB
NCAPG, ABCG2, LAP3, LCORL, FAM184B, MED28
CS
CR
Gene Name
value
6
41.71-43.01
4.61
KCNIP4, SLIT2
2
127.3-127.6
5.10
TRIM63, SLC30A2, EXTL1, PDIK1L
22
56.42-56.82
4.12
H1FOO
22
58.30-58.90
4.11
WNT7A
2
49.45-50.45
4.83
IWS1, PROC
3
32.87-33.37
5.43
KCNA2, KCNA3, KCNC4, KCNA10
29
29.05-29.75
4.14
CDON, PATE2
29
50.22-50.62
3.72
TNNT2, TNNT3, LSP1
ACCEPTED MANUSCRIPT
5
117.2-118.0
3.70
CDPF1, CERK, GRAMD4, GTSE1, PKDREJ, PPARA
WL
YL
ZB
HP
3.43
TBX2
6
38.00-38.40
5.16
IBSP, MEPE
3
33.46-33.86
4.35
STRIP1, ALX3
23
2.47-2.87
4.77
PRIM2, U6
5
55.90-56.20
4.71
ARHGEF25, B4GALNT1, DTX3,
3
32.87-33.37
5.42
KCNA2, KCNA3, KCNC4, KCNA10
7
45.05-45.50
4.14
AZU1, KISSIR
16
25.75-26.15
4.66
DUSP10
3
32.87-33.37
4.29
KCNA2, KCNA3, KCNC4, KCNA10
7
45.05-45.50
4.01
AZU1, KISSIR
11
10.77-11.27
3.90
ALMS1
3
32.87-33.37
4.64
KCNA2, KCNA3, KCNC4, KCNA10
7
44.90-45.30
4.17
AZU1, KISSIR
18
67.00-67.50
4.03
NLRP5, ZNF787
2
127.4-127.5
3.60
CATSPER4, CNKSR1, LLGL2
15
24.06-26.46
5.96
NCAM1, TTC12
3
32.87-33.37
5.14
KCNA2, KCNA3, KCNC4, KCNA10
6
38.57โ38.97
5.11
NCAPG, ABCG2, LAP3, LCORL, FAM184B,
CR IP T
MG
119.2-119.5
AN US
LW
19
MED28
22.15-22.45
23
39.13-39.23
19
56.42-56.72
5
LX
AC ZT
ND
-
4.65
DEK, TPMT, NHLRC1
5.03
ITGB4, LLGL2, RECQL5, SMIM5, SMIM6
4.29
DCTN2, DDIT3, MARS, ARHGAP9, NXPH4, R3HDM2, RDH16, SHMT2, STAC3, TAC3
4.92
-
15
24.06-26.46
4.23
NCAM1, TTC12
8
74.75-75.25
3.97
PPP2R2A, BNIP3L, DPYSL2
20
22.15-22.45
5.75
-
23
39.30-39.70
4.67
KIF13A, U6
21
51.54-5174
5.32
SLC24A4
1
132.3-132.6
4.79
-
3
32.87-33.37
4.78
KCNA2, KCNA3, KCNC4, KCNA10
7
44.90-45.30
4.17
AZU1, KISSIR
18
62.46-62.86
4.61
ISOC2, NAT14, RPL28, SHISA7, SSC5D
CE
DS
DZ
10.88-11.18
PT
9
56.20-56.90
5.75
M
JN
20
ED
QC
In addition, FST-based d๐ was used to detect signatures of selection among northern, central and southern cattle breeds (Akey et al., 2010). Results showed that 66, 87 and 90 genes within candidate regions were detected in northern, central and
ACCEPTED MANUSCRIPT
southern cattle breeds respectively. Five candidate regions with significant d๐ values in northern, central and southern cattle breeds were summarized in Table 3. Overall, we found a total of 223 candidate genes participated in 38 significant functional terms, including the biological processes (13 items), cellular component (5 items), molecular
CR IP T
function (6 items) and KEGG pathway (14 items). Significantly, the top three pathways were AMPK signaling pathway, insulin resistance and adrenergic signaling in cardiomyocytes with P-value lower than 0.01.
AN US
Table 3
Candidate regions and genes detected by ๐๐ in northern, central and southern cattle breeds. Chromosome
Region (Mb)
24
0.00-0.50
7
62.75-63.25
27
32.50-33.00
23
19.50-20.0
13 24
cattle breeds
4.91
PDGFRB, CAMK2A, TCOF1
4.46
FGFR1
4.38
RCAN2, PLA2G7
58.75-59.25
4.27
PCK1
0.00-0.50
5.94
ADNP2, RBFA
58.50-58.60
4.85
SYCP2, PHACTR3
60.25-60.75
4.42
TPM2, TLN1
51.50-52.00
4.34
SLC23A1, CXXC5
4
65.75-66.25
3. 80
GHRHR, AQP1, GARS
2
27.25.2-27.75
4.53
ABCB11, G6PC2
3
19.25-19.75
4.18
29
33.00-33.50
4.08
BARX2
16
67.00-67.50
4.00
MMEL1
5
26.00-26.50
3.89
13
Central
8
cattle breeds
CE
PT
7
Southern
AC
cattle breeds
1
Gene Name
ADNP2, RBFA
M
Northern
di value 5.43
ED
Breed1
TUFT1, CGN, SELENBP1, TMOD4, TNFAIP8L2, GABPB2
HOXC4, HOXC5, HOXC6, HOXC8, HOXC9, HOXC10
Northern cattle breeds contain CS, CR, YH and LW; Central cattle breeds contain QC, JN, HP, ZB, WL
and YL; Southern cattle breeds contain WS, DZ, ZT, ND and LL.
4. Discussion
Our study investigated the genetic diversity and signatures of positive selection
ACCEPTED MANUSCRIPT
for 724 individuals from 20 Chinese cattle breeds using genome-wide SNP data. The population genetic analyses showed that the genetic diversity of southern cattle breeds was lower than central and northern breeds. This was supported by the abundant ROH segments in southern cattle compared to central and northern cattle and a higher level
CR IP T
of genetic relatedness reflected by high IBS values within southern population.
Previous genetic diversity studies based on microsatellite and mtDNA have characterized the genetic diversity varied among Chinese cattle and confirmed
AN US
southern cattle displayed the low nucleotide and haplotype diversity (Cai et al., 2007; Lai et al., 2006). In the present study, southern breeds showed the lower average observed heterozygosis (Ho) values than central breeds. Gao et al. (2017) assessed the
M
Ho for Chinese indigenous cattle using 50K SNP assay and found that central and
ED
northern cattle have more genomic variations than southern cattle. Meanwhile, the higher sum of ROH length across the southern cattle genomes confirmed its lower
PT
level of genetic diversity. Among all indigenous cattle breeds, southern breed ND had
CE
the highest average sum of ROH length in short length category (1-5Mb), which
AC
indicated the presence of more ancient genetic relatedness compared with other breeds (McQuillan et al., 2008). Since a previous study had identified the high correlation of inbreeding coefficient between the pedigree based and ROH based (Kayser et al., 2010), southern breed DZ appeared to reflect more possibility of recent inbreeding within population due to its highest average sum of ROH length in long length category (>30Mb), and this is consistence with the highest F-gene estimated
ACCEPTED MANUSCRIPT
value of DZ (0.051). However, central cattle presented high Ho value and few ROH segments to give an evidence of the high level of genetic diversity. This is consistent with previous studies that B. taurus and B. indicus admixture events in central breeds are more likely to increase the genomic variations (Lei et al., 2000; Lei et al., 2006).
CR IP T
Moreover, central breed QC, JN, LX and NY have a long history of domestication and breeding with a better adaptability and performance in various environments, which in turn contribute to their higher genetic diversity. On the other hand, homozygosity of
AN US
south breeds may be overestimated because of SNP ascertainment bias, leading to more ROH segments present in southern breeds. In addition, genetic relatedness analysis revealed that there was a stronger correlation between southern cattle breeds,
M
like ND vs LL (IBS=0.47) and DZ vs WS (IBS=0.37), which was consistent with the
ED
results that southern cattle breeds held the high average ROH length in 1-5Mb category. This is probably attribute to the fact that south breeds shared the recent
PT
common ancestors (Gao et al., 2017). The relatedness between northern and southern
CE
cattle was the smallest across the genome, which was consistent with the research that
AC
the north-south gradient of taurine and indicine cattle ancestries existed in Chinese indigenous cattle (Lei et al., 2006). Previous studies based on population structure analysis suggested an origin of E. taurine lineage in northern cattle breeds and an origin of A. indicine lineage in southern cattle breeds, respectively (Gao et al., 2017; Zhang et al., 2007). According to the historical domestication information, E. taurine cattle spread into northern China between approximately 3,000 and 2,000 BC, and
ACCEPTED MANUSCRIPT
subsequently expanded to the central plains between 2,500 and 1,900 BC, nevertheless, A. indicine cattle appeared in the southern China after 1,500 BC (Flad et al., 2009; Cai etal., 2014; Yue et al., 2014; Chen et al., 2009). Therefore, the migration events of A. indicine and E. taurine historically shaped southern and northern indigenous breeds,
CR IP T
respectively. Additionally, the natural barriers, like Qinling Mountains, may hamper northern cattle breeds in the flow to the south direction (Cai et al., 2006). Moreover, the extent and decay pattern of LD was breed-specific in indigenous cattle breeds
AN US
because of unique selection history and population structure. Most of Chinese indigenous cattle exhibited a quick LD decay and low level of LD, especially XB with the slowest LD decay rate. This may be explained by the fact that XB as a cultivated
M
breed has subjected to a period of intensive breeding objective selection (Zhang et al.,
ED
2011). Nevertheless, compared with worldwide cattle breeds subjected to high-strength selection with commercial purposes, such as Holstein, Simmental and
PT
Angus, the LD of Chinese indigenous breeds were relatively lower (de Roos et al.,
CE
2008; McKay et al., 2007; Porto-Neto et al., 2014; Sargolzaei et al., 2008).
AC
The analytical concepts of previous studies scanning signatures of selection of
Chinese indigenous cattle focused on the allele frequency spectrum (Gao et al., 2017; Mei et al., 2017).. For a comprehensive investigation of potential selection signals, our study adopted complementary integrated Haplotype Score (iHS) and d๐ methods. It is noteworthy that there are only two genes, U4 and U6 involved in mRNA processing regulation, coincident in the two methods, and no other candidate genes
ACCEPTED MANUSCRIPT
overlapping. The lack of overlap between iHS and d๐ may be explained by the increased power of iHS to detect regions where alleles have intermediate frequency rather than have reached fixation (Voight et al., 2006). On the other hand, the two methods have different time-scale selections, in wihch iHS is suitable for detecting
CR IP T
signatures of recent selection but d๐ for early selection events (Akey et al., 2010). Overall, our study has identified genes in the putative candidate regions involved in many biology processes including growth and development, fertility and reproduction,
AN US
immune functions and environment adaption for Chinese indigenous cattle.
Based on iHS method, several candidate genes related to growth and development were identified as targets of positive selection. NCAPG, LAP3 and
M
LCORL on chromosome 6 involved in body weight and height were found in northern
ED
breed CS and central breed HP. NCAPG has the role in affecting bovine carcass weight (Setoguchi et al., 2009), fetal growth (Eberlein et al., 2009) as well as
PT
increased body frame size at puberty in cattle (Setoguchi et al., 2011), and had been
CE
considered as bovine carcass weight QTL (Eberlein et al., 2009; Setoguchi et al.,
AC
2009). Meanwhile, many genome-wide association studies have found a significant association of several markers located in the NCAPG-LCORL locus and LAP3 with the average daily gain in cattle (Lindholm-Perry et al., 2013; Lindholm-Perry et al., 2011) and other livestock (Al-Mamun et al., 2015; Liu et al., 2013; Tetens et al., 2013). However Mei et al. (2017) characterized the NCAPG in response to selection for milk production in dairy cattle. We also found a panel of candidate genes
ACCEPTED MANUSCRIPT
(BMPR1A, PKD2, IBSP and MEPE) associated with skeletal development were positively selected in northern breed LW and southern breed DS. The IBSP and MEPE genes have been primarily associated with bone and cartilage morphogenesis (Rowe et al., 2000) The MEPE gene including a cluster of bone-tooth mineral
CR IP T
extracellular matrix (ECM) phospho glycoproteins plays a role in bone-related traits in humans, mice and cattle (Bouleftour et al., 2015; Ormsby et al., 2014; Zelenchuk et al., 2015). The IBSP gene is a significant component of the bone extracellular matrix,
AN US
and have been proved to influence skeletal development in knockout mouse (Rivadeneira et al., 2009).
Other candidate genes focusing on animal fertility and reproduction
M
characteristics were identified, including ABCG2 in CS, LW, WL and HP, CATSPER
ED
in ZB, H1foo in CR and NLRP5 in ZB. Sperm ion channel proteins 4 (CATSPER4) was essential for sperm hyperactivated motility and male fertility (Qi et al., 2007), and
PT
influenced the function of ejaculated sperm in cattle (Johnson et al., 2017). H1foo was
CE
crucial for the process of bovine oocyte maturation (Yun et al., 2015) and participated
AC
in activation or repression of genes during oogenesis and embryo development before embryonic genome activation (McGraw et al., 2006). Chinese indigenous cattle have the great adaptability and performance in various
agro-ecological environments. Scanning their whole-genome, a set of important selection imprints of immune function were captured (SH3BGRL3 in ZB and ZT; PROC in YH; LRP1 in DS and LX; ELANE in YL, ZB and ZT; AZU1 in YL, ZB and
ACCEPTED MANUSCRIPT
WL). Interestingly, the innate immunity target gene, AZU1, was identified in three central breeds of Hubei province (YL, ZB and WL), which plays a role of increasing vascular permeability, bound endotoxin and chemotactic for monocytes (Mรผller et al., 2005). This suggested that the similar selection pressure may have acted on their
CR IP T
genomes. Notably, candidate genes (KCNA2, KCNA3, KCNC4, KCNA10) mediating potassium voltage-gated channel has been found in many indigenous cattle breeds. They have diverse functions including neurotransmitter release, regulating heart rate,
AN US
insulin secretion and neuronal excitability as well as expressed in T and B lymphocytes ( like KCNA3) and involved in autoimmune (DeCoursey et al., 1984; Gutman et al., 2005). Overall, these putative candidate genes may reflect the genomic
basing on population differentiation for northern, central and
ED
Selective sweeps
M
selected hot points of indigenous cattle involved in immune function.
southern cattle breeds were detected. We found candidate genes residing in
PT
significantly selected regions directly or indirectly influenced environments
CE
adaptation, morphology development and meat quality (Table 3). Similarly, Gao et al.
AC
(2017) also have detected the potential selective sweeps associated with morphology development and environment adaptation in Chinese indigenous cattle. The candidate region with the highest d๐ value in northern cattle breeds contained ADNP2 and RBFA genes, in which, RBFA was a cold shock protein whose absence will trigger the cold shock response (Dammel and Noller, 1995; Jones and Inouye, 1996). Therefore, the positive selection acting on this candidate region may be beneficial for the
ACCEPTED MANUSCRIPT
northern cattle adaptation of cold environment. In southern cattle breeds, the candidate regions included BARX2 gene, which is a member of the Bar class of homeobox genes and participated in hair follicle development in human and animal (Olson et al., 2005; Sander et al., 2000). This was consistent with the fact that
CR IP T
southern cattle have sparse and short hair to loss heat and adapt to hydrothermal environment. Moreover, Ai et al. (2015) conducted the scanning of selection signals for Chinese southern indigenous pig using whole-genome sequencing data, and found
AN US
BARX2 gene was subjected to positive selection similarly. In addition, we detected a cluster homeobox genes (HOXC4, HOXC5, HOXC6, HOXC8, HOXC9 and HOXC10) in southern cattle breeds. They are key regulators during morphogenesis in all
M
multicellular organisms, for instance, the spatial and temporal deployment of
ED
homeobox genes are responsible for spinal axis, hair follicles, tooth and mammary gland development (Duverger and Morasso, 2008; Stelnicki et al., 1998). In central
PT
cattle breeds, TPM2 and GARS genes associated with meat quality were localized in
CE
two candidate regions respectively. Previously, Choi et al. (2015) identified TPM2
AC
gene related to lipid accumulation and marbling in Hanwoo using whole-genome resequencing data. Muscle inosine monophosphate (IMP) is one of the most important flavour components in meat, and many studies suggested that GARS-AIRS-GART genes influenced the contents of IMP in chickens (Shu et al., 2009; Ye et al., 2010). Those two putative signatures of selection may be in accordance with the fact that central breeds LX and JN are well known with obvious marble and special meat
ACCEPTED MANUSCRIPT
flavor. 5. Conclusions
In this study, the comprehensive genome-wide genetic diversity and signatures of
CR IP T
selection of representative 20 Chinese indigenous cattle breeds have been investigated. The southern cattle breeds presented lower genetic diversity than central and northern cattle breeds, with abundant runs of homozygosity (ROH) and high identical by state
AN US
(IBS) value. We also detected many potential selective sweeps within or between populations via iHS and d๐ methods. The genes within intervals spanning the candidate regions are associated with growth and development, fertility and
M
reproduction, immune functions and environment adaption. In general, our study provides new insights into the level of genetic diversity of Chinese indigenous cattle,
ED
and suggests a role of natural/artificial selection in shaping their genome genetic
PT
variability.
CE
Acknowledgments
This work was funded in part by the National Natural Science Foundation of China
AC
(31402039), the National Beef Cattle Industrial Technology System (CARS-37), the Species and Breed Resources Conservation of China's Ministry of Agriculture (2017-2019), the Agricultural Science and Technology Innovation Program of Chinese Academy of Agricultural Sciences (ASTIPIAS03), the Cattle Breeding Innovative Research Team of Chinese Academy of Agricultural Sciences (cxgc-ias-03), and China
ACCEPTED MANUSCRIPT Scholarship Council (CSC). We are grateful to all scientists and staff of the National Beef Cattle Industrial Technology System in China for supporting the work. References
Ai, H., Fang, X., Yang, B., Huang, Z., Chen, H., Mao, L., Zhang, F., Zhang, L., Cui, L., He, W.,
whole-genome sequencing. Nat Genet. 47, 217-225.
CR IP T
2015. Adaptation and possible ancient interspecies introgression in pigs identified by
Akey, J.M., Ruhe, A.L., Akey, D.T., Wong, A.K., Connelly, C.F., Madeoy, J., Nicholas, T.J., Neff,
AN US
M.W., 2010. Tracking footprints of artificial selection in the dog genome. Proc Natl Acad Sci U S A. 107, 1160-1165.
Al-Mamun, H.A., Kwan, P., Clark, S.A., Ferdosi, M.H., Tellam, R., Gondro, C., 2015.
M
Genome-wide association study of body weight in Australian Merino sheep reveals an
ED
orthologous region on OAR6 to human and bovine genomic regions affecting height and weight. Genetics Selection Evolution. 47, doi: 10.1186/s12711-12015-10142-12714.
PT
Bouleftour, W., Bouet, G., Granito, R.N., Thomas, M., Linossier, M.T., Vanden-Bossche, A.,
CE
Aubin, J.E., Lafage-Proust, M.H., Vico, L., Malaval, L., 2015. Blocking the expression of
AC
both bone sialoprotein (BSP) and osteopontin (OPN) impairs the anabolic action of PTH in mouse calvaria bone. J Cell Physiol. 230, 568-577.
Bradley, D.G., MacHugh, D.E., Cunningham, P., Loftus, R.T., 1996. Mitochondrial diversity and the origins of African and European cattle. Proc Natl Acad Sci U S A. 93, 5131-5135. Browning, S.R., Browning, B.L., 2007. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering.
ACCEPTED MANUSCRIPT
Am J Hum Genet. 81, 1084-1097. Cai, X., Chen, H., Lei, C., Wang, S., Xue, K., Zhang, B., 2007. mtDNA diversity and genetic lineages of eighteen cattle breeds from Bos taurus and Bos indicus in China. Genetica. 131, 175-183.
CR IP T
Cai, D.W., Sun, Y., Tang, Z.W., Hu, S.M., Li, W.Y., Zhao, X.B., Xiang, H., Zhou, H., 2014. The origins of Chinese domestic cattle as revealed by ancient DNA analysis. J Archaeol Sci. 41, 423-434.
AN US
Cai, X., Chen, H., Wang, S., Xue, K., Lei, C., 2006. Polymorphisms of two Y chromosome microsatellites in Chinese cattle. Genet Sel Evol. 38, 525-534.
Chen, S.Y., Lin, B.Z., Baig, M., Mitra, B., Lopes, R.J., Santos, A.M., Magee, D.A., Azevedo, M.,
M
Tarroso, P., Sasazaki, S., 2010. Zebu Cattle Are an Exclusive Legacy of the South Asia
ED
Neolithic. Molecular biology and evolution. 27, 1-6. Choi, J.W., Choi, B.H., Lee, S.H., Lee, S.S., Kim, H.C., Yu, D., Chung, W.H., Lee, K.T., Chai,
PT
H.H., Cho, Y.M., 2015. Whole-Genome Resequencing Analysis of Hanwoo and Yanbian
CE
Cattle to Identify Genome-Wide SNPs and Signatures of Selection. Mol Cells. 38,
AC
466-473.
Daetwyler, H.D., Capitan, A., Pausch, H., Stothard, P., van Binsbergen, R., Brondum, R.F., Liao, X., Djari, A., Rodriguez, S.C., Grohs, C., 2014. Whole-genome sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle. Nat Genet. 46, 858-865. Dammel, C.S., Noller, H.F., 1995. Suppression of a cold-sensitive mutation in 16S rRNA by overexpression of a novel ribosome-binding factor, RbfA. Genes & Development. 9,
ACCEPTED MANUSCRIPT
626-637. de Roos, A.P., Hayes, B.J., Spelman, R.J., Goddard, M.E., 2008. Linkage disequilibrium and persistence of phase in Holstein-Friesian, Jersey and Angus cattle. Genetics. 179, 1503-1512.
CR IP T
DeCoursey, T.E., Chandy, K.G., Gupta, S., Cahalan, M.D., 1984. Voltage-gated K+ channels in human T lymphocytes: a role in mitogenesis? Nature. 307, 465-468.
Duverger, O., Morasso, M.I., 2008. Role of homeobox genes in the patterning, specification, and
AN US
differentiation of ectodermal appendages in mammals. J Cell Physiol. 216, 337-346.
Eberlein, A., Takasuga, A., Setoguchi, K., Pfuhl, R., Flisikowski, K., Fries, R., Klopp, N., Fรผrbass, R., Weikard, R., Kรผhn, C., 2009. Dissection of genetic factors modulating fetal growth in
M
cattle indicates a substantial role of the non-SMC condensin I complex, subunit G
ED
(NCAPG) gene. Genetics. 183, 951-964.
Flad, R., Yuan, J., Li, S., 2009. On the source and features of the Neolithic domestic animals in the
PT
Gansu and Qinghai Region, China. Archaeology. 5.
CE
Gao, Y., Gautier, M., Ding, X., Zhang, H., Wang, Y., Wang, X., Faruque, M.O., Li, J., Ye, S., Gou,
AC
X., 2017. Species composition and environmental adaptation of indigenous Chinese cattle. Sci Rep. 7, 16196.
Groenen, M.A., Archibald, A.L., Uenishi, H., Tuggle, C.K., Takeuchi, Y., Rothschild, M.F., Rogel-Gaillard, C., Park, C., Milan, D., Megens, H.J., 2012. Analyses of pig genomes provide insight into porcine demography and evolution. Nature. 491, 393-398. Gutman, G.A., Chandy, K.G., Grissmer, S., Lazdunski, M., Mckinnon, D., Pardo, L.A., Robertson,
ACCEPTED MANUSCRIPT
G.A., Rudy, B., Sanguinetti, M.C., St รผ hmer, W., 2005. International Union of Pharmacology. LIII. Nomenclature and molecular relationships of voltage-gated potassium channels. Pharmacological reviews. 57, 473-508. Huang, D.W., Sherman, B.T., Lempicki, R.A., 2009a. Bioinformatics enrichment tools: paths
CR IP T
toward the comprehensive functional analysis of large gene lists. Nucleic Acids Research. 37, 1-13.
Huang, D.W., Sherman, B.T., Lempicki, R.A., 2009b. Systematic and integrative analysis of large
AN US
gene lists using DAVID bioinformatics resources. Nature protocols. 4, 44-57.
Jared, D., Peter, B., 2003. Farmers and Their Languages: The First Expansions. Science. 5619, 597-603.
M
Johnson, G.P., English, A.-M., Cronin, S., Hoey, D.A., Meade, K.G., Fair, S., 2017. Genomic
ED
identification, expression profiling, and functional characterization of CatSper channels in the bovine. Biology of Reproduction. 97, 302-312.
PT
Jones, P.G., Inouye, M., 1996. RbfA, a 30S ribosomal binding factor, is a coldโshock protein
CE
whose absence triggers the cold โ shock response. Molecular microbiology. 21,
AC
1207-1218.
Kayser, M., Kirin, M., McQuillan, R., Franklin, C.S., Campbell, H., McKeigue, P.M., Wilson, J.F., 2010. Genomic Runs of Homozygosity Record Population History and Consanguinity. PLoS ONE. 5, doi: 10.1371/journal.pone.0013996. Kijas, J.W., Lenstra, J.A., Hayes, B., Boitard, S., Porto Neto, L.R., San Cristobal, M., Servin, B., McCulloch, R., Whan, V., Gietzen, K., 2012. Genome-wide analysis of the world's sheep
ACCEPTED MANUSCRIPT
breeds reveals high levels of historic mixture and strong recent selection. PLoS Biol. 10, doi: 10.1371/journal.pbio.1001258. Lai, S.J., Liu, Y.P., Liu, Y.X., Li, X.W., Yao, Y.G., 2006. Genetic diversity and origin of Chinese cattle revealed by mtDNA D-loop sequence variation. Mol Phylogenet Evol. 38, 146-154.
CR IP T
Lei, C.Z., Chen, H., Hu, S.R., 2000. Studies on Y chromosome polymorphism and the origin and classification of Chinese yellow cattle. Acta Agriculturae Boreali-occidentalis Sinica. 9, 43-47.
AN US
Lei, C.Z., Chen, H., Zhang, H.C., Cai, X., Liu, R.Y., Luo, L.Y., Wang, C.F., Zhang, W., Ge, Q.L., Zhang, R.F., 2006. Origin and phylogeographical structure of Chinese cattle. Anim Genet. 37, 579-582.
M
Lindholm-Perry, A.K., Kuehn, L.A., Oliver, W.T., Sexten, A.K., Miles, J.R., Rempel, L.A.,
ED
Cushman, R.A., Freetly, H.C., 2013. Adipose and muscle tissue gene expression of two genes (NCAPG and LCORL) located in a chromosomal region associated with cattle feed
PT
intake and gain. PloS one. 8, doi: 10.1371/journal.pone.0080882.
CE
Lindholm-Perry, A.K., Sexten, A.K., Kuehn, L.A., Smith, T.P., King, D.A., Shackelford, S.D.,
AC
Wheeler, T.L., Ferrell, C.L., Jenkins, T.G., Snelling, W.M., 2011. Association, effects and validation of polymorphisms within the NCAPG-LCORL locus located on BTA6 with feed intake, gain, meat and carcass traits in beef cattle. BMC genetics. 12, doi: 10.1186/1471-2156-1112-1103.
Liu, R., Sun, Y., Zhao, G., Wang, F., Wu, D., Zheng, M., Chen, J., Zhang, L., Hu, Y., Wen, J., 2013. Genome-wide association study identifies loci and candidate genes for body composition
ACCEPTED MANUSCRIPT
and
meat
quality
traits
in
Beijing-You
chickens.
PLoS
One.
8,
doi:
10.1371/journal.pone.0061172. Mรผller, C., Autenrieth, I., Peschel, A., 2005. Intestinal epithelial barrier and mucosal immunity. Cellular and molecular life sciences. 62, 1297-1307.
CR IP T
McGraw, S., Vigneault, C., Tremblay, K., Sirard, M.A., 2006. Characterization of linker histone H1FOO during bovine in vitro embryo development. Molecular reproduction and development. 73, 692-699.
AN US
McKay, S.D., Schnabel, R.D., Murdoch, B.M., Matukumalli, L.K., Aerts, J., Coppieters, W., Crews, D., Neto, E.D., Gill, C.A., Gao, C., 2007. Whole genome linkage disequilibrium maps in cattle. BMC genetics. 8, 74-85.
M
McQuillan, R., Leutenegger, A.L., Abdel-Rahman, R., Franklin, C.S., Pericic, M., Barac-Lauc, L.,
ED
Smolej-Narancic, N., Janicijevic, B., Polasek, O., Tenesa, A., 2008. Runs of homozygosity in European populations. Am J Hum Genet. 83, 359-372.
PT
Mei, C., Wang, H., Liao, Q., Wang, L., Cheng, G., Wang, H., Zhao, C., Zhao, S., Song, J., Guang,
CE
X., 2017. Genetic architecture and selection of Chinese cattle revealed by whole genome
AC
resequencing. Mol Biol Evol
Olson, L.E., Zhang, J., Taylor, H., Rose, D.W., Rosenfeld, M.G., 2005. Barx2 functions through distinct corepressor classes to regulate hair follicle remodeling. Proceedings of the National Academy of Sciences of the United States of America. 102, 3708-3713. Ormsby, R.T., Findlay, D.M., Kogawa, M., Anderson, P.H., Morris, H.A., Atkins, G.J., 2014. Analysis of vitamin D metabolism gene expression in human bone: evidence for autocrine
ACCEPTED MANUSCRIPT
control of bone remodelling. J Steroid Biochem Mol Biol. 144 Pt A, 110-113. Porto-Neto, L.R., Kijas, J.W., Reverter, A., 2014. The extent of linkage disequilibrium in beef cattle breeds using high-density SNP genotypes. Genetics Selection Evolution. 46, 22-26. Price, A.L., Patterson, N.J., Plenge, R.M., Weinblatt, M.E., Shadick, N.A., Reich, D., 2006.
CR IP T
Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 38, 904-909.
Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A.R., Bender, D., Maller, J., Sklar,
AN US
P., de Bakker, P.I.W., Daly, M.J., 2007. PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. The American Journal of Human Genetics. 81, 559-575.
M
Purfield, D.C., Berry, D.P., McParland, S., Bradley, D.G., 2012. Runs of homozygosity and
ED
population history in cattle. Bmc Genetics. 13, 70-80. Qanbari, S., Pausch, H., Jansen, S., Somel, M., Strom, T.M., Fries, R., Nielsen, R., Simianer, H.,
PT
2014. Classic selective sweeps revealed by massive sequencing in cattle. PLoS Genet. 10,
CE
doi: 10.1371/journal.pgen.1004148.
AC
Qi, H., Moran, M.M., Navarro, B., Chong, J.A., Krapivinsky, G., Krapivinsky, L., Kirichok, Y., Ramsey, I.S., Quill, T.A., Clapham, D.E., 2007. All four CatSper ion channel proteins are required for male fertility and sperm cell hyperactivated motility. Proceedings of the National Academy of Sciences. 104, 1219-1223. Rischkowsky, B., Pilling, D. (2007). The state of the world's animal genetic resources for food and agriculture: Food & Agriculture Org.
ACCEPTED MANUSCRIPT
Rivadeneira, F., Styrkรกrsdottir, U., Estrada, K., Halldรณrsson, B.V., Hsu, Y.-H., Richards, J.B., Zillikens, M.C., Kavvoura, F.K., Amin, N., Aulchenko, Y.S., 2009. Twenty bone-mineral-density loci identified by large-scale meta-analysis of genome-wide association studies. Nature Genetics. 41, 1199.
Windows and Linux. Mol Ecol Resour. 8, 103-106.
CR IP T
Rousset, F., 2008. genepop'007: a complete re-implementation of the genepop software for
Rowe, P.S., De Zoysa, P.A., Dong, R., Wang, H.R., White, K.E., Econs, M.J., Oudet, C.L., 2000.
AN US
MEPE, a new gene expressed in bone marrow and tumors causing osteomalacia. Genomics. 67, 54-68.
Rubin, C.J., Megens, H.J., Martinez Barrio, A., Maqbool, K., Sayyab, S., Schwochow, D., Wang,
M
C., Carlborg, O., Jern, P., Jorgensen, C.B., 2012. Strong signatures of selection in the
ED
domestic pig genome. Proc Natl Acad Sci U S A. 109, 19529-19536. Rubin, C.J., Zody, M.C., Eriksson, J., Meadows, J.R., Sherwood, E., Webster, M.T., Jiang, L.,
PT
Ingman, M., Sharpe, T., Ka, S., 2010. Whole-genome resequencing reveals loci under
CE
selection during chicken domestication. Nature. 464, 587-591.
AC
Sabeti, P.C., Reich, D.E., Higgins, J.M., Levine, H.Z., Richter, D.J., Schaffner, S.F., Gabriel, S.B., Platko, J.V., Patterson, N.J., McDonald, G.J., 2002. Detecting recent positive selection in the human genome from haplotype structure. Nature. 419, 832-837.
Sander, G., Simon Bawden, C., Hynd, P.I., Nesci, A., Rogers, G., Powell, B.C., 2000. Expression of the Homeobox Gene, Barx2, in Wool Follicle Development. Journal of Investigative Dermatology. 115, 753-756.
ACCEPTED MANUSCRIPT
Sargolzaei, M., Schenkel, F., Jansen, G., Schaeffer, L., 2008. Extent of linkage disequilibrium in Holstein cattle in North America. Journal of Dairy Science. 91, 2106-2117. Setoguchi, K., Furuta, M., Hirano, T., Nagao, T., Watanabe, T., Sugimoto, Y., Takasuga, A., 2009. Cross-breed comparisons identified a critical 591-kb region for bovine carcass weight
CR IP T
QTL (CW-2) on chromosome 6 and the Ile-442-Met substitution in NCAPG as a positional candidate. BMC Genet. 10, doi: 10.1186/1471-2156-1110-1143.
Setoguchi, K., Watanabe, T., Weikard, R., Albrecht, E., Kuhn, C., Kinoshita, A., Sugimoto, Y.,
AN US
Takasuga, A., 2011. The SNP c.1326T>G in the non-SMC condensin I complex, subunit G (NCAPG) gene encoding a p.Ile442Met variant is associated with an increase in body frame size at puberty in cattle. Anim Genet. 42, 650-655.
M
Shu, J.T., Bao, W.B., Zhang, X.Y., Ji, C.J., Han, W., Chen, K.W., 2009. Combined effect of
ED
mutations in ADSL and GARS-AIRS-GART genes on IMP content in chickens. British Poultry Science. 50, 680-686.
PT
Stelnicki, E.J., Komuves, L.G., Kwong, A.O., Holmes, D., Klein, P., Rozenfeld, S., Lawrence, H.J.,
CE
Adzick, N.S., Harrison, M., Largman, C., 1998. HOX homeobox genes exhibit spatial and
AC
temporal changes in expression during human skin development. J Invest Dermatol. 110, 110-115.
Szpiech, Z.A., Hernandez, R.D., 2014. selscan: an efficient multithreaded program to perform EHH-based scans for positive selection. Mol Biol Evol. 31, 2824-2837. Tetens, J., Widmann, P., Kรผhn, C., Thaller, G., 2013. A genomeโwide association study indicates LCORL/NCAPG as a candidate locus for withers height in German Warmblood horses.
ACCEPTED MANUSCRIPT
Anim Genet. 44, 467-471. Troy, C.S., MacHugh, D.E., Bailey, J.F., Magee, D.A., 2001. Genetic evidence for Near-Eastern origins of European cattle. Nature. 410, 1088-1091. Voight, B.F., Kudaravalli, S., Wen, X., Pritchard, J.K., 2006. A map of recent positive selection in
CR IP T
the human genome. PLoS Biol. 4, doi: 10.1371/journal.pbio.0040072.
Weir, B.S., Cockerham, C.C., 1984. Estimating F-Statistics for the Analysis of Population Structure. Evolution. 38, 1358-1370.
AN US
Ye, M.H., Chen, J.L., Zhao, G.P., Zheng, M.Q., Wen, J., 2010. Correlation between polymorphisms in ADSL and GARS-AIRS-GART genes with inosine 5 โฒ -monophosphate (IMP) contents in Beijing-you chickens. British Poultry Science. 51,
M
609-613.
ED
Yue, X.P., Li, R., Liu, L., Zhang, Y.S., Huang, J.P., Chang, Z.H., Dang, R.H., Lan, X.Y., Chen, H., Lei, C.Z., 2014 When and how did Bos indicus introgress into Mongolian cattle? Gene.
PT
537, 214-219.
CE
Yu, Y., Nie, L., He, Z.Q., Wen, J.K., Jian, C.S., Zhang, Y.P., 1999. Mitochondrial DNA variation in
AC
cattle of South China: origin and introgression. Anim Genet. 30, 245-250.
Yun, Y., An, P., Ning, J., Zhao, G.-M., Yang, W.-L., Lei, A.-M., 2015. H1foo is essential for in vitro meiotic maturation of bovine oocytes. Zygote. 23, 416-425.
Zelenchuk, L.V., Hedge, A.M., Rowe, P.S., 2015. Age dependent regulation of bone-mass and renal function by the MEPE ASARM-motif. Bone. 79, 131-142. Zhang, Y., 2011., in Animal Genetic Resources in China: Bovines, 1rd ed.; China Agricultural
ACCEPTED MANUSCRIPT
Press: Beijing, China, 2011; pp. 1-23; ISBN. Zhang, G.X., Wang, Z.G., Chen, W.S., Wu, C.X., Han, X., Chang, H., Zan, L.S., Li, R.L., Wang, J.H., Song, W.T., 2007. Genetic diversity and population structure of indigenous yellow cattle breeds of China using 30 microsatellite markers. Anim Genet. 38, 550-559.
CR IP T
Zhang, H., Paijmans, J.L., Chang, F., Wu, X., Chen, G., Lei, C., Yang, X., Wei, Z., Bradley, D.G., Orlando, L., 2013. Morphological and genetic evidence for early Holocene cattle management in northeastern China. Nat Commun. 4, doi: 10.1038/ncomms3755.
AN US
Zhou, G.L., Jin, H.G., Zhu, Q., Guo, S.L., Wu, Y.H., 2005. Genetic diversity analysis of five cattle
AC
CE
PT
ED
M
breeds native to China using microsatellites. Journal of Genetics. 84, 77-80.