Genome-wide association study for milk production in Egyptian buffalo

Genome-wide association study for milk production in Egyptian buffalo

Author’s Accepted Manuscript Genome-Wide Association study Production in Egyptian Buffalo for Milk Nermin El-Halawany, Hamdy Abdel-Shafy, AbdEl-Mon...

1MB Sizes 1 Downloads 74 Views

Author’s Accepted Manuscript Genome-Wide Association study Production in Egyptian Buffalo

for

Milk

Nermin El-Halawany, Hamdy Abdel-Shafy, AbdEl-Monsif A. Shawky, Magdy A. Abdel-Latif, Ahmed F.M. Al-Tohamy, Omaima M. Abd ElMoneim www.elsevier.com/locate/livsci

PII: DOI: Reference:

S1871-1413(17)30029-X http://dx.doi.org/10.1016/j.livsci.2017.01.019 LIVSCI3141

To appear in: Livestock Science Received date: 20 December 2016 Accepted date: 30 January 2017 Cite this article as: Nermin El-Halawany, Hamdy Abdel-Shafy, Abd-El-Monsif A. Shawky, Magdy A. Abdel-Latif, Ahmed F.M. Al-Tohamy and Omaima M. Abd El-Moneim, Genome-Wide Association study for Milk Production in Egyptian Buffalo, Livestock Science, http://dx.doi.org/10.1016/j.livsci.2017.01.019 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting galley proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Genome-Wide Association study for Milk Production in Egyptian Buffalo Nermin El-Halawanya,1*, Hamdy Abdel-Shafyb,1, Abd-El-Monsif A. Shawkya, Magdy A. AbdelLatifc, Ahmed F.M. Al-Tohamya, Omaima M. Abd El-Moneima a

Department of Cell Biology, National Research Centre, Dokki, Egypt

b

c

Department of Animal Production, Faculty of Agriculture, Cairo University, Giza, Egypt

Department of Buffalo Research, Animal Production Research Institute, Doki, Giza, Egypt.

[email protected] [email protected] *

Corresponding author: Department of Cell Biology, National Research Centre, Al Bhouth St. Dokki Giza Egypt, 12622

Abstract With the aim of characterizing the genetic background of Egyptian buffalo and identifying genomic regions and potential causative mutations associated with milk yield, we performed a Genome-Wide Association Study (GWAS) in Egyptian buffalo using Axiom Buffalo Genotyping Array 90K. This study was carried out with 250 buffalo cows using 89,069 daily milk records. After quality control, a total of 42,269 single nucleotide polymorphisms (SNPs) remained for further analysis. Genome-wide analysis was performed in the way of SNP-by-SNP, through regressing the observations of an average daily milk yield deviations on SNP alleles.

1

1st and 2nd authors contributed equally to this article. 1

Several genomic regions were detected with suggestive signals of association on chromosomes BTA1, BTA5, BTA6, and BTA27. The most significant SNP (Affx-79526274) was located on chromosome BTA27. The convincingly associated SNPs were located within or close to several candidate genes. A GO analysis ranked immune response at the top of all biological process associated with those genes. This is the first GWAS in Egyptian buffalo. Although a small sample size was used in this study, several suggestive genomic loci associated with daily milk production were detected. Further work is required on a larger sample size with fine mapping of identified QTL to detect potential candidate regions. Keywords: GWAS; milk yield; Egyptian buffalo

Introduction Animal agriculture makes up a significant portion of the Egyptian agricultural sector. It contributes 24.5% of the total agricultural output (SADS 2009). Among domestic animals, the water buffalo (Bubalus bubalis), particularly the river buffalo, holds great economic importance as a provider of milk, meat, hide and even horns (Cockrill 1981; Michelizzi et al. 2010a). According to estimates by the Food and Agriculture Organization of the United Nations, the global water buffalo population has increased 98% in recent decades, from 88 million in 1961 to 174 million in 2005. In Egypt the total number of buffaloes was approximately 3.95 million in 2009. The aggregate share of buffalo milk, from all types of production systems is about 81% of total milk production in Egypt. In recent years, the scientific community concerned with water buffalo has generated genome resources, including whole genome mapping and whole genome sequencing (Michelizzi et al. 2

2010a). Radiation hybrid (RH) and in situ hybridization mapping approaches have been used to map the river buffalo genome (Amaral et al. 2007; Goldammer et al. 2007; Miziara et al. 2007; Stafuzza et al. 2007; Amaral et al. 2008; Di Meo et al. 2008; Ianella et al. 2008; Stafuzza et al. 2009). As for nuclear genome sequencing, there are two water buffalo genomes sequences (Tantia et al. 2011; Zimin et al. 2013) now available on the NCBI platform in scaffolds. However, the sequences are not displayed in chromosomes and genes are not annotated (UMD_CASPUR_WB_2.0;

http://www.ncbi.nlm.nih.gov/assembly/GCF_000471725.1/).

Recently, de novo assembly was performed on Chinese swamp buffalo by RNA Sequencing (Deng et al. 2016). Although there are many advantages to farming water buffalo, they remain underutilized. Water buffalo breeders and farmers face many and particular challenges and problems such as poor reproductive efficiency (Gordon 1996), and low rates of calf survival (Khatun et al. 2009). Therefore, genome wide discovery of SNPs in water buffalo is required to promote and utilize gene technologies for genetic improvement, in terms of milk yields, of the animals. Recently, the development of genome-wide association studies (GWAS) (Hirschhorn & Daly 2005) has led to the identification of markers that could enable more accurate breeding value estimation and supports the understanding of genetic control of traits through identification of candidate genes (Pryce et al. 2010). In cattle, many reports describe the use of GWAS for several economically important traits including milk yield (Hayes et al. 2009; Bolormaa et al. 2010; Jiang et al. 2010; Mai et al. 2010; Iso-Touru et al. 2016), milk quality (Pryce et al. 2010; Bouwman et al. 2011; Schopen et al. 2011), fertility (Feugang et al. 2009; Huang et al. 2010; Sahana et al. 2010; Sahana et al. 2011), growth rate (Snelling et al. 2010; Bolormaa et al. 2011a), meat quality and carcass traits (Bolormaa et al. 2011b; Lu et al. 2013; Mudadu et al. 2016).

3

However, GWAS for buffalo are limited; some studies have been carried out to transfer single nucleotide polymorphisms (SNP) information from cattle to buffalo, since they are in a close evolutionary relationship (Hernandez Fernandez & Vrba 2005). The Illumina Bovine SNP50 BeadChip and the 777 k cattle SNP chip (Illumina) were used to genotype 10 water buffaloes (Michelizzi et al. 2010b) and 384 buffaloes (Borquis et al. 2014) respectively. Furthermore, cattle SNP chips were used to identify SNP associated with production and reproduction traits in buffaloes using the 54 k cattle SNP chip (Wu et al. 2013) and a 777 k cattle SNP chip (Venturini et al. 2014). More recently, whole genome sequencing of several buffaloes from different breeds made possible development of a commercial buffalo SNP chip array (Axiom ® Buffalo Genotyping Array 90 K Affymetrix). The SNP position and annotation to genes used the cattle genome (UMD3.1) as a reference. Therefore, the number of chromosomes in this array is 60 instead of 50, as is already known for river buffalo or 48 for swamp buffalo. This chip was used to find the genomic loci associated with total milk yield in Italian buffalo (Iamartino et al. 2013) and Brazilian Murrah buffalo (de Camargo et al. 2015). However, comparing the results of the GWAS in different buffalo breeds results in differences regarding the reported associated SNPs and genes (de Camargo et al. 2015). Hence, a continuous search for new SNPs and candidate genes related to economic traits in different buffaloes breeds should continue. The objective of this study was therefore to perform a GWAS for the milk yield trait in Egyptian buffalo using Axiom Buffalo Genotyping Array 90K. This will make possible a further characterization of the genetic background of Egyptian buffalo and possibly identify genomic regions and causative mutations associated with milk yield.

4

Materials and Methods Animals and phenotypes This study was carried out with 250 buffalo cows (96 genotyped and 154 contemporary animals) using 89,069 daily milk records. The buffalo were calved between 1998 and 2016, with more than 88% calved after 2006. The buffalo were kept on two dairy farms in the Delta region of Egypt. To ensure a homogenous data set, a number of editing steps were performed (Table 1). Table 1. Editing process for the daily milk records in the buffalo cows. Criteria for deletion

Deleted records

Remaining records

-----

89,069

29

89,040

Records in lactation number greater than 5

3,797

85,243

DIM less than 5 and greater than 290

7,487

77,756

458

77,298

Total records Records in lactation number less than 1

Test days less than 20 records per lactation DIM: days in milk.

The editing process was carried out to exclude records with irrecoverable errors like unknown calving date and a lactation number less than 1 (29 records) and records from the six and later lactations (3,797 records). Other criteria were set to include only records between 5 and 290 days in milk (7,487 records deleted) and lactations with a minimum of 20 test days per lactation (458 records deleted). After editing, 233 buffalo cows with 77,298 records were used. The age at first calving of these animals was greater than 690 days. The descriptive statistics of the data set after filtering options are presented in Table 2.

5

Table 2. Characterization of daily milk yield over the first five lactations in the buffalo cows. Lactation

Animals

Daily milk yield Records

Mean

Min

Max

SD

SE

165

36,721

10.6

0.5

22.5

3.03

0.02

2nd

122

21,432

13.0

0.5

28.0

4.20

0.03

3rd

94

11,152

13.0

0.5

27.0

4.58

0.04

4th

69

6,041

12.3

0.5

26.5

5.19

0.07

5th

33

1,952

11.2

0.5

23.0

5.80

0.13

1

st

Min: minimum, Max: maximum, SD: standard deviation, SE: standard error.

Genotypes Blood samples were collected aseptically from the jugular vein and chilled immediately on ice. Genomic DNA was isolated from blood using the phenol-chloroform extraction and ethanol precipitation methods (Sambrook 1989). The prepared DNA samples of 96 animals were sent to GeneSeek, Inc. (Lincoln, NE, USA) for genotyping using Axiom Buffalo Genotyping Array 90K according to its standard protocol. The raw intensity genotyping files (CEL format) were extracted and converted to the Ped/Map format using AffyPipe (Nicolazzi et al. 2014). Stringent quality control criteria were applied to exclude markers and individuals with insufficient genotyping quality. Quality control was performed using AffyPipe (Nicolazzi et al. 2014) and PLINK (Purcell et al. 2007). In this respect, SNPs with unknown physical position, low minor allele frequency (MAF<0.05), or missing genotype per SNP (GENO>0.10) were excluded. In addition, markers significantly (P<0.001) deviated from Hardy-Weinberg proportion were eliminated. To reduce the redundancy

6

between SNPs that are in high LD, we excluded one of a pair of SNPs if the LD of r2 > 0.5 within a window of 50 SNPs and moving 5 SNPs per set using PLINK (Purcell et al. 2007).

Statistical analyses Estimation of average yield deviation A yield deviation is a weighted average of the animal's milk yields adjusted for non-genetic factors. In this respect, contemporary animals can be used to increase the accuracy of adjustment for environmental effects (VanRaden & Wiggans 1991). The yield deviation for each animal was estimated using a mixed model procedure implemented in SAS software, version 9.4 (SAS 2014, SAS Institute Inc., Cary, NC, USA), then deviations were averaged for each individual over lactations. Daily milk records within a lactation of a buffalo cow were treated as repeated measurements. The spatial power covariance structure was used as an appropriate fit for the data set. The final model was: Yijklmo= μ + ACi + Lacj + HYSk + bl1 (DIM) + bl2 (exp -0.05*DIM)+ pem + Ɛijklmo where Yijklmo is a daily milk yield; μ is the overall mean of observations; ACi is the fixed effect of age at calving; Lacj is the fixed effect of lactation number; HYSk is the fixed combined effect between herd (h), year (y), and season (s) of calving, where seasons were defined by calendar quarters: January to March, April to June, July to September, and October to December; b l1 and bl2 are two regression coefficient associated with the fixed lactation function, where DIM is days in milk (Wilmink 1987); pem is the random permanent environmental effect, where Legendre polynomials of order four were used (Legendre polynomials were standardized in the period from 5 to 290 days after calving); and Ɛijklmo is the residual term denoting the difference between observed and predicted daily milk yield for a buffalo cow at each DIM.

7

Genome-wide association study (GWAS) A potential population stratification was accounted for by using a multidimensional scaling (MDS) procedure implemented in PLINK (Purcell et al. 2007), where a pairwise population concordance (PPC) test was performed based on an identity-by-state (IBS) similarity matrix. In this respect, LD pruning was applied in the first instance to include only independent SNPs using a threshold value of r2 >0.2 within a window of 50 SNPs and moving 5 SNPs per set, and then IBS was performed using all SNPs remained after the initial pruning (Anderson et al. 2010; Laurie et al. 2010). The GWAS was performed using the linear regression model as implemented in PLINK (Purcell et al. 2007), where the average daily milk deviations were regressed on the number of copies of the alleles using the PLINK–linear option with population stratification as covariates. The results of associations were used to generate Manhattan plots using Haploview, version 4.1 (Barrett et al. 2005). False discovery rate (FDR) implemented in PLINK was used to adjust the threshold below which a P value is considered significant (Benjamini & Hochberg 1995). A genome-wide complex trait analysis (GCTA) software v1.25.3 was used to estimate the heritability of the average daily milk deviation (Yang et al. 2011). Heritability is the proportion of phenotypic variation due to the genetic variance among individuals. To estimate the heritability explained by genome-wide SNPs, a genetic relationship matrix (GRM) was created and fitted subsequently in a mixed linear model through the restricted maximum likelihood (REML) method (Yang et al. 2011).

QTL and candidate genes

8

Since the current array aligned to the bovine genome, a list of previously reported QTL for milk yield

traits

was

obtained

from

animal

QTLdb,

release

30

(Hu

et

al.

2016)

(http://www.animalgenome.org/QTLdb).

In our data set, linkage disequilibrium (LD) among markers was found to be up to 1 Mb (Table 4). Therefore, we used a 1 Mb window around detected genomic regions to search for potential candidate genes. The gene search was performed via the Ensembl data base (UMD3.1, Ensembl data base build 85). SNPs were assigned to genes using Ensembl Perl API tools (http://www.ensembl.org) using a Perl script (http://www.perl.org). Integrative information of annotations for a given gene list was performed by GeneCodis (http://genecodis.cnb.csic.es), which uses information from different bioinformatics tools and functional enrichment analyses (Carmona-Saez et al. 2007; Nogales-Cadenas et al. 2009; Tabas-Madrid et al. 2012).

Results and Discussion Genotypes for genome-wide scan The Axiom Buffalo Genotyping Array used in this study featured 90,000 SNPs across all chromosomes with an average spacing of one SNP every 29.56 kb across all loci (median spacing of 28.4 kb, a minimum distance of 0.008 kb and a maximum distance of 1.07 Mb). In this array, markers were selected from sequencing initiatives of multiple buffalo breeds (Mediterranean, Murrah, Jaffarabadi, and Nili-Ravi). As the reference buffalo genome is not yet available, the sequence reads were aligned to the latest reference assembly of the bovine genome (UMD3.1, University of Maryland bovine genome assembly). Therefore, SNPs are located on 30 chromosome pairs including the X-chromosome instead of the 25 pairs already known for river buffalo (Iamartino et al. 2013). 9

During quality control, we excluded 48 SNPs with unknown physical position and 11,112 SNPs and one individual with low call rate (<90%). Further, 10,317 markers with MAF below 5% were eliminated. 647 markers significantly violated from Hardy-Weinberg proportion (P<0.001), which could be due to genotyping error, were also deleted. LD pruning with a step of five SNPs and r2 threshold of 0.5 was used, which led to the filtering out of 25,607 markers (Table 3). LD pruning is important to ensure independency among tested SNPs when GWAS is performed (Anderson et al. 2010; Laurie et al. 2010). Table 3. Editing process for the SNPs used for GWAS in the Egyptian buffalo. Quality control method

Deleted SNPs

Remaining SNPs

-----

90,000

48

89,952

SNPs with MAF<0.05

10,317

79,635

SNPs with low call rate <90%

11,112

68,523

647

67,876

25,607

42,269

Original SNPs Un-positioned SNPs

SNPs deviated from HWE 2

LD pruning (r >0.5)

MAF: minor allele frequency, HWE: Hardy-Weinberg equilibrium, LD: linkage disequilibrium

A total of 42,269 SNPs (46.97%) and 95 animals passed the quality control with the number of SNPs per chromosome ranging from 1,187 (on chromosomes BTA25) to 4,218 (on chromosome BTA1) as shown in Table 4. This subset of SNPs covered 2,646.90 Mb of the buffalo genome with an average MAF of 0.30 and an average probe spacing of 62.66 kb between adjacent markers (median spacing of 42.6 kb, a minimum distance of 0.09 kb and a maximum distance of 2.35 Mb). The genotyping rate for the individuals in the remaining data set was 99.8%.

10

Table 4. Summary statistics of SNPs used for GWAS in the buffalo cows. Gap size [kb]#* Chr length* Filtered* [Mb] Mean SD Max 2,442 158.13 64.78 57.01 764.85

Number of SNPs

LD between SNPs* 2

Chr 1

Initial 5,404

2

4,673

2,190

136.04

62.15

60.03

1,628.45

0.31

231.33

997.59

3

4,130

1,931

121.20

62.80

57.51

812.17

0.30

224.79

995.61

4

4,070

1,951

120.34

61.71

52.59

841.11

0.30

222.57

963.18

5

4,096

1,862

120.38

64.69

58.82

716.58

0.30

231.73

997.74

6

3,992

1,915

119.22

62.29

54.32

929.84

0.30

225.90

989.45

7

3,826

1,865

112.40

60.30

63.33

1,236.58

0.30

219.01

993.97

8

3,800

1,693

112.83

66.68

66.00

1,145.05

0.31

241.32

998.38

9

3,492

1,763

105.48

59.87

43.13

351.62

0.31

223.59

987.07

10

3,500

1,642

103.73

63.21

52.72

685.71

0.31

230.46

974.99

11

3,581

1,745

107.13

61.43

49.90

564.48

0.30

229.20

970.12

12

3,061

1,349

90.92

67.45

81.26

1,298.12

0.31

226.92

989.12

13

2,822

1,459

83.65

57.37

50.80

754.19

0.31

210.25

997.74

14

2,901

1,409

83.34

59.19

48.69

475.99

0.30

220.22

990.33

15

2,877

1,346

84.90

63.13

64.08

924.26

0.30

229.22

983.36

16

2,779

1,277

80.91

63.41

65.42

893.43

0.30

212.82

996.73

17

2,558

1,267

74.98

59.23

50.39

492.02

0.31

224.20

988.05

18

2,260

1,175

65.55

55.83

52.15

652.94

0.31

193.55

998.31

19

2,167

1,088

63.56

58.47

83.57

1,839.44

0.30

199.20

999.53

20

2,392

1,163

71.78

61.77

50.61

692.39

0.31

232.75

934.27

21

2,402

1,150

71.21

61.97

65.68

971.26

0.31

215.82

944.63

22

2,051

1,087

61.13

56.29

42.70

346.33

0.30

202.08

998.98

23

1,764

859

51.92

60.51

62.05

746.12

0.30

220.86

953.70

24

2,122

1,125

62.16

55.30

39.36

307.18

0.30

202.67

976.75

25

1,507

825

42.59

51.69

36.91

310.35

0.30

188.16

982.65

26

1,756

902

50.97

56.58

44.76

499.30

0.31

208.69

996.42

27

1,516

799

45.05

56.46

59.45

694.16

0.30

190.53

881.25

28

1,586

753

45.84

60.96

64.17

1,043.31

0.31

218.44

992.00

29

1,735

822

50.84

61.92

71.51

1,090.03

0.31

218.54

985.03

30

5,132

1,415

148.72

105.17 190.87

2,346.79

0.31

279.40

999.03

UnPos

48

---

---

---

---

---

---

---

---

Total

90,000

42,269

2,646.90

62.66

67.10

2,346.79

0.31

223.65

999.53

11

Mean r Mean dist [kb] Max dist [kb] 0.31 246.16 995.88

Chr: chromosome, Mb: mega base, kb: kilo base, LD: linkage disequilibrium, SD: standard deviation, Max: maximum, Min: minimum, r2: correlation coefficient between pairs of SNPs, dist: distance between the two SNPs that are in LD, UnPos: un-positioned SNPs. * SNPs after applying filtering options (see materials and methods for further details), #

Distance between adjacent SNPs.

Genome-wide association In this study, we performed GWAS in the way of SNP-by-SNP, through regressing the observations of an average daily milk yield deviations on a SNP alleles. After correction for population sub-structure, the genomic inflation factor (λ) reduced from 1.20 to 1.09, which shows that the MDS based analysis successfully eliminated spurious associations that could have resulted from population sub-structure. In this respect, the model detected several genomic regions with suggestive signals of association on chromosomes BTA1, BTA5, BTA6, and BTA27, where several suggestive SNPs are located within narrow chromosomal regions. As shown in Figure 1 and Table 5, the most significant SNP (Affx-79526274) was located on chromosome BTA27 (P≤ 1.87 x 10-6) at 30.14 Mb. In addition, seven suggestive SNPs were located around that region (between 28.45 and 44.32 Mb). Other convincing associations were located on chromosomes BTA1 (two SNPs between 20.99 and 21.05 Mb), BTA5 (two SNPs between 72.54 and 108.49 Mb), and BTA6 (two SNPs between 79.04 and 91.63 Mb).

Table 5. Top 20 suggestive SNPs associated with average daily milk production in Egyptian buffalo. SNP ID

Chr.

Positions [bp]

MA

MAF

β

P-value

Affx-79601724

1

20,997,981

T

0.3263

-0.728

0.0001666

Affx-79610950

1

21,052,931

G

0.4263

-0.841

0.0002086

12

Affx-79560372

1

88,969,774

C

0.06842

1.675

0.0002301

Affx-79533574

2

91,548,900

G

0.3526

0.718

0.0001266

Affx-79604933

4

91,085,346

C

0.4579

0.823

5.81E-05

Affx-79549038

5

72,540,057

C

0.2947

0.908

3.85E-05

Affx-79548204

5

108,492,873

C

0.1474

1.049

4.39E-05

Affx-79555281

6

79,047,120

A

0.4474

-0.781

0.0001943

Affx-79556673

6

91,635,711

C

0.1368

-0.946

0.0001874

Affx-79580064

7

17,310,633

C

0.3421

0.706

6.67E-05

Affx-79590952

11

102,297,706

G

0.4579

0.678

0.0002655

Affx-79561040

22

46,577,163

C

0.3105

-0.758

0.0001471

Affx-79557143

23

38,192,371

G

0.1684

-0.857

0.0002434

Affx-79587840

26

35,808,365

T

0.3158

-0.711

0.0001498

Affx-79605750

27

28,451,673

G

0.1316

-1.440

0.0001541

Affx-79526274

27

30,143,588

A

0.4158

-0.845

1.87E-06

Affx-79570034

27

36,811,120

G

0.2895

-0.794

8.96E-05

Affx-79541125

27

37,203,845

G

0.4211

0.714

0.0002398

Affx-79595574

27

37,722,254

T

0.2474

-0.980

2.96E-05

Affx-79527896

27

40,550,549

C

0.4105

-0.738

2.09E-05

Chr: chromosome, MA: minor allele, MAF: minor allele frequency, β: change per minor allele. Positions are according to University of Maryland bovine genome assembly Build 3.1 (UMD3.1).

Interestingly, the effect of minor alleles for the top five suggestive SNPs on chromosome BTA27, the two SNPs on chromosome BTA1 and the two SNPs on chromosome BTA6 was negative, ranging from -1.44 to -0.74, while the effect of minor alleles for the two SNPs located on chromosome BTA5 was positive, ranging from 0.91 to 1.05. The total genetic variance explained by all SNPs used for GWAS was only 0.23. The heritability estimate based on genetic relationship was 0.21 with standard error of 0.32, which is relatively large due to the small sample size. Similar values of heritability for milk yield traits were previously estimated in Brazilian, Egyptian, and Italian buffaloes and ranged from 0.13 to 0.25 using classical animal model and pedigree (Rosati & Van Vleck 2002; Aspilcueta-Borquis et al. 2010; El-Bramony 2011; de Camargo et al. 2015). 13

The sample size of 95 genotyped animals used for our GWAS is relatively low when compared to similar studies performed in dairy cattle, which usually contain thousands of individuals. The primary factor that limited our sample size was the difficulty in obtaining extensive and accurate daily milk records, since they are not usually kept in buffalo farms. In addition, the vast majority of Egyptian buffalo are owned by smallholders without systematic recording of any economical traits, which makes it more difficult to collect accurate phenotypes from a large number of animals, as such the ability to identify QTL associated with the phenotype was expected to be low compared to GWAS in dairy cattle. However, the genome-wide analysis revealed potential signals of association between SNP markers and average daily milk yield on several chromosomes as mentioned above. Since pedigree information is not usually recorded in Egypt, the estimation of heritability using genomic information, even with a small sample size, is the best alternative to use as a starting point for improving economically important traits in Egyptian buffalo. Using the same array in Italian (Iamartino et al. 2013b) and Brazilian buffalo (de Camargo et al. 2015), several suggestive genomic regions associated with lactation milk yield have been identified, but none of these loci was detected in our study. Likewise, the GWAS performed in Brazilian buffalo using Illumina BovineHD BeadChip identified different genomic loci associated with 305 days milk yield (Venturini et al. 2014). Also, the GWAS implemented in Chinese buffalo breeds using Illumina Bovine50k BeadChip detected different QTL associated with milk yield either per lactation or per day (Wu et al. 2013). Different genomic loci that have been identified in different buffalo populations but not in this study might have too small effects to be detected in the Egyptian buffalo population, have different LD structure, or have false positive signals. In addition, different traits used for other 14

GWAS (e.g. breeding values and de-regressed breeding values for 270 and 305 days in milk in Italian and Brazilian buffalo, respectively) than that used in the current study (daily milk yield deviation) might also lead to different results. Furthermore, these differences could be due to different genetic composition of the breeds, selection pressure in the population, varying environmental conditions, inbreeding, effective population size, allelic frequencies differences or using SNP chips that are not species specific. On the other hand, compared to GWAS performed in cattle, the suggestive associated region on chromosome 27 coincides with previously reported QTL for milk yield using different SNP chips in American Holstein cattle (Cole et al. 2011), Ireland Holstein cattle (Meredith et al. 2012), and Blonde d'Aquitaine cattle (Michenet et al. 2016). Similarly, the other suggestive regions are overlapped with the previously recognized QTL for milk yield on chromosome 1 in Blonde d'Aquitaine and Limousin cattle (Michenet et al. 2016); on chromosome 5 in American Holstein cattle (Cole et al. 2011), Ireland Holstein cattle (Meredith et al. 2012), and Canadian Holstein cattle (Nayeri et al. 2016); and on chromosome 6 in American Holstein cattle (Cole et al. 2011; Cochran et al. 2013), and Ireland Holstein cattle (Meredith et al. 2012). Identification of the same genomic regions in different species, breeds and populations with different experimental designs and analyses increases the confidence that the linked neighborhood genomic region contains true causative mutations responsible for the trait variation. The convincing associations on chromosome 1, 5, 6, and 27 were located within or close to several candidate genes (Table S1). Identified SNPs on chromosome 5 are located within Glycosyltransferase-like protein (LARGE1) and ELKS/Rab6-interacting/CAST family member 1 (ERC1) genes. Likewise, the SNPs on chromosome 6 are overlapped with Adhesion G ProteinCoupled Receptor L3 (ADGRL3) and Prostate androgen-regulated mucin-like protein 1 (PARM1) 15

genes. The other SNPs on chromosomes 1 and 27 are closed to several genes. The nearest genes for those SNPs are located within a distance ranging from 0.19 to 0.31 Mb (S1 Table). The candidate genes for suggestive genomic regions on chromosomes 1, 5, 6 and 27 were subjected to enrichment analysis using the GeneCodis databases. The analysis ranked immune response at the top of all biological processes associated with those genes (8 genes with P-value ≤ 0.0006 adjusted by FDR; GO:0006955; S2 Table). Interestingly, it was reported that several immune related genes were identified as potential candidates for milk production traits in dairy cattle (Beecher et al. 2010; Verbeke et al. 2014; Yang et al. 2016).

Conclusions This is the first GWAS in Egyptian buffalo. Although a small sample size was used in this study, several suggestive genomic loci associated with daily milk production were detected. These loci coincided with the previously reported QTL in different cattle breeds and supports the importance of these genomic regions for the trait variation. The heritability estimated from genomic information can be used as a best alternative for improving milk production in Egyptian buffalo, since there is a lack of pedigree information. Future work is required on a larger sample size with fine mapping of identified QTL for detecting potential candidate regions.

Acknowledgments

16

This study was financially supported by the Science and Technology Development Fund (STDF), Egypt (Grant number: 2047). The funding agency had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing Interests: The authors are declared that there is no competing interests exist.

References Amaral M.E., Grant J.R., Riggs P.K., Stafuzza N.B., Filho E.A., Goldammer T., Weikard R., Brunner R.M., Kochan K.J., Greco A.J., Jeong J., Cai Z., Lin G., Prasad A., Kumar S., Saradhi G.P., Mathew B., Kumar M.A., Miziara M.N., Mariani P., Caetano A.R., Galvao S.R., Tantia M.S., Vijh R.K., Mishra B., Kumar S.T., Pelai V.A., Santana A.M., Fornitano L.C., Jones B.C., Tonhati H., Moore S., Stothard P., Womack J.E., 2008. A first generation whole genome RH map of the river buffalo with comparison to domestic cattle. BMC Genomics 9, 631. Amaral M.E., Owens K.E., Elliott J.S., Fickey C., Schaffer A.A., Agarwala R., Womack J.E., 2007. Construction of a river buffalo (Bubalus bubalis. whole-genome radiation hybrid panel and preliminary RH mapping of chromosomes 3 and 10. Anim. Genet. 38, 311-4. Anderson C.A., Pettersson F.H., Clarke G.M., Cardon L.R., Morris A.P., Zondervan K.T., 2010. Data quality control in genetic case-control association studies. Nat. Protoc. 5, 1564-73.

17

Aspilcueta-Borquis R.R., Araujo Neto F.R., Baldi F., Bignardi A.B., Albuquerque L.G., Tonhati H., 2010.

Genetic parameters for buffalo milk yield and milk quality traits using

Bayesian inference. J. Dairy Sci. 93, 2195-201. Barrett J.C., Fry B., Maller J., Daly M.J., 2005. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21, 263-5. Beecher C., Daly M., Childs S., Berry D.P., Magee D.A., McCarthy T.V., Giblin L., 2010. Polymorphisms in bovine immune genes and their associations with somatic cell count and milk production in dairy cattle. BMC Genet. 11, 99. Benjamini Y., Hochberg Y., 1995. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological) 57, 289-300. Bolormaa S., Hayes B.J., Savin K., Hawken R., Barendse W., Arthur P.F., Herd R.M., Goddard M.E., 2011a. Genome-wide association studies for feedlot and growth traits in cattle. J. Anim. Sci. 89, 1684-97. Bolormaa S., Neto L.R., Zhang Y.D., Bunch R.J., Harrison B.E., Goddard M.E., Barendse W., 2011b. A genome-wide association study of meat and carcass traits in Australian cattle. J. Anim. Sci. 89, 2297-309. Bolormaa S., Pryce J.E., Hayes B.J., Goddard M.E., 2010. Multivariate analysis of a genomewide association study in dairy cattle. J. Dairy Sci. 93, 3818-33. Borquis R.R., Baldi F., de Camargo G.M., Cardoso D.F., Santos D.J., Lugo N.H., Sargolzaei M., Schenkel F.S., Albuquerque L.G., Tonhati H., 2014.

Water buffalo genome

characterization by the Illumina BovineHD BeadChip. Genet. Mol. Res. 13, 4202-15.

18

Bouwman A.C., Bovenhuis H., Visker M.H., van Arendonk J.A., 2011.

Genome-wide

association of milk fatty acids in Dutch dairy cattle. BMC Genet. 12, 43. Carmona-Saez P., Chagoyen M., Tirado F., Carazo J.M., Pascual-Montano A., 2007. GENECODIS: a web-based tool for finding significant concurrent annotations in gene lists. Genome Biol. 8, R3. Cochran S.D., Cole J.B., Null D.J., Hansen P.J., 2013.

Discovery of single nucleotide

polymorphisms in candidate genes associated with fertility and production traits in Holstein cattle. BMC Genet. 14, 49. Cockrill W.R., 1981. The water buffalo: a review. Br. Vet. J. 137, 8-16. Cole J.B., Wiggans G.R., Ma L., Sonstegard T.S., Lawlor T.J., Jr., Crooker B.A., Van Tassell C.P., Yang J., Wang S., Matukumalli L.K., Da Y., 2011. Genome-wide association analysis of thirty one production, health, reproduction and body conformation traits in contemporary U.S. Holstein cows. BMC Genomics 12, 408. de Camargo G.M., Aspilcueta-Borquis R.R., Fortes M.R., Porto-Neto R., Cardoso D.F., Santos D.J., Lehnert S.A., Reverter A., Moore S.S., Tonhati H., 2015. Prospecting major genes in dairy buffaloes. BMC Genomics 16, 872. Deng T., Pang C., Lu X., Zhu P., Duan A., Tan Z., Huang J., Li H., Chen M., Liang X., 2016. De Novo Transcriptome Assembly of the Chinese Swamp Buffalo by RNA Sequencing and SSR Marker Discovery. PLoS ONE 11(1): e0147132. doi:10.1371/journal.pone.0147132 Di Meo G.P., Perucatti A., Floriot S., Hayes H., Schibler L., Incarnato D., Di Berardino D., Williams J., Cribiu E., Eggen A., Iannuzzi L., 2008. An extended river buffalo (Bubalus bubalis, 2n = 50. cytogenetic map: assignment of 68 autosomal loci by FISH-mapping

19

and R-banding and comparison with human chromosomes. Chromosome Res. 16, 82737. El-Bramony M.M., 2011. Genetic and phenotypic parameters of milk yield and reproductive performance in the first three lactations of Egyptian buffalo. Egyptian J. Anim. Prod. 48, 1-10. Feugang J.M., Kaya A., Page G.P., Chen L., Mehta T., Hirani K., Nazareth L., Topper E., Gibbs R., Memili E., 2009. Two-stage genome-wide association study identifies integrin beta 5 as having potential role in bull fertility. BMC Genomics 10, 176. Goldammer T., Weikard R., Miziara M.N., Brunner R.M., Agarwala R., Schaffer A.A., Womack J.E., Amaral M.E., 2007. A radiation hybrid map of river buffalo (Bubalus bubalis) chromosome 7 and comparative mapping to the cattle and human genomes. Cytogenet. Genome Res. 119, 235-41. Gordon I., 1996.

Controlled reproduction in cattle and buffaloes. CAB International,

Wallingford. Hayes B.J., Bowman P.J., Chamberlain A.J., Savin K., van Tassell C.P., Sonstegard T.S., Goddard M.E., 2009. A validated genome wide association study to breed cattle adapted to an environment altered by climate change. PLoS One 4, e6676. Hernandez Fernandez M., Vrba E.S., 2005.

A complete estimate of the phylogenetic

relationships in Ruminantia: a dated species-level supertree of the extant ruminants. Biol. Rev. Camb. Philos. Soc. 80, 269-302. Hirschhorn J.N., Daly M.J., 2005. Genome-wide association studies for common diseases and complex traits. Nat. Rev. Genet. 6, 95-108.

20

Hu Z.L., Park C.A., Reecy J.M., 2016. Developmental progress and current status of the Animal QTLdb. Nucleic Acids Res, 44, D827-33. Huang W., Kirkpatrick B.W., Rosa G.J., Khatib H., 2010. A genome-wide association study using selective DNA pooling identifies candidate markers for fertility in Holstein cattle. Anim. Genet. 41, 570-8. Iamartino D., Williams J.L., Sonstegard T., Reecy J., Van Tassell C., Nicolazzi E.L., Biffani S., Biscarini F., Schroeder S., de Oliveira D.A.A., Coletta A., Garcia J.F., Ali A., Ramunno L., Pasquariello R., Drummond M.G., Bastianetto E., Fritz E., Knoltes J., Consortium t.I.B. 2013. The buffalo genome and the application of genomics in animal management and improvement. Buffalo Bull. 32, 151-8. Ianella P., Venancio L.P., Stafuzza N.B., Miziara M.N., Agarwala R., Schaffer A.A., Riggs P.K., Womack J.E., Amaral M.E. 2008. First radiation hybrid map of the river buffalo X chromosome (BBUX. and comparison with BTAX. Anim. Genet. 39, 196-200. Iso-Touru T., Sahana G., Guldbrandtsen B., Lund M.S., Vilkki J. 2016.

Genome-wide

association analysis of milk yield traits in Nordic Red Cattle using imputed whole genome sequence variants. BMC Genet. 17, 55. Jiang L., Liu J., Sun D., Ma P., Ding X., Yu Y., Zhang Q. 2010. Genome wide association studies for milk production traits in Chinese Holstein population. PLoS One 5, e13661. Khatun M.R., Arifuzzaman M., Ashraf A. 2009. A Comparative Analysis on Factors Affecting Calf Mortality of Buffalo in a Breeding Farm. Pak J Biol Sci. 12, 1535-8. Laurie C.C., Doheny K.F., Mirel D.B., Pugh E.W., Bierut L.J., Bhangale T., Boehm F., Caporaso N.E., Cornelis M.C., Edenberg H.J., Gabriel S.B., Harris E.L., Hu F.B., Jacobs K.B., Kraft P., Landi M.T., Lumley T., Manolio T.A., McHugh C., Painter I., Paschall J., Rice

21

J.P., Rice K.M., Zheng X., Weir B.S., Investigators G. 2010. Quality control and quality assurance in genotypic data for genome-wide association studies. Genet. Epidemiol. 34, 591-602. Lu D., Sargolzaei M., Kelly M., Vander Voort G., Wang Z., Mandell I., Moore S., Plastow G., Miller S.P. 2013. Genome-wide association analyses for carcass quality in crossbred beef cattle. BMC Genet. 14, 80. Mai M.D., Sahana G., Christiansen F.B., Guldbrandtsen B. 2010. A genome-wide association study for milk production traits in Danish Jersey cattle using a 50K single nucleotide polymorphism chip. J. Anim. Sci. 88, 3522-8. Meredith B.K., Kearney F.J., Finlay E.K., Bradley D.G., Fahey A.G., Berry D.P., Lynn D.J. 2012. Genome-wide associations for milk production and somatic cell score in HolsteinFriesian cattle in Ireland. BMC Genet. 13, 21. Michelizzi V.N., Dodson M.V., Pan Z., Amaral M.E., Michal J.J., McLean D.J., Womack J.E., Jiang Z. 2010a. Water buffalo genome science comes of age. Int. J. Biol. Sci. 6, 333-49. Michelizzi V.N., Wu X., Dodson M.V., Michal J.J., Zambrano-Varon J., McLean D.J., Jiang Z. 2010b.

A global view of 54,001 single nucleotide polymorphisms (SNPs.

on the

Illumina BovineSNP50 BeadChip and their transferability to water buffalo. Int. J. Biol. Sci. 7, 18-27. Michenet A., Barbat M., Saintilan R., Venot E., Phocas F. 2016. Detection of quantitative trait loci for maternal traits using high-density genotypes of Blonde d'Aquitaine beef cattle. BMC Genet. 17, 88.

22

Miziara M.N., Goldammer T., Stafuzza N.B., Ianella P., Agarwala R., Schaffer A.A., Elliott J.S., Riggs P.K., Womack J.E., Amaral M.E. 2007. A radiation hybrid map of river buffalo (Bubalus bubalis. chromosome 1 (BBU1) . Cytogenet Genome Res. 119, 100-4. Mudadu M.A., Porto-Neto L.R., Mokry F.B., Tizioto P.C., Oliveira P.S., Tullio R.R., Nassu R.T., Niciura S.C., Tholon P., Alencar M.M., Higa R.H., Rosa A.N., Feijo G.L., Ferraz A.L., Silva L.O., Medeiros S.R., Lanna D.P., Nascimento M.L., Chaves A.S., Souza A.R., Packer I.U., Torres R.A., Jr., Siqueira F., Mourao G.B., Coutinho L.L., Reverter A., Regitano L.C. 2016. Genomic structure and marker-derived gene networks for growth and meat quality traits of Brazilian Nelore beef cattle. BMC Genomics 17, 235. Nayeri S., Sargolzaei M., Abo-Ismail M.K., May N., Miller S.P., Schenkel F., Moore S.S., Stothard P. 2016. Genome-wide association for milk production and female fertility traits in Canadian dairy Holstein cattle. BMC Genet 17, 75. Nicolazzi E.L., Iamartino D., Williams J.L. 2014.

AffyPipe: an open-source pipeline for

Affymetrix Axiom genotyping workflow. Bioinformatics 30, 3118-9. Nogales-Cadenas R., Carmona-Saez P., Vazquez M., Vicente C., Yang X., Tirado F., Carazo J.M., Pascual-Montano A. 2009. GeneCodis: interpreting gene lists through enrichment analysis and integration of diverse biological information. Nucleic Acids Res 37, W31722. Pryce J.E., Bolormaa S., Chamberlain A.J., Bowman P.J., Savin K., Goddard M.E., Hayes B.J. 2010. A validated genome-wide association study in 2 dairy cattle breeds for milk production and fertility traits using variable length haplotypes. J. Dairy Sci. 93, 3331-45.

23

Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M.A., Bender D., Maller J., Sklar P., de Bakker P.I., Daly M.J., Sham P.C. 2007. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559-75. Rosati A., Van Vleck L.D. 2002. Estimation of genetic parameters for milk, fat, protein and mozzarella cheese production for the Italian river buffalo Bubalus bubalis population. Livest. Prod. Sci. 74, 185-90. SADS 2009. Sustainable Agricultural Development Strategy towards 2030. p. 194. Agricultural Research and Development Council, Ministry of Agriculture and Land Reclamation, Arab Republic of Egypt. Sahana G., Guldbrandtsen B., Bendixen C., Lund M.S. 2010.

Genome-wide association

mapping for female fertility traits in Danish and Swedish Holstein cattle. Anim. Genet. 41, 579-88. Sahana G., Guldbrandtsen B., Lund M.S. 2011. Genome-wide association study for calving traits in Danish and Swedish Holstein cattle. J. Dairy Sci. 94, 479-86. Sambrook J. 1989. Molecular Cloning: A Laboratory Manual. Cold Spring Harbor Laboratory Press. Schopen G.C., Visker M.H., Koks P.D., Mullaart E., van Arendonk J.A., Bovenhuis H. 2011. Whole-genome association study for milk protein composition in dairy cattle. J. Dairy Sci. 94, 3148-58. Snelling W.M., Allan M.F., Keele J.W., Kuehn L.A., McDaneld T., Smith T.P., Sonstegard T.S., Thallman R.M., Bennett G.L. 2010.

Genome-wide association study of growth in

crossbred beef cattle. J. Anim. Sci. 88, 837-48.

24

Stafuzza N.B., Abbassi H., Grant J.R., Rodrigues-Filho E.A., Ianella P., Kadri S.M., Amarante M.V., Stohard P., Womack J.E., de Leon F.A., Amaral M.E. 2009. Comparative RH maps of the river buffalo and bovine Y chromosomes. Cytogenet. Genome Res. 126, 1328. Stafuzza N.B., Ianella P., Miziara M.N., Agarwala R., Schaffer A.A., Riggs P.K., Womack J.E., Amaral M.E. 2007. Preliminary radiation hybrid map for river buffalo chromosome 6 and comparison to bovine chromosome 3. Anim. Genet. 38, 406-9. Tabas-Madrid D., Nogales-Cadenas R., Pascual-Montano A. 2012.

GeneCodis3: a non-

redundant and modular enrichment analysis tool for functional genomics. Nucleic Acids Res. 40, W478-83. Tantia M.S., Vijh R.K., Bhasin V., Sikka P., Vij P.K., Kataria R.S., Mishrs B.P., Yadav S.P., K. P.A., Sethi R.K., Joshi B.K., Gupta S.C., Pathak K.M.L. 2011. Whole-genome sequence assembly of the water buffalo (Bubalus bubalis) . Indian J. Anim Sci. 81. VanRaden P.M., Wiggans G.R. 1991. Derivation, calculation, and use of national animal model information. J. Dairy Sci. 74, 2737-46. Venturini G.C., Cardoso D.F., Baldi F., Freitas A.C., Aspilcueta-Borquis R.R., Santos D.J., Camargo G.M., Stafuzza N.B., Albuquerque L.G., Tonhati H. 2014.

Association

between single-nucleotide polymorphisms and milk production traits in buffalo. Genet. Mol. Res. 13, 10256-68. Verbeke J., Van Poucke M., Peelman L., Piepers S., De Vliegher S. 2014. Associations between CXCR1 polymorphisms and pathogen-specific incidence rate of clinical mastitis, test-day somatic cell count, and test-day milk yield. J. Dairy Sci. 97, 7927-39.

25

Wilmink J.B.M. 1987. Efficiency of selection for different cumulative milk, fat and protein yields in first lactation. Livest. Prod. Sci. 17, 211-24. Wu J.J., Song L.J., Wu F.J., Liang X.W., Yang B.Z., Wathes D.C., Pollott G.E., Cheng Z., Shi de S., Liu Q.Y., Yang L.G., Zhang S.J. 2013.

Investigation of transferability of

BovineSNP50 BeadChip from cattle to water buffalo for genome wide association study. Mol. Biol. Rep. 40, 743-50. Yang J., Jiang J., Liu X., Wang H., Guo G., Zhang Q., Jiang L. 2016. Differential expression of genes in milk of dairy cattle during lactation. Anim. Genet. 47, 174-80. Yang J., Lee S.H., Goddard M.E., Visscher P.M. 2011. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76-82. Zimin A.M.G., Ferre F., Biagini T., Shroeder S. 2013. The Buffalo Reference Genome Assembly. San Diego: XXI Plant and Animal Genome.

26

Fig 1. Genome-wide association analysis for deviation of average daily milk production in Egyptian buffalo.

Supporting Information S1 Table. Candidate genes located in the window of 1 Mb around SNPs associated with average daily milk yield deviation in Egyptian buffalo. Source: Ensembl 85, Species: Bos_taurus, Assembly: UMD3.1, Convincing SNPs (several SNPs in a narrow chromosomal region) are highlighted in green. S2 Table. Major Pathways associated with identified candidate genes for average daily milk deviations in Egyptian buffalo. NGR: Number of annotated genes in the reference list, TNGR: Total number of genes in the reference list, NG: Number of annotated genes in the input list, TNG: Total number of genes in the input list, Hyp: Hypergeometric pValue, Hyp*: Corrected hypergeometric pValue after FDR.

27

Highlights 

Axiom Buffalo Genotyping Array 90K was applied to Egyptian buffalo for the first time.



Four significant regions were identified associated with milk yield.



The most significant associated region was on chromosome BTA27.



The associated SNPs were located within or close to several candidate genes.



Immune response was ranked at the top of candidate genes function.

28