Whole genome sequencing and bioinformatics analysis of two Egyptian genomes

Whole genome sequencing and bioinformatics analysis of two Egyptian genomes

Gene 668 (2018) 129–134 Contents lists available at ScienceDirect Gene journal homepage: www.elsevier.com/locate/gene Research paper Whole genome ...

598KB Sizes 0 Downloads 47 Views

Gene 668 (2018) 129–134

Contents lists available at ScienceDirect

Gene journal homepage: www.elsevier.com/locate/gene

Research paper

Whole genome sequencing and bioinformatics analysis of two Egyptian genomes

T



Mahmoud ElHefnawia, ,1, Sungwon Jeonb,c,1, Youngjune Bhakb,c, Asmaa ElFikya,d, ⁎⁎ Ahmed Horaiza, JeHoon June,f, Hyunho Kimf, Jong Bhakb,c,e,f, a

Biomedical Informatics and Chemo-Informatics Group, Centre of Excellence for Advanced Sciences (CEAS), and Informatics and Systems Department, National Research Centre, Cairo 12622, Egypt b Korean Genomics Industrialization and Commercialization Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea c Department of Biomedical Engineering, School of Life Sciences, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea. d Environmental and Occupational Medicine Department, National Research Centre, Cairo 12622, Egypt e Personal Genomics Institute, Genome Research Foundation, Cheongju 28160, Republic of Korea f Geromics, Ulsan 44919, Republic of Korea.

A R T I C LE I N FO

A B S T R A C T

Keywords: Whole-genome sequencing Egyptian Variants Human migration Bioinformatics

We report two Egyptian male genomes (EGP1 and EGP2) sequenced at ~ 30× sequencing depths. EGP1 had 4.7 million variants, where 198,877 were novel variants while EGP2 had 209,109 novel variants out of 4.8 million variants. The mitochondrial haplogroup of the two individuals were identified to be H7b1 and L2a1c, respectively. We also identified the Y haplogroup of EGP1 (R1b) and EGP2 (J1a2a1a2 > P58 > FGC11). EGP1 had a mutation in the NADH gene of the mitochondrial genome ND4 (m.11778 G > A) that causes Leber's hereditary optic neuropathy. Some SNPs shared by the two genomes were associated with an increased level of cholesterol and triglycerides, probably related with Egyptians obesity. Comparison of these genomes with African and Western-Asian genomes can provide insights on Egyptian ancestry and genetic history. This resource can be used to further understand genomic diversity and functional classification of variants as well as human migration and evolution across Africa and Western-Asia.

1. Introduction A human genome holds an extensive amount of data on human evolution, diversity, health, physiology, and medicine (Lander et al., 2001). Whole genome sequencing (WGS) data can be used for the deepest possible genetic analyses for various purposes such as common and rare disorder association studies. Genomes and their diverse variation information can also be used effectively for estimating risk factors of common diseases (Bick & Dimmock, 2011; Thompson et al., 2012). Currently, massively-parallel next-generation sequencing (NGS) methods are the most widely used method for analyzing the whole human genomes. Programs to map short reads of a genome and to call the subsequent variations are being rapidly improved and upgraded

(Lupski et al., 2010). In addition, the cost of analyzing a genome has become very low and WGS is becoming more common in detecting uncommon, disease-causing variants by scrutinizing affected people's genomes (Lupski et al., 2010; Sobreira et al., 2010; Roach et al., 2010). For example, it can be useful for screening women who have BRCA1 and BRCA2 genes mutations to assess the risk of breast and ovarian cancers (Campeau et al., 2008). The Egyptian population is diverse due to its position between Africa and Asia. It has two long banks along the Nile River, which is the longest African River, and has hosted various populations throughout history. Ancient Egyptian traditions, such as mummification, play an important role in preserving genomes and subsequent analysis of DNA variants (Paabo, 1985). Egyptian DNA have been studied for a long time

Abbreviations: EGP, Egyptian person; HQ, High quality; LD, Linkage disequilibrium; LHON, Leber's hereditary optic neuropathy; mtDNA, Mitochondrial DNA; NGS, Next generation sequencing; rCRS, revised Cambridge reference sequence; SNP, Single nucleotide polymorphism; WGS, Whole genome sequencing; Y-STR, short tandem repeat (STR) on the Y-chromosome ⁎ Corresponding author. ⁎⁎ Correspondence to: J. Bhak, Korean Genomics Industrialization and Commercialization Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea. E-mail addresses: [email protected] (M. ElHefnawi), [email protected] (J. Bhak). 1 Equal contributors. https://doi.org/10.1016/j.gene.2018.05.048 Received 28 March 2018; Accepted 13 May 2018 0378-1119/ © 2018 Published by Elsevier B.V.

Gene 668 (2018) 129–134

M. ElHefnawi et al.

(Hawass et al., 2010). Research on the DNA of modern North Africans revealed that their gene frequencies are central to those of southern Europe, the Near East, and Sub Saharan Africa (Cavalli-Sforza et al., 1994). However, the frequency distribution of the non-recombining portion of the Y chromosome of modern Egyptian population is extremely similar to those of the middle northern African population, implying a more extensive portion of Eurasian genetic components (Cavalli-Sforza et al., 1994; Bosch et al., 1997; Manni et al., 2002; Arredi et al., 2004; Luis et al., 2004). Pagani et al. showed, after analyzing 225 Egyptians and Ethiopians, that the correct root out of Africa is the northern one through Sinai to Eurasia (Pagani et al., 2015). Khairat, Ball et al., found that one of the mummified Egyptian people pertained to the L2 mtDNA haplogroup (Li & Durbin, 2009), a maternal clade that is accepted to have origins from Western Asia. Here, we report the analyses of two male Egyptian whole genomes at high sequencing depth. A systematic genomic analysis, including the analysis of functional and deleterious mutations, was conducted and we provided the genomic structure and phylogenetic tree in comparison with populations in Africa and the Middle-East.

filtered reads onto the hg19 human reference using BWA-MEM 0.7.8 (Li & Durbin, 2009) with the default option. SAM files were then restored to the BAM file using Samtools 0.1.19 (Li et al., 2009a). To remove PCR duplicated reads, MarkDuplicate subroutine in Picard v1.9.2 (http:// broadinstitute.github.io/picard/) was used. We also conducted IndelRealigner and BaseRecalibration using GATK v2.3.9 (McKenna et al., 2010a) in order to increase the accuracy of variants calling. The variants were called by GATK UnifiedGenotyper with ‘–heterozygosity 0.0010 -dcov 200 -stand_call_conf 30.0 -stand_emit_conf 30.0’ options.

2. Material and method

We first merged Affymetrix human origin single nucleotide polymorphism (SNP) panel (HOSP) data (Lazaridis et al., 2014) with the two Egyptian genomes using PLINK 1.90 (Purcell et al., 2007), generating 591,356 autosomal SNPs. We pruned the panel with linkage disequilibrium (LD). The final dataset included 289,287 SNPs. We ran ADMIXTURE 1.3.0 (Alexander et al., 2009) with default cross-validation and used the number of ancestral population K value from 2 to 10. To construct a phylogenetic tree across populations from Africa and the Middle-east, we calculated a pairwise nucleotide distance with the same SNP panel for ADMIXTURE. We then constructed a neighbor-joining tree with pairwise pi distance (Nei & Li, 1979).

2.4. Annotation and functional analysis of variants We annotated the type and genomic regions of variants using snpEff v4.3i (Cingolani et al., 2012). To predict mutations which possibly make function altering amino acid changes, PROVEAN was used (Choi & Chan, 2015). These protein-damaging mutations were further annotated with OMIM (McKusick, 2007) and ClinVar databases (Landrum et al., 2016). 2.5. Construction of phylogenetic tree and ADMIXTURE analysis

2.1. DNA extraction and ethical approval Blood samples were extracted from two healthy people whose parents originate from the Delta (North of Egypt) and Saied (South of Egypt), respectively. This study was approved by Institutional Review Board at Genome Research Foundation with IRB-REC-2011-10-003 and the written consent was signed by the participants. 2.2. Sample preparation and whole genome sequencing Genomic DNA was extracted from blood with the Gene JETBlood genomic DNA purification Kit (Thermo Scientific, USA), according to manufacturer's protocol. A library of 400–500 bp insert size was created. The genomic DNA was sheared utilizing Covaris S series (Covaris, MS, USA). The sheared DNA was end-repaired A-tailed, and ligated to paired-end adapters, according to the manufacturer's protocol (Truseq DNA Sample Prep Kit v2, Illumina, San Diego, CA, USA). Adapter-ligated fragments were then size selected on a 2% Agarose gel, with the 520–620 bp band being removed. Gel extraction and column purification was executed by applying the Minelute Gel Extraction Kit (Qiagen), following the manufacturer's protocol. The ligated DNA parts which contained adapter sequences were enhanced via PCR using adapter specific primers. Library quality and concentration were resolved using theAgilent 2100 BioAnalyzer. The libraries were evaluated utilizing a KAPA library quantification kit (KapaBiosystems, MA, USA), as indicated by Illumina's library quantification protocol. According to the qPCR quantification, the libraries were standardized to 2 nM and after that denatured using 0.1 N NaOH. Cluster amplification of denatured templates was completed in flow cells, according to the manufacturer's protocol (Illumina). Flow cells were paired-end sequenced (2 × 100 bp) on an Illumina HiSeq2000 machine. In order to process the raw fluorescent images and the called sequences, the base-calling pipeline (Sequencing Control Software (SCS), Illumina) was applied. The rest of our analysis was initiated from the FASTQ files maintained by Illumina's downstream analysis CASAVA software suite. The raw data can be accessed at NCBI SRA, with accession number SRR5738871 and SRR5738872.

2.6. Identification of mitochondrial DNA and Y chromosome haplotype Before identifying the mitochondrial haplogroup, we extracted the short sequencing reads which were mapped to chrM of hg19 reference using in-house scripts. We then mapped the reads to rCRS mtDNA reference using BWA-MEM (Li & Durbin, 2009) and have generated consensus mtDNA sequences for the samples using samtools (Li et al., 2009a). The mitochondrial haplogroup of each sample was identified by MitoTool (Fan & Yao, 2013). The haplogroup of Y chromosome was identified by using Nevgen predictor. 3. Results and discussion 3.1. The donors EGP1’s clinical history shows bilateral visual loss, worse on the left consistent with Leber's hereditary optic neuropathy (LHON), a mitochondrial disorder which disturbs the optic nerves specifically. A visual field examination (Table 1), MRI scan, and genetic testing all confirmed the diagnosis of LHON. Additionally, there was a family history of cardiovascular and pulmonary diseases in EGP2. Table 1 Clinical data of ocular examination of EGP1. Right eye Vision (cc) Color Refraction Pupils

2.3. Alignment of reads to reference and variants detection The NGSQC toolkit v 2.3.3 (Patel & Jain, 2012) was applied to filter low quality reads with an ‘-l 70 – s 20’ options (cutoff read length for HQ = 70%, cutoff quality score = 20). Subsequently, we aligned the

Ocular motility

130

Left eye

20/400 2/300 RE 45/12 LE 2.5/12 −4.5 sphere −4.5 sphere Sluggishly reactive, There is probably a left relative afferent pupillary defect Ductions and Versions: Full There is no internuclearophthalmologia

Gene 668 (2018) 129–134

M. ElHefnawi et al.

2134 and 2093 non-mitochondrial nsSNPs in EGP1 and EGP2, respectively, as being classified as functionally damaging on proteins (S2 Table). We identified nsSNPs and annotated these nsSNPs using ClinVar to find their clinical relevance. This analysis revealed several nsSNPs that were found in two Egyptian subjects. For EGP1, SNP (rs3775291) Leu412Phe in TLR3 gene was detected, which is susceptibility to herpes simplex encephalitis. Further, Pro187Ser in NQO1 (rs1800566) was associated with an expanded danger of hematotoxicity after exposure to benzene and susceptibility to different types of cancer (Smith, 1999; Traver et al., 1997). EGP2 had an SNP (rs1801394) that is known as A66G or Ile22Met in the methionine synthase (MTRR) gene. This substitution has been related with an increased risk for neural tube defect spina bifida and the lack of serum vitamin B12 expanded this effect (Wilson et al., 1999). We found 998 function-altering nsSNPs were shared by two Egyptians and 16 nsSNPs were associated with several phenotypes (S3 Table). One of the SNP is rs181274, His166Arg, in FCGR2A, associated with susceptibility to Malaria Lupus nephritis; Met580Thr in ATP6V0A4 (rs380715), associated with renal tubular acidosis (Smith et al., 2000). Arg197Gln in NAT2 (rs1799930), which is associated with slow acetylation (Vatsis et al., 1991), was also found. It has been known that the variation in NAT2 is linked to capacity of acetylation and has genetic diversity across populations (Magalon et al., 2008). About 80% of Egyptians have a slow acetylation (Ma et al., 2002). We also found Ala222Val in MTHFR (rs1801133, C > T), which is related with hyperhomocysteinemia. Africans commonly had the CC genotype and the TT genotype was prevalent in southerly direction of Europe (Wilcken et al., 2003) and Pathans had the same mutation (Ilyas et al., 2015). In addition, this mutation is associated with the risk of colorectal cancer in Egyptians (El Awady et al., 2009). It has been reported that Egyptians have above average BMI and a major proportion are obese (Ng et al., 2014). Thr55Ala in FABP2 (rs1799883) and Asn985Tyr in RP1 (rs2293869) were identified and they all had a link to increased concentrations of cholesterol, triglycerides, and probably obesity (Georgopoulos et al., 2000; Fujita et al., 2003; Subramanian & Chait, 2012). Ser192Tyr in TYR (rs1042602, C > A) was related to skin/hair/eye color and the absence of freckling (Sulem et al., 2007; Stokowski et al., 2007). The A allele has been subject to a positive selection in European populations (Sulem et al., 2007). In addition, two SNPs were annotated as risk factors: Pro124Leu in PLAU (rs2227564), susceptibility to Alzheimer disease and Asn248Ser in TLR1 (rs4833095), associated with leprosy5 (Schuring et al., 2009). We also identified nine amino acid changes on a mitochondrial peptide from two Egyptian samples as seen in Table 4. One and two function-altering amino acid changes were predicted in EGP2 and EGP1, respectively. This is especially for EGP1 had Arg340His in ND4, (m.11778G > A) which was linked to Leber's hereditary optic neuropathy (LHON) (Singh et al., 1989).

Table 2 Summary of whole genome sequencing and mapping. Sample ID

EGP1

EGP2

Number of generated reads Number of remained reads after filtering Number of reads after removing the duplicates Number of mapped reads Mapping rate Average depth (×)

1,072,420,548 946,144,204 886,322,035 885,930,166 99.96% 30.4×

1,048,815,766 926,701,466 867,470,926 867,073,418 99.95% 29.7×

Table 3 Statistics of variants from each sample. Sample ID

EGP1

EGP2

Number of homozygous SNPs Number of heterozygous SNPs Total number of SNPs Number of homozygous INDELs Number of heterozygous INDELs Total number of INDELs Number of known variants

1,486,562 2,702,312 4,188,874 223,160 342,591 565,751 4,555,748 (95.82%) 4,754,625

1,473,578 2,815,594 4,289,172 219,551 355,587 575,138 4,655,201 (95.70%) 4,864,310

Number of total variants

3.2. Genome sequencing and variants identification Genomic DNA was sequenced by Hiseq 2000 platforms (Illumina, San Diego, CA, USA), generating 1,072,420,548 and 1,048,815,766 reads for EGP1 and EGP2, respectively. We filtered out low-quality reads using NGSQC toolkit (Patel & Jain, 2012). A total of 94.62 Gb and 92.68 Gb of reads were remained and aligned to the human reference hg19 utilizing BWA-MEM (Li & Durbin, 2009), respectively. Total 885,930,166 (99.96%) and 867,073,418 (99.95%) reads were mapped to the reference for EGP1 and EGP2, respectively, resulting in over 29× depth (Table 2). We identified a total of 4,754,625 variants and 4,864,310 variants in EGP1 and EGP2, respectively. 4,555,748 (95.82%, EGP1) and 4,655,201 (95.70%, EGP2) variants were already reported in dbSNP database and considering as a reference. We also identified 4,188,874 SNPs and 4,289,172 SNPs in EGP1 and EGP2. There were 1,486,562 homozygous SNPs and 2,702,312 heterozygous SNPs in EGP1, while EGP2 had 1,473,578 homozygous SNPs and 2,815,594 heterozygous SNPs, as show in Table 3. We annotated the effects of variants using snpEff (Cingolani et al., 2012), identifying a total of 112,637 effects and 114,964 effects on the transcripts for EGP1 and EGP2, respectively (S1 Table).

3.3. Functional classification and clinical relevance of variants To identify possible functional effect of non-synonymous SNPs (nsSNPs) in the Egyptian genomes, we used the computational prediction methods, the PROVEAN tool (Choi & Chan, 2015). We identified Table 4 Results of damaging prediction of amino acid change in mitochondrial gene. Sample

PROTEIN_ID

POS

REF

ALT

Score

Prediction (cutoff = −2.5)

Gene symbol

EGP1

ENSP00000354554 ENSP00000354632 ENSP00000354961 ENSP00000354499 ENSP00000354554 ENSP00000354554 ENSP00000354632 ENSP00000354632 ENSP00000355206

194 112 340 254 7 194 59 112 114

T T R I T T T T T

A A H V I A A A A

0.31 −3.97 −4.74 −0.44 −2.35 0.31 −0.94 −3.97 −1.39

Neutral Deleterious Deleterious Neutral Neutral Neutral Neutral Deleterious Neutral

MT-CYB MT-ATP6 MT-ND4 MT-CO1 MT-CYB MT-CYB MT-ATP6 MT-ATP6 MT-ND3

EGP2

131

Gene 668 (2018) 129–134

M. ElHefnawi et al.

Hadza Ju_hoan_North Ju_hoan_South Naro Taa_North Gui Taa_West Taa_East Hoan Xuun Khomani Nama Gana Tshwa Haiom Mbuti Biaka Esan Yoruba Mandenka Mende Gambian Wambo Damara Himba BantuKenya Luo Luhya BantuSA Tswana Dinka Kikuyu Kgalagadi Khwe Shua Masai Sandawe Somali Oromo Saharawi Mozabite Ethiopian_Jew Algerian Tunisian Moroccan_Jew Tunisian_Jew Libyan_Jew EGP2 Egyptian EGP1 BedouinB Saudi Yemenite_Jew Turkish Iraqi_Jew Cypriot Druze Turkish_Jew Lebanese Syrian Jordanian Palestinian BedouinA Yemen Georgian Abkhasian Armenian Georgian_Jew Adygei Balkar Chechen Kumyk Lezgin North_Ossetian Nogai

1.0 0.8 0.6 0.4 0.2 0.0

Fig. 1. ADMIXTURE results at K = 7 of Egyptian individuals together with several populations in Africa and Middle-East.

tree construction including two Egyptians from this study.

3.4. ADMIXTURE and phylogenetic analysis To investigate genomic representative of our two Egyptians, we performed an ADMIXTURE analysis together with African and the Middle East samples in the human origin SNP panel (Lazaridis et al., 2014) (HOSP) (from K = 2 to K = 10, S1 Figure). Our Egyptian genomes had similar ADMIXTURE configurations with Egyptian in HOSP. We also compared the Egyptians and other populations in Africa and the Middle-East. The two Egyptian genomes had a mixed configuration of the Middle-East and Africa (North and West) at K = 7(Fig. 1). Interestingly, Egyptians had the Middle-Eastern composition as a major at K = 2 to K = 4, indicating its origin is from outside of Africa. We also calculated pairwise nucleotide diversity based on SNPs in HOSP and constructed a phylogenetic tree of the two Egyptians together with several populations from Africa and the Middle-East which we have used in the ADMIXTURE analysis (Fig. 2). Both EGP1 and EGP2 were located as a sister group of the Middle-East. This confirms that Egyptians have much larger Eurasian components as previously suggested (Cavalli-Sforza et al., 1994; Bosch et al., 1997; Manni et al., 2002; Arredi et al., 2004; Luis et al., 2004). (A) Phylogenetic tree based on pairwise nucleotide distance and (B) geographical location of samples used in ADMIXTURE and Phylogenetic

3.5. Mitochondria and Y-chromosome analysis Sequencing Mitochondrial DNA (mtDNA) is widely used to comprehend the maternal migration and heredity history of human populaces (Cann et al., 1987). We generated the consensus sequences of mtDNA of the two Egyptian genomes using SAMtools (Li et al., 2009a). We identified the mitochondrial haplogroup of each sample using MitoTool (Fan & Yao, 2013) acquiring the result that EGP1 had an H7b1 while EGP2 had an L2a1c (Table 5). H7 was mostly found in the near East, Caucasus, Iran, Central Asia and Balto-Slavic countries (Costa et al., 2013) and H2, H5, H7, H13, and H20 all of which have been found in early Neolithic populations from Europe and are additionally found all through the Middle East at present time (van Oven & Kayser, 2009). Haplogroup L2 is presented in around 33% of Africans and in their current progenies (Gonder et al., 2007). L2a1c regularly shares the mutation 16,189 with L2a1b, however, has its own particular markers at 3010 and 6663. Alteration in 16,192 is likewise basic in L2a1b and L2a1c; it shows up in Southeastern Africa and also East Africa. This recommends some broadening of this clade in situ. Positions T16209C C16301T C16354T over L2a1 characterize a small sub-clade, named

A

B AF

AF

F _A

AF

A_

Ba

ub

a_

AF



AF

_ an





AF

Me

AF

nd

H

ka_ AF za_ AF Kiku yu_A F Sand awe_ AF

A e_

F

a_ imb Dam

Had

Latitude

uS

Ts

AF

a_

ua

ka_

we _

AF wa n

F

nt

_A

F

Kh



Es

F



30

r Yo

F





Sh

Mbu

ti_A

F

●● ● ● ●

Bia

_AF

gadi_

Haiom

Kgala

Tshwa_AF

i_AF

Gana_AF

Khoman

F

F t_A

n_A

_AF

Hoa

Xuun

i_A

F _A

_A

Gu

r th

th

_A

o_

hy a_

ro

Din

or

AF

th

Lu Lu

No

_N

ou

Na

_S

t_

an

an

es _W

ho

a Ta

Ju _

ho

Eas

_ Taa

a_ Ta

Ju _

Nama_AF

60

AF

ara





0 ●

_AF

F bo_A Wam F nka_A Mande _AF Gambian

Masai_AF



BantuKenya_A



● ●● ● ●



−30

F

Oromo_AF Nor th_Ossetian

Somali_AF

Nogai_N

w_AF Ethiopian_Je i_AF Saharaw

Balka

F n_A

P2 EG

F

_A

P1

Druze_WA

_WA

_WA Lebanese

n_W A

ania

Syrian

_W A

_W A

inA

nia n

Pale sti

Jord

B_ W A

i_W A

dou

uin

ud Sa

ite en

Cypriot_WA

Turkish_Jew

Be

EG

W A

en

_J ew _

m Ye

A _W

SC n_ ia as A kh _W Ab SC ish n_ rk Tu nia A e m _W w Ar SC Je w_ qi_ _Je Ira ian F _A org ew Ge n_J isia Tun _AF Jew an_ Liby _AF _Jew ccan Moro _WA

y Eg

an pti

do

ia

Ye m

is Tun

F n_A

Be

A

ia lger

r_NC

Ady gei_ NC Kum yk_ NC Ch ec he n_ Le NC zg in_ Ge NC or gia n_ SC

F

ite_A

ab Moz

_NC 0

C ● Abkhasian_SC ● Adygei_NC



EGP1 ● EGP2

Algerian_AF

Egyptian_AF

Armenian_SC

Esan_AF

Balkar_NC

Ethiopian_Jew_AF

BantuKenya_AF

Gambian_AF

● BantuSA_AF

Samples ● BedouinA_WA

Gana_AF

90

Jordanian_WA

Syrian_WA

Mbuti_AF

● Ju_hoan_North_AF ● Mende_AF

Taa_East_AF

● Ju_hoan_South_AF

Moroccan_Jew_AF

Taa_North_AF

Kgalagadi_AF

Mozabite_AF

Taa_West_AF

Khomani_AF

Nama_AF

Tshwa_AF

● Khwe_AF

Kikuyu_AF

● Georgian_Jew_SC ● Kumyk_NC

● Naro_AF

Tswana_AF Tunisian_AF

Nogai_NC

North_Ossetian_NC ● Tunisian_Jew_AF ● Oromo_AF

BedouinB_WA

Georgian_SC

Lebanese_WA

Biaka_AF

Gui_AF

Lezgin_NC

Palestinian_WA

Chechen_NC

Hadza_AF

Libyan_Jew_AF

Saharawi_AF

Turkish_Jew_WA ● Turkish_WA

Wambo_AF

Cypriot_WA

Haiom_AF

Luhya_AF

● Sandawe_AF



● Damara_AF

Himba_AF

Luo_AF

● Saudi_WA

● Yemen_WA

Dinka_AF

Hoan_AF

Mandenka_AF

Druze_WA

Iraqi_Jew_WA

Masai_AF

Fig. 2. Phylogenetic relationship based on pairwise nucleotide distance. 132

45

Longitude

Shua_AF ● Somali_AF



Xuun_AF

Yemenite_Jew_WA Yoruba_AF

Gene 668 (2018) 129–134

M. ElHefnawi et al.

Methodology: AE, JJ. Project Administration: JB. Resources: ME, AE, AH. Software: SJ, YB, AE, JJ, HK. Supervision: JB. Validation: SJ, YB. Visualization: SJ. Writing – Original Draft Preparation: ME, SJ. Writing – Review & Editing: SJ, YB, AE, AH, JB.

Table 5 Mitochondrial haplotype and variants of two Egyptian genomes. Sample ID

mtDNA Haplotype

Variants

EGP1

H7b1

EGP2

L2a1c

263, 309 + C, 315 + C, 750, 1438, 4769, 4793, 5348, 8860, 11,778,12,351, 15,326, 16183C, 16,189, 16,193 + C, 16519 73, 143, 146, 151, 152, 195, 263, 309 + C, 315 + C, 748, 750, 769, 1018,1438, 2416, 2706, 2789, 3010, 3594, 4104, 4769, 6663, 7028, 7175, 7256, 7274, 7521, 7771, 8206, 8701, 8860, 9221, 9540, 9950, 10,115, 10,398, 10,873, 11,719, 11,914, 11,944, 12,693, 12,705, 13,590, 13,650, 13,803, 14,566, 14,766, 15,208, 15,301, 15,326, 15,784, 16183C, 16,189, 16,193 + C, 16,223, 16,278, 16,294, 16,309, 16,390, 16,519

Acknowledgments This work is done under the umbrella of the pan Asian population genomics initiative (PAPGI) consortium (http://papgi.org). It aims to study variations in pan Asian populations, to understand how these variations affect these populations and their evolution. This work was supported by the Genome Korea Project in Ulsan (800 genome sequencing) Research Fund (1.180017.01) of UNIST (Ulsan National Institute of Science & Technology) and the Genome Korea Project in Ulsan (200 genome sequencing) Research Fund (1.180024.01) of UNIST (Ulsan National Institute of Science & Technology). J. B., S. J. were supported by Human Resource Development (HRD) for Personal Genome Informatics research fund (NRF-2017M3C9A6047623).

L2a1c by (Salas et al., 2002; Kivisild et al., 2004), which for the most part show up in East Africa (e.g. Sudan, Nubia, Ethiopia) and West Africa (e.g. Turkana, Kanuri). In the Chad Basin, four diverse L2a1c composes, a couple mutational strides from the East and West African haplotypes, were distinguished (Kivisild et al., 2004; Cerny et al., 2007). For the Y-chromosome haplogroup, a short tandem repeat analysis was carried out using Y-STR toolkit (Li et al., 2009b; McKenna et al., 2010b) that identified STR markers according to FTDNA 111 Y-STR markers and the resulting markers were used in Y-haplogroup predictor (Y-DNA haplogroup Predictor – NEVGEN). EGP1 had an R1b haplotype that believed to be the most frequently occurring in Western Europe (Myres et al., 2011), as well as exposed at moderate frequencies all over Eastern Europe, Western Asia, and some regions of North Africa and Central Asia (Herrera et al., 2012). EGP2 had J1a2a1a2 > P58 > FGC11. Haplotype J1 is for the most frequently found in Caucasia, Mesopotamia, Levant and Arabian Peninsula. J1-FGC11 is the most widely recognized heredity among Bedouins and other Semitic individuals, for example, Jewish and Arabs. These may be correlated with the animal husbandry culture as opposed to cultivation since it is predominantly observed in infertile soils, for example, in Arabia, Yemen, Ethiopia and so on (Arredi et al., 2004; Abu-Amero et al., 2009).

References Abu-Amero, K.K., Hellani, A., Gonzalez, A.M., Larruga, J.M., Cabrera, V.M., et al., 2009. Saudi Arabian Y-chromosome diversity and its relationship with nearby regions. BMC Genet. 10, 59. Alexander, D.H., Novembre, J., Lange, K., 2009. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664. Arredi, B., Poloni, E.S., Paracchini, S., Zerjal, T., Fathallah, D.M., et al., 2004. A predominantly neolithic origin for Y-chromosomal DNA variation in North Africa. Am. J. Hum. Genet. 75, 338–345. Bick, D., Dimmock, D., 2011. Whole exome and whole genome sequencing. Curr. Opin. Pediatr. 23, 594–600. Bosch, E., Calafell, F., Perez-Lezaun, A., Comas, D., Mateu, E., et al., 1997. Population history of North Africa: evidence from classical genetic markers. Hum. Biol. 69, 295–311. Campeau, P.M., Foulkes, W.D., Tischkowitz, M.D., 2008. Hereditary breast cancer: new genetic developments, new therapeutic avenues. Hum. Genet. 124, 31–42. Cann, R.L., Stoneking, M., Wilson, A.C., 1987. Mitochondrial DNA and human evolution. Nature 325, 31–36. Cavalli-Sforza, L.L., Menozzi, P., Piazza, A., 1994. The History and Geography of Human Genes. xi, 541 Princeton University Press, Princeton, N.J (518 pp). Cerny, V., Salas, A., Hajek, M., Zaloudkova, M., Brdicka, R., 2007. A bidirectional corridor in the Sahel-Sudan belt and the distinctive features of the Chad Basin populations: a history revealed by the mitochondrial DNA genome. Ann. Hum. Genet. 71, 433–452. Choi, Y., Chan, A.P., 2015. PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics 31, 2745–2747. Cingolani, P., Platts, A., Wang le, L., Coon, M., Nguyen, T., et al., 2012. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6, 80–92 (Austin). Costa, M.D., Pereira, J.B., Pala, M., Fernandes, V., Olivieri, A., et al., 2013. A substantial prehistoric European ancestry amongst Ashkenazi maternal lineages. Nat. Commun. 4, 2543. El Awady, M.K., Karim, A.M., Hanna, L.S., El Husseiny, L.A., El Sahar, M., et al., 2009. Methylenetetrahydrofolate reductase gene polymorphisms and the risk of colorectal carcinoma in a sample of Egyptian individuals. Cancer Biomark. 5, 233–240. Fan, L., Yao, Y.G., 2013. An update to MitoTool: using a new scoring system for faster mtDNA haplogroup determination. Mitochondrion 13, 360–363. Fujita, Y., Ezura, Y., Emi, M., Ono, S., Takada, D., et al., 2003. Hypertriglyceridemia associated with amino acid variation Asn985Tyr of the RP1 gene. J. Hum. Genet. 48, 305–308. Georgopoulos, A., Aras, O., Tsai, M.Y., 2000. Codon-54 polymorphism of the fatty acidbinding protein 2 gene is associated with elevation of fasting and postprandial triglyceride in type 2 diabetes. J. Clin. Endocrinol. Metab. 85, 3155–3160. Gonder, M.K., Mortensen, H.M., Reed, F.A., de Sousa, A., Tishkoff, S.A., 2007. WholemtDNA genome sequence analysis of ancient African lineages. Mol Biol Evol 24, 757–768. Hawass, Z., Gad, Y.Z., Ismail, S., Khairat, R., Fathalla, D., et al., 2010. Ancestry and pathology in king Tutankhamun's family. JAMA 303, 638–647. Herrera, K.J., Lowery, R.K., Hadden, L., Calderon, S., Chiou, C., et al., 2012. Neolithic patrilineal signals indicate that the Armenian plateau was repopulated by agriculturalists. Eur. J. Hum. Genet. 20, 313–320. Ilyas, M., Kim, J.S., Cooper, J., Shin, Y.A., Kim, H.M., et al., 2015. Whole genome sequencing of an ethnic Pathan (Pakhtun) from the north-west of Pakistan. BMC

4. Conclusion We present the whole genomes of two Egyptian individuals from the Delta (North of Egypt, EGP1) and Saied (South of Egypt, EGP2), respectively. Our analysis provides resourceful data and information of the Egyptian genome heterogeneity and functional characterization of some variants. We also provide phylogenetic information of two Egyptians compared with several populations from Africa and MiddleEast, suggesting their genetic history is mixed between middle easterners and north and east Africans. This may shed some light on early human migration and current population diversity. A larger scale study of Egyptian genomes is needed in the future to map their complete genetic make-up in a much finer scale and frequencies. Supplementary data to this article can be found online at https:// doi.org/10.1016/j.gene.2018.05.048. Author contributions Conceptualization: ME, JB. Data Curation: JJ, HK. Formal analysis: SJ, YB. Funding Acquisition: JB. Investigation: SJ, YB, AE. 133

Gene 668 (2018) 129–134

M. ElHefnawi et al.

Paabo, S., 1985. Molecular cloning of ancient Egyptian mummy DNA. Nature 314, 644–645. Pagani, L., Schiffels, S., Gurdasani, D., Danecek, P., Scally, A., et al., 2015. Tracing the route of modern humans out of Africa by using 225 human genome sequences from Ethiopians and Egyptians. Am. J. Hum. Genet. 96, 986–991. Patel, R.K., Jain, M., 2012. NGS QC toolkit: a toolkit for quality control of next generation sequencing data. PLoS One 7, e30619. Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A., et al., 2007. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575. Roach, J.C., Glusman, G., Smit, A.F., Huff, C.D., Hubley, R., et al., 2010. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328, 636–639. Salas, A., Richards, M., De la Fe, T., Lareu, M.V., Sobrino, B., et al., 2002. The making of the African mtDNA landscape. Am. J. Hum. Genet. 71, 1082–1111. Schuring, R.P., Hamann, L., Faber, W.R., Pahan, D., Richardus, J.H., et al., 2009. Polymorphism N248S in the human toll-like receptor 1 gene is related to leprosy and leprosy reactions. J. Infect. Dis. 199, 1816–1819. Singh, G., Lott, M.T., Wallace, D.C., 1989. A mitochondrial DNA mutation as a cause of Leber's hereditary optic neuropathy. N. Engl. J. Med. 320, 1300–1305. Smith, M.T., 1999. Benzene, NQO1, and genetic susceptibility to cancer. Proc. Natl. Acad. Sci. 96, 7624–7626. Smith, A.N., Skaug, J., Choate, K.A., Nayir, A., Bakkaloglu, A., et al., 2000. Mutations in ATP6N1B, encoding a new kidney vacuolar proton pump 116-kD subunit, cause recessive distal renal tubular acidosis with preserved hearing. Nat. Genet. 26, 71–75. Sobreira, N.L., Cirulli, E.T., Avramopoulos, D., Wohler, E., Oswald, G.L., et al., 2010. Whole-genome sequencing of a single proband together with linkage analysis identifies a Mendelian disease gene. PLoS Genet. 6, e1000991. Stokowski, R.P., Pant, P.V., Dadd, T., Fereday, A., Hinds, D.A., et al., 2007. A genomewide association study of skin pigmentation in a South Asian population. Am. J. Hum. Genet. 81, 1119–1132. Subramanian, S., Chait, A., 2012. Hypertriglyceridemia secondary to obesity and diabetes. Biochim. Biophys. Acta 1821, 819–825. Sulem, P., Gudbjartsson, D.F., Stacey, S.N., Helgason, A., Rafnar, T., et al., 2007. Genetic determinants of hair, eye and skin pigmentation in Europeans. Nat. Genet. 39, 1443–1452. Thompson, R., Drew, C.J., Thomas, R.H., 2012. Next generation sequencing in the clinical domain: clinical advantages, practical, and ethical challenges. Adv. Protein Chem. Struct. Biol. 89, 27–63. Traver, R., Siegel, D., Beall, H., Phillips, R.M., Gibson, N., et al., 1997. Characterization of a polymorphism in NAD (P) H: quinone oxidoreductase (DT-diaphorase). Br. J. Cancer 75, 69–75. Vatsis, K.P., Martell, K.J., Weber, W.W., 1991. Diverse point mutations in the human gene for polymorphic N-acetyltransferase. Proc. Natl. Acad. Sci. U. S. A. 88, 6333–6337. Wilcken, B., Bamforth, F., Li, Z., Zhu, H., Ritvanen, A., et al., 2003. Geographical and ethnic variation of the 677C > T allele of 5,10 methylenetetrahydrofolate reductase (MTHFR): findings from over 7000 newborns from 16 areas world wide. J. Med. Genet. 40, 619–625. Wilson, A., Platt, R., Wu, Q., Leclerc, D., Christensen, B., et al., 1999. A common variant in methionine synthase reductase combined with low cobalamin (vitamin B 12) increases risk for spina bifida. Mol. Genet. Metab. 67, 317–323.

Genomics 16, 172. Kivisild, T., Reidla, M., Metspalu, E., Rosa, A., Brehm, A., et al., 2004. Ethiopian mitochondrial DNA heritage: tracking gene flow across and around the gate of tears. Am. J. Hum. Genet. 75, 752–770. Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., et al., 2001. Initial sequencing and analysis of the human genome. Nature 409, 860–921. Landrum, M.J., Lee, J.M., Benson, M., Brown, G., Chao, C., et al., 2016. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 44, D862–868. Lazaridis, I., Patterson, N., Mittnik, A., Renaud, G., Mallick, S., et al., 2014. Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature 513, 409–413. Li, H., Durbin, R., 2009. Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics 25, 1754–1760. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., et al., 2009a. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., et al., 2009b. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079. Luis, J.R., Rowold, D.J., Regueiro, M., Caeiro, B., Cinnioglu, C., et al., 2004. The Levant versus the horn of Africa: evidence for bidirectional corridors of human migrations. Am. J. Hum. Genet. 74, 532–544. Lupski, J.R., Reid, J.G., Gonzaga-Jauregui, C., Rio Deiros, D., Chen, D.C., et al., 2010. Whole-genome sequencing in a patient with Charcot-Marie-tooth neuropathy. N. Engl. J. Med. 362, 1181–1191. Ma, M.K., Woo, M.H., McLeod, H.L., 2002. Genetic basis of drug metabolism. Am. J. Health Syst. Pharm. 59, 2061–2069. Magalon, H., Patin, E., Austerlitz, F., Hegay, T., Aldashev, A., et al., 2008. Population genetic diversity of the NAT2 gene supports a role of acetylation in human adaptation to farming in Central Asia. Eur. J. Hum. Genet. 16, 243–251. Manni, F., Leonardi, P., Barakat, A., Rouba, H., Heyer, E., et al., 2002. Y-chromosome analysis in Egypt suggests a genetic regional continuity in Northeastern Africa. Hum. Biol. 74, 645–658. McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., et al., 2010a. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303. McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., et al., 2010b. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303. McKusick, V.A., 2007. Mendelian inheritance in man and its online version, OMIM. Am. J. Hum. Genet. 80, 588–604. Myres, N.M., Rootsi, S., Lin, A.A., Jarve, M., King, R.J., et al., 2011. A major Y-chromosome haplogroup R1b Holocene era founder effect in central and Western Europe. Eur. J. Hum. Genet. 19, 95–101. Nei, M., Li, W.H., 1979. Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc. Natl. Acad. Sci. U. S. A. 76, 5269–5273. Ng, M., Fleming, T., Robinson, M., Thomson, B., Graetz, N., et al., 2014. Global, regional, and national prevalence of overweight and obesity in children and adults during 1980-2013: a systematic analysis for the global burden of disease study 2013. Lancet 384, 766–781. van Oven, M., Kayser, M., 2009. Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation. Hum. Mutat. 30, E386–394.

134