Gene 668 (2018) 129–134
Contents lists available at ScienceDirect
Gene journal homepage: www.elsevier.com/locate/gene
Research paper
Whole genome sequencing and bioinformatics analysis of two Egyptian genomes
T
⁎
Mahmoud ElHefnawia, ,1, Sungwon Jeonb,c,1, Youngjune Bhakb,c, Asmaa ElFikya,d, ⁎⁎ Ahmed Horaiza, JeHoon June,f, Hyunho Kimf, Jong Bhakb,c,e,f, a
Biomedical Informatics and Chemo-Informatics Group, Centre of Excellence for Advanced Sciences (CEAS), and Informatics and Systems Department, National Research Centre, Cairo 12622, Egypt b Korean Genomics Industrialization and Commercialization Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea c Department of Biomedical Engineering, School of Life Sciences, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea. d Environmental and Occupational Medicine Department, National Research Centre, Cairo 12622, Egypt e Personal Genomics Institute, Genome Research Foundation, Cheongju 28160, Republic of Korea f Geromics, Ulsan 44919, Republic of Korea.
A R T I C LE I N FO
A B S T R A C T
Keywords: Whole-genome sequencing Egyptian Variants Human migration Bioinformatics
We report two Egyptian male genomes (EGP1 and EGP2) sequenced at ~ 30× sequencing depths. EGP1 had 4.7 million variants, where 198,877 were novel variants while EGP2 had 209,109 novel variants out of 4.8 million variants. The mitochondrial haplogroup of the two individuals were identified to be H7b1 and L2a1c, respectively. We also identified the Y haplogroup of EGP1 (R1b) and EGP2 (J1a2a1a2 > P58 > FGC11). EGP1 had a mutation in the NADH gene of the mitochondrial genome ND4 (m.11778 G > A) that causes Leber's hereditary optic neuropathy. Some SNPs shared by the two genomes were associated with an increased level of cholesterol and triglycerides, probably related with Egyptians obesity. Comparison of these genomes with African and Western-Asian genomes can provide insights on Egyptian ancestry and genetic history. This resource can be used to further understand genomic diversity and functional classification of variants as well as human migration and evolution across Africa and Western-Asia.
1. Introduction A human genome holds an extensive amount of data on human evolution, diversity, health, physiology, and medicine (Lander et al., 2001). Whole genome sequencing (WGS) data can be used for the deepest possible genetic analyses for various purposes such as common and rare disorder association studies. Genomes and their diverse variation information can also be used effectively for estimating risk factors of common diseases (Bick & Dimmock, 2011; Thompson et al., 2012). Currently, massively-parallel next-generation sequencing (NGS) methods are the most widely used method for analyzing the whole human genomes. Programs to map short reads of a genome and to call the subsequent variations are being rapidly improved and upgraded
(Lupski et al., 2010). In addition, the cost of analyzing a genome has become very low and WGS is becoming more common in detecting uncommon, disease-causing variants by scrutinizing affected people's genomes (Lupski et al., 2010; Sobreira et al., 2010; Roach et al., 2010). For example, it can be useful for screening women who have BRCA1 and BRCA2 genes mutations to assess the risk of breast and ovarian cancers (Campeau et al., 2008). The Egyptian population is diverse due to its position between Africa and Asia. It has two long banks along the Nile River, which is the longest African River, and has hosted various populations throughout history. Ancient Egyptian traditions, such as mummification, play an important role in preserving genomes and subsequent analysis of DNA variants (Paabo, 1985). Egyptian DNA have been studied for a long time
Abbreviations: EGP, Egyptian person; HQ, High quality; LD, Linkage disequilibrium; LHON, Leber's hereditary optic neuropathy; mtDNA, Mitochondrial DNA; NGS, Next generation sequencing; rCRS, revised Cambridge reference sequence; SNP, Single nucleotide polymorphism; WGS, Whole genome sequencing; Y-STR, short tandem repeat (STR) on the Y-chromosome ⁎ Corresponding author. ⁎⁎ Correspondence to: J. Bhak, Korean Genomics Industrialization and Commercialization Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea. E-mail addresses:
[email protected] (M. ElHefnawi),
[email protected] (J. Bhak). 1 Equal contributors. https://doi.org/10.1016/j.gene.2018.05.048 Received 28 March 2018; Accepted 13 May 2018 0378-1119/ © 2018 Published by Elsevier B.V.
Gene 668 (2018) 129–134
M. ElHefnawi et al.
(Hawass et al., 2010). Research on the DNA of modern North Africans revealed that their gene frequencies are central to those of southern Europe, the Near East, and Sub Saharan Africa (Cavalli-Sforza et al., 1994). However, the frequency distribution of the non-recombining portion of the Y chromosome of modern Egyptian population is extremely similar to those of the middle northern African population, implying a more extensive portion of Eurasian genetic components (Cavalli-Sforza et al., 1994; Bosch et al., 1997; Manni et al., 2002; Arredi et al., 2004; Luis et al., 2004). Pagani et al. showed, after analyzing 225 Egyptians and Ethiopians, that the correct root out of Africa is the northern one through Sinai to Eurasia (Pagani et al., 2015). Khairat, Ball et al., found that one of the mummified Egyptian people pertained to the L2 mtDNA haplogroup (Li & Durbin, 2009), a maternal clade that is accepted to have origins from Western Asia. Here, we report the analyses of two male Egyptian whole genomes at high sequencing depth. A systematic genomic analysis, including the analysis of functional and deleterious mutations, was conducted and we provided the genomic structure and phylogenetic tree in comparison with populations in Africa and the Middle-East.
filtered reads onto the hg19 human reference using BWA-MEM 0.7.8 (Li & Durbin, 2009) with the default option. SAM files were then restored to the BAM file using Samtools 0.1.19 (Li et al., 2009a). To remove PCR duplicated reads, MarkDuplicate subroutine in Picard v1.9.2 (http:// broadinstitute.github.io/picard/) was used. We also conducted IndelRealigner and BaseRecalibration using GATK v2.3.9 (McKenna et al., 2010a) in order to increase the accuracy of variants calling. The variants were called by GATK UnifiedGenotyper with ‘–heterozygosity 0.0010 -dcov 200 -stand_call_conf 30.0 -stand_emit_conf 30.0’ options.
2. Material and method
We first merged Affymetrix human origin single nucleotide polymorphism (SNP) panel (HOSP) data (Lazaridis et al., 2014) with the two Egyptian genomes using PLINK 1.90 (Purcell et al., 2007), generating 591,356 autosomal SNPs. We pruned the panel with linkage disequilibrium (LD). The final dataset included 289,287 SNPs. We ran ADMIXTURE 1.3.0 (Alexander et al., 2009) with default cross-validation and used the number of ancestral population K value from 2 to 10. To construct a phylogenetic tree across populations from Africa and the Middle-east, we calculated a pairwise nucleotide distance with the same SNP panel for ADMIXTURE. We then constructed a neighbor-joining tree with pairwise pi distance (Nei & Li, 1979).
2.4. Annotation and functional analysis of variants We annotated the type and genomic regions of variants using snpEff v4.3i (Cingolani et al., 2012). To predict mutations which possibly make function altering amino acid changes, PROVEAN was used (Choi & Chan, 2015). These protein-damaging mutations were further annotated with OMIM (McKusick, 2007) and ClinVar databases (Landrum et al., 2016). 2.5. Construction of phylogenetic tree and ADMIXTURE analysis
2.1. DNA extraction and ethical approval Blood samples were extracted from two healthy people whose parents originate from the Delta (North of Egypt) and Saied (South of Egypt), respectively. This study was approved by Institutional Review Board at Genome Research Foundation with IRB-REC-2011-10-003 and the written consent was signed by the participants. 2.2. Sample preparation and whole genome sequencing Genomic DNA was extracted from blood with the Gene JETBlood genomic DNA purification Kit (Thermo Scientific, USA), according to manufacturer's protocol. A library of 400–500 bp insert size was created. The genomic DNA was sheared utilizing Covaris S series (Covaris, MS, USA). The sheared DNA was end-repaired A-tailed, and ligated to paired-end adapters, according to the manufacturer's protocol (Truseq DNA Sample Prep Kit v2, Illumina, San Diego, CA, USA). Adapter-ligated fragments were then size selected on a 2% Agarose gel, with the 520–620 bp band being removed. Gel extraction and column purification was executed by applying the Minelute Gel Extraction Kit (Qiagen), following the manufacturer's protocol. The ligated DNA parts which contained adapter sequences were enhanced via PCR using adapter specific primers. Library quality and concentration were resolved using theAgilent 2100 BioAnalyzer. The libraries were evaluated utilizing a KAPA library quantification kit (KapaBiosystems, MA, USA), as indicated by Illumina's library quantification protocol. According to the qPCR quantification, the libraries were standardized to 2 nM and after that denatured using 0.1 N NaOH. Cluster amplification of denatured templates was completed in flow cells, according to the manufacturer's protocol (Illumina). Flow cells were paired-end sequenced (2 × 100 bp) on an Illumina HiSeq2000 machine. In order to process the raw fluorescent images and the called sequences, the base-calling pipeline (Sequencing Control Software (SCS), Illumina) was applied. The rest of our analysis was initiated from the FASTQ files maintained by Illumina's downstream analysis CASAVA software suite. The raw data can be accessed at NCBI SRA, with accession number SRR5738871 and SRR5738872.
2.6. Identification of mitochondrial DNA and Y chromosome haplotype Before identifying the mitochondrial haplogroup, we extracted the short sequencing reads which were mapped to chrM of hg19 reference using in-house scripts. We then mapped the reads to rCRS mtDNA reference using BWA-MEM (Li & Durbin, 2009) and have generated consensus mtDNA sequences for the samples using samtools (Li et al., 2009a). The mitochondrial haplogroup of each sample was identified by MitoTool (Fan & Yao, 2013). The haplogroup of Y chromosome was identified by using Nevgen predictor. 3. Results and discussion 3.1. The donors EGP1’s clinical history shows bilateral visual loss, worse on the left consistent with Leber's hereditary optic neuropathy (LHON), a mitochondrial disorder which disturbs the optic nerves specifically. A visual field examination (Table 1), MRI scan, and genetic testing all confirmed the diagnosis of LHON. Additionally, there was a family history of cardiovascular and pulmonary diseases in EGP2. Table 1 Clinical data of ocular examination of EGP1. Right eye Vision (cc) Color Refraction Pupils
2.3. Alignment of reads to reference and variants detection The NGSQC toolkit v 2.3.3 (Patel & Jain, 2012) was applied to filter low quality reads with an ‘-l 70 – s 20’ options (cutoff read length for HQ = 70%, cutoff quality score = 20). Subsequently, we aligned the
Ocular motility
130
Left eye
20/400 2/300 RE 45/12 LE 2.5/12 −4.5 sphere −4.5 sphere Sluggishly reactive, There is probably a left relative afferent pupillary defect Ductions and Versions: Full There is no internuclearophthalmologia
Gene 668 (2018) 129–134
M. ElHefnawi et al.
2134 and 2093 non-mitochondrial nsSNPs in EGP1 and EGP2, respectively, as being classified as functionally damaging on proteins (S2 Table). We identified nsSNPs and annotated these nsSNPs using ClinVar to find their clinical relevance. This analysis revealed several nsSNPs that were found in two Egyptian subjects. For EGP1, SNP (rs3775291) Leu412Phe in TLR3 gene was detected, which is susceptibility to herpes simplex encephalitis. Further, Pro187Ser in NQO1 (rs1800566) was associated with an expanded danger of hematotoxicity after exposure to benzene and susceptibility to different types of cancer (Smith, 1999; Traver et al., 1997). EGP2 had an SNP (rs1801394) that is known as A66G or Ile22Met in the methionine synthase (MTRR) gene. This substitution has been related with an increased risk for neural tube defect spina bifida and the lack of serum vitamin B12 expanded this effect (Wilson et al., 1999). We found 998 function-altering nsSNPs were shared by two Egyptians and 16 nsSNPs were associated with several phenotypes (S3 Table). One of the SNP is rs181274, His166Arg, in FCGR2A, associated with susceptibility to Malaria Lupus nephritis; Met580Thr in ATP6V0A4 (rs380715), associated with renal tubular acidosis (Smith et al., 2000). Arg197Gln in NAT2 (rs1799930), which is associated with slow acetylation (Vatsis et al., 1991), was also found. It has been known that the variation in NAT2 is linked to capacity of acetylation and has genetic diversity across populations (Magalon et al., 2008). About 80% of Egyptians have a slow acetylation (Ma et al., 2002). We also found Ala222Val in MTHFR (rs1801133, C > T), which is related with hyperhomocysteinemia. Africans commonly had the CC genotype and the TT genotype was prevalent in southerly direction of Europe (Wilcken et al., 2003) and Pathans had the same mutation (Ilyas et al., 2015). In addition, this mutation is associated with the risk of colorectal cancer in Egyptians (El Awady et al., 2009). It has been reported that Egyptians have above average BMI and a major proportion are obese (Ng et al., 2014). Thr55Ala in FABP2 (rs1799883) and Asn985Tyr in RP1 (rs2293869) were identified and they all had a link to increased concentrations of cholesterol, triglycerides, and probably obesity (Georgopoulos et al., 2000; Fujita et al., 2003; Subramanian & Chait, 2012). Ser192Tyr in TYR (rs1042602, C > A) was related to skin/hair/eye color and the absence of freckling (Sulem et al., 2007; Stokowski et al., 2007). The A allele has been subject to a positive selection in European populations (Sulem et al., 2007). In addition, two SNPs were annotated as risk factors: Pro124Leu in PLAU (rs2227564), susceptibility to Alzheimer disease and Asn248Ser in TLR1 (rs4833095), associated with leprosy5 (Schuring et al., 2009). We also identified nine amino acid changes on a mitochondrial peptide from two Egyptian samples as seen in Table 4. One and two function-altering amino acid changes were predicted in EGP2 and EGP1, respectively. This is especially for EGP1 had Arg340His in ND4, (m.11778G > A) which was linked to Leber's hereditary optic neuropathy (LHON) (Singh et al., 1989).
Table 2 Summary of whole genome sequencing and mapping. Sample ID
EGP1
EGP2
Number of generated reads Number of remained reads after filtering Number of reads after removing the duplicates Number of mapped reads Mapping rate Average depth (×)
1,072,420,548 946,144,204 886,322,035 885,930,166 99.96% 30.4×
1,048,815,766 926,701,466 867,470,926 867,073,418 99.95% 29.7×
Table 3 Statistics of variants from each sample. Sample ID
EGP1
EGP2
Number of homozygous SNPs Number of heterozygous SNPs Total number of SNPs Number of homozygous INDELs Number of heterozygous INDELs Total number of INDELs Number of known variants
1,486,562 2,702,312 4,188,874 223,160 342,591 565,751 4,555,748 (95.82%) 4,754,625
1,473,578 2,815,594 4,289,172 219,551 355,587 575,138 4,655,201 (95.70%) 4,864,310
Number of total variants
3.2. Genome sequencing and variants identification Genomic DNA was sequenced by Hiseq 2000 platforms (Illumina, San Diego, CA, USA), generating 1,072,420,548 and 1,048,815,766 reads for EGP1 and EGP2, respectively. We filtered out low-quality reads using NGSQC toolkit (Patel & Jain, 2012). A total of 94.62 Gb and 92.68 Gb of reads were remained and aligned to the human reference hg19 utilizing BWA-MEM (Li & Durbin, 2009), respectively. Total 885,930,166 (99.96%) and 867,073,418 (99.95%) reads were mapped to the reference for EGP1 and EGP2, respectively, resulting in over 29× depth (Table 2). We identified a total of 4,754,625 variants and 4,864,310 variants in EGP1 and EGP2, respectively. 4,555,748 (95.82%, EGP1) and 4,655,201 (95.70%, EGP2) variants were already reported in dbSNP database and considering as a reference. We also identified 4,188,874 SNPs and 4,289,172 SNPs in EGP1 and EGP2. There were 1,486,562 homozygous SNPs and 2,702,312 heterozygous SNPs in EGP1, while EGP2 had 1,473,578 homozygous SNPs and 2,815,594 heterozygous SNPs, as show in Table 3. We annotated the effects of variants using snpEff (Cingolani et al., 2012), identifying a total of 112,637 effects and 114,964 effects on the transcripts for EGP1 and EGP2, respectively (S1 Table).
3.3. Functional classification and clinical relevance of variants To identify possible functional effect of non-synonymous SNPs (nsSNPs) in the Egyptian genomes, we used the computational prediction methods, the PROVEAN tool (Choi & Chan, 2015). We identified Table 4 Results of damaging prediction of amino acid change in mitochondrial gene. Sample
PROTEIN_ID
POS
REF
ALT
Score
Prediction (cutoff = −2.5)
Gene symbol
EGP1
ENSP00000354554 ENSP00000354632 ENSP00000354961 ENSP00000354499 ENSP00000354554 ENSP00000354554 ENSP00000354632 ENSP00000354632 ENSP00000355206
194 112 340 254 7 194 59 112 114
T T R I T T T T T
A A H V I A A A A
0.31 −3.97 −4.74 −0.44 −2.35 0.31 −0.94 −3.97 −1.39
Neutral Deleterious Deleterious Neutral Neutral Neutral Neutral Deleterious Neutral
MT-CYB MT-ATP6 MT-ND4 MT-CO1 MT-CYB MT-CYB MT-ATP6 MT-ATP6 MT-ND3
EGP2
131
Gene 668 (2018) 129–134
M. ElHefnawi et al.
Hadza Ju_hoan_North Ju_hoan_South Naro Taa_North Gui Taa_West Taa_East Hoan Xuun Khomani Nama Gana Tshwa Haiom Mbuti Biaka Esan Yoruba Mandenka Mende Gambian Wambo Damara Himba BantuKenya Luo Luhya BantuSA Tswana Dinka Kikuyu Kgalagadi Khwe Shua Masai Sandawe Somali Oromo Saharawi Mozabite Ethiopian_Jew Algerian Tunisian Moroccan_Jew Tunisian_Jew Libyan_Jew EGP2 Egyptian EGP1 BedouinB Saudi Yemenite_Jew Turkish Iraqi_Jew Cypriot Druze Turkish_Jew Lebanese Syrian Jordanian Palestinian BedouinA Yemen Georgian Abkhasian Armenian Georgian_Jew Adygei Balkar Chechen Kumyk Lezgin North_Ossetian Nogai
1.0 0.8 0.6 0.4 0.2 0.0
Fig. 1. ADMIXTURE results at K = 7 of Egyptian individuals together with several populations in Africa and Middle-East.
tree construction including two Egyptians from this study.
3.4. ADMIXTURE and phylogenetic analysis To investigate genomic representative of our two Egyptians, we performed an ADMIXTURE analysis together with African and the Middle East samples in the human origin SNP panel (Lazaridis et al., 2014) (HOSP) (from K = 2 to K = 10, S1 Figure). Our Egyptian genomes had similar ADMIXTURE configurations with Egyptian in HOSP. We also compared the Egyptians and other populations in Africa and the Middle-East. The two Egyptian genomes had a mixed configuration of the Middle-East and Africa (North and West) at K = 7(Fig. 1). Interestingly, Egyptians had the Middle-Eastern composition as a major at K = 2 to K = 4, indicating its origin is from outside of Africa. We also calculated pairwise nucleotide diversity based on SNPs in HOSP and constructed a phylogenetic tree of the two Egyptians together with several populations from Africa and the Middle-East which we have used in the ADMIXTURE analysis (Fig. 2). Both EGP1 and EGP2 were located as a sister group of the Middle-East. This confirms that Egyptians have much larger Eurasian components as previously suggested (Cavalli-Sforza et al., 1994; Bosch et al., 1997; Manni et al., 2002; Arredi et al., 2004; Luis et al., 2004). (A) Phylogenetic tree based on pairwise nucleotide distance and (B) geographical location of samples used in ADMIXTURE and Phylogenetic
3.5. Mitochondria and Y-chromosome analysis Sequencing Mitochondrial DNA (mtDNA) is widely used to comprehend the maternal migration and heredity history of human populaces (Cann et al., 1987). We generated the consensus sequences of mtDNA of the two Egyptian genomes using SAMtools (Li et al., 2009a). We identified the mitochondrial haplogroup of each sample using MitoTool (Fan & Yao, 2013) acquiring the result that EGP1 had an H7b1 while EGP2 had an L2a1c (Table 5). H7 was mostly found in the near East, Caucasus, Iran, Central Asia and Balto-Slavic countries (Costa et al., 2013) and H2, H5, H7, H13, and H20 all of which have been found in early Neolithic populations from Europe and are additionally found all through the Middle East at present time (van Oven & Kayser, 2009). Haplogroup L2 is presented in around 33% of Africans and in their current progenies (Gonder et al., 2007). L2a1c regularly shares the mutation 16,189 with L2a1b, however, has its own particular markers at 3010 and 6663. Alteration in 16,192 is likewise basic in L2a1b and L2a1c; it shows up in Southeastern Africa and also East Africa. This recommends some broadening of this clade in situ. Positions T16209C C16301T C16354T over L2a1 characterize a small sub-clade, named
A
B AF
AF
F _A
AF
A_
Ba
ub
a_
AF
●
AF
_ an
●
●
AF
Me
AF
nd
H
ka_ AF za_ AF Kiku yu_A F Sand awe_ AF
A e_
F
a_ imb Dam
Had
Latitude
uS
Ts
AF
a_
ua
ka_
we _
AF wa n
F
nt
_A
F
Kh
●
Es
F
●
30
r Yo
F
●
●
Sh
Mbu
ti_A
F
●● ● ● ●
Bia
_AF
gadi_
Haiom
Kgala
Tshwa_AF
i_AF
Gana_AF
Khoman
F
F t_A
n_A
_AF
Hoa
Xuun
i_A
F _A
_A
Gu
r th
th
_A
o_
hy a_
ro
Din
or
AF
th
Lu Lu
No
_N
ou
Na
_S
t_
an
an
es _W
ho
a Ta
Ju _
ho
Eas
_ Taa
a_ Ta
Ju _
Nama_AF
60
AF
ara
●
●
0 ●
_AF
F bo_A Wam F nka_A Mande _AF Gambian
Masai_AF
●
BantuKenya_A
●
● ●● ● ●
●
−30
F
Oromo_AF Nor th_Ossetian
Somali_AF
Nogai_N
w_AF Ethiopian_Je i_AF Saharaw
Balka
F n_A
P2 EG
F
_A
P1
Druze_WA
_WA
_WA Lebanese
n_W A
ania
Syrian
_W A
_W A
inA
nia n
Pale sti
Jord
B_ W A
i_W A
dou
uin
ud Sa
ite en
Cypriot_WA
Turkish_Jew
Be
EG
W A
en
_J ew _
m Ye
A _W
SC n_ ia as A kh _W Ab SC ish n_ rk Tu nia A e m _W w Ar SC Je w_ qi_ _Je Ira ian F _A org ew Ge n_J isia Tun _AF Jew an_ Liby _AF _Jew ccan Moro _WA
y Eg
an pti
do
ia
Ye m
is Tun
F n_A
Be
A
ia lger
r_NC
Ady gei_ NC Kum yk_ NC Ch ec he n_ Le NC zg in_ Ge NC or gia n_ SC
F
ite_A
ab Moz
_NC 0
C ● Abkhasian_SC ● Adygei_NC
●
EGP1 ● EGP2
Algerian_AF
Egyptian_AF
Armenian_SC
Esan_AF
Balkar_NC
Ethiopian_Jew_AF
BantuKenya_AF
Gambian_AF
● BantuSA_AF
Samples ● BedouinA_WA
Gana_AF
90
Jordanian_WA
Syrian_WA
Mbuti_AF
● Ju_hoan_North_AF ● Mende_AF
Taa_East_AF
● Ju_hoan_South_AF
Moroccan_Jew_AF
Taa_North_AF
Kgalagadi_AF
Mozabite_AF
Taa_West_AF
Khomani_AF
Nama_AF
Tshwa_AF
● Khwe_AF
Kikuyu_AF
● Georgian_Jew_SC ● Kumyk_NC
● Naro_AF
Tswana_AF Tunisian_AF
Nogai_NC
North_Ossetian_NC ● Tunisian_Jew_AF ● Oromo_AF
BedouinB_WA
Georgian_SC
Lebanese_WA
Biaka_AF
Gui_AF
Lezgin_NC
Palestinian_WA
Chechen_NC
Hadza_AF
Libyan_Jew_AF
Saharawi_AF
Turkish_Jew_WA ● Turkish_WA
Wambo_AF
Cypriot_WA
Haiom_AF
Luhya_AF
● Sandawe_AF
●
● Damara_AF
Himba_AF
Luo_AF
● Saudi_WA
● Yemen_WA
Dinka_AF
Hoan_AF
Mandenka_AF
Druze_WA
Iraqi_Jew_WA
Masai_AF
Fig. 2. Phylogenetic relationship based on pairwise nucleotide distance. 132
45
Longitude
Shua_AF ● Somali_AF
●
Xuun_AF
Yemenite_Jew_WA Yoruba_AF
Gene 668 (2018) 129–134
M. ElHefnawi et al.
Methodology: AE, JJ. Project Administration: JB. Resources: ME, AE, AH. Software: SJ, YB, AE, JJ, HK. Supervision: JB. Validation: SJ, YB. Visualization: SJ. Writing – Original Draft Preparation: ME, SJ. Writing – Review & Editing: SJ, YB, AE, AH, JB.
Table 5 Mitochondrial haplotype and variants of two Egyptian genomes. Sample ID
mtDNA Haplotype
Variants
EGP1
H7b1
EGP2
L2a1c
263, 309 + C, 315 + C, 750, 1438, 4769, 4793, 5348, 8860, 11,778,12,351, 15,326, 16183C, 16,189, 16,193 + C, 16519 73, 143, 146, 151, 152, 195, 263, 309 + C, 315 + C, 748, 750, 769, 1018,1438, 2416, 2706, 2789, 3010, 3594, 4104, 4769, 6663, 7028, 7175, 7256, 7274, 7521, 7771, 8206, 8701, 8860, 9221, 9540, 9950, 10,115, 10,398, 10,873, 11,719, 11,914, 11,944, 12,693, 12,705, 13,590, 13,650, 13,803, 14,566, 14,766, 15,208, 15,301, 15,326, 15,784, 16183C, 16,189, 16,193 + C, 16,223, 16,278, 16,294, 16,309, 16,390, 16,519
Acknowledgments This work is done under the umbrella of the pan Asian population genomics initiative (PAPGI) consortium (http://papgi.org). It aims to study variations in pan Asian populations, to understand how these variations affect these populations and their evolution. This work was supported by the Genome Korea Project in Ulsan (800 genome sequencing) Research Fund (1.180017.01) of UNIST (Ulsan National Institute of Science & Technology) and the Genome Korea Project in Ulsan (200 genome sequencing) Research Fund (1.180024.01) of UNIST (Ulsan National Institute of Science & Technology). J. B., S. J. were supported by Human Resource Development (HRD) for Personal Genome Informatics research fund (NRF-2017M3C9A6047623).
L2a1c by (Salas et al., 2002; Kivisild et al., 2004), which for the most part show up in East Africa (e.g. Sudan, Nubia, Ethiopia) and West Africa (e.g. Turkana, Kanuri). In the Chad Basin, four diverse L2a1c composes, a couple mutational strides from the East and West African haplotypes, were distinguished (Kivisild et al., 2004; Cerny et al., 2007). For the Y-chromosome haplogroup, a short tandem repeat analysis was carried out using Y-STR toolkit (Li et al., 2009b; McKenna et al., 2010b) that identified STR markers according to FTDNA 111 Y-STR markers and the resulting markers were used in Y-haplogroup predictor (Y-DNA haplogroup Predictor – NEVGEN). EGP1 had an R1b haplotype that believed to be the most frequently occurring in Western Europe (Myres et al., 2011), as well as exposed at moderate frequencies all over Eastern Europe, Western Asia, and some regions of North Africa and Central Asia (Herrera et al., 2012). EGP2 had J1a2a1a2 > P58 > FGC11. Haplotype J1 is for the most frequently found in Caucasia, Mesopotamia, Levant and Arabian Peninsula. J1-FGC11 is the most widely recognized heredity among Bedouins and other Semitic individuals, for example, Jewish and Arabs. These may be correlated with the animal husbandry culture as opposed to cultivation since it is predominantly observed in infertile soils, for example, in Arabia, Yemen, Ethiopia and so on (Arredi et al., 2004; Abu-Amero et al., 2009).
References Abu-Amero, K.K., Hellani, A., Gonzalez, A.M., Larruga, J.M., Cabrera, V.M., et al., 2009. Saudi Arabian Y-chromosome diversity and its relationship with nearby regions. BMC Genet. 10, 59. Alexander, D.H., Novembre, J., Lange, K., 2009. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664. Arredi, B., Poloni, E.S., Paracchini, S., Zerjal, T., Fathallah, D.M., et al., 2004. A predominantly neolithic origin for Y-chromosomal DNA variation in North Africa. Am. J. Hum. Genet. 75, 338–345. Bick, D., Dimmock, D., 2011. Whole exome and whole genome sequencing. Curr. Opin. Pediatr. 23, 594–600. Bosch, E., Calafell, F., Perez-Lezaun, A., Comas, D., Mateu, E., et al., 1997. Population history of North Africa: evidence from classical genetic markers. Hum. Biol. 69, 295–311. Campeau, P.M., Foulkes, W.D., Tischkowitz, M.D., 2008. Hereditary breast cancer: new genetic developments, new therapeutic avenues. Hum. Genet. 124, 31–42. Cann, R.L., Stoneking, M., Wilson, A.C., 1987. Mitochondrial DNA and human evolution. Nature 325, 31–36. Cavalli-Sforza, L.L., Menozzi, P., Piazza, A., 1994. The History and Geography of Human Genes. xi, 541 Princeton University Press, Princeton, N.J (518 pp). Cerny, V., Salas, A., Hajek, M., Zaloudkova, M., Brdicka, R., 2007. A bidirectional corridor in the Sahel-Sudan belt and the distinctive features of the Chad Basin populations: a history revealed by the mitochondrial DNA genome. Ann. Hum. Genet. 71, 433–452. Choi, Y., Chan, A.P., 2015. PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics 31, 2745–2747. Cingolani, P., Platts, A., Wang le, L., Coon, M., Nguyen, T., et al., 2012. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6, 80–92 (Austin). Costa, M.D., Pereira, J.B., Pala, M., Fernandes, V., Olivieri, A., et al., 2013. A substantial prehistoric European ancestry amongst Ashkenazi maternal lineages. Nat. Commun. 4, 2543. El Awady, M.K., Karim, A.M., Hanna, L.S., El Husseiny, L.A., El Sahar, M., et al., 2009. Methylenetetrahydrofolate reductase gene polymorphisms and the risk of colorectal carcinoma in a sample of Egyptian individuals. Cancer Biomark. 5, 233–240. Fan, L., Yao, Y.G., 2013. An update to MitoTool: using a new scoring system for faster mtDNA haplogroup determination. Mitochondrion 13, 360–363. Fujita, Y., Ezura, Y., Emi, M., Ono, S., Takada, D., et al., 2003. Hypertriglyceridemia associated with amino acid variation Asn985Tyr of the RP1 gene. J. Hum. Genet. 48, 305–308. Georgopoulos, A., Aras, O., Tsai, M.Y., 2000. Codon-54 polymorphism of the fatty acidbinding protein 2 gene is associated with elevation of fasting and postprandial triglyceride in type 2 diabetes. J. Clin. Endocrinol. Metab. 85, 3155–3160. Gonder, M.K., Mortensen, H.M., Reed, F.A., de Sousa, A., Tishkoff, S.A., 2007. WholemtDNA genome sequence analysis of ancient African lineages. Mol Biol Evol 24, 757–768. Hawass, Z., Gad, Y.Z., Ismail, S., Khairat, R., Fathalla, D., et al., 2010. Ancestry and pathology in king Tutankhamun's family. JAMA 303, 638–647. Herrera, K.J., Lowery, R.K., Hadden, L., Calderon, S., Chiou, C., et al., 2012. Neolithic patrilineal signals indicate that the Armenian plateau was repopulated by agriculturalists. Eur. J. Hum. Genet. 20, 313–320. Ilyas, M., Kim, J.S., Cooper, J., Shin, Y.A., Kim, H.M., et al., 2015. Whole genome sequencing of an ethnic Pathan (Pakhtun) from the north-west of Pakistan. BMC
4. Conclusion We present the whole genomes of two Egyptian individuals from the Delta (North of Egypt, EGP1) and Saied (South of Egypt, EGP2), respectively. Our analysis provides resourceful data and information of the Egyptian genome heterogeneity and functional characterization of some variants. We also provide phylogenetic information of two Egyptians compared with several populations from Africa and MiddleEast, suggesting their genetic history is mixed between middle easterners and north and east Africans. This may shed some light on early human migration and current population diversity. A larger scale study of Egyptian genomes is needed in the future to map their complete genetic make-up in a much finer scale and frequencies. Supplementary data to this article can be found online at https:// doi.org/10.1016/j.gene.2018.05.048. Author contributions Conceptualization: ME, JB. Data Curation: JJ, HK. Formal analysis: SJ, YB. Funding Acquisition: JB. Investigation: SJ, YB, AE. 133
Gene 668 (2018) 129–134
M. ElHefnawi et al.
Paabo, S., 1985. Molecular cloning of ancient Egyptian mummy DNA. Nature 314, 644–645. Pagani, L., Schiffels, S., Gurdasani, D., Danecek, P., Scally, A., et al., 2015. Tracing the route of modern humans out of Africa by using 225 human genome sequences from Ethiopians and Egyptians. Am. J. Hum. Genet. 96, 986–991. Patel, R.K., Jain, M., 2012. NGS QC toolkit: a toolkit for quality control of next generation sequencing data. PLoS One 7, e30619. Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A., et al., 2007. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575. Roach, J.C., Glusman, G., Smit, A.F., Huff, C.D., Hubley, R., et al., 2010. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328, 636–639. Salas, A., Richards, M., De la Fe, T., Lareu, M.V., Sobrino, B., et al., 2002. The making of the African mtDNA landscape. Am. J. Hum. Genet. 71, 1082–1111. Schuring, R.P., Hamann, L., Faber, W.R., Pahan, D., Richardus, J.H., et al., 2009. Polymorphism N248S in the human toll-like receptor 1 gene is related to leprosy and leprosy reactions. J. Infect. Dis. 199, 1816–1819. Singh, G., Lott, M.T., Wallace, D.C., 1989. A mitochondrial DNA mutation as a cause of Leber's hereditary optic neuropathy. N. Engl. J. Med. 320, 1300–1305. Smith, M.T., 1999. Benzene, NQO1, and genetic susceptibility to cancer. Proc. Natl. Acad. Sci. 96, 7624–7626. Smith, A.N., Skaug, J., Choate, K.A., Nayir, A., Bakkaloglu, A., et al., 2000. Mutations in ATP6N1B, encoding a new kidney vacuolar proton pump 116-kD subunit, cause recessive distal renal tubular acidosis with preserved hearing. Nat. Genet. 26, 71–75. Sobreira, N.L., Cirulli, E.T., Avramopoulos, D., Wohler, E., Oswald, G.L., et al., 2010. Whole-genome sequencing of a single proband together with linkage analysis identifies a Mendelian disease gene. PLoS Genet. 6, e1000991. Stokowski, R.P., Pant, P.V., Dadd, T., Fereday, A., Hinds, D.A., et al., 2007. A genomewide association study of skin pigmentation in a South Asian population. Am. J. Hum. Genet. 81, 1119–1132. Subramanian, S., Chait, A., 2012. Hypertriglyceridemia secondary to obesity and diabetes. Biochim. Biophys. Acta 1821, 819–825. Sulem, P., Gudbjartsson, D.F., Stacey, S.N., Helgason, A., Rafnar, T., et al., 2007. Genetic determinants of hair, eye and skin pigmentation in Europeans. Nat. Genet. 39, 1443–1452. Thompson, R., Drew, C.J., Thomas, R.H., 2012. Next generation sequencing in the clinical domain: clinical advantages, practical, and ethical challenges. Adv. Protein Chem. Struct. Biol. 89, 27–63. Traver, R., Siegel, D., Beall, H., Phillips, R.M., Gibson, N., et al., 1997. Characterization of a polymorphism in NAD (P) H: quinone oxidoreductase (DT-diaphorase). Br. J. Cancer 75, 69–75. Vatsis, K.P., Martell, K.J., Weber, W.W., 1991. Diverse point mutations in the human gene for polymorphic N-acetyltransferase. Proc. Natl. Acad. Sci. U. S. A. 88, 6333–6337. Wilcken, B., Bamforth, F., Li, Z., Zhu, H., Ritvanen, A., et al., 2003. Geographical and ethnic variation of the 677C > T allele of 5,10 methylenetetrahydrofolate reductase (MTHFR): findings from over 7000 newborns from 16 areas world wide. J. Med. Genet. 40, 619–625. Wilson, A., Platt, R., Wu, Q., Leclerc, D., Christensen, B., et al., 1999. A common variant in methionine synthase reductase combined with low cobalamin (vitamin B 12) increases risk for spina bifida. Mol. Genet. Metab. 67, 317–323.
Genomics 16, 172. Kivisild, T., Reidla, M., Metspalu, E., Rosa, A., Brehm, A., et al., 2004. Ethiopian mitochondrial DNA heritage: tracking gene flow across and around the gate of tears. Am. J. Hum. Genet. 75, 752–770. Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., et al., 2001. Initial sequencing and analysis of the human genome. Nature 409, 860–921. Landrum, M.J., Lee, J.M., Benson, M., Brown, G., Chao, C., et al., 2016. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 44, D862–868. Lazaridis, I., Patterson, N., Mittnik, A., Renaud, G., Mallick, S., et al., 2014. Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature 513, 409–413. Li, H., Durbin, R., 2009. Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics 25, 1754–1760. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., et al., 2009a. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., et al., 2009b. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079. Luis, J.R., Rowold, D.J., Regueiro, M., Caeiro, B., Cinnioglu, C., et al., 2004. The Levant versus the horn of Africa: evidence for bidirectional corridors of human migrations. Am. J. Hum. Genet. 74, 532–544. Lupski, J.R., Reid, J.G., Gonzaga-Jauregui, C., Rio Deiros, D., Chen, D.C., et al., 2010. Whole-genome sequencing in a patient with Charcot-Marie-tooth neuropathy. N. Engl. J. Med. 362, 1181–1191. Ma, M.K., Woo, M.H., McLeod, H.L., 2002. Genetic basis of drug metabolism. Am. J. Health Syst. Pharm. 59, 2061–2069. Magalon, H., Patin, E., Austerlitz, F., Hegay, T., Aldashev, A., et al., 2008. Population genetic diversity of the NAT2 gene supports a role of acetylation in human adaptation to farming in Central Asia. Eur. J. Hum. Genet. 16, 243–251. Manni, F., Leonardi, P., Barakat, A., Rouba, H., Heyer, E., et al., 2002. Y-chromosome analysis in Egypt suggests a genetic regional continuity in Northeastern Africa. Hum. Biol. 74, 645–658. McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., et al., 2010a. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303. McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., et al., 2010b. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303. McKusick, V.A., 2007. Mendelian inheritance in man and its online version, OMIM. Am. J. Hum. Genet. 80, 588–604. Myres, N.M., Rootsi, S., Lin, A.A., Jarve, M., King, R.J., et al., 2011. A major Y-chromosome haplogroup R1b Holocene era founder effect in central and Western Europe. Eur. J. Hum. Genet. 19, 95–101. Nei, M., Li, W.H., 1979. Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc. Natl. Acad. Sci. U. S. A. 76, 5269–5273. Ng, M., Fleming, T., Robinson, M., Thomson, B., Graetz, N., et al., 2014. Global, regional, and national prevalence of overweight and obesity in children and adults during 1980-2013: a systematic analysis for the global burden of disease study 2013. Lancet 384, 766–781. van Oven, M., Kayser, M., 2009. Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation. Hum. Mutat. 30, E386–394.
134