Integrating large-scale phylogenetic datasets to dissect the ancient evolutionary history of vertebrate genome

Integrating large-scale phylogenetic datasets to dissect the ancient evolutionary history of vertebrate genome

YMPEV 4898 No. of Pages 13, Model 5G 14 May 2014 Molecular Phylogenetics and Evolution xxx (2014) xxx–xxx 1 Contents lists available at ScienceDire...

3MB Sizes 0 Downloads 40 Views

YMPEV 4898

No. of Pages 13, Model 5G

14 May 2014 Molecular Phylogenetics and Evolution xxx (2014) xxx–xxx 1

Contents lists available at ScienceDirect

Molecular Phylogenetics and Evolution journal homepage: www.elsevier.com/locate/ympev 5 6

Integrating large-scale phylogenetic datasets to dissect the ancient evolutionary history of vertebrate genome

3 4 7

Q1

8

Sadaf Ambreen, Faiqa Khalil, Amir Ali Abbasi ⇑ National Center for Bioinformatics, Program of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad 45320, Pakistan

10 9 11 12

a r t i c l e

1 2 4 8 15 16 17 18 19

i n f o

Article history: Received 4 March 2014 Revised 17 April 2014 Accepted 1 May 2014 Available online xxxx

20 21 22 23 24 25 26 27

Keywords: Human paralogons HOX-cluster WGD/2R Phylogenetic analysis Vertebrates Segmental duplication

a b s t r a c t Background: The vertebrate genome often contains closely spaced set of paralogous genes from distinct gene families on typically two, three or four different chromosomes (paralogons). This type of genome architecture is widely considered to be remnants of whole genome duplication events (WGD/2R). Results: Taking advantage of the well-annotated and high-quality human genomic sequence map as well as the ever-increasing accessibility of large-scale genomic sequence data from a diverse range of animal species, we investigated the evolutionary history of potential quadruplicated regions residing on human HOX-cluster bearing chromosomes (chromosomes 2/7/12/17). For this purpose a detailed phylogenetic analysis was performed for those multigene families, including members of at least three of the four HOX-bearing chromosomes. Topology comparison approach categorized the members of 63 families into distinct co-duplicated groups. Distinct gene families belonging to a particular co-duplicated group, exhibit similar evolutionary history and hence have duplicated concurrently, whereas genes of two different co-duplicated groups do not share their history and have not duplicated in concert with each other. Conclusions: These results based on large-scale phylogenetic dataset yielded no evidence in favor of polyploidization events; instead it appears that triplicated and quadruplicated genomic segments on the human HOX-bearing chromosomes arose by small-scale duplication events that occurred at widely different time points in animal evolution. Ó 2014 Published by Elsevier Inc.

29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

48

1. Introduction

49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

Q3

To explain the genetic underpinnings of major morphological and developmental transformations during early vertebrate history, almost 40 years ago Ohno postulated that the ancestral vertebrate genome underwent large-scale genomic changes such as the two rounds of whole-genome duplications (2R/WGD) (Ohno, 1970, 1973). This assumption has been hotly debated over the past couple of decades. Among those evidences, adduced in the favor of ancient vertebrate polyploidy, most popular one is the existence of quadruplicate regions in the vertebrate genomes: the distinct chromosomal regions within a genome that contains a set of similar genes (paralogons). In particular, human fourfold paralogy regions on HSA 1/6/9/19, HSA 4/5/8/10, HSA 1/2/8/10 and the HOX-bearing chromosomes HSA 2/7/12/17 are considered to be remnants of 2R/WGD events (Canestro et al., 2013). To investigate evolutionary events that resulted in the origin of human paralogy blocks, several avenues were explored in detail (Abbasi, 2008). For instance, numbers of previous studies have ⇑ Corresponding author. E-mail address: [email protected] (A.A. Abbasi).

employed the map-self comparison approach to predict evolutionary events that generated human ancient intra-genomic synteny blocks. However, recent investigations by our group clearly highlighted the fact that sheer map distribution of subset of human genes does not constitute the evidence for the evolutionary mechanisms resulted in the origin of such human/vertebrate paralogons (Abbasi, 2010; Abbasi and Grzeschik, 2007; Asrar et al., 2013). Instead, we and other proposed that such synteny patterns supporting WGD hypothesis only if few critical conditions are met; evolutionary history of the gene families constituting paralogons should suggest that majority of them duplicated in the early history of vertebrate lineage (Abbasi, 2010; Abbasi and Grzeschik, 2007; Asrar et al., 2013; Hughes, 1998; Zhang and Nei, 1996); gene families that duplicated within the time window of invertebrates– vertebrates and bony fish–tetrapod split should reveal consistencies in their tree topologies (Hughes et al., 2001; Martin, 2001); ideally the phylogenetic trees of quadruplicated families (under 2R assumption) should exhibit the topology of the form (AB)(CD), i.e. two clusters of two genes (Hughes, 1998; Martin, 2001). To test this assumption, previously we investigated the phylogenetic history of 43 multigene families with members residing on at least three of the human HOX-bearing chromosomes (HSA2, HSA7,

http://dx.doi.org/10.1016/j.ympev.2014.05.002 1055-7903/Ó 2014 Published by Elsevier Inc.

Please cite this article in press as: Ambreen, S., et al. Integrating large-scale phylogenetic datasets to dissect the ancient evolutionary history of vertebrate genome. Mol. Phylogenet. Evol. (2014), http://dx.doi.org/10.1016/j.ympev.2014.05.002

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88

YMPEV 4898

No. of Pages 13, Model 5G

14 May 2014 2

S. Ambreen et al. / Molecular Phylogenetics and Evolution xxx (2014) xxx–xxx

111

HSA12 and HSA17), but the results found were contradictory (Abbasi, 2010; Abbasi and Grzeschik, 2007; Asrar et al., 2013). In the present study, we extend our previous work (Abbasi, 2010; Abbasi and Grzeschik, 2007; Asrar et al., 2013) and reported the phylogenetic history of further 19 multigene families that have representation on at least three of the four human HOX-bearing chromosomes (Fig. 1 and Table 1). The topology comparison approach (Abbasi, 2010; Abbasi and Grzeschik, 2007; Asrar et al., 2013; Hughes et al., 2001a; Martin, 2001) was then applied on the phylogenetic data of total 63 families (19 present data and 44 previous data) to elucidate the evolutionary mechanisms that resulted in the origin of triplicated and quadruplicated synteny blocks residing on human HOX-cluster containing chromosomes. In support to our previous data (Abbasi, 2010; Abbasi and Grzeschik, 2007; Asrar et al., 2013), the results from the present study suggests that the gene families with three or more paralogs linked to HOX-clusters did not arise simultaneously through two rounds of whole chromosome or whole-genome duplications. Instead, our study based on large-scale phylogenetic data decisively concludes that human HOX-cluster paralogons were shaped by independent gene duplications, segmental duplications (SDs) and genomic rearrangement events that occurred at widely different time points during animals’ history.

112

2. Materials and methods

113

2.1. Dataset

89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110

114 115

The human gene families that have members residing on at least three of the human HOX-bearing chromosomes (HSA2,

HSA7, HSA12 and HSA17) were identified by scanning the human genome sequence maps available at the Ensembl and UCSC genome browsers (Hubbard et al., 2002). In total 19 human HOX linked gene families were used in the present study. Eight families had members on each of the four human HOX-bearing chromosomes while the remaining eleven families have their representation on at least three of these chromosomes (Table 1 and Fig. 1). The closest putative orthologous sequences of the human proteins in other species were obtained using BLASTP in the Ensembl genome browser (Clamp et al., 2003; Hubbard et al., 2002). To enrich these gene families with sequences from those organisms for which sequence information was not available at Ensembl, a BLASTP (Altschul et al., 1990) search was carried out against the protein database available at the National Center for Biotechnology Information (Hughes et al., 2001; Johnson et al., 2008) and the Joint Genome Institute (http://www.jgi.doe.gov/). Because the main objective of this study was to identify the duplications events which had occurred during vertebrate evolution; the blast hits with higher scores than the available invertebrate ancestral sequences were retained. Further confirmation of ancestral–descendants relationship among putative orthologous was achieved by clustering homologous proteins within phylogenetic trees. Sequences whose position within a tree was sharply in conflict with the uncontested animal phylogeny were excluded from analysis. The list of sequences used in the analysis is provided in the Supplementary material (Appendix 1). The species that were selected for the analysis are Homo sapiens (Human), Pan troglodytes (Chimpanzee), Gorilla gorilla (Gorilla), Macaca mulatta (Macaque), Mus musculus (Mouse), Rattus norvegicus (Rat), Bos taurus (Cow),

Fig. 1. Gene families with members on at least three of the human HOX-cluster bearing chromosomes 2, 7, 12 and 17. Restricted location of members of many of these gene families near the HOX-clusters suggests that HOX-cluster paralogons might have been shaped by two rounds of block/whole chromosome duplications. ABC, ATP binding cassette; BAZ, Bromodomain adjacent to zinc finger domain; CCT, Chaperonin containing t-complex polypeptide 1; CDKs, Cyclin-dependent kinase; CRY, Crystallin family; FZD, Frizzled; ITGA, Integrin alpha family; KCNH, Potassium voltage-gated channel, subfamily H; KCNJ, Potassium inwardly-rectifying channel, subfamily J; METTL, methyltransferase; MLX, MLX interacting protein; MYO1, myosin 1; NACA, NAC alpha domain proteins; NEUROD, Neurogenic differentiation; RAPGEF, Rap guanine nucleotide exchange factor (GEF); RNF, Ring finger protein family; SLC5A, Solute carrier family 5; SLC38A, Solute carrier family 38A; SOCS, Suppressor of cytokine signaling. Genes analyzed in this study are enclosed within rectangles, whereas the histories of other genes (not enclosed in rectangles) were presented in our previous data (Abbasi, 2010; Abbasi and Grzeschik, 2007; Asrar et al., 2013). None of the features of this figure are drawn to scale.

Please cite this article in press as: Ambreen, S., et al. Integrating large-scale phylogenetic datasets to dissect the ancient evolutionary history of vertebrate genome. Mol. Phylogenet. Evol. (2014), http://dx.doi.org/10.1016/j.ympev.2014.05.002

116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145

YMPEV 4898

No. of Pages 13, Model 5G

14 May 2014 3

S. Ambreen et al. / Molecular Phylogenetics and Evolution xxx (2014) xxx–xxx Table 1 List of human gene families used in the phylogenetic analysis. Gene family

Members

Chr location

Human protein accession No.

Number of included taxa

Number of sequences included

21q22.3 4q22 11q23.3 2p21 2p21 9q31.1 9q34 16p13.3 1p22 17q24.3 17q24.3 19p13.3 17q24 17q24.2 17q24 2q34 7p12.3

Q9UNQ0 Q9H172 Q9H222 Q9H221 O95477 Q9BZC7 Q99758 P78363 Q8WWZ7 Q8N139 Q8IZY2 O94911 Q8IUA7 Q8WWZ4 Q86UK0 Q86UQ4 Q9UNQ0

18

142

Bromodomain adjacent to zinc finger domain BAZ1A 14q13.2 BAZ2A 12q13.3 BAZ1B 7q11.23 BAZ2B 2q24.2 BPTF 17q24.3 KAT2A 17q21 KAT2B 3p24 CECR2 22q11.2

Q9NRL2 Q9UIF9 Q9UIG0 Q9UIF8 Q12830 Q92830 Q92831 Q9BXF3

27

136

Chaperonin containing T-complex polypeptide 1 CCT2 12q15 CCT3 1q23 CCT4 2p15 CCT5 5p15.2 CCT6A 7p11.2 CCT6B 17q12 CCT7 2p13.2 CCT8 21q22.11

P78371 P49368 P50991 P48643 P40227 Q92526 Q99832 P50990

27

171

Cyclin-dependent kinase CDK1 CDK2 CDK3 CDK4 CDK5 CDK6 CDK14 CDK15 CDK16 CDK17 CDK18

10q21.1 12q13 17q22-qter 12q14 7q36 7q21-q22 7q21-q22 2q33.2 Xp11 12q23.1 1q31-q32

P06493 P24941 Q00526 P11802 Q00535 Q00534 O94921 Q96Q40 Q00536 Q00537 Q07002

28

183

CRYGA CRYGB CRYGC CRYGD CRYGS CRYGN CRYBA1 CRYBA2 CRYBA4 CRYBB1 CRYBB2 CRYBB3

2q33-q35 2q33-q35 2q33-q35 2q33-q35 3q25-qter 7q36.1 17q11.2 2q34-q36 22q11.2q13.1|22q12.1 22q11.2|22q12.1 22q11.2q12.1|22q11.3 22q11.2q12.1|22q11.3

P11844 P07316 P07315 P07320 P22914 Q8WXF5 P05813 P53672 P53673 P53674 P43320 P26998

19

110

FZD1 FZD2 FZD3 FZD4 FZD5 FZD6 FZD7 FZD8 FZD9 FZD10

7q21 17q21.1 8p21 11q14.2 2q33.3 8q22.3-q23.1 2q33 10p11.21 7q11.23 12q24.33

Q9UP38 Q14332 Q9NPG1 Q9ULV1 Q13467 O60353 O75084 Q9H461 O00144 Q9ULW2

21

102

ATP binding cassette ABCG1 ABCG2 ABCG4 ABCG5 ABCG8 ABCA1 ABCA2 ABCA3 ABCA4 ABCA5 ABCA6 ABCA7 ABCA8 ABCA9 ABCA10 ABCA12 ABCA13

Crystallin family

Frizzled

(continued on next page)

Please cite this article in press as: Ambreen, S., et al. Integrating large-scale phylogenetic datasets to dissect the ancient evolutionary history of vertebrate genome. Mol. Phylogenet. Evol. (2014), http://dx.doi.org/10.1016/j.ympev.2014.05.002

YMPEV 4898

No. of Pages 13, Model 5G

14 May 2014 4

S. Ambreen et al. / Molecular Phylogenetics and Evolution xxx (2014) xxx–xxx

Table 1 (continued) Gene family

Members

Chr location

Human protein accession No.

Number of included taxa

Number of sequences included

5q11.2||C 5q11.2 17q21.33 2q31.3 12q11-q13 2q31.1 12q13 10p13 3p21.3 1q21 15q23 17q21.32 16p11.2 17p13 16p11.2 16p11.2 2q31-q32 16p11.2

P56199 P17301 P26006 P13612 P08648 P23229 Q13683 P53708 Q13797 O75578 Q9UKX5 P08514 Q13349 P38570 P20701 P11215 P06756 P20702

16

146

Potassium voltage-gated channel, subfamily H KCNH1 1q32.2 KCNH2 7q36.1 KCNH3 12q13 KCNH4 17q21.2 KCNH5 14q23.1 KCNH6 17q23.3 KCNH7 2q24.2 KCNH8 3p24.3 CNGA1 4p12 CNGA2 Xq27 CNGA3 2q11.2 CNGB3 8q21.3 CNGA4 11p15.4 HCN1 5p12 HCN2 19p13.3 HCN3 1q22 HCN4 15q24.1

O95259 Q12809 Q9ULD8 Q9UQ05 Q8NCM2 Q9H252 Q9NS40 Q96L42 P29973 Q16280 Q16281 Q9NQW8 Q8IV77 O60741 Q9UL51 Q9P1Z3 Q9Y3Q4

15

103

Potassium inwardly-rectifying channel, subfamily J KCNJ1 11q24 KCNJ2 17q24.3 KCNJ3 2q24.1 KCNJ4 22q13.1 KCNJ5 11q24 KCNJ6 21q22.1|21q22.1 KCNJ8 12p11.23 KCNJ9 1q23.2 KCNJ10 1q23.2 KCNJ11 11p15.1 KCNJ12 17p11.2 KCNJ14 19q13 KCNJ15 21q22.2 KCNJ16 17q24.3

P48048 P63252 P48549 P48050 P48544 P48051 Q15842 Q92806 P78508 Q14654 Q14500 Q9UNX9 Q99712 Q9NPI9

27

239

17q23.2 7q32.1 3p25.1 2q31.1

Q96IZ6 Q6P1Q9 Q8TCB7 Q9H825

27

56

17q21.1 12q24.31||C 7q11.23

Q9UH92 Q9HAP2 Q9NP71

15

28

12q13-q14 2q12-q34 17p13.3 17q11-q12 15q21-q22 19p13.3-p13.2 7p13-p11.2 12q24.11 17q12

Q9UBC5 O43795 O00159 O94832 Q12965 O00160 B0I1T2 Q8N1T3 Q96H55

28

161

Integrin alpha family ITGA1 ITGA2 ITGA3 ITGA4 ITGA5 ITGA6 ITGA7 ITGA8 ITGA9 ITGA10 ITGA11 ITGA2B ITGAD ITGAE ITGAL ITGAM ITGAV ITGAX

Methyltransferase METTL2A METTL2B METTL6 METTL8 MLX interacting protein MLX MLXIP MLXIPL Myosin 1 MYO1A MYO1B MYO1C MYO1D MYO1E MYO1F MYO1G MYO1H MYO19

Please cite this article in press as: Ambreen, S., et al. Integrating large-scale phylogenetic datasets to dissect the ancient evolutionary history of vertebrate genome. Mol. Phylogenet. Evol. (2014), http://dx.doi.org/10.1016/j.ympev.2014.05.002

YMPEV 4898

No. of Pages 13, Model 5G

14 May 2014 5

S. Ambreen et al. / Molecular Phylogenetics and Evolution xxx (2014) xxx–xxx Table 1 (continued) Gene family

146 147 148 149 150 151 152 153 154

Members

Chr location

Human protein accession No.

Number of included taxa

Number of sequences included

NAC alpha domain proteins NACA NACA2 NACAD

12q23-q24.1 17q23.2 7p13

Q13765 Q9H009 O15069

18

29

Neurogenic differentiation NEUROD1 NEUROD2 NEUROD4 NEUROD6 NEUROG1 NEUROG2 NEUROG3

2 C3|2 46.0 cM 17q12 12q13.2 7p14.3 5q23-q31 4q25 10q21.3

Q60867 Q15784 Q9HD90 Q96NK8 Q92886 Q9H2A3 Q9Y4Z2

23

89

Rap guanine nucleotide exchange factor RAPGEF2 4q32.1 RAPGEF3 12q13.1 RAPGEF4 2 C3|2 RAPGEF5 7p15.3 RAPGEF6 5q31.1 RAPGEFL1 17q21.1

Q9Y4G8 O95398 Q9EQZ6 Q92565 Q8TEU7 Q9UHV5

27

95

Ring finger protein family RNF13 RNF43 RNF128 RNF130 RNF133 RNF148 RNF149 RNF150 RNF167 RNF215

3q25.1 17q22 Xq22.3 5q35.3 7q31.32 7q31.33 2q11.2 4q31.21 17p13.2 22q12.2

O43567 Q68DV7 Q8TEB7 Q86XS8 Q8WVZ7 Q8N7C7 Q8NC42 Q9ULK6 Q9H6Y7 Q9Y6U7

21

114

Solute carrier family 5 SLC5A1 SLC5A2 SLC5A3 SLC5A4 SLC5A5 SLC5A6 SLC5A8 SLC5A9 SLC5A10 SLC5A11 SLC5A12

22q12.3 16p12-p11 21q22.12 22q12.2-q12.3 19p13.2-p12 2p23 12q23.1 1p33 17p11.2 16pter-p11 11p14.2

P13866 P31639 P53794 Q9NY91 Q92911 Q9Y289 Q8N695 Q2M3M2 A0PJK1 Q8WWX8 Q8IUS7

28

126

Solute carrier family 38A SLC38A1 SLC38A2 SLC38A4 SLC38A5 SLC38A6 SLC38A10 SLC38A11

12q13.11 12q 12q13 Xp11.23 14q23.1 17q25.3 2q24.3

Q9H2H9 Q96QD8 Q969I6 Q8WUX1 Q8IZM9 Q9HBR0 Q08AI6

28

128

Suppressor of cytokine signaling SOCS1 SOCS2 SOCS3 SOCS4 SOCS5 SOCS6 SOCS7 CISH

16p13.13 12q 17q25.3 14q22.1 2p21 18q22.2 17q12 3p21.3

O15524 O14508 O14543 Q8WXH5 O75159 O14544 O14512 Q9NSE2

26

150

Canis familiaris (Dog), Loxodonta Africana (Elephant), Monodelphis domestica (Opossum), Gallus gallus (Chicken), Taeniopygia guttata (Zebra finch), Anolis carolinensis (Lizard), Xenopus tropicalis (Frog), Danio rerio (Zebrafish), Takifugu rubripes (Fugu), Tetraodon nigroviridis (Tetraodon), Gasterosteus aculeatus (Stickleback), Oryzias latipes (Medaka), Ciona intestinalis (Ascidian), Ciona savignyi (Ascidian), Branchiostoma floridae (Amphioxus), Strongylocentrotus purpuratus (Sea urchin), Drosophila melanogaster (Fruit fly), Apis mellifera (Honey bee), Anopheles gambiae (Mosquito), Caenorhabditis elegans

(Nematode), Nematostella vectensis (Sea anemone) and Hydra magnipapillata (Hydra).

155

2.2. Alignment and phylogenetic analysis

157

Phylogenetic analyses for each gene family were performed using MEGA version 5 (Kumar et al., 2008). Amino acid sequences were aligned using a multiple sequence alignment tool CLUSTALW with default parameter (Thompson et al., 1994). Phylogenetic

158

Please cite this article in press as: Ambreen, S., et al. Integrating large-scale phylogenetic datasets to dissect the ancient evolutionary history of vertebrate genome. Mol. Phylogenet. Evol. (2014), http://dx.doi.org/10.1016/j.ympev.2014.05.002

156

159 160 161

YMPEV 4898

No. of Pages 13, Model 5G

14 May 2014 6 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194

195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224

S. Ambreen et al. / Molecular Phylogenetics and Evolution xxx (2014) xxx–xxx

trees for each gene family were reconstructed using the neighbor joining (NJ) method (Russo et al., 1996; Saitou and Nei, 1987) .The complete deletion option was used to exclude any site which postulated a gap in the sequences. Uncorrected proportion (p) of Q4 amino acid difference and possion corrected (PC) amino acid distance were used as amino acid substitution models. Since both methodologies produced similar results, only the results from NJ tree based on uncorrected p-distance are presented here (Figs. 2 and 3 and Appendix 2). The authenticity of the resulting tree topologies were confirmed by bootstrap method (at 1000 pseudoreplicates) which generated the bootstrap probability for each interior branch in the tree (Felsenstein, 1985). The sequences that were too diverged, disrupting the entire alignment were excluded. Maximum Likelihood tree was also constructed for each gene family using the Whelan and Goldman (WAG) model of amino acid replacement (Whelan and Goldman, 2001) (Appendix 3). For each gene family, the order of branching within the phylogenetic tree was used to estimate the time window of gene duplication events relative to the divergence of major taxa of organisms. The method of relative dating does not rely on the assumption of a constant rate of evolution. Therefore the process is sensitive to the varying rate of evolution in different branches of the tree (Hughes, 1998). The tree topology of each gene family was compared with those of other families and with HOX-clusters phylogeny to test for consistency in duplication events (Zhang and Nei, 1996) (Fig. 4). Among the topologies of 19 gene families, the phylogenetic trees of twelve gene families (ABC, CDK, CCT, FZD, ITGA, METTL, MYO1, MLX, RAPGEF, RNF, SLC5A, SOCS and SLC38A) were rooted with both invertebrate and vertebrate sequences. The phylogenies of BAZ, KCNJ, KCNH and NEUROD families consisted of subfamilies, each of which served to root the other. The phylogenetic trees of NACA and CRY were rooted with orthologous genes from invertebrates. 3. Results To elucidate the evolutionary mechanisms that shaped ancient quadruplicated regions of human genome, 19 multigene families with representation on at least three of the four HOX-cluster bearing chromosomes (HSA2/7/12/17) were selected for this study. Neighbor-Joining (NJ) and Maximum-Likelihood (ML) trees were constructed by employing protein sequence data from diverse set of vertebrate and invertebrate species for each gene family (Appendices 2 and 3). Given the phylogenetic data, we next determine the co-duplication events by employing the topology comparison approach (Abbasi, 2010; Abbasi and Grzeschik, 2007; Zhang and Nei, 1996) (Fig. 1). For this purpose, we shortlisted only those portions of phylogenies which indicated a strong statistical signal for at least two duplication events within the time window of teleost–tetrapod and vertebrate–invertebrate split (proposed timings of WGDs) (Fig. 5 and Table 2). NEUROD and FZD gene family members located on at least three of the human HOX-cluster bearing chromosomes (HSA2, HSA7 and HSA17) and diversified by at least two vertebrate specific duplication events (Table 2). For both of these distinct families, genes on HSA17 and HSA2 clustered together and gene on HSA7 formed an out-group to them (Fig. 4D). In addition to harmony in their tree topologies the members of these families revealed physical linkage on HSA2 and HSA17 (Fig. 1). Paralogs of KCNH and METTL families have representation on at least three of the four HOX-cluster bearing chromosomes (HSA2, HSA7 and HSA17). Careful analysis of tree topologies revealed that representatives of these families diversified by duplication events that occurred in tertapod lineage. For instance KCNH members (on HSA17 and HSA7) experienced a duplication event at the root

of amniote lineage, whereas METTL paralogs (on HSA17 and HSA7) experienced a duplication event that appears to have occurred specifically in primate history. Furthermore, among KCNH family members KCNH3, KCNH4 and KCNH8 diversified by duplication events that occurred within time window of fish–tetrapod and vertebrate–invertebrate split and exhibits a topology of type ((HSA17, HSA3) HSA12) (Table 2). ITGA gene family has representation on at least three of the HOX-cluster bearing chromosomes (HSA2, HSA12 and HSA17) and showed close physical proximity with human HOX-clusters (Fig. 1). Among ITGA family members ITGA6, ITGA7 and ITGA3 diversified by duplication events that occurred within time window of fish–tetrapod and vertebrate–invertebrate split and exhibits a topology of type ((HSA12, HSA2) HSA17) with highly significant (100%) bootstrap support (Fig. 4C, Table 2). NACA gene family has representation on three HOX-cluster bearing chromosomes (HSA2, HSA12 and HSA17) (Fig. 1) and diversified by at least two vertebrate specific duplication events (Table 2) with genes on HSA12 and HSA17 clustered together and gene on HSA2 formed an out-group to them, i.e. ((HSA12, HSA17) HSA2) (Appendix 2). Assuming an independent translocation event from HSA7 to HSA22 (Fig. 4B), CRY paralogs have representation on three of the human HOX-cluster bearing chromosomes and showed a tree topology pattern ((HSA22, HSA17) HSA2) with a strong bootstrap support (99%) (Table 2). Phylogenetic trees of twelve gene families (ABC, BAZ, CCT, CDK, KCNJ, MYO1, MLX, RAPGEF, RNF, SLC5A, SOCS and SLC38A) involved very ancient duplication events that occurred at least prior to invertebrate-vertebrate split. Phylogeny of ABC gene family revealed that the members of this family diversified by in total sixteen duplication events, eight of them might have occurred very anciently predating Bilaterian–Nonbilaterian divergence (Fig. 3). BAZ gene family recovered total seven events, five of them occurred anciently at least prior to the divergence of invertebrate-chordate and vertebrate lineage (Appendix 2). Paralogs of CCT gene family experienced total seven duplication events, six of them are ancient and may predate the divergence of invertebrate chordate and vertebrate lineage (Appendix 2). Phylogeny of CDK gene family revealed eight independent duplication events that occurred very anciently prior to Bilaterian–Nonbilaterian split (Appendix 2). Tree topology of KCNJ family suggested that members of this family diversified by in total thirteen duplication events, four of them occurred anciently at least prior to the divergence of invertebrate chordate and vertebrate lineage (Appendix 2). The phylogenetic tree pattern suggests that MYO1 gene family paralogs originated by eight duplication events, four of them occurred anciently at least prior to the divergence of invertebrate chordate and vertebrate lineage. Tree topology of MLX gene family suggested that members of this family diversified anciently at least prior to the divergence of invertebrate chordate and vertebrate lineages (Appendix 2). Members of SOCS gene family arose by seven duplications events, five of them occurred very anciently prior to Bilaterian and Nonbilaterian split (Appendix 2). The phylogenetic tree of RAPGEF gene family suggested that the members of this family diversified by in total five duplication events, three of them occurred anciently prior to protostomes and deuterostomes split. Paralogs of RNF gene family diversified by in total nine duplication events, five of them are ancient and may predate the divergence of the Bilaterian–Nonbilaterian lineages (Appendix 2). SLC5A gene family recovered ten duplication events, six of them occurred anciently at least prior to protostomes and deuterostomes split. For SLC38A gene family seven duplication events recovered, three of them have occurred very anciently predating Bilaterian–Nonbilaterian divergence (Appendix 2).

Please cite this article in press as: Ambreen, S., et al. Integrating large-scale phylogenetic datasets to dissect the ancient evolutionary history of vertebrate genome. Mol. Phylogenet. Evol. (2014), http://dx.doi.org/10.1016/j.ympev.2014.05.002

225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289

YMPEV 4898

No. of Pages 13, Model 5G

14 May 2014 S. Ambreen et al. / Molecular Phylogenetics and Evolution xxx (2014) xxx–xxx

7

Fig. 2. Neighbor-Joining tree of the FZD family. Uncorrected p-distance was used. Complete-deletion option was used. Numbers on branches represent bootstrap values (based on 1000 replications) supporting that branch; only the values P 50% are presented here. Scale bar shows amino acid substitution per site.

290 291 292 293 294 295 296

Tree topology comparison approach suggests that harmony among the phylogenetic tree branching pattern of distinct gene families revealing conserved physical linkage on human paralogy groups might reflect their parallel origin, thus defining the same co-duplicated group (Abbasi, 2010; Abbasi and Grzeschik, 2007; Asrar et al., 2013). Whereas dissimilar tree topologies of distinct gene sets sharing physical location on human paralogy groups

reflects that concerned families might not have duplicated in concert with each other (Abbasi, 2010; Abbasi and Grzeschik, 2007; Abbasi and Hanif, 2012; Asrar et al., 2013; Hughes and Friedman, 2003). Based on these assumptions, previous data comprising of 44 multigene families (members residing on human HOX-cluster paralogons) were categorized into four distinct co-duplicated groups (Abbasi, 2010). The first co-duplicated group with the

Please cite this article in press as: Ambreen, S., et al. Integrating large-scale phylogenetic datasets to dissect the ancient evolutionary history of vertebrate genome. Mol. Phylogenet. Evol. (2014), http://dx.doi.org/10.1016/j.ympev.2014.05.002

297 298 299 300 301 302 303

YMPEV 4898

No. of Pages 13, Model 5G

14 May 2014 8

S. Ambreen et al. / Molecular Phylogenetics and Evolution xxx (2014) xxx–xxx

Fig. 3. Neighbor-Joining tree of the ABC family. Uncorrected p-distance was used. Complete-deletion option was used. Numbers on branches represent bootstrap values (based on 1000 replications) supporting that branch; only the values P 50% are presented here. Scale bar shows amino acid substitution per site.

Please cite this article in press as: Ambreen, S., et al. Integrating large-scale phylogenetic datasets to dissect the ancient evolutionary history of vertebrate genome. Mol. Phylogenet. Evol. (2014), http://dx.doi.org/10.1016/j.ympev.2014.05.002

YMPEV 4898

No. of Pages 13, Model 5G

14 May 2014 S. Ambreen et al. / Molecular Phylogenetics and Evolution xxx (2014) xxx–xxx

9

Fig. 4. Consistencies in phylogenies of families (analyzed in this and our previous work) including members on at least three of the human HOX-bearing chromosomes (A) schematic topology of GLI, INHB, HH, SLC4A, OSBPL, PDE, SCRN, TNS, RAMP, NXPH and GRB families (B) schematic topology of ERBB, ZNFN1A, IGFBP, CBX, PDK and CRY family members (C) schematic topology of HOX-clusters, SP members, HNRNPA, FMNL and ITGA family (D) schematic topology of integrin beta chain, ATP5G, RND, ORMDL, PPP1R1, ZNF385, NEUROD and FZD gene families. In each case the percentage bootstrap support of the internal branches is given in parentheses. The connecting bars on the left depict the close physical linkages of relevant genes. Genes analyzed in this study are enclosed within rectangles.

304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325

topology of type ((HSA7 HSA2) HSA12/17) was the largest and suggested simultaneous duplication of seven gene families, i.e. GLI, INHB, HH, SLC4A, OSBPL, PDE, SCRN, TNS, RAMP, NXPH and GRB families (Fig. 4A); co-duplicated group-2 presented a topology of the type (((HSA7 HSA17) HSA2) HSA12) and involved the members from ERBB, ZNFN1A, IGFBP, CBX, PDK families (Fig. 4B); co-duplicated group-3 suggested the topology of the type (((HSA2 HSA12)HSA7)HSA17) and included HOX-clusters and members of the SP, HNRNPA, FMNL gene families (Fig. 4C); and co-duplicated group-4 involved the genes from ITGB, MYL, ATP5G, RND, ORMDL, PPP1R1, ZNF385 gene families with the topology of the type ((HSA17/3 HSA2) HSA12/HSA7) (Fig. 4D). In the present study, we recovered in total 83 vertebrate specific duplication events for 19 human multigene families residing on HOX-cluster paralogons. Careful analysis of current data revealed that duplication history of at least four families reconciled with previously recovered four distinct vertebrate specific coduplicated groups (Fig. 4). The phylogenetic tree of CRY gene family showed topology that was in harmony with the previously recovered co-duplicated group-2. Thus, taken together the previous and current data the co-duplicated group-2 involves simultaneous diversification of at

least 6 HOX linked gene families through three rounds of gene cluster/segmental duplication events (Table 2, Fig. 4B). Previously recovered co-duplicated group-3 comprises of prominent HOX gene clusters, whose evolutionary conserved genomic architecture suggests ancient origin (Abbasi, 2010; Abbasi and Grzeschik, 2007; Asrar et al., 2013). To elucidate the timing and pattern of these events, the evolutionary histories of HOX linked gene families provided an important insight to HOX evolution. Previously, the histories of SP, HNRNPA and FMNL gene families with their members residing in close proximity of human HOX-clusters suggested that these families might have diversified in concert with the vertebrate HOX-clusters through three rounds of successive SD events (Fig. 4C). Among the families analyzed in the present study ITGA paralogs are positioned in the close vicinity of human HOX-clusters, with ITGA3 gene mapping at 1.3 Mb centromeric to HOXB, and ITGA7 at 1.6 Mb centromeric to HOXC, while ITGA6 is 3.5 Mb centromeric to HOXD (Fig. 1). These HOX linked ITGA genes revealed a tree topology of the type where paralog on HSA2 and HSA12 are grouped together, while a paralog on HSA17 form an out group to them. Assuming, an independent gene loss from HSA7, we can conclude that vertebrate ITGA gene tree topology is in harmony with closely linked SP, HNRNPA, FMNL

Please cite this article in press as: Ambreen, S., et al. Integrating large-scale phylogenetic datasets to dissect the ancient evolutionary history of vertebrate genome. Mol. Phylogenet. Evol. (2014), http://dx.doi.org/10.1016/j.ympev.2014.05.002

326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347

YMPEV 4898

No. of Pages 13, Model 5G

14 May 2014 10

S. Ambreen et al. / Molecular Phylogenetics and Evolution xxx (2014) xxx–xxx

Fig. 5. The relative timing of duplication events that diversified the multigene families residing on human HOX-cluster paralogons. The branching order within phylogenetic trees was used to estimate the time windows of gene duplication events relative to major cladogenetic events. For 62 multigene families (in total 364 genes) residing on HSA2/7/12/17, 114 duplication events were detected before vertebrate–invertebrate split. 174 duplications were detected after vertebrate–invertebrate and before tetrapod– bony fish divergence whereas only seventeen tetrapod specific duplication events were detected. The numbers enclosed within the parentheses in front of the gene family names represent number of duplications experienced by that gene family.

Please cite this article in press as: Ambreen, S., et al. Integrating large-scale phylogenetic datasets to dissect the ancient evolutionary history of vertebrate genome. Mol. Phylogenet. Evol. (2014), http://dx.doi.org/10.1016/j.ympev.2014.05.002

YMPEV 4898

No. of Pages 13, Model 5G

14 May 2014 11

S. Ambreen et al. / Molecular Phylogenetics and Evolution xxx (2014) xxx–xxx Table 2 Summary of the phylogenetic analysis of gene families whose three or more members are residing on HOX-cluster paralogons. Family name

Hsa2/1*

Hsa7/3*

Hsa12

Hsa17

Consistency with HOX phylogeny

Topology

Previous study Abbasi and Grzeschik (2007), Abbasi (2010) and Asrar et al. (2013) ERBB ERBB4 EGFR ERBB3

ERBB2



Collagen

(((17, 7) 2) 12) 97,98 ((((12,17)7)2)2) 93,92,83

COL1A2

COL2A1

COL1A1



IGFBP1 IGFBP3

IGFBP6

IGFBP4



INTB

COL3A1 COL5A2 IGFBP2 IGFBP5 ITGB6

ITGB5*

ITGB7

ITGB3



MYL

MYL1



MYL6

MYL4



SP

Sp3

Sp4

Sp1

Sp2

Yes

ZNFN1A

ZNFN1A2

ZNFN1A1

ZNFN1A4

ZNFN1A3



INHB

INHBB

INHBA

INHBC INHBE





SLC4A

SLC4A3

SLC4A2

––

SLC4A1



GLI

GLI2

GLI3

GLI1





HH

IHH

SHH

DHH

OSBPL

OSBPL6

OSBPL3



OSBPL7



PDE1

PDE1A

PDE1C

PDE1B





SCRN

SCRN3

SCRN1



SCRN2



CBX



CBX3

CBX5

CBX1



HNRNPA

HNRNPA3

HNRNPA2B1

HNRNPA1



Yes

ATP5G

ATP5G3



ATP5G2

ATP5G1



RND

RND3



RND1

RND2



PDK FMNL

PDK1 FMNL2

PDK4 –

– FMNL3

PDK2 FMNL1

– Yes

GRB

GRB14

GRB10



GRB7



NXPH

NXPH2

NXPH1

NXPH4

NXPH3



RAMP

RAMP1

RAMP3



RAMP2



TMEM106



TMEM106B

TMEM106C

TMEM106A



ORMDL

ORMDL1



ORMDL2

ORMDL3



PLEKHA

PLEKHA3

PLEKHA8

PLEKHA9





PPP1R1

PPP1R1C



PPP1R1A

PPP1R1B



TNS

TNS1

TNS3

TENC1





VAMP

VAMP3*



VAMP1

VAMP2

Yes

IGFBP

*



ZNF385

ZNF385B

ZNF385D

ZNF385A

ZNF385C



ACCN

ACCN4

ACCN3

ACCN2

ACCN1



CACNB

CACNB4



CACNB3

CACNB1



This study FZD

FZD7

FZD1



FZD2



NEUROD

NEUROD1

NEUROD6

NEUROD4

NEUROD2



KCNH



KCNH8*

KCNH3

KCNH4



CRY

CRYBA2

CRYBA4 



CRYBA1



((17, 7)2) ((7, 2)12) 99,91 (((3, 17)2)12) 98,99 ((17, 2)12) 87 (((12, 2) 7) 17) 98,89 (((7, 17) 2) 12) 94,90 ((7, 2)12) 93 ((7, 2)17) 85 ((7, 2)12) 99 ((7, 2)12) 97 ((7, 2)17) 90 ((7, 2)12) 99 ((7, 2)17) 68 ((17, 7) 12) 51 ((12, 2) 7) 98 ((17, 2)12) 92 ((17, 2)12) 99 ((17,7)2) ((2,12)17) 100 ((2,7)17) 92 (((2,7)12)17) 95,89 ((2,7)17) 76 ((7,12)17) 81 ((2,17)12) 60 ((7,12)2) 99 ((17,2)12) 98 ((2,7)12) 76,83 ((1,12)17) 67 (((3,2)12)17) 58,73 (((12,17)2)7) 96,69 ((2,12)(17,10))  99 ((17, 2)7) 93 (((17,2)7)12)) 84,100 ((17,3)12) 100,96 ((17,22)2) 99 (continued on next page)

Please cite this article in press as: Ambreen, S., et al. Integrating large-scale phylogenetic datasets to dissect the ancient evolutionary history of vertebrate genome. Mol. Phylogenet. Evol. (2014), http://dx.doi.org/10.1016/j.ympev.2014.05.002

YMPEV 4898

No. of Pages 13, Model 5G

14 May 2014 12

S. Ambreen et al. / Molecular Phylogenetics and Evolution xxx (2014) xxx–xxx

Table 2 (continued) Family name

Hsa2/1*

Hsa7/3*

Hsa12

Hsa17

Consistency with HOX phylogeny

Topology

ITGA

ITGA6



ITGA7

ITGA3

Yes

NACA



NACAD

NACA

NACA2



((12, 2)17) 97 ((12,17)2) 99

For each gene family the chromosomal location and topologies (in the Newick format) of those genes are given, which arose through duplications after the invertebrates vertebrates split and before the tetrapod-fish divergence. The percentage bootstrap support of the internal branches is given below each relevant topology. Represents the non-HOX bearing chromosomes.   indicates that the gene family member is positioned on a different chromosome, i.e. CACNB2 is on Hsa10 and CRYBA4 on Hsa22. *

348 349 350 351 352 353 354 355 356 357 358

and HOX genes (Fig 4C). Thus, among the families analyzed in the present study, the tree topology and chromosomal location of ITGA paralogs support to the notion that HOX-cluster were duplicated deep in vertebrate history through three rounds of SDs (Fig. 4C and Table 2). FZD and NEUROD gene families showed topologies that were in harmony with previously recovered co-duplicated group-4. Thus, taken together previous and current data co-duplicated group-4 involves simultaneous diversification of at least nine HOX linked gene families through two rounds of gene clusters or segmental duplication events (Fig. 4D and Table 2).

359

4. Discussion

360

Post genomic approaches such as map self-comparison and genome-wide pairwise comparisons can provide an important insight into those genome shaping events that occurred in the recent history of species, because such events are unobscured by long term evolutionary divergence, breakage and rearrangements. For instance, comparing physical organization of genes within the human genome and its comparison with genomes of multiple primate species revealed a intricate pattern of recent duplications termed as segmental duplications (SDs) (Bailey et al., 2002; Cheng et al., 2005). These recent duplications present in at least two genomic locations and showing high sequence identity (>90%) ranging in size from 300 kb to 1 Mb (Samonte and Eichler, 2002). It has been estimated that these primate segmental duplications and rearrangements account up to 5% of the human genome (Bailey and Eichler, 2006). Detailed comparative analysis of genomic data across the species has attributed several roles to these segmental duplication events: creating novel primate genes, shaping of primate genomes, expansion of gene families, and initiating large scale hominoid specific chromosomal rearrangements (Marques-Bonet et al., 2009). In contrast, the absolute nature of evolutionary events that had led to creation of ancient (>450) paralogy regions in the vertebrate genome, is extremely difficult to track through inter-genomic and intra-genomic map comparison approaches because such ancient events experienced multiple chromosomal breakages and rearrangement events that lead to the alteration of karyotype and disruption of gene order on chromosomes. A more convincing way to determine the mechanism of origin of vertebrate ancient paralogons is phylogenetic analysis of multigene families (Abbasi, 2010; Abbasi and Grzeschik, 2007; Asrar et al., 2013; Hughes, 1998; Hughes et al., 2001a). This approach effectively apprehends the precise nature of anciently duplicated genomic regions in two ways: Firstly, by estimating relative timing of duplication events occurring prior or after a speciation event. This type of relative dating can provide a robust picture of extent of duplication events within particular time window. For instance, if the phylogenies indicate that the majority of the paralogons originated before the separation of tetrapod–fish and after invertebrate–vertebrate split,

361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397

this reflects that large-scale duplications have occurred between these speciation events (Van de Peer, 2004). Secondly, the evolutionary origin of paralogons can be examined by coupling the information from the global physical organization of gene families comprising of paralogons with their relevant tree topologies (branching order of phylogenetic tree). Correspondence among the topologies of distinct multigene families comprising human paralogons would suggest that these families might have arose simultaneously, through segmental or block duplication events. This mechanism is well explained and applied in previous studies (Abbasi, 2010; Abbasi and Grzeschik, 2007; Abbasi and Hanif, 2012; Asrar et al., 2013; Hughes et al., 2001; Zhang and Nei, 1996). Together with our recent data (Abbasi, 2010; Abbasi and Grzeschik, 2007; Asrar et al., 2013), we constructed the phylogenetic trees of 62 gene families with members on at least three of the four HOX-cluster bearing chromosomes (Fig. 1). To test whether the fourfold paralogy seen on human HOX paralogons (Fig. 1) is an outcome of quadruplication of a single ancestral block, topology comparison approach was employed to check the consistencies among these phylogenies and between these phylogenies and the HOX phylogeny (Table 2). This analysis, revealed four distinct co-duplication events which occurred within the time window of vertebrate–invertebrate and tetrapod–bony fish divergence (Fig. 4). The recovery of these co-duplicated groups is highlighting the fact that HOX-cluster paralogons were shaped by segmental duplications and rearrangement events that occurred at the root of vertebrate lineage (Fig. 4). Gene families belonging to a particular co-duplicated group suggest that they share similar evolutionary history and might have originated through gene cluster duplication event, whereas the genes belonging to different coduplicated groups may not share the evolutionary history and might not have duplicated simultaneously. The conservation of gene content organization on different chromosomal region implies some functional significance. For instance, the co-expression of neighboring genes is mediated by high order structural organization of chromosomes, which brings together the far off genomic regions in close proximity in order to be expressed together in a coordinated manner (Parveen et al., 2013). Similarly, the gene regulatory elements spread across long regions impose critical constraint on genomic architecture and are known to have maintained exceptionally long syntenic blocks both within and across species (Meaburn and Misteli, 2007).

398

5. Conclusion

440

The present study aims to test the validity of the 2R hypothesis which links the organismal complexity with genome-wide duplications during early vertebrate evolution. Taken together with previous data our results based on large-scale phylogenetic dataset (duplication history of 364 human genes/63 gene families) demonstrate that vertebrate genome evolved by relatively small-scale, regional duplication events at widely different time points in

441

Please cite this article in press as: Ambreen, S., et al. Integrating large-scale phylogenetic datasets to dissect the ancient evolutionary history of vertebrate genome. Mol. Phylogenet. Evol. (2014), http://dx.doi.org/10.1016/j.ympev.2014.05.002

399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439

442 443 444 445 446 447

YMPEV 4898

No. of Pages 13, Model 5G

14 May 2014 S. Ambreen et al. / Molecular Phylogenetics and Evolution xxx (2014) xxx–xxx

450

animal history. These results conclude that mechanisms of duplications at the root of vertebrate history are not different than those that shaped our genome during its recent evolutionary history.

451

6. Authors’ contributions

452 454

A.A.A. conceived the project and designed the experiments. S.A. and F.K. performed the experiments. A.A.A., S.A. and F.K. analyzed the data. A.A.A. and S.A wrote the paper.

455

Acknowledgments

456 458

This study was supported by Higher Education Commission (HEC) of Pakistan and National Center for Bioinformatics, Quad-iAzam University, Islamabad.

459

Appendix A. Supplementary material

460 462

Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.ympev.2014.05. 002.

463

References

464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490

Abbasi, A.A., 2008. Are we degenerate tetraploids? More genomes, new facts. Biol. Direct. 3, 50. Abbasi, A.A., 2010. Unraveling ancient segmental duplication events in human genome by phylogenetic analysis of multigene families residing on HOX-cluster paralogons. Mol. Phylogenet. Evol. 57, 836–848. Abbasi, A.A., Grzeschik, K.H., 2007. An insight into the phylogenetic history of HOX linked gene families in vertebrates. BMC Evol. Biol. 7, 239. Abbasi, A.A., Hanif, H., 2012. Phylogenetic history of paralogous gene quartets on human chromosomes 1, 2, 8 and 20 provides no evidence in favor of the vertebrate octoploidy hypothesis. Mol. Phylogenet. Evol. 63, 922–927. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J., 1990. Basic local alignment search tool. J. Mol. Biol. 215, 403–410. Asrar, Z., Haq, F., Abbasi, A.A., 2013. Fourfold paralogy regions on human HOXbearing chromosomes: role of ancient segmental duplications in the evolution of vertebrate genome. Mol. Phylogenet. Evol. 66, 737–747. Bailey, J.A., Eichler, E.E., 2006. Primate segmental duplications: crucibles of evolution, diversity and disease. Nat. Rev. Genet. 7, 552–564. Bailey, J.A., Gu, Z., Clark, R.A., Reinert, K., Samonte, R.V., Schwartz, S., Adams, M.D., Myers, E.W., Li, P.W., Eichler, E.E., 2002. Recent segmental duplications in the human genome. Science 297, 1003–1007. Canestro, C., Albalat, R., Irimia, M., Garcia-Fernandez, J., 2013. Impact of gene gains, losses and duplication modes on the origin and diversification of vertebrates. Semin. Cell Dev. Biol. 24, 83–94. Cheng, Z., Ventura, M., She, X., Khaitovich, P., Graves, T., Osoegawa, K., Church, D., DeJong, P., Wilson, R.K., Paabo, S., Rocchi, M., Eichler, E.E., 2005. A genome-wide comparison of recent chimpanzee and human segmental duplications. Nature 437, 88–93.

448 449

453

457

461

13

Clamp, M., Andrews, D., Barker, D., Bevan, P., Cameron, G., Chen, Y., Clark, L., Cox, T., Cuff, J., Curwen, V., Down, T., Durbin, R., Eyras, E., Gilbert, J., Hammond, M., Hubbard, T., Kasprzyk, A., Keefe, D., Lehvaslaiho, H., Iyer, V., Melsopp, C., Mongin, E., Pettett, R., Potter, S., Rust, A., Schmidt, E., Searle, S., Slater, G., Smith, J., Spooner, W., Stabenau, A., Stalker, J., Stupka, E., Ureta-Vidal, A., Vastrik, I., Birney, E., 2003. Ensembl 2002: accommodating comparative genomics. Nucl. Acids Res. 31, 38–42. Felsenstein, J., 1985. Confidence limit on phylogenies: an approach using the bootstrap. Evolution 39, 95–105. Hubbard, T., Barker, D., Birney, E., Cameron, G., Chen, Y., Clark, L., Cox, T., Cuff, J., Curwen, V., Down, T., Durbin, R., Eyras, E., Gilbert, J., Hammond, M., Huminiecki, L., Kasprzyk, A., Lehvaslaiho, H., Lijnzaad, P., Melsopp, C., Mongin, E., Pettett, R., Pocock, M., Potter, S., Rust, A., Schmidt, E., Searle, S., Slater, G., Smith, J., Spooner, W., Stabenau, A., Stalker, J., Stupka, E., Ureta-Vidal, A., Vastrik, I., Clamp, M., 2002. The Ensembl genome database project. Nucl. Acids Res. 30, 38–41. Hughes, A.L., 1998. Phylogenetic tests of the hypothesis of block duplication of homologous genes on human chromosomes 6, 9, and 1. Mol. Biol. Evol. 15, 854– 870. Hughes, A.L., Friedman, R., 2003. 2R or not 2R: testing hypotheses of genome duplication in early vertebrates. J. Struct. Funct. Genom. 3, 85–93. Hughes, A.L., da Silva, J., Friedman, R., 2001. Ancient genome duplications did not structure the human Hox-bearing chromosomes. Genome Res. 11, 771–780. Johnson, M., Zaretskaya, I., Raytselis, Y., Merezhuk, Y., McGinnis, S., Madden, T.L., 2008. NCBI BLAST: a better web interface. Nucl. Acids Res. 36, W5–W9. Kumar, S., Nei, M., Dudley, J., Tamura, K., 2008. MEGA: a biologist-centric software for evolutionary analysis of DNA and protein sequences. Brief Bioinform. 9, 299–306. Marques-Bonet, T., Girirajan, S., Eichler, E.E., 2009. The origins and impact of primate segmental duplications. Trends Genet. 25, 443–454. Martin, A., 2001. Is tetralogy true? Lack of support for the ‘‘one-to-four rule’’. Mol. Biol. Evol. 18, 89–93. Meaburn, K.J., Misteli, T., 2007. Cell biology: chromosome territories. Nature 445, 379–781. Ohno, S., 1970. Evolution by Gene Duplication. Springer-Verlag. Ohno, S., 1973. Ancient linkage groups and frozen accidents. Nature 244, 259–262. Parveen, N., Masood, A., Iftikhar, N., Minhas, B.F., Minhas, R., Nawaz, U., Abbasi, A.A., 2013. Comparative genomics using teleost fish helps to systematically identify target gene bodies of functionally defined human enhancers. BMC Genom. 14, 122. Russo, C.A.M., Takezaki, N., Nei, M., 1996. Efficiencies of different genes and different tree-building methods in recovering a known vertebrate phylogeny. Mol. Biol. Evol. 13, 525–536. Saitou, N., Nei, M., 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425. Samonte, R.V., Eichler, E.E., 2002. Segmental duplications and the evolution of the primate genome. Nat. Rev. Genet. 3, 65–72. Thompson, J.D., Higgins, D.G., Gibson, T.J., 1994. Clustal-W – improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucl. Acids Res. 22, 4673–4680. Van de Peer, Y., 2004. Computational approaches to unveiling ancient genome duplications. Nat. Rev. Genet. 5, 752–763. Whelan, S., Goldman, N., 2001. A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol. Biol. Evol. 18, 691–699. Zhang, J., Nei, M., 1996. Evolution of Antennapedia-class homeobox genes. Genetics 142, 295–303.

Please cite this article in press as: Ambreen, S., et al. Integrating large-scale phylogenetic datasets to dissect the ancient evolutionary history of vertebrate genome. Mol. Phylogenet. Evol. (2014), http://dx.doi.org/10.1016/j.ympev.2014.05.002

Q5

491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548