Phylogenomic analysis reveals ancient segmental duplications in the human genome

Phylogenomic analysis reveals ancient segmental duplications in the human genome

Accepted Manuscript Phylogenomic analysis reveals ancient segmental duplications in the human genome Madiha Hafeez, Madiha Shabbir, Fouzia Altaf, Amir...

1MB Sizes 1 Downloads 64 Views

Accepted Manuscript Phylogenomic analysis reveals ancient segmental duplications in the human genome Madiha Hafeez, Madiha Shabbir, Fouzia Altaf, Amir Ali Abbasi PII: DOI: Reference:

S1055-7903(15)00250-X http://dx.doi.org/10.1016/j.ympev.2015.08.019 YMPEV 5281

To appear in:

Molecular Phylogenetics and Evolution

Received Date: Revised Date: Accepted Date:

21 April 2015 4 August 2015 21 August 2015

Please cite this article as: Hafeez, M., Shabbir, M., Altaf, F., Abbasi, A.A., Phylogenomic analysis reveals ancient segmental duplications in the human genome, Molecular Phylogenetics and Evolution (2015), doi: http://dx.doi.org/ 10.1016/j.ympev.2015.08.019

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Phylogenomic analysis reveals ancient segmental duplications in the human genome Madiha Hafeez

1

1, 2

, Madiha Shabbir

1, 2

,Fouzia Altaf

1, 2

1

& Amir Ali Abbasi *

National Center for Bioinformatics, Program of Comparative and Evolutionary Genomics, Faculty of

Biological Sciences, Quaid-i-Azam University, Islamabad 45320, Pakistan. 2

These authors contributed equally to this work.

*Corresponding author E-mail: [email protected] Tel Office: +92-51-90644109

1

Abstract Evolution of organismal complexity and origin of novelties during vertebrate history has been widely explored in context of both regulation of gene expression and gene duplication events. Ohno (1970) for the first time put forward the idea of two rounds whole genome duplication events as the most plausible explanation for evolutionarizing the vertebrate lineage (2R hypothesis). To test the validity of 2R hypothesis, a robust phylogenomic analysis of multigene families with

triplicated or quadruplicated representation on human FGFR bearing chromosomes (4/5/8/10) was performed. Topology comparison approach categorized members of 80 families into five

distinct co-duplicated groups. Genes belonging to one co-duplicated group are duplicated concurrently, whereas genes of two different co-duplicated groups do not share their duplication history and have not duplicated in congruency. Our findings contradict the 2R model and are

indicative of small-scale duplications and rearrangements that cover the entire span of animal’s history. Keywords FGFR; Phylogeny; Gene duplications; Whole genome duplications; Vertebrates; Human chromosomes; Paralogon

2

1. Introduction One of the major focuses of evolutionary developmental biologists is to unveil the genetic underpinnings of major changes in organismal design and the origin of novelties during evolutionary history of animals. A large body of evolutionary developmental (evo-devo) studies has revealed that differences among closely and distantly related animal taxa are associated with differences in the spatial and temporal aspects of gene expression regulation during development (Villar et al., 2014). Still others consider increase in gene number by duplication as a principle mechanism responsible for increase in organismal complexity and diversity (Ohno, 1973; Van de Peer et al., 2009). Evolution of organismal complexity and origin of novelties during vertebrate history has been widely explored in context of both small and large scale gene duplication events (Abbasi, 2008; Rogers and Gibbs, 2014). The first instance was in the year 1970 when Susumu Ohno put forward the idea of whole genome duplications (WGD) as the most plausible explanation of evolution of form during early history of vertebrates (Ohno, 1970). This notion popularly theorized as “2R hypothesis” (two rounds of WGDs) has been widely debated (Abbasi, 2010a). Among substantial evidence adduced in favor of ancient vertebrate polyploidy (genome duplications), the most widely cited suggests the existence of paralogons or paralogous genomic segments in vertebrate genomes: homologous chromosomal segments within the genome sharing similar sets of genes (Hwang et al., 2013; Putnam et al., 2008). Precisely, the occurrence of four potential quadruplicated regions, notably on Homo sapiens autosome (Hsa) 1/6/9/19, Hsa 4/5/8/10, Hsa 1/2/8/10 and the HOX-cluster bearing chromosomes Hsa 2/7/12/17, are considered to be structured by two rounds of polyploidy (Furlong and Holland, 2002) . However, it is alternatively hypothesized that the profusion of paralogy regions on human chromosomes is due to a high rate of local duplications, translocations and genomic 3

rearrangement events that occurred at widely different time points during early vertebrate history, disputing the central tenet of Ohno’s idea (Hughes et al., 2001). Over the years, several human paralogons are being surveyed to determine the mechanistic of embarkation of these vertebrate specific paralogy regions. In this regard, HOX and FGFR clusters are most intensively studied paralogons and are considered to have coevolved through 2R-WGD, collectively contributing towards higher complexity in vertebrates (Coulier et al., 1997). The earlier studies on these paralogons incorporated data from sparse sampling of genes and were limited to few species, leaving this longstanding mystery yet to be solved (Abbasi, 2008). To dig deeper into the debate, our group is continuously putting the efforts in assembling and dating the primordial duplication events that molded animal lineage (Abbasi, 2010b; Abbasi & Hanif, 2012; Ajmal et al., 2014; Ambreen et al., 2014; Asrar et al., 2013). Previously we analyzed the evolutionary histories of 21 multi gene families (93 genes) with triplicated or quadruplicated distribution on human FGFR bearing chromosomes (4/5/8/10). To perform the robust phylogenetic analysis, 1494 sequences were obtained from diverge range of vertebrate and invertebrate species. These data was not substantiated in the favor of en bloc duplications, instead, it appeared that paralogy blocks residing on human FGFR bearing paralogon are the consequence of small-scale duplication events that spread across the entire history of animals (Abbasi, 2010b; Abbasi and Grzeschik, 2007; Ajmal et al., 2014; Ambreen et al., 2014; Asrar et al., 2013; Martin, 2001) In the present study, we extended our previous work (Ajmal et al., 2014) and investigated the evolutionary history of further 59 human multigene families with three or fourfold representation on FGFR bearing chromosomes (Hsa 4/5/8/10). A thorough phylogenetic analysis of these gene families was performed by employing the currently available well-annotated and high-quality 4

finished genomic sequence data using neighbor joining (NJ) and maximum likelihood (ML) methods. In addition, congruency among topologies of 80 phylogenies (59 present data and 21 previous data) was scrutinized to identify the genes which could have duplicated simultaneously at the root of vertebrate history (Abbasi, 2010b; Ajmal et al., 2014) (Fig. 1).

2. Materials and Methods 2.1 Dataset

Genes from 59 gene families (in total 447 genes and 4807 amino acid sequences) with threefold or fourfold representations on human FGFR bearing chromosomes were included in the analysis. The chromosomal location of human gene families was obtained from Ensembl genome browser (Hubbard et al., 2002; Paul et al., 2014), 11 of these families have members on each of the human FGFR bearing chromosomes while 48 have their members on at least three of those chromosomes (Fig. 1a and Appendix 1). Information regarding molecular functions (Appendix 1) of selected gene families was retrieved from GeneReport, available at SOURCE (Diehn et al., 2003). BLASTP (Altschul et al., 1990) of the Ensembl genome browser (Paul et al., 2014), using bidirectional best hit strategy, was employed to attain closest putative orthologous sequences of the human proteins in other species (Clamp et al., 2003; Hubbard et al., 2002). For those organisms for which sequences was not available at Ensembl (Paul et al., 2014), the BLASTP (Altschul et al., 1990) search was carried out against the protein database available at the National Center for Biotechnology Information (Johnson et al., 2008) and the Joint Genome Institute(Nordberg et al., 2014).

5

The list of sequences used in the analysis (from 41 species including 25 tetrapods, 5 teleost fish and 11 invertebrates) is provided in the supplementary material (Appendix 2). The species that were selected in the analysis comprises of Homo sapiens (Human), Mus musculus (Mouse), Pan troglodytes (Chimpanzee), Gorilla gorilla (Gorilla), Callithrix jacchus (Marmoset), Pongo abelii (Orangutan), Macaca mulatta (Macaque), Rattus norvegicus (Rat), Oryctolagus cuniculus (Rabbit), Gallus gallus (Chicken), Taeniopygia guttata (Zebra finch), Canis familiaris (Dog), Felis catus (Cat), Bos Taurus (Cow), Equus caballus (Horse), Loxodonta africana (Elephant), Dasypus novemcinctus (Armadillo), Myotis lucifugus (Microbat), Pteropus vampyrus (Megabat), Monodelphis domestica (Opossum), Ornithorhynchus anatinus (Platypus), Anolis carolinensis (Lizard), Pelodiscus sinensis (Chinese softshell turtle), Xenopus tropicalis (Frog), Erinaceus europaeus (Hedgehog), Danio rerio (Zebrafish), Takifugu rubripes (Fugu), Tetraodon nigroviridis (Tetraodon), Gasterosteus aculeatus (Stickleback), Oryzias latipes (Medaka), Ciona intestinalis (Ascidian), Ciona savignyi (Ascidian), Branchiostoma floridae (Amphioxus), Strongylocentrotus purpuratus (Sea urchin), Drosophila melanogaster (Fruit fly), Apis mellifera (Honey bee), Anopheles gambiae (Mosquito), Caenorhabditis elegans (Nematode), Nematostella vectensis (Sea anemone), Hydra magnipapillata (Hydra) and Amphimedon queenslandica (Sponge). 2.2 Alignment and phylogenetic analysis

Amino acid sequences were aligned by using CLUSTAL W (Thompson et al., 1994) under default parameters. Phylogenetic trees were constructed by using Neighbor-Joining (NJ) method (Russo et al., 1996; Saitou and Nei, 1987) with p-distance as amino-acid substitution model, implemented in MEGA version 5 (Kumar et al., 2008; Tamura et al., 2011). Complete deletion option was selected to eliminate any site which can introduce a gap in the sequences. The 6

sequences that were too diverged, disrupting the entire tree were excluded. Tree topologies were supported by bootstrap (Felsenstein, 1985) with 1000 replicates. To systematically check and validate trees with different reconstruction method; Maximum Likelihood with Whelan and Goldman (WAG) model (Whelan and Goldman, 2001), using MEGA 5 program (Tamura et al., 2011) was implemented. Results from both the methods are presented here (Appendix 3 and 4). Gene duplication events were estimated by the branching order of each gene family within the phylogenetic tree (Appendix 5). The tree topology of each gene family was compared with those of other families to test for consistencies in duplication events. Among the topologies of 59 gene families, the phylogenetic trees of five gene families (CCN, CPEB, HTRA, MARCH, NRG) were rooted with both invertebrate and vertebrate sequences. The phylogenies of 27 families (ABLIM, ACSL, ADAM, ADAMTS, ADRA, ANXA, ARAP, CAMK, COL, DNAJ, DLX, DUSP, EGR, GRK, HNRNP, MAPK, NIPAL, NKX,PDE, PKD, RHO, RUFY, SEC24, SEPT, SLC39A, TSPAN, UNC5) consisted of subfamilies, each of which served to root the other. The phylogenetic trees of DDK, DOCK, EBF, EIF4EBP, FGF, GPR, GPRIN, GRI, KCNIP, MYOZ, NDST, NPM, POU, PPP3C, PSD, SLIT, SNC, SPOCK SYNPO, TCF7 and VDAC were rooted with orthologous sequences from invertebrates. Sequences of vertebrates of DND1 served to root RBM phylogeny, PALLD gene family is rooted with vertebrates of MYOT, sequences of vertebrates of KLF17 served to root KLF phylogeny, PPRGC gene family is rooted with vertebrates of PPRGC1A, SLC26A gene family is rooted by vertebrates of SLC26A8 and TACC phylogeny is rooted with vertebrates TACC3.

7

3. Results and Discussion In order to test Ohno’s prediction of two rounds of whole genome duplications that were speculated to cast a major impact on ancestral vertebrate genome architecture, we conducted a robust phylogenetic analysis of 59 novel gene families (447 human genes and 4807 sequences in total) (Fig. 1a). These data combined with our previous investigation have categorized the phylogenies of 80 families (with three or four fold representation on human FGFR bearing chromosome Hsa 4/5/8/10) (Ajmal et al., 2014) into five discrete co-duplicated groups (Fig. 1b and Appendix 5). Gene families belonging to a particular co-duplicated group share similar evolutionary history and might have originated through gene cluster duplication event at the root of vertebrate lineage, whereas the genes belonging to different co-duplicated groups may not share the evolutionary history and might not have duplicated simultaneously (Fig. 1b). Interestingly, congruent and symmetrical topologies of the type ((A,B)(C,D)) were recovered for the PPP2R2, REEP, FGFR and TCF7 (Fig. 1b) (Ajmal et al., 2014), seeming to be ohnologue families (stemmed by WGD events). We assert, however, that sub-chromosomal duplications and rearrangements might be a more realistic explanation for such congruent and symmetrical tree topology patterns (Abbasi and Hanif, 2012). Furthermore, identifying the timing of duplication events relative to radiation of major animal taxa deciphered approximately similar proportion of duplications before and after vertebrate-invertebrate split (Fig. 2). Thus, relative dating discounts the contention that a burst of gene duplication activity took place in the early vertebrate history after the divergence of invertebrate lineage (McLysaght et al., 2002). These results are in accordance with the data confirming recent segmental duplications (rSDs) in primates (Rogers and Gibbs, 2014) bolstering the hypothesis of a continuous wave of small-scale doublings during animals’ history (Hughes et al., 2001)

Therefore, it appears that current 8

hierarchy of human proteome is created by small-scale events, scattered at different times over the evolutionary history of animal’s life.

4. Conclusion To elucidate the nature of evolutionary events that resulted in the origin of vertebrate quadruplicated or triplicated paralogy blocks, we investigated the evolutionary history of 540 human genes (in total 80 gene families) residing on human FGFR bearing chromosomes (Hsa 4/5/8/10). Our results indicate that ancient segmental duplications (aSDs) and rearrangements helped fuel the key processes involved in shaping the contemporary architecture of human genome. These data compels to infer that the mechanism underlying ancient vertebrate genome evolution is not different from those that occurred during their recent history. Therefore, we are safe to say that tetraploidization is an unlikely event to be proposed as potential shaping force of vertebrate biology and human genome architecture.

Authors’ contributions A.A.A. conceived the project and designed the experiments. M.H., M.S. and F.A. performed the experiments. A.A.A., M.H., M.S. and F.A. analyzed the data. A.A.A., M.H., M.S. and F.A. wrote the paper. Acknowledgments This work was supported by higher education commission of Pakistan.

9

Competing Interest Statement The Authors declare that they have no competing financial interests. References Abbasi, A.A., 2008. Are we degenerate tetraploids? More genomes, new facts. Biol Direct 3, 50. Abbasi, A.A., 2010a. Piecemeal or big bangs: correlating the vertebrate evolution with proposed models of gene expansion events. Nat Rev Genet 11, 166. Abbasi, A.A., 2010b. Unraveling ancient segmental duplication events in human genome by phylogenetic analysis of multigene families residing on HOX-cluster paralogons. Mol Phylogenet Evol 57, 836-848. Abbasi, A.A., Grzeschik, K.H., 2007. An insight into the phylogenetic history of HOX linked gene families in vertebrates. BMC Evol Biol 7, 239. Abbasi, A.A., Hanif, H., 2012. Phylogenetic history of paralogous gene quartets on human chromosomes 1, 2, 8 and 20 provides no evidence in favor of the vertebrate octoploidy hypothesis. Mol Phylogenet Evol 63, 922-927. Ajmal, W., Khan, H., Abbasi, A.A., 2014. Phylogenetic investigation of human FGFR-bearing paralogons favors piecemeal duplication theory of vertebrate genome evolution. Molecular phylogenetics and evolution 81, 49-60. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J., 1990. Basic local alignment search tool. Journal of molecular biology 215, 403-410.

10

Ambreen, S., Khalil, F., Abbasi, A.A., 2014. Integrating large-scale phylogenetic datasets to dissect the ancient evolutionary history of vertebrate genome. Mol Phylogenet Evol 78, 1-13. Asrar, Z., Haq, F., Abbasi, A.A., 2013. Fourfold paralogy regions on human HOX-bearing chromosomes: role of ancient segmental duplications in the evolution of vertebrate genome. Mol Phylogenet Evol 66, 737-747. Clamp, M., Andrews, D., Barker, D., Bevan, P., Cameron, G., Chen, Y., Clark, L., Cox, T., Cuff, J., Curwen, V., Down, T., Durbin, R., Eyras, E., Gilbert, J., Hammond, M., Hubbard, T., Kasprzyk, A., Keefe, D., Lehvaslaiho, H., Iyer, V., Melsopp, C., Mongin, E., Pettett, R., Potter, S., Rust, A., Schmidt, E., Searle, S., Slater, G., Smith, J., Spooner, W., Stabenau, A., Stalker, J., Stupka, E., Ureta-Vidal, A., Vastrik, I., Birney, E., 2003. Ensembl 2002: accommodating comparative genomics. Nucleic Acids Research 31, 38-42. Coulier, F., Pontarotti, P., Roubin, R., Hartung, H., Goldfarb, M., Birnbaum, D., 1997. Of worms and men: an evolutionary perspective on the fibroblast growth factor (FGF) and FGF receptor families. J Mol Evol 44, 43-56. Diehn, M., Sherlock, G., Binkley, G., Jin, H., Matese, J.C., Hernandez-Boussard, T., Rees, C.A., Cherry, J.M., Botstein, D., Brown, P.O., Alizadeh, A.A., 2003. SOURCE: a unified genomic resource of functional annotations, ontologies, and gene expression data. Nucleic Acids Research 31, 219223. Felsenstein, J., 1985. Confidence Limits on Phylogenies: An Approach Using the Bootstrap. Evolution 39, 783-791.

11

Furlong, R.F., Holland, P.W., 2002. Were vertebrates octoploid? Philos Trans R Soc Lond B Biol Sci 357, 531-544. Hubbard, T., Barker, D., Birney, E., Cameron, G., Chen, Y., Clark, L., Cox, T., Cuff, J., Curwen, V., Down, T., Durbin, R., Eyras, E., Gilbert, J., Hammond, M., Huminiecki, L., Kasprzyk, A., Lehvaslaiho, H., Lijnzaad, P., Melsopp, C., Mongin, E., Pettett, R., Pocock, M., Potter, S., Rust, A., Schmidt, E., Searle, S., Slater, G., Smith, J., Spooner, W., Stabenau, A., Stalker, J., Stupka, E., Ureta-Vidal, A., Vastrik, I., Clamp, M., 2002. The Ensembl genome database project. Nucleic Acids Research 30, 38-41. Hughes, A.L., da Silva, J., Friedman, R., 2001. Ancient genome duplications did not structure the human Hox-bearing chromosomes. Genome Res 11, 771-780. Hwang, J.I., Moon, M.J., Park, S., Kim, D.K., Cho, E.B., Ha, N., Son, G.H., Kim, K., Vaudry, H., Seong, J.Y., 2013. Expansion of secretin-like G protein-coupled receptors and their peptide ligands via local duplications before and after two rounds of whole-genome duplication. Mol Biol Evol 30, 1119-1130. Johnson, M., Zaretskaya, I., Raytselis, Y., Merezhuk, Y., McGinnis, S., Madden, T.L., 2008. NCBI BLAST: a better web interface. Nucleic Acids Research 36, 9. Kumar, S., Nei, M., Dudley, J., Tamura, K., 2008. MEGA: a biologist-centric software for evolutionary analysis of DNA and protein sequences. Briefings in bioinformatics 9, 299-306. Martin, A., 2001. Is tetralogy true? Lack of support for the “one-to-four rule”. Molecular biology and evolution 18, 89-93.

12

McLysaght, A., Hokamp, K., Wolfe, K.H., 2002. Extensive genomic duplication during early chordate evolution. Nat Genet 31, 200-204. Nordberg, H., Cantor, M., Dusheyko, S., Hua, S., Poliakov, A., Shabalov, I., Smirnova, T., Grigoriev, I.V., Dubchak, I., 2014. The genome portal of the Department of Energy Joint Genome Institute: 2014 updates. Nucleic Acids Research 42, 31. Ohno, S., 1970. Evolution by gene duplication. Springer-Verlag. Ohno, S., 1973. Ancient linkage groups and frozen accidents. Nature 244, 259-262. Paul, F., Amode, M.R., Daniel, B., Kathryn, B., Konstantinos, B., Simon, B., Denise, C.-S., Peter, C., Guy, C., Stephen, F., Laurent, G., Carlos García, G., Leo, G., Thibaut, H., Sarah, H., Nathan, J., Thomas, J., Andreas, K.K., Stephen, K., Eugene, K., Fergal, J.M., Thomas, M., William, M.M., Daniel, N.M., Rishi, N., Bert, O., Miguel, P., Bethan, P., Emily, P., Harpreet, S.R., Magali, R., Daniel, S., Kieron, T., Anja, T., Stephen, J.T., Alessandro, V., Steven, P.W., Mark, W., Amonida, Z., Bronwen, L.A., Ewan, B., Fiona, C., Jennifer, H., Javier, H., Tim, J.P.H., Rhoda, K., Matthieu, M., Anne, P., Giulietta, S., Andy, Y., Daniel, R.Z., Stephen, M.J.S., 2014. Ensembl 2014. Nucleic Acids Research. Putnam, N.H., Butts, T., Ferrier, D.E., Furlong, R.F., Hellsten, U., Kawashima, T., Robinson-Rechavi, M., Shoguchi, E., Terry, A., Yu, J.K., Benito-Gutierrez, E.L., Dubchak, I., Garcia-Fernandez, J., Gibson-Brown, J.J., Grigoriev, I.V., Horton, A.C., de Jong, P.J., Jurka, J., Kapitonov, V.V., Kohara, Y., Kuroki, Y., Lindquist, E., Lucas, S., Osoegawa, K., Pennacchio, L.A., Salamov, A.A., Satou, Y., Sauka-Spengler, T., Schmutz, J., Shin, I.T., Toyoda, A., Bronner-Fraser, M.,

13

Fujiyama, A., Holland, L.Z., Holland, P.W., Satoh, N., Rokhsar, D.S., 2008. The amphioxus genome and the evolution of the chordate karyotype. Nature 453, 1064-1071. Rogers, J., Gibbs, R.A., 2014. Comparative primate genomics: emerging patterns of genome content and dynamics. Nat Rev Genet 15, 347-359. Russo, C.A., Takezaki, N., Nei, M., 1996. Efficiencies of different genes and different tree-building methods in recovering a known vertebrate phylogeny. Molecular biology and evolution 13, 525536. Saitou, N., Nei, M., 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular biology and evolution 4, 406-425. Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M., Kumar, S., 2011. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Molecular biology and evolution 28, 2731-2739. Thompson, J.D., Higgins, D.G., Gibson, T.J., 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research 22, 4673-4680. Van de Peer, Y., Maere, S., Meyer, A., 2009. The evolutionary significance of ancient genome duplications. Nat Rev Genet 10, 725-732. Villar, D., Flicek, P., Odom, D.T., 2014. Evolution of transcription factor binding in metazoans mechanisms and functional implications. Nat Rev Genet 15, 221-233.

14

Whelan, S., Goldman, N., 2001. A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Molecular biology and evolution 18, 691-699.

Figure Legends Fig. 1.

Phylogenomic analysis of human chromosomes 4, 5, 8 and 10 revealed ancient

segmental duplications. (a) Gene families with at least three-fold representation on human FGFR bearing chromosomes 4, 5, 8 and 10 were subjected to rigorous phylogenetic investigation. Genes analyzed in this study are enclosed within rectangles, whereas the histories of other genes (not enclosed in rectangles) were presented in our previous data (Ajmal et al., 2014). (b) The human genes duplicated concurrently lie in respective co-duplicated groups. In each case the percentage bootstrap support is given in parentheses. The connecting bars on the left depict the close physical linkage. * Represents situation where a gene family member is not residing on FGFR bearing chromosomes. None of the features of this figure are drawn to scale. Fig. 2. The relative timing of duplication events that expanded the multigene families residing on human FGFR paralogon. For 80 multigene families (in total 536 human genes) residing on Hsa 4/5/8/10, in total 201 duplication events were detected before vertebrate–invertebrate split and 263 duplications were detected after vertebrate–invertebrate and before tetrapod–bony fish divergence. Only 15 tetrapod specific duplication events were detected.

15

Figure 1

(A)

( ) (B)

Figure 2 GFRA (1) DPYSL (2) LZTS (2) MXD (1) RGS (3) STK32 (1) PDLIM (1) LOXL (1) REEP (1) SORBS (1) ANK (2) ELOVL ((3)) ABLIM (5) ACSL (3) HTRA (1) ARAP (5) RUFY (1) RHO (4) COL (6) GRK (5) TSPAN (9) SLC39A (4) PDE (5) SLC26A (4) ADAM (4) NKX (12) CCN (8) GRI (3) ELOVL (4) AFAP (2) DPYSL (3) JAKMIP (2) LGI (3) LZTS (2) MXD (3) NSG (2) PPP2R2 (3) RGS ((3)) STK32 (1) FGFR (3) GPRIN (1) DKK (2) EBF (3) SYNPO (2) COL (3) SLC39A (5) TSPAN (8) PDE (4) RUFY (3) EGR (5) NPM (3) NRG (1) DNAJ (3) MAPK (3) ADRA (13) MARCH (3) KLF (8) DLX (9) KCNIP (3) RBM (1) PSD (3) PALLD (1) UNC5 (7) ADAM (6) GRI (7) ADAMTS (10)

NDST (1) GPRIN (1)

SEPT (1) ADAM (1) HNRNPH (1)

ADAM (1) SNC (1)

ARAP (6) ANAXA (2)

POU (1) KLF (1)

EGR (1) NRG (1) TACC (2) PPRGC (2) SEC24 (1) DOCK (1) NIPAL (4) DNAJ (3) MAPK (3) HNRNPH (3) CAMK ((6)) ADRA (4) DUSP (5) MARCH (2) POU (10) KLF (8) SEPT (11) DLX (7) EIF4EBP (3) CPEB (3) RBM (4) PSD (1) PKD (6) PALLD (2) ANXA (6) UNC (3) ADAMTS (4)

FAM13 (2) FAM53 (2) PITX (3) PDLIM (2) LOXL (3) GFRA (3) REEP (4) SORBS (4) ANK (3) TCF7 (3) ABLIM (5) ACSL (4) FGF (2) GPR (2) HTRA (2) RHO (4) GRK (10) ARAP (9) NDST (2) SPOCK (2) SEC24 (2) DOCK (4) NIPAL (3) HNRNPH (5) CAMK (11) DUSP (11) POU (3) SEPT (3) VDAC (2) SNC (1) SLIT (2) PPP3C(3) MYOZ (2) SLC26A (6) NKX (5) ANAXA (7) CCN (5) PKD(3) AFAP (1) LGI (2) LZTS (1) RGS (2) FAM53 (1) GFRA (1) ANK (3) SYNPO (2) FGF (1) EBF (2) ABLIM (1) GRK (3) ARAP (3) PDE (2) RUFY (1) TCF7 (1) SPOCK (1) MAPK (2) CAMK (3) MARCH (3) SEPT (2) PKD (1) PALLD (1) UNC5 (1) SNC (1) KCNIP (2) SLC26A (2) GRI (6) ADAMTS (1) CCN (1)

DPYSL (2) MXD (1) PPP2R2 (1) STK32 (1) LOXL (1) REEP (1) ELOVL (3) ACSL (1) GPR (1) HTRA (2) RHO (2) EGR (1) POU (4) NPM (1) NRG (2) DNAJ (1) ADRA (2) DUSP (1) KLF (4) DLX (3) EIF4EBP (1) CPEB (1) RBM (1) PSD(1) SLIT (1) PPP3C (1) MYOZ (1) ADAM (2) NKX (1) ANAX (4) VDAC (1)

Supplementary Figure1. Gene families Mapped on FGFR bearing chromosomes. Gene families with members on at least three of the human FGFR-bearing chromosomes 4, 5, 8 and 10.

Hsa 4 Hsa 4 PDE6B SLC26A1 NKX1‐1 FAM53A TACC3 MXD4 GRK4 RGS12 ADRA2C NSG1 MSX1 STK32B CRMP1 JAKMIP1 PPP2R2C AFAP1 ABLIM2 HTRA3 HMX1

PKD2 FAM13A GPRIN3 SNCA GRID2 DKK2 LEF1 COL25A1 SEC24B ELOVL6 PITX2 ANK2 CAMK2D NDST4 NDST3 PRSS12 SEC24D SYNPO2 MYOZ2 PDE5A

Hsa 5 Hsa 5 Hsa 5 SLIT2 KCNIP4 GPR125 PPARGC1A LGI2 ARAP2 KLF3 RHOH RBM47 NIPAL1

RUFY3 GRSF1 ADAMTS3 ANKRD17 SEPT11 ANXA3 NKX6‐1 MAPK10 UNC5C TSPAN5 DNAJB14 PPP3CA SLC39A8 UBE2D3 ANXA5 CCNA2 ANKRD50

POU4F2 DCLK2 RBM46 GRIA2

SH3RF1

MARCH1 SPOCK3 ANXA10 PALLD ACSL1 PDLIM3 SORBS2

Hsa 8 8

ADAMTS16

NKX3‐2 CPEB2

MARCH11 PSD3

ACSL6 PDLIM4 SEPT8 VDAC1 TCF7 SEC24A PITX1 SPOCK1 MYOT PKD2L2 FAM13B GFRA3 FAM53C REEP2 EGR1 DNAJC18 UBE2D2 PSD2 NRG2 MASK‐BP3 EIF4EBP3  DND1 ARAP3 SH3RF2 POU4F3 PPP2R2B STK32A DPYSL3 JAKMIP2 ABLIM3 AFAP1L1 PPARGC1B PDE6A CAMK2A NDST1 SLC26A2 SYNPO MYOZ3 ANXA6 GRIA1 NIPAL4 ADAM19 EBF1 ADRA1B

Hsa 10 Hsa 10 Hsa 10

Hsa 8 Hsa 8

PPP2R2A DPYSL2 ADRA1A DUSP4 NRG1 UNC5D

CCNO ELOVL7 HTR1A ADAMTS6 CCNB1 POU5F2 CAMK4 REEP5 LOX MARCH3 ADAMTS19 SLIT3 DOCK2 KCNIP1 NPM1 FGF18 DUSP1 TLX3 NKX2‐5 CPEB4 MSX2 GPRIN1 TSPAN17 NSG2 SNCB MXD3 RGS14 UNC5A GRK6 PDLIM7 FGFR4 PROP1 N4BP3 COL23A1 RUFY1 HNRNPH1 ADAMTS2 MAPK9

GPR124 EIF4EBP1 FGFR1 TACC1 HTRA4 ADAM9 NKX6‐3 DKK4 VDAC3

LZTS1 GFRA2 NPM2 FGF17 DMTN REEP4 LGI3 SLC39A14 PPP3CC SORBS3 PDLIM2 EGR3 RHOBTB2 LOXL2 NKX3‐1 NKX2‐6 ADAM28 ADAM7 DOCK5 EBF2

PSKH2 SLC26A7 RBM12B CCNE2 ESRP1

KLF10 DPYS

AN XA13 POU5F1B ASAP1

BAI1 SLC39A4

CAMK1D SLC39A12

HNRNPF  MARCH8 AGAP4 ANXA8L2 GPRIN2 ANXA8L1 ANXA8 AGAP10 MAPK8 AGAP6 A1CF DKK1 FAM13C ANK3 RHOBTB1 EGR2 REEP3

UBE2W

NIPAL2

KLF6

COL22A1

ANXA11 NGR3 GRID1 SNCG HTR7 CPEB3 PDE6C LGI1 PDLIM1 SORBS1 SLIT1 LOXL4 NKX2‐3 PKD2L1 LZTS2 TLX1 FGF8 NPM3 KCNIP2 PPRC1 ELOVL3 PITX3 GBF1 PSD

MYPN HNRNPH3 RUFY2 TSPAN15 COL13A1 EIF4EBP2 ADAMTS14 UNC5B SPOCK2 DNAJB12 ANXA7 PPP3CB MYOZ1 SYNPO2L AGAP5 SEC24C NDST2 CAMK2G VDAC2 MXI1 DUSP5 ADRA2A ACSL5 TCF7L2  AFAP1L2 ABLIM1  GFRA1 GRK5 RGS10 FGFR2 TACC7 HMX3 HMX2 NKX1‐2 FAM53B ADAM12 DOCK1 EBF3 PPP2R2D JAKMIP3 DPYSL4 STK32C NKX6‐2 GPR123 ADAM8 CALY

Highlights     

Evolutionary history of human FGFR-bearing paralogon was investigated. Phylogenies of 80 gene-families by employing sequences from diverse animal species. Results corroborate that small-scale duplications structured human FGFR-bearing paralogon. These regional duplications occurred at different time points in animal history. These data contradict the assumption of WGDs at the root of vertebrate lineage.

16