Independent evolution of genomic characters during major metazoan transitions

Independent evolution of genomic characters during major metazoan transitions

Author’s Accepted Manuscript Independent evolution of genomic characters during major metazoan transitions Oleg Simakov, Takeshi Kawashima www.elsevi...

919KB Sizes 0 Downloads 15 Views

Author’s Accepted Manuscript Independent evolution of genomic characters during major metazoan transitions Oleg Simakov, Takeshi Kawashima

www.elsevier.com/locate/developmentalbiology

PII: DOI: Reference:

S0012-1606(16)30480-8 http://dx.doi.org/10.1016/j.ydbio.2016.11.012 YDBIO7297

To appear in: Developmental Biology Received date: 1 August 2016 Revised date: 8 November 2016 Accepted date: 14 November 2016 Cite this article as: Oleg Simakov and Takeshi Kawashima, Independent evolution of genomic characters during major metazoan transitions, Developmental Biology, http://dx.doi.org/10.1016/j.ydbio.2016.11.012 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting galley proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Independent evolution of genomic characters during major metazoan transitions Authors: Oleg Simakov [1], Takeshi Kawashima [2] Affiliations: [1] Okinawa Institute of Science and Technology, Okinawa, Japan [2] National Institute of Genetics, Mishima, Japan Abstract Metazoan evolution encompasses a vast evolutionary time scale spanning over 600 million years. Our ability to infer ancestral metazoan characters, both morphological and functional, is limited by our understanding of the nature and evolutionary dynamics of the underlying regulatory networks. Increasing coverage of metazoan genomes enables us to identify the evolutionary changes of the relevant genomic characters such as the loss or gain of coding sequences, gene duplications, micro- and macro-synteny, and non-coding element evolution in different lineages. In this review we describe recent advances in our understanding of ancestral metazoan coding and non-coding features, as deduced from genomic comparisons. Some genomic changes such as innovations in gene and linkage content occur at different rates across metazoan clades, suggesting some level of independence among genomic characters. While their contribution to biological innovation remains largely unclear, we review recent literature about certain genomic changes that do correlate with changes to specific developmental pathways and metazoan innovations. In particular, we discuss the origins of the recently described pharyngeal cluster which is conserved across deuterostome genomes, and highlight different genomic features that have contributed to the evolution of this group. We also assess our current capacity to infer ancestral metazoan states from gene models and comparative genomics tools and elaborate on the future directions of metazoan comparative genomics relevant to evo-devo studies.

Introduction Ancient animal evolution was shaped by several distinct and complex transitions of bodyplan organization, most notably the origin of multicellularity (Ediacaran or Vendian) (Fedonkin, 2003; King, 2004; Narbonne, 2005; Richter and King, 2013; Stanley, 1973), followed by the emergence of bilaterally symmetric forms with a through-gut, and the centralized nervous system (Erwin et al., 2011; Knoll and Carroll, 1999). It is one major focus of evolutionary biology to understand those and other organismal transitions, at both the molecular-genetic and morphological levels. Over the past years, many studies have revealed a surprisingly high degree of shared gene complements (Degnan et al., 2009; Kortschak et al., 2003; Putnam et al., 2007; Wheeler et al., 2009) among the descendants of those ancient animals. Building on this shared genomic content, recent advances in evodevo have contributed to our understanding of the function of those cascades in development across metazoans, revealing striking similarities (Carroll et al., 2013; Raff, 2000) of developmental programmes and transcriptional regulatory cascades as well as signaling pathway usage responsible for the patterning of the basic metazoan body-plan (Degnan et al., 2009; Erwin, 2009). Every new metazoan genome that was decoded revealed underlying genetic novelties at both coding and non-coding levels (Figure 1). Such a tendency was found since the 1

publication of the first metazoan genome, the nematode C. elegans (C. elegans Sequencing Consortium, 1998), revealing a striking number of gene duplications of olfactory receptors, relating to lineage-specific adaptation. The initial focus on the genomes of model organisms relevant to developmental biology or medical research, such as Drosophila or mouse, helped identify several patterns of metazoan evolution, contributing to the ongoing debates of the relative importance of different mechanisms contributing to organismal complexity, e.g,. evolution through gene novelty and duplication (Kaessmann, 2010; Tautz and Domazet-Loso, 2011) or regulation (Davidson and Erwin, 2006; Prud'homme et al., 2007). Understanding the role of those processes during metazoan evolution requires the sampling of many phylogenetically informative species. With the help of the recent advances in sequencing technology, genomic studies were able to focus on species distributed over the whole metazoan tree of life (Figure 1). These publications set the stage for understanding the patterns of metazoan evolution, including both the origin of multicellularity and its diversification into the modern clades. In this review we discuss the insights from animal comparative genomics analysis, in particular focusing on metazoan-level evolution and the genomic processes accompanying those transitions.

Uncovering the ancient urmetazoan genome Our understanding of the metazoan genetic complement relies heavily on the comparative analysis of the genomes of several key species: human (Lander et al., 2001; Venter et al., 2015), the fly Drosophila melanogaster (Adams et al., 2000), the nematode Caenorhabditis elegans (C. elegans Sequencing Consortium, 1998), the cnidarian Nematostella vectensis (Putnam et al., 2007), and the sponge Amphimedon queenslandica (Srivastava et al., 2010), as well as the outgroup (non-metazoan) species such as the most closely related group of ophisthokonts Monosiga brevicollis (King et al., 2008) and Capsaspora owczarzaki (Suga et al., 2013), followed by the fungi Saccharomyces cerevisiae (Engel et al., 2014) and the amoebazoan Dictyostelium discoideum (Eichinger et al., 2005). With contributions from expressed sequence tags (EST) studies (Kortschak et al., 2003; Kusserow et al., 2005), the nature of ancestral metazoan complexity emerged, deduced from the high degree of conservation of the gene family complements in contemporary animals. This complexity is reflected by the presence of almost full gene family complement, such as the major toolkit genes Wnt (Kusserow et al., 2005), Hox (Schierwater and Kuhn, 1998), and T-box (Agulnik et al., 1995) families, in modern representatives of the earliest branching lineages, such as sponges, Trichoplax, and cnidarians. Initial studies on partial gene sets (Raible et al., 2005; Rogozin et al., 2003; Roy and Gilbert, 2005) combined with whole genome studies (Putnam et al., 2007; Raible et al., 2005; Srivastava et al., 2010) also revealed striking conservation in gene (exon-intron boundary) structure, domain architecture, and gene linkage conservation (synteny) (Table 1) in the metazoan ancestor. This hinted at a very high selective pressure to maintain gene structures such as splice sites, intron phase, and splicing isoforms (Ast, 2004) during metazoan evolution. The degree of conservation even allowed those characters to be used for phylogenetic inferences (Krauss et al., 2008; Zdobnov and Bork, 2007). This completely changed the picture of a gradual ‘step-wise’ increase in genomic complexity from the metazoan ancestor to vertebrates, and supported the proposal that ancient metazoans in the pre-Cambrian were already genetically complex, with most of the modern 2

gene families and genomic organization already present. This naturally led to the question how such a large transition could have been accomplished and what were the genomic prerequisites. While still debated, a major correlate of the metazoan evolution compared to their closest unicellular relatives is the significant increase in the nuclear genome size (Elliott and Gregory, 2015; Gregory, 2011): ‘average’ metazoan genomes (of basally branching metazoans and marine invertebrates) are considered to be in the 300-500 Mb range (Figure 1), while the closest unicellular organisms, such as the unikont Monosiga brevicollis (King et al., 2008) or Capsaspora owczarzaki (Suga et al., 2013), have average genome sizes well below 100 Mb. While genome sizes should not be taken as surrogates to animal complexity (Elliott and Gregory, 2015), this increase in the DNA material presumably was a major contributing factor behind the relatively recent emergence of multicellular animal life on Earth (Gregory, 2011), leading to many metazoan genomic innovations at both coding gene and non-coding levels (see below). Many subsequent alterations to the genome size such as the significant reductions in the ascidian Oikopleura dioica (Seo et al., 2001), the pufferfish Fugu rubripes (Aparicio et al., 2002), or Drosophila melanogaster (Adams et al., 2000; Petrov et al., 1996) are considered secondary loss, while the increase to the 1 Gb range in many invertebrates, such as the cnidarian Hydra magnipapillata (Chapman et al., 2010), the cephalopod Octopus bimaculoides (Albertin et al., 2015), and in the vertebrates (McLysaght et al., 2002), are considered secondary genome expansions or duplications.

Measurable levels of metazoan genome evolution Those inferences, however, are based on parsimony estimates from comparison of a few, phylogenetically sparsely scattered, metazoan genomes. Hence, it is pivotal to investigate the evolutionary rates of change of individual genomic features and their underlying mechanisms. Whole genome sequences provide rich insights, deserving still more study, not just into the current state of gene and regulatory complement of an animal but also its history. Such sequences bear signatures of past events and allow us to look into their dynamics across the past hundreds of millions of years. Current research of metazoan comparative genomics mainly focuses on the following areas (Figure 2). Novel gene families and the origin of metazoans Arguably, the most significant insight into the metazoan evolution resulting from the comparative genomic and transcriptomic analyses is the notion of gene novelty during animal evolution, as defined by shared gene orthology across two or more metazoan draft genomes. Many of those genes can be traced back for the first time in metazoan genomes, i.e., having no detectable sequence similarity at any level to any other known non-metazoan sequence, eukaryotic or prokaryotic (Srivastava et al., 2010). We call those genes ‘true’ or ‘type 1’ novelties (Table 2, Figure 3). (Here we adopt and extend the notation of Putnam et al (Putnam et al., 2007) and Simakov et al (Simakov et al., 2015) for different classes of gene novelty.) Using parsimony and several dozen sequenced metazoan genomes, comparative genomics enabled us to estimate the total complement of gene families at different nodes of the animal history. Those studies revealed a significant contribution (at least 300) of type 1 novel gene families leading up to the metazoan ancestor (Srivastava et al., 2010) (compared to just 3

around 50 type 1 novelties in the later bilaterian ancestor) (Simakov et al., 2013). Similar patterns have been also observed by others (Tautz and Domazet-Loso, 2011). Interestingly, most of the innovations belong to the developmental tool-kit genes involved in cell-cell signaling and patterning (e.g., Wnt, TGFbeta, Hedgehog, and Notch pathways) (Srivastava et al., 2010), as well as an extensive novel transcription factor complement (Hox and other homeobox-related genes, Forkhead, Sox and others) (Larroux et al., 2008). The role of type 1 novel genes in the development of organismal complexity has been vigorously studied over the past years across metazoans, revealing a significant contribution to novel developmental regulatory cascades and to morphological innovations, such as novel cell types or organs, as most recently shown in several studies (Babonis et al., 2016; Khalturin et al., 2008; Smith et al., 2014). Novel genes are most prominent contributors to distinct evolutionary structures, such as the cnidarian nematocytes (Balasubramanian et al., 2012; Hwang et al., 2008), and cephalopod photosensory tissues (skin and retina) (Albertin et al., 2015; Andouche et al., 2013). Despite such importance for evolutionary transitions, the vast majority of novel genes, which by definition are absent or unrecognizable in other model systems, has not yet been characterized. Genome analyses predict so far hundreds, if not thousands, of recently originated novel genes per species (Khalturin et al., 2009), but in most cases the only support that can be obtained for their functional validity is the presence of high-quality transcripts at the same locus. The ‘de-novo’ origin of so many genes at the metazoan stem is enigmatic. We know very little about the actual genomic dynamics of such innovation and can only suggest a few mechanisms. First, it is possible that type 1 novelties are derived from originally duplicated gene copies in the opisthokont ancestor, which then experienced such a high level of sequence evolution that their relatedness cannot be detected by any existing similaritybased method. If one can exclude such a possibility, the genes may be likely derived from noncoding regions of the genome. There are several possibilities for such derivation, including de-novo transcription of a locus that by chance or through promoter ‘hijacking’ (Kalitsis and Saffery, 2009) acquires transcriptional activity. A functional open reading frame (ORF) could be evolved in parallel or afterwards (Knowles and McLysaght, 2009; Tautz and Domazet-Loso, 2011). Pervasive transcription of non-coding regions, a prerequisite for such a scenario, has been shown for many metazoan genomes (Jensen et al., 2013). In any of the possibilities, the origin of transcribed sense mRNA is a multi step process. Thus, such intense acquisition of novel genes at the metazoan stem seems to require large quantities of genomic material for selection or drift to act upon, to be present. Does this imply a certain population structure of the ancestral metazoans, one that perhaps enhances the importance of drift and is permissive for large genomic expansions, as has been proposed for small population sizes (Lynch and Conery, 2003)? While we do observe a high degree of gene innovation at the metazoan stem node, we also observe many novel genes appearing very recently. To the extent that the origination of new genes in contemporary organisms models the birth of gene novelties in the pre-Cambrian, population genomic studies on Drosophila may provide insights (Chen et al., 2010; Zhao et al., 2014; Zhou et al., 2008). The majority of them will not survive the test of time (thus persisting in two or more branches) due to negative selection or drift, but their function may be important in the micro-evolutionary context (Zhao et al., 2014).

4

Gene structure and domain architecture evolution Gene function does not only evolve by nucleotide or amino-acid sequence change. Proteins are often modularly structured, with different ‘modules’ (domains) executing different functions (Chothia et al., 2003; Patthy, 1999). The addition of a novel domain (permitting a new function) to a protein is called type 2 and the change of the order of two or more domains (architecture) type 3 novelty, respectively. Those innovations are very common in major metazoan signaling pathways (Figure 3). The domain complement encoded by a gene can be easily assessed utilizing the power of well-curated protein functional domain databases, such as PFAM (Sonnhammer et al., 1997) or InterPro (Mitchell et al., 2015). Unlike type 1 innovation in developmental signaling cascades, proteins involved in cell adhesion and the extracellular matrix, such as fibronectin and laminin (Fahey and Degnan, 2012), show general traces of sequence similarity outside metazoans and primarily evolved by the addition of novel domains at the metazoan stem (type 2). Reordering or duplication of already existing protein domains (type 3) also has an impact on functional innovation. For example, zinc finger domain duplication results in different binding site kinetics and regulation of many transcription factors (Jantz and Berg, 2010). Proteins relevant to the innate immune systems are the prototypical example; they create the diversity of their structures by domain-shuffling. Some molecular studies of sponges, cnidarians, shrimp, mollusks, sea urchins, and protochordates have found surprisingly diversified immune molecules (Azumi et al., 2003; Loker et al., 2004; Rast et al., 2006). Some of the key genes relevant to cell differentiation in metazoans have also emerged by domain shuffling events. For example, the Pax, POU and LHX families are constructed by the combination of a homeobox domain (presumably derived from a homeobox gene) and Paired-domain for the former, or a POU-specific-domain for the middle or a LIM-domain for the latter. The family origination for the latter two is correlated with the emergence of the eyes and endocrine system, and of elaborate neuronal systems, respectively (Larroux et al., 2008), although the genes themselves may have arisen first in other contexts, and then later recruited for these roles. Some novel genes encoding structural proteins also originated by domain shuffling. For example, aggrecan, occluding, and tectorin-alpha, each of which is a major component of cartilage, tight junctions, and the mammalian inner-ear membrane, respectively, are constructed by domain shuffling events on the common ancestor of vertebrates (Kawashima et al., 2009). Variation on the ancestral metazoan theme: gene duplication and loss The gene complement remains dynamic throughout metazoan evolution. High copy number variation that is present in natural population (Redon et al., 2006; Rogers et al., 2009) is naturally controlled by selection or drift. Depending on the species and the selection regime as described in the seminal paper (Lynch and Conery, 2000), we can expect gene duplication rates of 0.2 to 2% of genes per million years. Genes that still remain (the other undergoing loss) can be involved in various evolutionary scenarios such as subfunctionalization (Cresko et al., 2003; Lynch and Force, 2000) or increase in complexity of specific systems through co-option and subsequent neofunctionalization (Conant and Wolfe, 2008), such as adaptive immunity in vertebrates (Kumanovics et al., 2003; Okada and Asai, 2008). Among the toolkit genes, especially progressing towards the origin of bilaterian animals, we find many examples of such duplications including the TGF-beta pathway members Lefty (a Nodal antagonist) and Univin/Vg1/GDF1 (Nodal agonist), involved in axial patterning and left-right symmetry formation (Range and Lepage, 2011). 5

The Wnt pathway has been shown to be very dynamic as well, exhibiting both loss and gain patterns in various lineages (Albalat and Canestro, 2016; Cho et al., 2010). Gene duplicates can form paralogous clusters, such as the zinc finger transcription factors that are particularly expanded in amphioxus and octopus lineages, convergently with vertebrates (Albertin et al., 2015; Emerson and Thomas, 2009). Using synonymous substitutions it is possible to date those expansions, suggesting highly species-specific patterns. Gene duplication may facilitate enhanced evolutionary rates, in which one copy retains the ancestral function, while the other experiences reduced selective pressure and evolves novelty (Crow and Wagner, 2006; Lynch and Conery, 2000; Ohno, 1970; Ohta, 1989). If such enhanced evolutionary rate is not accompanied by the evolution of a novel domain or domain combination, we identify them as type 4 novelty (Simakov et al., 2015) (e.g., Lefty). Morphological or developmental subfunctionalization is the main result of this process, as outlined in numerous studies at both the whole genome level (Force et al., 1999; Hellsten et al., 2007) and within the specific cascade (Braasch et al., 2007) level. Insights into the noncoding evolution across metazoan genomes One significant effect of the discovery of the conserved gene repertoire across metazoans is to suspect that the morphological evolution of animals, especially in bilaterians, is due not just to coding sequence but to non-coding regulatory change, which would affect the times, places, and combinations of expression of these conserved genes. So far those noncoding regions, while occupying the vast bulk of animal genomes, remain largely understudied at the metazoan level. Composed of regulatory elements (promoters, enhancers), repetitive elements (active and inactive transposons, tandem and simple repeats), and other nonrepetitive/non-regulatory DNA, this is the ‘dark matter’ of the genome. Its importance is evident as many human diseases are directly related to it (Khurana et al., 2013). Understanding its evolutionary history is thus of utmost importance as it will reveal conserved (and putatively) functional elements and general evolutionary trends. While largely known across vertebrates (Elgar, 2009), the conservation of any regulatory DNA at the metazoan level is very hard to study, with only a few known cases existing, e.g., (Maeso et al., 2013; Rebeiz et al., 2005; Royo et al., 2011). A distinctive feature of metazoan genomes is that a large proportion of the noncoding DNA is comprised of repetitive sequences occurring usually in more than ten copies. Repetitive DNA content ranges widely among metazoan genomes (Canapa et al., 2015), and is thus evolutionarily very dynamic. Unlike regulatory sequences, it is relatively easy to identify (due to its repetitiveness) and classify (most common element classes are shared across animals). Repeats are subdivided into simple repeats and transposable elements (TEs). While simple repeats do not encode for any self-replication ability, transposons are elements with such ability and thus impose certain threats to the genome by inserting themselves into regulatory or coding regions (Feschotte and Pritham, 2007). Metazoans thus have evolved elaborate mechanisms to control transposon activity and replication, including piwiassociated RNAs (piRNA) and long non-coding RNAs (lncRNA) (Morris and Mattick, 2014) that target transposon mRNA and force its degradation (Siomi et al., 2008). While (surviving) TE insertions are mostly neutral or slightly deleterious, they can contribute to a variety of innovations. For example, tandem repeats can facilitate gene duplications either through tandem duplication (Eichler et al., 1998) or retrotransposition (Ewing et al., 2013), insertion of transposons can contribute to the evolution of novel regulatory elements (de Souza et al., 6

2013; Del Rosario et al., 2014). Open reading frames of transposons can become incorporated into exons of other genes (Bejerano et al., 2006; Breitling and Gerber, 2000). Among all features, accumulation of repeats over millions of years in certain genomic regions (‘hotspots’) can result in recombination events, disrupting synteny, introducing new linkages (Rizzon et al., 2002), or lead to domain reshuffling (Feschotte and Pritham, 2007). Using consensus (Pace and Feschotte, 2007) or modeling approaches (Marchani et al., 2009), the high abundance of repeats enables us to put a date on specific genomic changes and to understand the time scale of changes. This reveals quite dynamic genomes (the halflife of transposon insertions is usually on the order of several, not hundreds, million of years (Chalopin et al., 2013; Chapman et al., 2010). The common method of using synonymous substitutions to date those events (both for genes and repeats) has a very limited time scale, usually not going beyond 150 million years (Liberles et al., 2001), therefore we cannot empirically infer the ancestral metazoan repeat complement. Variation in gene linkage (synteny) at both micro and macro levels While finding regulatory sequences shared across metazoans is currently facing methodological difficulties, we can use other genomic features, such as conservation of gene order (synteny) to infer co-regulated or functionally linked genes. We can distinguish between two types of conserved gene order that span different genomic scales. Microsynteny encompasses short-range (3-10 genes) linkages of genes (Figure 2), usually colinear across two or more genomes, sometimes consisting of paralogous gene clusters (Irimia et al., 2012; Simakov et al., 2013). The most famous examples include the Hox cluster (Duboule, 2007), ParaHox (Brooke et al., 1998), several Wnt ligand gene clusters (Nusse, 2001), and, most recently discovered, the pharyngeal gene cluster of deuterostomes (discussed below). While we infer many ‘new’ synteny blocks in the metazoan ancestor, this is heavily biased by the novel genes appearing at that node. When corrected for gene novelty, so far we find the greatest consolidation of micro-synteny at the metazoan to bilaterian transition (Simakov et al., 2013). Hundreds of these micro-syntenic linkages are found to be shared across modern day bilaterian animals. While there is some evidence for co-expression or co-regulation of such pairs, suggestive of shared regulatory sequences among genes of the cluster (Irimia et al., 2012; Simakov et al., 2013), we are still far from understanding the functional aspects of metazoan micro-synteny. Except for Hox and a few other cases, the functional aspects of the genomic clustering have been assumed but so far are not clearly demonstrated. Indeed, it may be that some linkages are neutrally preserved due to variations in the rate of genome rearrangement across the genome arising from differences in repetitive content, sensitivity to genome breakage, etc. Recent advances in chromatin capture assays such as HiC, 3C, and its derivatives (de Wit and de Laat, 2012) or other methods to assess transcriptionally active regions such as ATAC-seq (Buenrostro et al., 2013) have revealed the presence of distinct and conserved functional regions on the length scale of micro-syntenic blocks (100kb to megabase size). For example, for the Hox cluster, two contributing TADs (topologically associating domains) around HoxD were identified (Dixon et al., 2012). While TAD boundaries are usually conserved across some (mammalian) species and cell types (Pope et al., 2014), metazoan level analyses show some important regulatory transitions. For example, a recent study 7

showed the presence of a single TAD corresponding to the amphioxus Hox cluster (Acemel et al., 2016), suggesting a potential ancestral chordate state. The borders of TADs usually correlate with CCCTC binding sites (Dixon et al., 2012). CTCF (the CCCTC binding factor) has been shown to be conserved across bilaterians, with evidence so far lacking from nonbilaterian metazoans (Heger et al., 2012), thus presumably constituting a novel bilaterian gene family. While this may contribute to the increase in micro-synteny observed at the metazoan to bilaterian transition, similar zinc-finger type proteins may exist in non-bilaterian metazoans and could be simply awaiting their discovery. Compared to micro-synteny, macro-synteny describes much larger linkages, those on the scale of (partial) chromosomes (Figure 2) and including hundreds of genes and gene families. Those gene families are found co-clustered on the same contiguous piece of DNA (e.g., megabase-long scaffolds) across metazoans, through paralogous relationships they often link several chromosomal pieces within a single species. Metazoan and bilaterian ancestors are inferred to have had at least dozen ancestral linkage groups that are still retained in the modern animals (Nakatani et al., 2007; Putnam et al., 2008; Simakov et al., 2013; Smith and Keinath, 2015). This number significantly increased in vertebrates due to whole genome duplications (Nakatani et al., 2007; Putnam et al., 2008). Most, but not all, of the micro-syntenic linkages are located within those macro-syntenic linkage groups and while the majority of macro and micro linkages are linked and ‘co-inherited’ across metazoan transitions, there are some micro-linkages that change their macro-linkage group assignment (Simakov et al., 2013). As was the case for micro-synteny, the functional importance of macro-syntenic linkages remains to be investigated. Unlike micro-synteny, however, it relates to a much larger scale organization of metazoan genomes, concerning larger proportions of (if not whole) chromosome evolution.

Independence of genomic characters during metazoan evolution By assessing all of the different kinds of genome change, we can understand their dynamics not just within a single lineage but comparatively across all metazoans. Focusing on genomic characters that can be traced back to the metazoan or bilaterian ancestors, we observe their independent contribution to modern metazoan genomes, i.e., different lineages appear to evolve largely by modifying different genomic features (Figure 4). For example, considering the evolution of well defined and shared bilaterian characters after the bilaterian segregation it has been observed that the evolution of the model ecdysozoans Drosophila and C. elegans was accompanied by global gene loss (Petrov et al., 1996), intron loss in shared genes (Rogozin et al., 2003), and high rates of protein evolution (Britten, 1986). On the other hand, some lophotrochozoans, e.g., Helobdella (leech), have evolved by (micro and macro) syntenic reshuffling (losing major ancestral bilaterian micro- and macro-syntenic linkages) but are otherwise very stable in their gene content and rates of protein sequence evolution (Simakov et al., 2013). The genome of the limpet Lottia, on the other hand, is mostly dominated by repetitive element activity and has retained most of the mentioned ancestral bilaterian genomic features. We can also observe similar trends at the origin of metazoans. The transition to multicellularity was heavily influenced by type 1 gene novelties (Tautz and Domazet-Loso, 2011) and the origin of the main macro-syntenic linkage groups (Putnam et al., 2007), while the bilaterian transition appears to have been associated more with paralog formation and micro-syntenic evolution (Simakov et al., 2013). While this

8

observation can be affected by species sampling, it does correlate with some biological observations as outlined in the previous sections. There are several mechanistic explanations for such observations. Reduction in genome size and loss of gene complement may be associated with the evolution of fast development and specific ecological adaptation (Holland, 2016), thus it could be a result of positive selection for a specific genomic innovation. Another, more probabilistic explanation, can be derived from the proposed effect of the effective population size onto the prevalence of selection or drift regimes (Lynch, 2007; Lynch and Conery, 2003; Yi and Streelman, 2005) and their effect on genome sizes. Higher selective pressure regime in large population sizes (e.g., in insects) would favor smaller genome sizes and any, even slightly, disadvantageous mutations (e.g., transposon insertions) would be wiped out quickly in the next generation (Lynch and Conery, 2003). In the species with small population sizes, such mutations can accumulate resulting in, e.g., increased genome size, retention of duplicated genes, repeats, potentially contributing to syntenic reshuffling. For example, the genome of a cephalopod Octopus bimaculoides (Albertin et al., 2015) revealed very stable gene content with a few gene families duplicating, but a significant contribution from transposable elements to the genome size, correlated with syntenic rearrangements and the scattering of most of the ancestral bilaterian micro-syntenic clusters, including the Hox genes. However, the effective population size scenario does not account for phylogenetic history of the species (Whitney and Garland, 2010). For example, it is not clear how long term changes in the effective population sizes (e.g., in cephalopods (Kroger et al., 2011)) over millions of years would affect the outcome. Whether such genome dynamics actually help drive organismal evolution is not completely understood. So far, our genomic comparisons suggest that independence of genomic features in the different metazoan lineages could allow animals to utilize different aspects of the genomic information to evolve complexity. In particular it can contribute to the proposed recurrence in evolution and explain different gain and loss trajectories of the different features across the phylogenetic tree (Maeso et al., 2012). In some sense, the notion of independence is already methodologically utilized in phylogenetic inferences, due to the use of different genomic characters to obtain independent verification of tree topologies (Rokas and Holland, 2000). Here we touch only on a very coarse classification of such genomic features. More modes of genomic evolution exist within each of those categories, and certainly multiple evolutionary modalities can occur together in varying inter-dependent combinations. While we can trace losses of ancestral metazoan or bilaterian genomic characters, gains of novel genes or linkages is strongly affect by taxon sampling within the relevant clades. Further studies of evolutionary forces acting on genomic material and organization over millions of years will be valuable in finding all dependent and independent components across metazoans as well as putting those observations into a more clear biological context. Until then, these observations would also suggest caution when using general descriptors such as ‘slow’ or ‘fast’ when referring to species level evolution and evo-devo inferences.

Insights into specific ancient organismal transitions: the pharyngeal gene cluster and deuterostome origins 9

Regulatory evolution at the metazoan level has so far been largely an uncharted territory. We suggest that novel micro-syntenic linkages of both novel and known genes should provide a fruitful area for future functional and comparative evo-devo studies. Only through expression and, eventually, functional studies, will it be possible to determine their evolutionary role. One recent promising example is the so-called pharyngeal cluster - a micro-syntenic gene cluster preserved in one piece so far only in deuterostome animals. A core of the cluster is constructed from four transcription factor genes (nk2.1, nk2.2, pax1/9, foxA1/2) and two bystander genes (slc25A21 and mipol1). Additionally one transcription factor gene (msxlx) and one by-stander gene (cngal) are found on the cluster in a subset of deuterostome (amphioxus and hemichordates). While the linkage of some pairs of theses genes (nk2.1 nk2.2 - msxlx, mipol1 - foxA) are found in protostome species (Irimia et al., 2012), the clustering of all six core-genes can be so far only identified in deuterostomes (Simakov et al., 2015). The name for this syntenic cluster is derived from the prominent expression of its transcription factor genes in the deuterostome foregut, in particular the pharyngeal endoderm (Figure 5), suggesting a deuterostome synapomorphy (Fritzenwanker et al., 2014; Gillis et al., 2012; Ogasawara et al., 1999). The gene expression pattern is additionally diversified in vertebrate neural organs, due to gene duplication and subsequent subfunctionalization (Figure 5). Comparative genomics analyses revealed conserved non-coding elements in and around the genes in vertebrates and amphioxus (Wang et al., 2007), which could be confirmed for hemichordates (Simakov et al., 2015). Available data for human reveals the presence of at least two topologically associating domains encompassing nk2.1, nk2.2, pax1/9 as well as the mipol1 - foxA region (unpublished observation based on HiC and TAD boundary data available at www.genomegitar.org, Figure 5, colored bar). While interactions among genes belonging to several TADs are possible, similar to the Hox cluster (Lonfat and Duboule, 2015), they are less likely according to the current models (Smith et al., 2016), suggesting some degree of modularity. Evidence in vertebrates so far suggests that at least nk2.1 is activated by foxA (Boggaram, 2009). Thus, functional genomic studies on hemichordate chromosomal conformations and the development of this model system would provide invaluable insight into the possible ancestral state of the regulation of this genomic cluster. The cluster is also of special interest because the gill-pore bearing pharynx is thought to be a morphological and functional innovation (efficient suspension and filter feeding) in the deuterostome stem lineage (Frisdal and Trainor, 2014; Graham and Richardson, 2012). The pharyngeal endoderm in chordates is responsible for the formation of a muco-ciliary organ, the endostyle, involved in food particle trapping and uptake (Barrington, 1937). In addition to gene expression and morphology (Gee, 2007; Ogasawara, 2000; Romer and Parsons, 1977), functional studies have determined it to be the homolog of the vertebrate thyroid gland (Paris et al., 2008), also derived from the pharyngeal endoderm. While no clear homolog has yet been identified in Saccoglossus, the expression of the pharyngeal gene cluster in the developing foregut provides evidence for its existence in ancient deuterostomes. Mucus production is an integral functional part of the food particle trapping, and it is thus relevant to note the expansions of the families of mucin-related genes in hemichordate genomes, as well as their expression around the pharynx and proboscis epidermis involved in digging behavior and food suspension (Simakov et al., 2015). 10

The available expression data to support pharyngeal cluster’s role in gill-pore formation is still very patchy. For example, additional data from the starfish Acanthaster planci (that lacks gill pores) in comparison to hemichordates is required. The cluster may also hint at a crucial link between deuterostome and protostome foregut formation. E.g., in the invertebrate deuterostomes, namely amphioxus, hemichordates, and echinoderms, the cluster harbors the msxlx homeobox gene (Butts et al., 2010). While absent in vertebrates, the gene is found closely linked to nk2.1 and nk2.2 in the pharyngeal cluster. Its expression in the pharyngeal endoderm (left anterior gut diverticulum in amphioxus) (Butts et al., 2010) suggests a key role in early foregut formation. The presence of msxlx in numerous protostome genomes and its known linkage to nk2.2 in Lottia suggests a more ancestral function of this potential pharyngeal cluster precursor. So far, msxlx expression in protostomes is unclear. The study of this gene may thus provide a crucial link between protostome and deuterostome foregut developmental networks.

The value of continued metazoan genomic sampling As discussed in this review, our understanding of the ancestral metazoan genome has profited immensely from the increased species sampling (Figure 1) during recent years. Despite such exciting results, it is important to understand the limits of the current models to infer ancestral metazoan characters. Simple parsimonious presence of the ‘same’ (defined by some levels of sequence similarity) gene does not exclude the scenario that the observed locus has actually undergone several rounds of duplications, resulting in the loss of the original copy. The easiest way to correct this is to increase the phylogenetic sampling. From that perspective, we are still just beginning to understand metazoan evolution. In addition to increased within-clade sampling, we are looking forward to decoding the genomes of two groups of bilaterian metazoans for which there is still very limited genomic data: within ecdysozoans the groups of Nematomorpha, Loricifera, Kinorhyncha and Onychophora (Highlighted area A in Figure 1), and within lophotrochozoans the groups of Cycliophora, Phoronida, Bryozoa, Entoprocta, Acanthocephala, Gastrotricha, Gnathostomulida, Nemertea and Rhombozoa (Highlighted area B in Figure 1), as well as several enigmatic clades of still debated relatedness to protostomes and deuterostomes including Xenoturbella and Chaetognatha (latest, (Cannon et al., 2016)). This data will help to (1) examine any signal for saturation in the number of the discovered genomic features and (2) develop more reliable methods for empirical quantification of evolutionary dynamics. For example, for ancestral metazoan gene families, our current genomic sampling of a couple of dozen genomes provides results concerning the approximate number of inferred ancestral metazoan gene families. These results could be reproduced across several studies over the past years. However, for other genomic features that show higher turnover rates (such as the formation of gene duplicates or repeat element dynamics), we probably have not yet achieved a satisfactory understanding of their evolution. Thus, understanding within-clade variation, as has been accomplished for some species, e.g., Drosophila (Clark et al., 2007), would provide us with crucial quantitative information on the tempo of evolutionary change across all of the genomic features. In the future, it will allow us to create more sophisticated models of evolution that would allow us a glimpse into much older genomic configurations than currently accessible with the usual methods (i.e., relying on parsimony or synonymous substitutions). 11

Conclusions Increased genomic sampling across metazoans has helped reveal conserved and changing features of genomic sequence during animal evolution, uncovering differential utilization (independence) of several genomic features across animal clades. While so far investigated mostly at the gene level, other genomic features such as noncoding elements and gene linkage function are gaining in attention, aided by new genomic data and novel experimental approaches. Combined with other datasets (e.g., expression data) this information will be helpful in assessing the functional importance of those features during development and cell type or tissue patterning. Metazoan comparative genomics will profit significantly from the incorporation of functional genomic data, such as methylation profiling, gene pausing, RNA editing, and chromosomal conformational studies. The goal of comparative genomics at this stage is shifting from the discovery of evolving genomic features to providing models and inferring the evolutionary trajectories of those features. Such information will prove indispensable in the reconstruction of the ancient metazoan developmental, functional, and morphological traits.

Acknowledgements We thank John Gerhart, Daniel Rokhsar and two anonymous reviewers for their invaluable comments during the writing of this manuscript. We apologize to the authors whose work we had to omit due to space constraints of the review article.

12

Figure Legends Figure 1. History of published metazoan genomes categorized at the phylum level. The left two columns show the phylum level taxonomy of metazoan and the rough number of decoded genomes. The right four columns show the history of published metazoan genomes, with a citation of the first published paper for a genome for each phylum. The correspondence between the left two columns and right four columns is shown by the connected lines. The number of available genomes (at NCBI) is clearly biased toward Chordata (291), Nematoda (81), Arthropoda (239) and Platyhelminthes (29). On the other hand, the genomic data for two phylum groups symbolized here by “A” and “B” highlighted by dark gray are still very sparse. References for each paper in the figure as follows: C.elegans (C. elegans Sequencing Consortium, 1998), D. melanogaster (Adams et al., 2000), H. sapiens (Lander et al., 2001), S. purpuratus (Sodergren et al., 2006), N. vectensis (Putnam et al., 2007), T. adhaerens (Srivastava et al., 2008), S. mansoni (Berriman et al., 2009), A. queenslandica (Srivastava et al., 2010), L. gigantea, C. teleta and H. robusta (Simakov et al., 2013), A. vaga (Flot et al., 2013), P. bachei (Moroz et al., 2014), M. leidyi (Ryan et al., 2013), L. anatina (Luo et al., 2015), H. dujardini (Boothby et al., 2015), S. kowalevskii and P. flava (Simakov et al., 2015) and I. linei (Mikhailov et al., 2016). Figure 2. Evolving genomic features that are most studied with metazoan comparative genomics tools, and their physical scales. Genomic features subdivided into four categories: gene level, coding (large rectangles) and non-coding (yellow boxes) element copy number, micro-syntenic, and macro-syntenic innovation. Their chromosomal scale is depicted on the left. At gene level, colors correspond to different domains, at other levels, colors correspond to different orthologous groups; small ovals denote regulatory elements that are hard to identify at the metazoan scale with comparative genomics tools. Figure 3. Gene novelty distribution across the Wnt pathway at the metazoan level. See text and Table 2 for the description of the different novelty types. Canonical Wnt signaling pathway inhibits glycogen-synthase-kinase 3b and the degradation of beta-catenin. While intracellular pathway components can be traced back to unicellular ancestors, the Wnt ligands appear in metazoan lineages. After (Holstein, 2012). Figure 4. Independent evolution of genomic characters in metazoans. Qualitative interpretation of genome-wide trends of evolutionary gains (boxes) and losses (crossed through boxes) across the four feature categories from panel Figure 2. Absence of such genome-wide trends is shown as grey boxes. For modern-day species, the dynamics reflect trends in loss or gain relative to the last common bilaterian ancestor. See text for more details. Based on data and analyses from (Albertin et al., 2015; Chapman et al., 2010; Putnam et al., 2008; Simakov et al., 2015; Simakov et al., 2013). Figure 5. The deuterostome pharyngeal gene cluster and gene expression. The left side, vertically, shows the cladogram of the metazoan lineages, on which proposed events of pharyngeal cluster evolution have been mapped. The right side, horizontally, shows the gene order and micro-syntenic linkage of the pharyngeal cluster for each taxonomic group. Human locus additionally highlights the location of at least two TADs (red and yellow color), according to the datasets available in genomegitar.org . The schematic view of the spatial expression pattern of each gene is shown under the respective gene-arrow. Missing pieces 13

of expression data are shown as question mark (“?”). Five transcription factor-developmental genes (nkx2.1, nkx2.2, pax1/9, foxA1/2, and msxlx) in the cluster are shown as blue filled arrows. The msxlx homeobox gene which is lost from the cluster in vertebrates is shown as blue edged arrow. Three by-stander genes (cngal, slc25A21 and mipol1) are shown as red filled arrows distinguished by the letters C, S and M, respectively. Of those, six core-genes (nkx2.1, nkx2.2, pax1/9, foxA1/2, slc25A21, and mipol1) are filled by their color. The spatial embryonic region of each gene expression was filled by the following colors: blastopore and homologous region as red, pharyngeal or oral region as green, neuronal region as purple, others as blue. Gene expression data are obtained from the following papers: nkx2.1 (M. musculus) from (Camus et al., 2006), nkx2.2 (M. musculus) from (Shimamura et al., 1995), pax1 (M. musculus) from (Wallin et al., 1996), foxA2 (M. muscululs) from (Sasaki and Hogan, 1993), nkx2.1 (X. laevis) from (Pollet et al., 2000), nkx2.2 (X. laevis) from (Knecht and Harland, 1997), pax1 (X. laevis) from (Pohl and Knochel, 2005), nkx2.1 (B. floridae) from (Venkatesh et al., 1999), nk2.2 (B. floridae) from (Holland et al., 1998), msxlx (B. floridae) from (Butts et al., 2010), pax1/9 (B. floridae) from (Holland and Holland, 1996), foxA1/2 (B. floridae) from (Shimeld, 1997), nkx2.1, nkx2.2, pax1/9, and foxA1/2 (S. kowalevskii) from (Simakov et al., 2015), nkx2.1 (S. purpuratus) from (Takacs et al., 2004), nkx2.2 (S. purpuratus) from (Chen et al., 2011), from foxA1/2 (S. purpuratus) from (Oliveri et al., 2006), nk2.1 and nk2.2 (P. dumerilii) from (Denes et al., 2007; Tessmar-Raible et al., 2007), pax1/9 (T. transversa) from (Passamaneck et al., 2015), foxA (C. variopedatus) from (Boyle and Seaver, 2010), foxA (D. japonica) from (Koinuma et al., 2000), nkx2 and msx3 (A. millepora) from (de Jong et al., 2006), paxD (N. vectensis) from (Matus et al., 2007) and foxA (N. vectensis) from (Martindale et al., 2004).

14

References Acemel, R.D., Tena, J.J., Irastorza-Azcarate, I., Marletaz, F., Gomez-Marin, C., de la CalleMustienes, E., Bertrand, S., Diaz, S.G., Aldea, D., Aury, J.M., Mangenot, S., Holland, P.W., Devos, D.P., Maeso, I., Escriva, H., Gomez-Skarmeta, J.L., 2016. A single threedimensional chromatin compartment in amphioxus indicates a stepwise evolution of vertebrate Hox bimodal regulation. Nat Genet 48, 336-341. Adams, M.D., Celniker, S.E., Holt, R.A., Evans, C.A., Gocayne, J.D., Amanatides, P.G., Scherer, S.E., Li, P.W., Hoskins, R.A., Galle, R.F., George, R.A., Lewis, S.E., Richards, S., Ashburner, M., Henderson, S.N., Sutton, G.G., Wortman, J.R., Yandell, M.D., Zhang, Q., Chen, L.X., Brandon, R.C., Rogers, Y.H., Blazej, R.G., Champe, M., Pfeiffer, B.D., Wan, K.H., Doyle, C., Baxter, E.G., Helt, G., Nelson, C.R., Gabor, G.L., Abril, J.F., Agbayani, A., An, H.J., Andrews-Pfannkoch, C., Baldwin, D., Ballew, R.M., Basu, A., Baxendale, J., Bayraktaroglu, L., Beasley, E.M., Beeson, K.Y., Benos, P.V., Berman, B.P., Bhandari, D., Bolshakov, S., Borkova, D., Botchan, M.R., Bouck, J., Brokstein, P., Brottier, P., Burtis, K.C., Busam, D.A., Butler, H., Cadieu, E., Center, A., Chandra, I., Cherry, J.M., Cawley, S., Dahlke, C., Davenport, L.B., Davies, P., de Pablos, B., Delcher, A., Deng, Z., Mays, A.D., Dew, I., Dietz, S.M., Dodson, K., Doup, L.E., Downes, M., Dugan-Rocha, S., Dunkov, B.C., Dunn, P., Durbin, K.J., Evangelista, C.C., Ferraz, C., Ferriera, S., Fleischmann, W., Fosler, C., Gabrielian, A.E., Garg, N.S., Gelbart, W.M., Glasser, K., Glodek, A., Gong, F., Gorrell, J.H., Gu, Z., Guan, P., Harris, M., Harris, N.L., Harvey, D., Heiman, T.J., Hernandez, J.R., Houck, J., Hostin, D., Houston, K.A., Howland, T.J., Wei, M.H., Ibegwam, C., Jalali, M., Kalush, F., Karpen, G.H., Ke, Z., Kennison, J.A., Ketchum, K.A., Kimmel, B.E., Kodira, C.D., Kraft, C., Kravitz, S., Kulp, D., Lai, Z., Lasko, P., Lei, Y., Levitsky, A.A., Li, J., Li, Z., Liang, Y., Lin, X., Liu, X., Mattei, B., McIntosh, T.C., McLeod, M.P., McPherson, D., Merkulov, G., Milshina, N.V., Mobarry, C., Morris, J., Moshrefi, A., Mount, S.M., Moy, M., Murphy, B., Murphy, L., Muzny, D.M., Nelson, D.L., Nelson, D.R., Nelson, K.A., Nixon, K., Nusskern, D.R., Pacleb, J.M., Palazzolo, M., Pittman, G.S., Pan, S., Pollard, J., Puri, V., Reese, M.G., Reinert, K., Remington, K., Saunders, R.D., Scheeler, F., Shen, H., Shue, B.C., SidenKiamos, I., Simpson, M., Skupski, M.P., Smith, T., Spier, E., Spradling, A.C., Stapleton, M., Strong, R., Sun, E., Svirskas, R., Tector, C., Turner, R., Venter, E., Wang, A.H., Wang, X., Wang, Z.Y., Wassarman, D.A., Weinstock, G.M., Weissenbach, J., Williams, S.M., WoodageT, Worley, K.C., Wu, D., Yang, S., Yao, Q.A., Ye, J., Yeh, R.F., Zaveri, J.S., Zhan, M., Zhang, G., Zhao, Q., Zheng, L., Zheng, X.H., Zhong, F.N., Zhong, W., Zhou, X., Zhu, S., Zhu, X., Smith, H.O., Gibbs, R.A., Myers, E.W., Rubin, G.M., Venter, J.C., 2000. The genome sequence of Drosophila melanogaster. Science 287, 2185-2195. Agulnik, S.I., Bollag, R.J., Silver, L.M., 1995. Conservation of the T-box gene family from Mus musculus to Caenorhabditis elegans. Genomics 25, 214-219. Albalat, R., Canestro, C., 2016. Evolution by gene loss. Nat Rev Genet 17, 379-391. Albertin, C.B., Simakov, O., Mitros, T., Wang, Z.Y., Pungor, J.R., Edsinger-Gonzales, E., Brenner, S., Ragsdale, C.W., Rokhsar, D.S., 2015. The octopus genome and the evolution of cephalopod neural and morphological novelties. Nature 524, 220-224. Andouche, A., Bassaglia, Y., Baratte, S., Bonnaud, L., 2013. Reflectin genes and development of iridophore patterns in Sepia officinalis embryos (Mollusca, Cephalopoda). Dev Dyn 242, 560-571. Aparicio, S., Chapman, J., Stupka, E., Putnam, N., Chia, J.M., Dehal, P., Christoffels, A., Rash, S., Hoon, S., Smit, A., Gelpke, M.D., Roach, J., Oh, T., Ho, I.Y., Wong, M., Detter, C., Verhoef, F., Predki, P., Tay, A., Lucas, S., Richardson, P., Smith, S.F., Clark, M.S., Edwards, Y.J., Doggett, N., Zharkikh, A., Tavtigian, S.V., Pruss, D., Barnstead, M., Evans, C., Baden, H., Powell, J., Glusman, G., Rowen, L., Hood, L., Tan, Y.H., Elgar, G., Hawkins, T., Venkatesh, B., Rokhsar, D., Brenner, S., 2002. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297, 1301-1310. Ast, G., 2004. How did alternative splicing evolve? Nature Reviews Genetics 5, 773-782.

15

Azumi, K., De Santis, R., De Tomaso, A., Rigoutsos, I., Yoshizaki, F., Pinto, M.R., Marino, R., Shida, K., Ikeda, M., Ikeda, M., Arai, M., Inoue, Y., Shimizu, T., Satoh, N., Rokhsar, D.S., Du Pasquier, L., Kasahara, M., Satake, M., Nonaka, M., 2003. Genomic analysis of immunity in a Urochordate and the emergence of the vertebrate immune system: "waiting for Godot". Immunogenetics 55, 570-581. Babonis, L.S., Martindale, M.Q., Ryan, J.F., 2016. Do novel genes drive morphological novelty? An investigation of the nematosomes in the sea anemone Nematostella vectensis. BMC evolutionary biology 16, 1. Balasubramanian, P.G., Beckmann, A., Warnken, U., Schnolzer, M., Schuler, A., BornbergBauer, E., Holstein, T.W., Ozbek, S., 2012. Proteome of Hydra nematocyst. J Biol Chem 287, 9672-9681. Barrington, E.J.W., 1937. The digestive system of Amphioxus (Branchiostoma) lanceolatus. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 228, 269-311. Bejerano, G., Lowe, C.B., Ahituv, N., King, B., Siepel, A., Salama, S.R., Rubin, E.M., Kent, W.J., Haussler, D., 2006. A distal enhancer and an ultraconserved exon are derived from a novel retroposon. Nature 441, 87-90. Berriman, M., Haas, B.J., LoVerde, P.T., Wilson, R.A., Dillon, G.P., Cerqueira, G.C., Mashiyama, S.T., Al-Lazikani, B., Andrade, L.F., Ashton, P.D., Aslett, M.A., Bartholomeu, D.C., Blandin, G., Caffrey, C.R., Coghlan, A., Coulson, R., Day, T.A., Delcher, A., DeMarco, R., Djikeng, A., Eyre, T., Gamble, J.A., Ghedin, E., Gu, Y., Hertz-Fowler, C., Hirai, H., Hirai, Y., Houston, R., Ivens, A., Johnston, D.A., Lacerda, D., Macedo, C.D., McVeigh, P., Ning, Z., Oliveira, G., Overington, J.P., Parkhill, J., Pertea, M., Pierce, R.J., Protasio, A.V., Quail, M.A., Rajandream, M.A., Rogers, J., Sajid, M., Salzberg, S.L., Stanke, M., Tivey, A.R., White, O., Williams, D.L., Wortman, J., Wu, W., Zamanian, M., Zerlotini, A., Fraser-Liggett, C.M., Barrell, B.G., El-Sayed, N.M., 2009. The genome of the blood fluke Schistosoma mansoni. Nature 460, 352-358. Boggaram, V., 2009. Thyroid transcription factor-1 (TTF-1/Nkx2.1/TITF1) gene regulation in the lung. Clin Sci (Lond) 116, 27-35. Boothby, T.C., Tenlen, J.R., Smith, F.W., Wang, J.R., Patanella, K.A., Nishimura, E.O., Tintori, S.C., Li, Q., Jones, C.D., Yandell, M., Messina, D.N., Glasscock, J., Goldstein, B., 2015. Evidence for extensive horizontal gene transfer from the draft genome of a tardigrade. Proc Natl Acad Sci U S A 112, 15976-15981. Boyle, M.J., Seaver, E.C., 2010. Expression of FoxA and GATA transcription factors correlates with regionalized gut development in two lophotrochozoan marine worms: Chaetopterus (Annelida) and Themiste lageniformis (Sipuncula). Evodevo 1, 2. Braasch, I., Schartl, M., Volff, J.N., 2007. Evolution of pigment synthesis pathways by gene and genome duplication in fish. BMC Evol Biol 7, 74. Breitling, R., Gerber, J.-K., 2000. Origin of the paired domain. Development Genes and Evolution 210, 644-650. Britten, R.J., 1986. Rates of DNA sequence evolution differ between taxonomic groups. Science 231, 1393-1398. Brooke, N.M., Garcia-Fernandez, J., Holland, P.W., 1998. The ParaHox gene cluster is an evolutionary sister of the Hox gene cluster. Nature 392, 920-922. Buenrostro, J.D., Giresi, P.G., Zaba, L.C., Chang, H.Y., Greenleaf, W.J., 2013. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods 10, 1213-1218. Butts, T., Holland, P.W., Ferrier, D.E., 2010. Ancient homeobox gene loss and the evolution of chordate brain and pharynx development: deductions from amphioxus gene expression. Proc Biol Sci 277, 3381-3389. C. elegans Sequencing Consortium, 1998. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282, 2012-2018. Camus, A., Perea-Gomez, A., Moreau, A., Collignon, J., 2006. Absence of Nodal signaling promotes precocious neural differentiation in the mouse embryo. Dev Biol 295, 743-755.

16

Canapa, A., Barucca, M., Biscotti, M.A., Forconi, M., Olmo, E., 2015. Transposons, Genome Size, and Evolutionary Insights in Animals. Cytogenet Genome Res 147, 217-239. Cannon, J.T., Vellutini, B.C., Smith, J., 3rd, Ronquist, F., Jondelius, U., Hejnol, A., 2016. Xenacoelomorpha is the sister group to Nephrozoa. Nature 530, 89-93. Carroll, S.B., Grenier, J.K., Weatherbee, S.D., 2013. From DNA to diversity: molecular genetics and the evolution of animal design. John Wiley & Sons. Chalopin, D., Fan, S., Simakov, O., Meyer, A., Schartl, M., Volff, J.N., 2013. Evolutionary active transposable elements in the genome of the coelacanth. J Exp Zool B Mol Dev Evol. Chapman, J.A., Kirkness, E.F., Simakov, O., Hampson, S.E., Mitros, T., Weinmaier, T., Rattei, T., Balasubramanian, P.G., Borman, J., Busam, D., Disbennett, K., Pfannkoch, C., Sumin, N., Sutton, G.G., Viswanathan, L.D., Walenz, B., Goodstein, D.M., Hellsten, U., Kawashima, T., Prochnik, S.E., Putnam, N.H., Shu, S., Blumberg, B., Dana, C.E., Gee, L., Kibler, D.F., Law, L., Lindgens, D., Martinez, D.E., Peng, J., Wigge, P.A., Bertulat, B., Guder, C., Nakamura, Y., Ozbek, S., Watanabe, H., Khalturin, K., Hemmrich, G., Franke, A., Augustin, R., Fraune, S., Hayakawa, E., Hayakawa, S., Hirose, M., Hwang, J.S., Ikeo, K., Nishimiya-Fujisawa, C., Ogura, A., Takahashi, T., Steinmetz, P.R., Zhang, X., Aufschnaiter, R., Eder, M.K., Gorny, A.K., Salvenmoser, W., Heimberg, A.M., Wheeler, B.M., Peterson, K.J., Bottger, A., Tischler, P., Wolf, A., Gojobori, T., Remington, K.A., Strausberg, R.L., Venter, J.C., Technau, U., Hobmayer, B., Bosch, T.C., Holstein, T.W., Fujisawa, T., Bode, H.R., David, C.N., Rokhsar, D.S., Steele, R.E., 2010. The dynamic genome of Hydra. Nature 464, 592-596. Chen, J.H., Luo, Y.J., Su, Y.H., 2011. The dynamic gene expression patterns of transcription factors constituting the sea urchin aboral ectoderm gene regulatory network. Dev Dyn 240, 250-260. Chen, S., Zhang, Y.E., Long, M., 2010. New genes in Drosophila quickly become essential. Science 330, 1682-1685. Cho, S.J., Valles, Y., Giani, V.C., Jr., Seaver, E.C., Weisblat, D.A., 2010. Evolutionary dynamics of the wnt gene family: a lophotrochozoan perspective. Mol Biol Evol 27, 16451658. Chothia, C., Gough, J., Vogel, C., Teichmann, S.A., 2003. Evolution of the protein repertoire. Science 300, 1701-1703. Clark, A.G., Eisen, M.B., Smith, D.R., Bergman, C.M., Oliver, B., Markow, T.A., Kaufman, T.C., Kellis, M., Gelbart, W., Iyer, V.N., Pollard, D.A., Sackton, T.B., Larracuente, A.M., Singh, N.D., Abad, J.P., Abt, D.N., Adryan, B., Aguade, M., Akashi, H., Anderson, W.W., Aquadro, C.F., Ardell, D.H., Arguello, R., Artieri, C.G., Barbash, D.A., Barker, D., Barsanti, P., Batterham, P., Batzoglou, S., Begun, D., Bhutkar, A., Blanco, E., Bosak, S.A., Bradley, R.K., Brand, A.D., Brent, M.R., Brooks, A.N., Brown, R.H., Butlin, R.K., Caggese, C., Calvi, B.R., Bernardo de Carvalho, A., Caspi, A., Castrezana, S., Celniker, S.E., Chang, J.L., Chapple, C., Chatterji, S., Chinwalla, A., Civetta, A., Clifton, S.W., Comeron, J.M., Costello, J.C., Coyne, J.A., Daub, J., David, R.G., Delcher, A.L., Delehaunty, K., Do, C.B., Ebling, H., Edwards, K., Eickbush, T., Evans, J.D., Filipski, A., Findeiss, S., Freyhult, E., Fulton, L., Fulton, R., Garcia, A.C., Gardiner, A., Garfield, D.A., Garvin, B.E., Gibson, G., Gilbert, D., Gnerre, S., Godfrey, J., Good, R., Gotea, V., Gravely, B., Greenberg, A.J., Griffiths-Jones, S., Gross, S., Guigo, R., Gustafson, E.A., Haerty, W., Hahn, M.W., Halligan, D.L., Halpern, A.L., Halter, G.M., Han, M.V., Heger, A., Hillier, L., Hinrichs, A.S., Holmes, I., Hoskins, R.A., Hubisz, M.J., Hultmark, D., Huntley, M.A., Jaffe, D.B., Jagadeeshan, S., Jeck, W.R., Johnson, J., Jones, C.D., Jordan, W.C., Karpen, G.H., Kataoka, E., Keightley, P.D., Kheradpour, P., Kirkness, E.F., Koerich, L.B., Kristiansen, K., Kudrna, D., Kulathinal, R.J., Kumar, S., Kwok, R., Lander, E., Langley, C.H., Lapoint, R., Lazzaro, B.P., Lee, S.J., Levesque, L., Li, R., Lin, C.F., Lin, M.F., Lindblad-Toh, K., Llopart, A., Long, M., Low, L., Lozovsky, E., Lu, J., Luo, M., Machado, C.A., Makalowski, W., Marzo, M., Matsuda, M., Matzkin, L., McAllister, B., McBride, C.S., McKernan, B., McKernan, K., Mendez-Lago, M., Minx, P., Mollenhauer, M.U., Montooth, K., Mount, S.M., Mu, X., Myers, E., Negre, B., Newfeld, S., Nielsen, R., Noor, M.A., O'Grady, P., Pachter, L., Papaceit, M., Parisi, M.J., Parisi, M., Parts, L., Pedersen, J.S., Pesole, G., Phillippy, A.M., Ponting, C.P., Pop, M., 17

Porcelli, D., Powell, J.R., Prohaska, S., Pruitt, K., Puig, M., Quesneville, H., Ram, K.R., Rand, D., Rasmussen, M.D., Reed, L.K., Reenan, R., Reily, A., Remington, K.A., Rieger, T.T., Ritchie, M.G., Robin, C., Rogers, Y.H., Rohde, C., Rozas, J., Rubenfield, M.J., Ruiz, A., Russo, S., Salzberg, S.L., Sanchez-Gracia, A., Saranga, D.J., Sato, H., Schaeffer, S.W., Schatz, M.C., Schlenke, T., Schwartz, R., Segarra, C., Singh, R.S., Sirot, L., Sirota, M., Sisneros, N.B., Smith, C.D., Smith, T.F., Spieth, J., Stage, D.E., Stark, A., Stephan, W., Strausberg, R.L., Strempel, S., Sturgill, D., Sutton, G., Sutton, G.G., Tao, W., Teichmann, S., Tobari, Y.N., Tomimura, Y., Tsolas, J.M., Valente, V.L., Venter, E., Venter, J.C., Vicario, S., Vieira, F.G., Vilella, A.J., Villasante, A., Walenz, B., Wang, J., Wasserman, M., Watts, T., Wilson, D., Wilson, R.K., Wing, R.A., Wolfner, M.F., Wong, A., Wong, G.K., Wu, C.I., Wu, G., Yamamoto, D., Yang, H.P., Yang, S.P., Yorke, J.A., Yoshida, K., Zdobnov, E., Zhang, P., Zhang, Y., Zimin, A.V., Baldwin, J., Abdouelleil, A., Abdulkadir, J., Abebe, A., Abera, B., Abreu, J., Acer, S.C., Aftuck, L., Alexander, A., An, P., Anderson, E., Anderson, S., Arachi, H., Azer, M., Bachantsang, P., Barry, A., Bayul, T., Berlin, A., Bessette, D., Bloom, T., Blye, J., Boguslavskiy, L., Bonnet, C., Boukhgalter, B., Bourzgui, I., Brown, A., Cahill, P., Channer, S., Cheshatsang, Y., Chuda, L., Citroen, M., Collymore, A., Cooke, P., Costello, M., D'Aco, K., Daza, R., De Haan, G., DeGray, S., DeMaso, C., Dhargay, N., Dooley, K., Dooley, E., Doricent, M., Dorje, P., Dorjee, K., Dupes, A., Elong, R., Falk, J., Farina, A., Faro, S., Ferguson, D., Fisher, S., Foley, C.D., Franke, A., Friedrich, D., Gadbois, L., Gearin, G., Gearin, C.R., Giannoukos, G., Goode, T., Graham, J., Grandbois, E., Grewal, S., Gyaltsen, K., Hafez, N., Hagos, B., Hall, J., Henson, C., Hollinger, A., Honan, T., Huard, M.D., Hughes, L., Hurhula, B., Husby, M.E., Kamat, A., Kanga, B., Kashin, S., Khazanovich, D., Kisner, P., Lance, K., Lara, M., Lee, W., Lennon, N., Letendre, F., LeVine, R., Lipovsky, A., Liu, X., Liu, J., Liu, S., Lokyitsang, T., Lokyitsang, Y., Lubonja, R., Lui, A., MacDonald, P., Magnisalis, V., Maru, K., Matthews, C., McCusker, W., McDonough, S., Mehta, T., Meldrim, J., Meneus, L., Mihai, O., Mihalev, A., Mihova, T., Mittelman, R., Mlenga, V., Montmayeur, A., Mulrain, L., Navidi, A., Naylor, J., Negash, T., Nguyen, T., Nguyen, N., Nicol, R., Norbu, C., Norbu, N., Novod, N., O'Neill, B., Osman, S., Markiewicz, E., Oyono, O.L., Patti, C., Phunkhang, P., Pierre, F., Priest, M., Raghuraman, S., Rege, F., Reyes, R., Rise, C., Rogov, P., Ross, K., Ryan, E., Settipalli, S., Shea, T., Sherpa, N., Shi, L., Shih, D., Sparrow, T., Spaulding, J., Stalker, J., Stange-Thomann, N., Stavropoulos, S., Stone, C., Strader, C., Tesfaye, S., Thomson, T., Thoulutsang, Y., Thoulutsang, D., Topham, K., Topping, I., Tsamla, T., Vassiliev, H., Vo, A., Wangchuk, T., Wangdi, T., Weiand, M., Wilkinson, J., Wilson, A., Yadav, S., Young, G., Yu, Q., Zembek, L., Zhong, D., Zimmer, A., Zwirko, Z., Alvarez, P., Brockman, W., Butler, J., Chin, C., Grabherr, M., Kleber, M., Mauceli, E., MacCallum, I., 2007. Evolution of genes and genomes on the Drosophila phylogeny. Nature 450, 203-218. Conant, G.C., Wolfe, K.H., 2008. Turning a hobby into a job: how duplicated genes find new functions. Nat Rev Genet 9, 938-950. Cresko, W.A., Yan, Y.L., Baltrus, D.A., Amores, A., Singer, A., Rodriguez-Mari, A., Postlethwait, J.H., 2003. Genome duplication, subfunction partitioning, and lineage divergence: Sox9 in stickleback and zebrafish. Dev Dyn 228, 480-489. Crow, K.D., Wagner, G.P., 2006. What is the role of genome duplication in the evolution of complexity and diversity? Molecular Biology and Evolution 23, 887-892. Davidson, E.H., Erwin, D.H., 2006. Gene regulatory networks and the evolution of animal body plans. Science 311, 796-800. de Jong, D.M., Hislop, N.R., Hayward, D.C., Reece-Hoyes, J.S., Pontynen, P.C., Ball, E.E., Miller, D.J., 2006. Components of both major axial patterning systems of the Bilateria are differentially expressed along the primary axis of a 'radiate' animal, the anthozoan cnidarian Acropora millepora. Dev Biol 298, 632-643. de Souza, F.S.J., Franchini, L.F., Rubinstein, M., 2013. Exaptation of Transposable Elements into Novel Cis-Regulatory Elements: Is the Evidence Always Strong? Molecular Biology and Evolution 30, 1239-1251. de Wit, E., de Laat, W., 2012. A decade of 3C technologies: insights into nuclear organization. Genes & development 26, 11-24.

18

Degnan, B.M., Vervoort, M., Larroux, C., Richards, G.S., 2009. Early evolution of metazoan transcription factors. Curr Opin Genet Dev 19, 591-599. Del Rosario, R.C., Rayan, N.A., Prabhakar, S., 2014. Noncoding origins of anthropoid traits and a new null model of transposon functionalization. Genome Res 24, 1469-1484. Denes, A.S., Jekely, G., Steinmetz, P.R., Raible, F., Snyman, H., Prud'homme, B., Ferrier, D.E., Balavoine, G., Arendt, D., 2007. Molecular architecture of annelid nerve cord supports common origin of nervous system centralization in bilateria. Cell 129, 277-288. Dixon, J.R., Selvaraj, S., Yue, F., Kim, A., Li, Y., Shen, Y., Hu, M., Liu, J.S., Ren, B., 2012. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376-380. Duboule, D., 2007. The rise and fall of Hox gene clusters. Development 134, 2549-2560. Eichinger, L., Pachebat, J.A., Glockner, G., Rajandream, M.A., Sucgang, R., Berriman, M., Song, J., Olsen, R., Szafranski, K., Xu, Q., Tunggal, B., Kummerfeld, S., Madera, M., Konfortov, B.A., Rivero, F., Bankier, A.T., Lehmann, R., Hamlin, N., Davies, R., Gaudet, P., Fey, P., Pilcher, K., Chen, G., Saunders, D., Sodergren, E., Davis, P., Kerhornou, A., Nie, X., Hall, N., Anjard, C., Hemphill, L., Bason, N., Farbrother, P., Desany, B., Just, E., Morio, T., Rost, R., Churcher, C., Cooper, J., Haydock, S., van Driessche, N., Cronin, A., Goodhead, I., Muzny, D., Mourier, T., Pain, A., Lu, M., Harper, D., Lindsay, R., Hauser, H., James, K., Quiles, M., Madan Babu, M., Saito, T., Buchrieser, C., Wardroper, A., Felder, M., Thangavelu, M., Johnson, D., Knights, A., Loulseged, H., Mungall, K., Oliver, K., Price, C., Quail, M.A., Urushihara, H., Hernandez, J., Rabbinowitsch, E., Steffen, D., Sanders, M., Ma, J., Kohara, Y., Sharp, S., Simmonds, M., Spiegler, S., Tivey, A., Sugano, S., White, B., Walker, D., Woodward, J., Winckler, T., Tanaka, Y., Shaulsky, G., Schleicher, M., Weinstock, G., Rosenthal, A., Cox, E.C., Chisholm, R.L., Gibbs, R., Loomis, W.F., Platzer, M., Kay, R.R., Williams, J., Dear, P.H., Noegel, A.A., Barrell, B., Kuspa, A., 2005. The genome of the social amoeba Dictyostelium discoideum. Nature 435, 43-57. Eichler, E.E., Hoffman, S.M., Adamson, A.A., Gordon, L.A., McCready, P., Lamerdin, J.E., Mohrenweiser, H.W., 1998. Complex beta-satellite repeat structures and the expansion of the zinc finger gene cluster in 19p12. Genome Res 8, 791-808. Elgar, G., 2009. Pan-vertebrate conserved non-coding sequences associated with developmental regulation. Brief Funct Genomic Proteomic 8, 256-265. Elliott, T.A., Gregory, T.R., 2015. What's in a genome? The C-value enigma and the evolution of eukaryotic genome content. Philos Trans R Soc Lond B Biol Sci 370, 20140331. Emerson, R.O., Thomas, J.H., 2009. Adaptive evolution in zinc finger transcription factors. PLoS Genet 5, e1000325. Engel, S.R., Dietrich, F.S., Fisk, D.G., Binkley, G., Balakrishnan, R., Costanzo, M.C., Dwight, S.S., Hitz, B.C., Karra, K., Nash, R.S., Weng, S., Wong, E.D., Lloyd, P., Skrzypek, M.S., Miyasato, S.R., Simison, M., Cherry, J.M., 2014. The reference genome sequence of Saccharomyces cerevisiae: then and now. G3 (Bethesda) 4, 389-398. Erwin, D.H., 2009. Early origin of the bilaterian developmental toolkit. Philos Trans R Soc Lond B Biol Sci 364, 2253-2261. Erwin, D.H., Laflamme, M., Tweedt, S.M., Sperling, E.A., Pisani, D., Peterson, K.J., 2011. The Cambrian conundrum: early divergence and later ecological success in the early history of animals. Science 334, 1091-1097. Ewing, A.D., Ballinger, T.J., Earl, D., Broad Institute Genome, S., Analysis, P., Platform, Harris, C.C., Ding, L., Wilson, R.K., Haussler, D., 2013. Retrotransposition of gene transcripts leads to structural variation in mammalian genomes. Genome Biol 14, R22. Fahey, B., Degnan, B.M., 2012. Origin and evolution of laminin gene family diversity. Mol Biol Evol 29, 1823-1836. Fedonkin, M.A., 2003. The origin of the Metazoa in the light of the Proterozoic fossil record. Paleontological Research 7, 9-41. Feschotte, C., Pritham, E.J., 2007. DNA transposons and the evolution of eukaryotic genomes. Annu Rev Genet 41, 331-368. Flot, J.F., Hespeels, B., Li, X., Noel, B., Arkhipova, I., Danchin, E.G., Hejnol, A., Henrissat, B., Koszul, R., Aury, J.M., Barbe, V., Barthelemy, R.M., Bast, J., Bazykin, G.A., Chabrol, O., 19

Couloux, A., Da Rocha, M., Da Silva, C., Gladyshev, E., Gouret, P., Hallatschek, O., HecoxLea, B., Labadie, K., Lejeune, B., Piskurek, O., Poulain, J., Rodriguez, F., Ryan, J.F., Vakhrusheva, O.A., Wajnberg, E., Wirth, B., Yushenova, I., Kellis, M., Kondrashov, A.S., Mark Welch, D.B., Pontarotti, P., Weissenbach, J., Wincker, P., Jaillon, O., Van Doninck, K., 2013. Genomic evidence for ameiotic evolution in the bdelloid rotifer Adineta vaga. Nature 500, 453-457. Force, A., Lynch, M., Pickett, F.B., Amores, A., Yan, Y.L., Postlethwait, J., 1999. Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151, 1531-1545. Frisdal, A., Trainor, P.A., 2014. Development and evolution of the pharyngeal apparatus. Wiley interdisciplinary reviews. Developmental biology 3, 403-418. Fritzenwanker, J.H., Gerhart, J., Freeman, R.M., Jr., Lowe, C.J., 2014. The Fox/Forkhead transcription factor family of the hemichordate Saccoglossus kowalevskii. Evodevo 5, 17. Gee, H., 2007. Before the Backbone: Views on the Origin of the Vertebrates. Springer Science & Business Media. Gillis, J.A., Fritzenwanker, J.H., Lowe, C.J., 2012. A stem-deuterostome origin of the vertebrate pharyngeal transcriptional network. Proc Biol Sci 279, 237-246. Graham, A., Richardson, J., 2012. Developmental and evolutionary origins of the pharyngeal apparatus. Evodevo 3, 24. Gregory, T.R., 2011. The evolution of the genome. Academic Press. Heger, P., Marin, B., Bartkuhn, M., Schierenberg, E., Wiehe, T., 2012. The chromatin insulator CTCF and the emergence of metazoan diversity. Proc Natl Acad Sci U S A 109, 17507-17512. Hellsten, U., Khokha, M., Grammer, T., Harland, R., Richardson, P., Rokhsar, D., 2007. Accelerated gene evolution and subfunctionalization in the pseudotetraploid frog Xenopus laevis. BMC Biology 5, 31. Holland, L.Z., 2016. Tunicates. Curr Biol 26, R146-152. Holland, L.Z., Holland, N.D., 1996. Expression of AmphiHox-1 and AmphiPax-1 in amphioxus embryos treated with retinoic acid: insights into evolution and patterning of the chordate nerve cord and pharynx. Development 122, 1829-1838. Holland, L.Z., Venkatesh, T.V., Gorlin, A., Bodmer, R., Holland, N.D., 1998. Characterization and developmental expression of AmphiNk2-2, an NK2 class homeobox gene from Amphioxus. (Phylum Chordata; Subphylum Cephalochordata). Dev Genes Evol 208, 100105. Holstein, T.W., 2012. The evolution of the Wnt pathway. Cold Spring Harb Perspect Biol 4, a007922. Hwang, J.S., Takaku, Y., Chapman, J., Ikeo, K., David, C.N., Gojobori, T., 2008. Cilium evolution: identification of a novel protein, nematocilin, in the mechanosensory cilium of Hydra nematocytes. Mol Biol Evol 25, 2009-2017. Irimia, M., Tena, J.J., Alexis, M.S., Fernandez-Minan, A., Maeso, I., Bogdanovic, O., de la Calle-Mustienes, E., Roy, S.W., Gomez-Skarmeta, J.L., Fraser, H.B., 2012. Extensive conservation of ancient microsynteny across metazoans due to cis-regulatory constraints. Genome Res 22, 2356-2367. Jantz, D., Berg, J.M., 2010. Probing the DNA-binding affinity and specificity of designed zinc finger proteins. Biophys J 98, 852-860. Jensen, T.H., Jacquier, A., Libri, D., 2013. Dealing with pervasive transcription. Mol Cell 52, 473-484. Kaessmann, H., 2010. Origins, evolution, and phenotypic impact of new genes. Genome Res 20, 1313-1326. Kalitsis, P., Saffery, R., 2009. Inherent promoter bidirectionality facilitates maintenance of sequence integrity and transcription of parasitic DNA in mammalian genomes. BMC Genomics 10, 498. Kawashima, T., Kawashima, S., Tanaka, C., Murai, M., Yoneda, M., Putnam, N.H., Rokhsar, D.S., Kanehisa, M., Satoh, N., Wada, H., 2009. Domain shuffling and the evolution of vertebrates. Genome Res 19, 1393-1403. 20

Khalturin, K., Anton-Erxleben, F., Sassmann, S., Wittlieb, J., Hemmrich, G., Bosch, T.C., 2008. A novel gene family controls species-specific morphological traits in Hydra. PLoS Biol 6, e278. Khalturin, K., Hemmrich, G., Fraune, S., Augustin, R., Bosch, T.C., 2009. More than just orphans: are taxonomically-restricted genes important in evolution? Trends Genet 25, 404413. Khurana, E., Fu, Y., Colonna, V., Mu, X.J., Kang, H.M., Lappalainen, T., Sboner, A., Lochovsky, L., Chen, J., Harmanci, A., Das, J., Abyzov, A., Balasubramanian, S., Beal, K., Chakravarty, D., Challis, D., Chen, Y., Clarke, D., Clarke, L., Cunningham, F., Evani, U.S., Flicek, P., Fragoza, R., Garrison, E., Gibbs, R., Gumus, Z.H., Herrero, J., Kitabayashi, N., Kong, Y., Lage, K., Liluashvili, V., Lipkin, S.M., MacArthur, D.G., Marth, G., Muzny, D., Pers, T.H., Ritchie, G.R., Rosenfeld, J.A., Sisu, C., Wei, X., Wilson, M., Xue, Y., Yu, F., Genomes Project, C., Dermitzakis, E.T., Yu, H., Rubin, M.A., Tyler-Smith, C., Gerstein, M., 2013. Integrative annotation of variants from 1092 humans: application to cancer genomics. Science 342, 1235587. King, N., 2004. The unicellular ancestry of animal development. Dev Cell 7, 313-325. King, N., Westbrook, M.J., Young, S.L., Kuo, A., Abedin, M., Chapman, J., Fairclough, S., Hellsten, U., Isogai, Y., Letunic, I., Marr, M., Pincus, D., Putnam, N., Rokas, A., Wright, K.J., Zuzow, R., Dirks, W., Good, M., Goodstein, D., Lemons, D., Li, W., Lyons, J.B., Morris, A., Nichols, S., Richter, D.J., Salamov, A., Sequencing, J.G., Bork, P., Lim, W.A., Manning, G., Miller, W.T., McGinnis, W., Shapiro, H., Tjian, R., Grigoriev, I.V., Rokhsar, D., 2008. The genome of the choanoflagellate Monosiga brevicollis and the origin of metazoans. Nature 451, 783-788. Knecht, A.K., Harland, R.M., 1997. Mechanisms of dorsal-ventral patterning in noggininduced neural tissue. Development 124, 2477-2488. Knoll, A.H., Carroll, S.B., 1999. Early animal evolution: emerging views from comparative biology and geology. Science 284, 2129-2137. Knowles, D.G., McLysaght, A., 2009. Recent de novo origin of human protein-coding genes. Genome Res 19, 1752-1759. Koinuma, S., Umesono, Y., Watanabe, K., Agata, K., 2000. Planaria FoxA (HNF3) homologue is specifically expressed in the pharynx-forming cells. Gene 259, 171-176. Kortschak, R.D., Samuel, G., Saint, R., Miller, D.J., 2003. EST analysis of the cnidarian Acropora millepora reveals extensive gene loss and rapid sequence divergence in the model invertebrates. Curr Biol 13, 2190-2195. Krauss, V., Thummler, C., Georgi, F., Lehmann, J., Stadler, P.F., Eisenhardt, C., 2008. Near intron positions are reliable phylogenetic markers: an application to holometabolous insects. Mol Biol Evol 25, 821-830. Kroger, B., Vinther, J., Fuchs, D., 2011. Cephalopod origin and evolution: A congruent picture emerging from fossils, development and molecules: Extant cephalopods are younger than previously realised and were under major selection to become agile, shell-less predators. Bioessays 33, 602-613. Kumanovics, A., Takada, T., Lindahl, K.F., 2003. Genomic organization of the mammalian MHC. Annu Rev Immunol 21, 629-657. Kusserow, A., Pang, K., Sturm, C., Hrouda, M., Lentfer, J., Schmidt, H.A., Technau, U., von Haeseler, A., Hobmayer, B., Martindale, M.Q., Holstein, T.W., 2005. Unexpected complexity of the Wnt gene family in a sea anemone. Nature 433, 156-160. Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., Funke, R., Gage, D., Harris, K., Heaford, A., Howland, J., Kann, L., Lehoczky, J., LeVine, R., McEwan, P., McKernan, K., Meldrim, J., Mesirov, J.P., Miranda, C., Morris, W., Naylor, J., Raymond, C., Rosetti, M., Santos, R., Sheridan, A., Sougnez, C., Stange-Thomann, N., Stojanovic, N., Subramanian, A., Wyman, D., Rogers, J., Sulston, J., Ainscough, R., Beck, S., Bentley, D., Burton, J., Clee, C., Carter, N., Coulson, A., Deadman, R., Deloukas, P., Dunham, A., Dunham, I., Durbin, R., French, L., Grafham, D., Gregory, S., Hubbard, T., Humphray, S., Hunt, A., Jones, M., Lloyd, C., McMurray, A., Matthews, L., Mercer, S., Milne, S., Mullikin, J.C., Mungall, A., Plumb, R., Ross, M., 21

Shownkeen, R., Sims, S., Waterston, R.H., Wilson, R.K., Hillier, L.W., McPherson, J.D., Marra, M.A., Mardis, E.R., Fulton, L.A., Chinwalla, A.T., Pepin, K.H., Gish, W.R., Chissoe, S.L., Wendl, M.C., Delehaunty, K.D., Miner, T.L., Delehaunty, A., Kramer, J.B., Cook, L.L., Fulton, R.S., Johnson, D.L., Minx, P.J., Clifton, S.W., Hawkins, T., Branscomb, E., Predki, P., Richardson, P., Wenning, S., Slezak, T., Doggett, N., Cheng, J.F., Olsen, A., Lucas, S., Elkin, C., Uberbacher, E., Frazier, M., Gibbs, R.A., Muzny, D.M., Scherer, S.E., Bouck, J.B., Sodergren, E.J., Worley, K.C., Rives, C.M., Gorrell, J.H., Metzker, M.L., Naylor, S.L., Kucherlapati, R.S., Nelson, D.L., Weinstock, G.M., Sakaki, Y., Fujiyama, A., Hattori, M., Yada, T., Toyoda, A., Itoh, T., Kawagoe, C., Watanabe, H., Totoki, Y., Taylor, T., Weissenbach, J., Heilig, R., Saurin, W., Artiguenave, F., Brottier, P., Bruls, T., Pelletier, E., Robert, C., Wincker, P., Smith, D.R., Doucette-Stamm, L., Rubenfield, M., Weinstock, K., Lee, H.M., Dubois, J., Rosenthal, A., Platzer, M., Nyakatura, G., Taudien, S., Rump, A., Yang, H., Yu, J., Wang, J., Huang, G., Gu, J., Hood, L., Rowen, L., Madan, A., Qin, S., Davis, R.W., Federspiel, N.A., Abola, A.P., Proctor, M.J., Myers, R.M., Schmutz, J., Dickson, M., Grimwood, J., Cox, D.R., Olson, M.V., Kaul, R., Shimizu, N., Kawasaki, K., Minoshima, S., Evans, G.A., Athanasiou, M., Schultz, R., Roe, B.A., Chen, F., Pan, H., Ramser, J., Lehrach, H., Reinhardt, R., McCombie, W.R., de la Bastide, M., Dedhia, N., Blocker, H., Hornischer, K., Nordsiek, G., Agarwala, R., Aravind, L., Bailey, J.A., Bateman, A., Batzoglou, S., Birney, E., Bork, P., Brown, D.G., Burge, C.B., Cerutti, L., Chen, H.C., Church, D., Clamp, M., Copley, R.R., Doerks, T., Eddy, S.R., Eichler, E.E., Furey, T.S., Galagan, J., Gilbert, J.G., Harmon, C., Hayashizaki, Y., Haussler, D., Hermjakob, H., Hokamp, K., Jang, W., Johnson, L.S., Jones, T.A., Kasif, S., Kaspryzk, A., Kennedy, S., Kent, W.J., Kitts, P., Koonin, E.V., Korf, I., Kulp, D., Lancet, D., Lowe, T.M., McLysaght, A., Mikkelsen, T., Moran, J.V., Mulder, N., Pollara, V.J., Ponting, C.P., Schuler, G., Schultz, J., Slater, G., Smit, A.F., Stupka, E., Szustakowski, J., Thierry-Mieg, D., Thierry-Mieg, J., Wagner, L., Wallis, J., Wheeler, R., Williams, A., Wolf, Y.I., Wolfe, K.H., Yang, S.P., Yeh, R.F., Collins, F., Guyer, M.S., Peterson, J., Felsenfeld, A., Wetterstrand, K.A., Patrinos, A., Morgan, M.J., de Jong, P., Catanese, J.J., Osoegawa, K., Shizuya, H., Choi, S., Chen, Y.J., 2001. Initial sequencing and analysis of the human genome. Nature 409, 860-921. Larroux, C., Luke, G.N., Koopman, P., Rokhsar, D.S., Shimeld, S.M., Degnan, B.M., 2008. Genesis and expansion of metazoan transcription factor gene classes. Mol Biol Evol 25, 980-996. Liberles, D.A., Schreiber, D.R., Govindarajan, S., Chamberlin, S.G., Benner, S.A., 2001. The adaptive evolution database (TAED). Genome Biol 2, RESEARCH0028. Loker, E.S., Adema, C.M., Zhang, S.M., Kepler, T.B., 2004. Invertebrate immune systems-not homogeneous, not simple, not well understood. Immunol Rev 198, 10-24. Lonfat, N., Duboule, D., 2015. Structure, function and evolution of topologically associating domains (TADs) at HOX loci. FEBS Lett 589, 2869-2876. Luo, Y.J., Takeuchi, T., Koyanagi, R., Yamada, L., Kanda, M., Khalturina, M., Fujie, M., Yamasaki, S., Endo, K., Satoh, N., 2015. The Lingula genome provides insights into brachiopod evolution and the origin of phosphate biomineralization. Nat Commun 6, 8301. Lynch, M., 2007. The evolution of genetic networks by non-adaptive processes. Nat Rev Genet 8, 803-813. Lynch, M., Conery, J.S., 2000. The evolutionary fate and consequences of duplicate genes. Science 290, 1151-1155. Lynch, M., Conery, J.S., 2003. The origins of genome complexity. Science 302, 1401-1404. Lynch, M., Force, A., 2000. The probability of duplicate gene preservation by subfunctionalization. Genetics 154, 459-473. Maeso, I., Irimia, M., Tena, J.J., Casares, F., Gomez-Skarmeta, J.L., 2013. Deep conservation of cis-regulatory elements in metazoans. Philos Trans R Soc Lond B Biol Sci 368, 20130020. Maeso, I., Roy, S.W., Irimia, M., 2012. Widespread recurrent evolution of genomic features. Genome Biol Evol 4, 486-500. Marchani, E.E., Xing, J., Witherspoon, D.J., Jorde, L.B., Rogers, A.R., 2009. Estimating the age of retrotransposon subfamilies using maximum likelihood. Genomics 94, 78-82. 22

Martindale, M.Q., Pang, K., Finnerty, J.R., 2004. Investigating the origins of triploblasty: 'mesodermal' gene expression in a diploblastic animal, the sea anemone Nematostella vectensis (phylum, Cnidaria; class, Anthozoa). Development 131, 2463-2474. Matus, D.Q., Pang, K., Daly, M., Martindale, M.Q., 2007. Expression of Pax gene family members in the anthozoan cnidarian, Nematostella vectensis. Evol Dev 9, 25-38. McLysaght, A., Hokamp, K., Wolfe, K.H., 2002. Extensive genomic duplication during early chordate evolution. Nat Genet 31, 200-204. Mikhailov, K.V., Slyusarev, G.S., Nikitin, M.A., Logacheva, M.D., Penin, A.A., Aleoshin, V.V., Panchin, Y.V., 2016. The Genome of Intoshia linei Affirms Orthonectids as Highly Simplified Spiralians. Curr Biol 26, 1768-1774. Mitchell, A., Chang, H.Y., Daugherty, L., Fraser, M., Hunter, S., Lopez, R., McAnulla, C., McMenamin, C., Nuka, G., Pesseat, S., Sangrador-Vegas, A., Scheremetjew, M., Rato, C., Yong, S.Y., Bateman, A., Punta, M., Attwood, T.K., Sigrist, C.J., Redaschi, N., Rivoire, C., Xenarios, I., Kahn, D., Guyot, D., Bork, P., Letunic, I., Gough, J., Oates, M., Haft, D., Huang, H., Natale, D.A., Wu, C.H., Orengo, C., Sillitoe, I., Mi, H., Thomas, P.D., Finn, R.D., 2015. The InterPro protein families database: the classification resource after 15 years. Nucleic Acids Res 43, D213-221. Moroz, L.L., Kocot, K.M., Citarella, M.R., Dosung, S., Norekian, T.P., Povolotskaya, I.S., Grigorenko, A.P., Dailey, C., Berezikov, E., Buckley, K.M., Ptitsyn, A., Reshetov, D., Mukherjee, K., Moroz, T.P., Bobkova, Y., Yu, F., Kapitonov, V.V., Jurka, J., Bobkov, Y.V., Swore, J.J., Girardo, D.O., Fodor, A., Gusev, F., Sanford, R., Bruders, R., Kittler, E., Mills, C.E., Rast, J.P., Derelle, R., Solovyev, V.V., Kondrashov, F.A., Swalla, B.J., Sweedler, J.V., Rogaev, E.I., Halanych, K.M., Kohn, A.B., 2014. The ctenophore genome and the evolutionary origins of neural systems. Nature. Morris, K.V., Mattick, J.S., 2014. The rise of regulatory RNA. Nat Rev Genet 15, 423-437. Nakatani, Y., Takeda, H., Kohara, Y., Morishita, S., 2007. Reconstruction of the vertebrate ancestral genome reveals dynamic genome reorganization in early vertebrates. Genome Res 17, 1254-1265. Narbonne, G.M., 2005. The Ediacara biota: Neoproterozoic origin of animals and their ecosystems. Annu. Rev. Earth Planet. Sci. 33, 421-442. Nusse, R., 2001. An ancient cluster of Wnt paralogues. Trends Genet 17, 443. Ogasawara, M., 2000. Overlapping expression of amphioxus homologs of the thyroid transcription factor-1 gene and thyroid peroxidase gene in the endostyle: insight into evolution of the thyroid gland. Dev Genes Evol 210, 231-242. Ogasawara, M., Wada, H., Peters, H., Satoh, N., 1999. Developmental expression of Pax1/9 genes in urochordate and hemichordate gills: insight into function and evolution of the pharyngeal epithelium. Development 126, 2539-2550. Ohno, S., 1970. Evolution by gene duplication. Allen & Unwin; Springer-Verlag, London, New York. Ohta, T., 1989. Role of gene duplication in evolution. Genome 31, 304-310. Okada, K., Asai, K., 2008. Expansion of signaling genes for adaptive immune system evolution in early vertebrates. BMC Genomics 9, 218. Oliveri, P., Walton, K.D., Davidson, E.H., McClay, D.R., 2006. Repression of mesodermal fate by foxa, a key endoderm regulator of the sea urchin embryo. Development 133, 41734181. Pace, J.K., 2nd, Feschotte, C., 2007. The evolutionary history of human DNA transposons: evidence for intense activity in the primate lineage. Genome Res 17, 422-432. Paris, M., Escriva, H., Schubert, M., Brunet, F., Brtko, J., Ciesielski, F., Roecklin, D., VivatHannah, V., Jamin, E.L., Cravedi, J.P., Scanlan, T.S., Renaud, J.P., Holland, N.D., Laudet, V., 2008. Amphioxus postembryonic development reveals the homology of chordate metamorphosis. Curr Biol 18, 825-830. Passamaneck, Y.J., Hejnol, A., Martindale, M.Q., 2015. Mesodermal gene expression during the embryonic and larval development of the articulate brachiopod Terebratalia transversa. Evodevo 6, 10.

23

Patthy, L., 1999. Genome evolution and the evolution of exon-shuffling--a review. Gene 238, 103-114. Petrov, D.A., Lozovskaya, E.R., Hartl, D.L., 1996. High intrinsic rate of DNA loss in Drosophila. Nature 384, 346-349. Pohl, B.S., Knochel, W., 2005. Of Fox and Frogs: Fox (fork head/winged helix) transcription factors in Xenopus development. Gene 344, 21-32. Pollet, N., Schmidt, H.A., Gawantka, V., Vingron, M., Niehrs, C., 2000. Axeldb: a Xenopus laevis database focusing on gene expression. Nucleic Acids Res 28, 139-140. Pope, B.D., Ryba, T., Dileep, V., Yue, F., Wu, W., Denas, O., Vera, D.L., Wang, Y., Hansen, R.S., Canfield, T.K., Thurman, R.E., Cheng, Y., Gulsoy, G., Dennis, J.H., Snyder, M.P., Stamatoyannopoulos, J.A., Taylor, J., Hardison, R.C., Kahveci, T., Ren, B., Gilbert, D.M., 2014. Topologically associating domains are stable units of replication-timing regulation. Nature 515, 402-405. Prud'homme, B., Gompel, N., Carroll, S.B., 2007. Emerging principles of regulatory evolution. Proc Natl Acad Sci U S A 104 Suppl 1, 8605-8612. Putnam, N.H., Butts, T., Ferrier, D.E., Furlong, R.F., Hellsten, U., Kawashima, T., RobinsonRechavi, M., Shoguchi, E., Terry, A., Yu, J.K., Benito-Gutierrez, E.L., Dubchak, I., GarciaFernandez, J., Gibson-Brown, J.J., Grigoriev, I.V., Horton, A.C., de Jong, P.J., Jurka, J., Kapitonov, V.V., Kohara, Y., Kuroki, Y., Lindquist, E., Lucas, S., Osoegawa, K., Pennacchio, L.A., Salamov, A.A., Satou, Y., Sauka-Spengler, T., Schmutz, J., Shin, I.T., Toyoda, A., Bronner-Fraser, M., Fujiyama, A., Holland, L.Z., Holland, P.W., Satoh, N., Rokhsar, D.S., 2008. The amphioxus genome and the evolution of the chordate karyotype. Nature 453, 1064-1071. Putnam, N.H., Srivastava, M., Hellsten, U., Dirks, B., Chapman, J., Salamov, A., Terry, A., Shapiro, H., Lindquist, E., Kapitonov, V.V., Jurka, J., Genikhovich, G., Grigoriev, I.V., Lucas, S.M., Steele, R.E., Finnerty, J.R., Technau, U., Martindale, M.Q., Rokhsar, D.S., 2007. Sea anemone genome reveals ancestral eumetazoan gene repertoire and genomic organization. Science 317, 86-94. Raff, R.A., 2000. Evo-devo: the evolution of a new discipline. Nat Rev Genet 1, 74-79. Raible, F., Tessmar-Raible, K., Osoegawa, K., Wincker, P., Jubin, C., Balavoine, G., Ferrier, D., Benes, V., de Jong, P., Weissenbach, J., Bork, P., Arendt, D., 2005. Vertebrate-type intron-rich genes in the marine annelid Platynereis dumerilii. Science 310, 1325-1326. Range, R., Lepage, T., 2011. Maternal Oct1/2 is required for Nodal and Vg1/Univin expression during dorsal-ventral axis specification in the sea urchin embryo. Dev Biol 357, 440-449. Rast, J.P., Smith, L.C., Loza-Coll, M., Hibino, T., Litman, G.W., 2006. Genomic insights into the immune system of the sea urchin. Science 314, 952-956. Rebeiz, M., Stone, T., Posakony, J.W., 2005. An ancient transcriptional regulatory linkage. Dev Biol 281, 299-308. Redon, R., Ishikawa, S., Fitch, K.R., Feuk, L., Perry, G.H., Andrews, T.D., Fiegler, H., Shapero, M.H., Carson, A.R., Chen, W., Cho, E.K., Dallaire, S., Freeman, J.L., Gonzalez, J.R., Gratacos, M., Huang, J., Kalaitzopoulos, D., Komura, D., MacDonald, J.R., Marshall, C.R., Mei, R., Montgomery, L., Nishimura, K., Okamura, K., Shen, F., Somerville, M.J., Tchinda, J., Valsesia, A., Woodwark, C., Yang, F., Zhang, J., Zerjal, T., Zhang, J., Armengol, L., Conrad, D.F., Estivill, X., Tyler-Smith, C., Carter, N.P., Aburatani, H., Lee, C., Jones, K.W., Scherer, S.W., Hurles, M.E., 2006. Global variation in copy number in the human genome. Nature 444, 444-454. Richter, D.J., King, N., 2013. The genomic and cellular foundations of animal origins. Annu Rev Genet 47, 509-537. Rizzon, C., Marais, G., Gouy, M., Biemont, C., 2002. Recombination rate and the distribution of transposable elements in the Drosophila melanogaster genome. Genome Res 12, 400407. Rogers, R.L., Bedford, T., Hartl, D.L., 2009. Formation and longevity of chimeric and duplicate genes in Drosophila melanogaster. Genetics 181, 313-322.

24

Rogozin, I.B., Wolf, Y.I., Sorokin, A.V., Mirkin, B.G., Koonin, E.V., 2003. Remarkable interkingdom conservation of intron positions and massive, lineage-specific intron loss and gain in eukaryotic evolution. Curr Biol 13, 1512-1517. Rokas, A., Holland, P.W., 2000. Rare genomic changes as a tool for phylogenetics. Trends Ecol Evol 15, 454-459. Romer, A.S., Parsons, T.S., 1977. The vertebrate body, 5th ed. Saunders, Philadelphia. Roy, S.W., Gilbert, W., 2005. Complex early genes. Proc Natl Acad Sci U S A 102, 19861991. Royo, J.L., Maeso, I., Irimia, M., Gao, F., Peter, I.S., Lopes, C.S., D'Aniello, S., Casares, F., Davidson, E.H., Garcia-Fernandez, J., Gomez-Skarmeta, J.L., 2011. Transphyletic conservation of developmental regulatory state in animal evolution. Proc Natl Acad Sci U S A 108, 14186-14191. Ryan, J.F., Pang, K., Schnitzler, C.E., Nguyen, A.D., Moreland, R.T., Simmons, D.K., Koch, B.J., Francis, W.R., Havlak, P., Program, N.C.S., Smith, S.A., Putnam, N.H., Haddock, S.H., Dunn, C.W., Wolfsberg, T.G., Mullikin, J.C., Martindale, M.Q., Baxevanis, A.D., 2013. The genome of the ctenophore Mnemiopsis leidyi and its implications for cell type evolution. Science 342, 1242592. Sasaki, H., Hogan, B.L., 1993. Differential expression of multiple fork head related genes during gastrulation and axial pattern formation in the mouse embryo. Development 118, 4759. Schierwater, B., Kuhn, K., 1998. Homology of Hox genes and the zootype concept in early metazoan evolution. Mol Phylogenet Evol 9, 375-381. Seo, H.C., Kube, M., Edvardsen, R.B., Jensen, M.F., Beck, A., Spriet, E., Gorsky, G., Thompson, E.M., Lehrach, H., Reinhardt, R., Chourrout, D., 2001. Miniature genome in the marine chordate Oikopleura dioica. Science 294, 2506. Shimamura, K., Hartigan, D.J., Martinez, S., Puelles, L., Rubenstein, J.L., 1995. Longitudinal organization of the anterior neural plate and neural tube. Development 121, 3923-3933. Shimeld, S.M., 1997. Characterisation of amphioxus HNF-3 genes: conserved expression in the notochord and floor plate. Dev Biol 183, 74-85. Simakov, O., Kawashima, T., Marletaz, F., Jenkins, J., Koyanagi, R., Mitros, T., Hisata, K., Bredeson, J., Shoguchi, E., Gyoja, F., Yue, J.X., Chen, Y.C., Freeman, R.M., Jr., Sasaki, A., Hikosaka-Katayama, T., Sato, A., Fujie, M., Baughman, K.W., Levine, J., Gonzalez, P., Cameron, C., Fritzenwanker, J.H., Pani, A.M., Goto, H., Kanda, M., Arakaki, N., Yamasaki, S., Qu, J., Cree, A., Ding, Y., Dinh, H.H., Dugan, S., Holder, M., Jhangiani, S.N., Kovar, C.L., Lee, S.L., Lewis, L.R., Morton, D., Nazareth, L.V., Okwuonu, G., Santibanez, J., Chen, R., Richards, S., Muzny, D.M., Gillis, A., Peshkin, L., Wu, M., Humphreys, T., Su, Y.H., Putnam, N.H., Schmutz, J., Fujiyama, A., Yu, J.K., Tagawa, K., Worley, K.C., Gibbs, R.A., Kirschner, M.W., Lowe, C.J., Satoh, N., Rokhsar, D.S., Gerhart, J., 2015. Hemichordate genomes and deuterostome origins. Nature 527, 459-465. Simakov, O., Marletaz, F., Cho, S.J., Edsinger-Gonzales, E., Havlak, P., Hellsten, U., Kuo, D.H., Larsson, T., Lv, J., Arendt, D., Savage, R., Osoegawa, K., de Jong, P., Grimwood, J., Chapman, J.A., Shapiro, H., Aerts, A., Otillar, R.P., Terry, A.Y., Boore, J.L., Grigoriev, I.V., Lindberg, D.R., Seaver, E.C., Weisblat, D.A., Putnam, N.H., Rokhsar, D.S., 2013. Insights into bilaterian evolution from three spiralian genomes. Nature. Siomi, M.C., Saito, K., Siomi, H., 2008. How selfish retrotransposons are silenced in Drosophila germline and somatic cells. FEBS Lett 582, 2473-2478. Smith, C.L., Varoqueaux, F., Kittelmann, M., Azzam, R.N., Cooper, B., Winters, C.A., Eitel, M., Fasshauer, D., Reese, T.S., 2014. Novel cell types, neurosecretory cells, and body plan of the early-diverging metazoan Trichoplax adhaerens. Curr Biol 24, 1565-1572. Smith, E.M., Lajoie, B.R., Jain, G., Dekker, J., 2016. Invariant TAD Boundaries Constrain Cell-Type-Specific Looping Interactions between Promoters and Distal Elements around the CFTR Locus. Am J Hum Genet 98, 185-201. Smith, J.J., Keinath, M.C., 2015. The sea lamprey meiotic map improves resolution of ancient vertebrate genome duplications. Genome Res 25, 1081-1090.

25

Sodergren, E., Weinstock, G.M., Davidson, E.H., Cameron, R.A., Gibbs, R.A., Angerer, R.C., Angerer, L.M., Arnone, M.I., Burgess, D.R., Burke, R.D., Coffman, J.A., Dean, M., Elphick, M.R., Ettensohn, C.A., Foltz, K.R., Hamdoun, A., Hynes, R.O., Klein, W.H., Marzluff, W., McClay, D.R., Morris, R.L., Mushegian, A., Rast, J.P., Smith, L.C., Thorndyke, M.C., Vacquier, V.D., Wessel, G.M., Wray, G., Zhang, L., Elsik, C.G., Ermolaeva, O., Hlavina, W., Hofmann, G., Kitts, P., Landrum, M.J., Mackey, A.J., Maglott, D., Panopoulou, G., Poustka, A.J., Pruitt, K., Sapojnikov, V., Song, X., Souvorov, A., Solovyev, V., Wei, Z., Whittaker, C.A., Worley, K., Durbin, K.J., Shen, Y., Fedrigo, O., Garfield, D., Haygood, R., Primus, A., Satija, R., Severson, T., Gonzalez-Garay, M.L., Jackson, A.R., Milosavljevic, A., Tong, M., Killian, C.E., Livingston, B.T., Wilt, F.H., Adams, N., Belle, R., Carbonneau, S., Cheung, R., Cormier, P., Cosson, B., Croce, J., Fernandez-Guerra, A., Geneviere, A.M., Goel, M., Kelkar, H., Morales, J., Mulner-Lorillon, O., Robertson, A.J., Goldstone, J.V., Cole, B., Epel, D., Gold, B., Hahn, M.E., Howard-Ashby, M., Scally, M., Stegeman, J.J., Allgood, E.L., Cool, J., Judkins, K.M., McCafferty, S.S., Musante, A.M., Obar, R.A., Rawson, A.P., Rossetti, B.J., Gibbons, I.R., Hoffman, M.P., Leone, A., Istrail, S., Materna, S.C., Samanta, M.P., Stolc, V., Tongprasit, W., Tu, Q., Bergeron, K.F., Brandhorst, B.P., Whittle, J., Berney, K., Bottjer, D.J., Calestani, C., Peterson, K., Chow, E., Yuan, Q.A., Elhaik, E., Graur, D., Reese, J.T., Bosdet, I., Heesun, S., Marra, M.A., Schein, J., Anderson, M.K., Brockton, V., Buckley, K.M., Cohen, A.H., Fugmann, S.D., Hibino, T., Loza-Coll, M., Majeske, A.J., Messier, C., Nair, S.V., Pancer, Z., Terwilliger, D.P., Agca, C., Arboleda, E., Chen, N., Churcher, A.M., Hallbook, F., Humphrey, G.W., Idris, M.M., Kiyama, T., Liang, S., Mellott, D., Mu, X., Murray, G., Olinski, R.P., Raible, F., Rowe, M., Taylor, J.S., Tessmar-Raible, K., Wang, D., Wilson, K.H., Yaguchi, S., Gaasterland, T., Galindo, B.E., Gunaratne, H.J., Juliano, C., Kinukawa, M., Moy, G.W., Neill, A.T., Nomura, M., Raisch, M., Reade, A., Roux, M.M., Song, J.L., Su, Y.H., Townley, I.K., Voronina, E., Wong, J.L., Amore, G., Branno, M., Brown, E.R., Cavalieri, V., Duboc, V., Duloquin, L., Flytzanis, C., Gache, C., Lapraz, F., Lepage, T., Locascio, A., Martinez, P., Matassi, G., Matranga, V., Range, R., Rizzo, F., Rottinger, E., Beane, W., Bradham, C., Byrum, C., Glenn, T., Hussain, S., Manning, G., Miranda, E., Thomason, R., Walton, K., Wikramanayke, A., Wu, S.Y., Xu, R., Brown, C.T., Chen, L., Gray, R.F., Lee, P.Y., Nam, J., Oliveri, P., Smith, J., Muzny, D., Bell, S., Chacko, J., Cree, A., Curry, S., Davis, C., Dinh, H., Dugan-Rocha, S., Fowler, J., Gill, R., Hamilton, C., Hernandez, J., Hines, S., Hume, J., Jackson, L., Jolivet, A., Kovar, C., Lee, S., Lewis, L., Miner, G., Morgan, M., Nazareth, L.V., Okwuonu, G., Parker, D., Pu, L.L., Thorn, R., Wright, R., 2006. The genome of the sea urchin Strongylocentrotus purpuratus. Science 314, 941952. Sonnhammer, E.L., Eddy, S.R., Durbin, R., 1997. Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins 28, 405-420. Srivastava, M., Begovic, E., Chapman, J., Putnam, N.H., Hellsten, U., Kawashima, T., Kuo, A., Mitros, T., Salamov, A., Carpenter, M.L., Signorovitch, A.Y., Moreno, M.A., Kamm, K., Grimwood, J., Schmutz, J., Shapiro, H., Grigoriev, I.V., Buss, L.W., Schierwater, B., Dellaporta, S.L., Rokhsar, D.S., 2008. The Trichoplax genome and the nature of placozoans. Nature 454, 955-960. Srivastava, M., Simakov, O., Chapman, J., Fahey, B., Gauthier, M.E.A., Mitros, T., Richards, G.S., Conaco, C., Dacre, M., Hellsten, U., Larroux, C., Putnam, N.H., Stanke, M., Adamska, M., Darling, A., Degnan, S.M., Oakley, T.H., Plachetzki, D.C., Zhai, Y., Adamski, M., Calcino, A., Cummins, S.F., Goodstein, D.M., Harris, C., Jackson, D.J., Leys, S.P., Shu, S., Woodcroft, B.J., Vervoort, M., Kosik, K.S., Manning, G., Degnan, B.M., Rokhsar, D.S., 2010. The Amphimedon queenslandica genome and the evolution of animal complexity. Nature 466, 720-726. Stanley, S.M., 1973. An ecological theory for the sudden origin of multicellular life in the late precambrian. Proc Natl Acad Sci U S A 70, 1486-1489. Suga, H., Chen, Z., de Mendoza, A., Sebe-Pedros, A., Brown, M.W., Kramer, E., Carr, M., Kerner, P., Vervoort, M., Sanchez-Pons, N., Torruella, G., Derelle, R., Manning, G., Lang, B.F., Russ, C., Haas, B.J., Roger, A.J., Nusbaum, C., Ruiz-Trillo, I., 2013. The Capsaspora genome reveals a complex unicellular prehistory of animals. Nat Commun 4, 2325. 26

Takacs, C.M., Amore, G., Oliveri, P., Poustka, A.J., Wang, D., Burke, R.D., Peterson, K.J., 2004. Expression of an NK2 homeodomain gene in the apical ectoderm defines a new territory in the early sea urchin embryo. Dev Biol 269, 152-164. Tautz, D., Domazet-Loso, T., 2011. The evolutionary origin of orphan genes. Nat Rev Genet 12, 692-702. Tessmar-Raible, K., Raible, F., Christodoulou, F., Guy, K., Rembold, M., Hausen, H., Arendt, D., 2007. Conserved sensory-neurosecretory cell types in annelid and fish forebrain: insights into hypothalamus evolution. Cell 129, 1389-1400. Venkatesh, T.V., Holland, N.D., Holland, L.Z., Su, M.T., Bodmer, R., 1999. Sequence and developmental expression of amphioxus AmphiNk2-1: insights into the evolutionary origin of the vertebrate thyroid gland and forebrain. Dev Genes Evol 209, 254-259. Venter, J.C., Smith, H.O., Adams, M.D., 2015. The Sequence of the Human Genome. Clin Chem 61, 1207-1208. Wallin, J., Eibel, H., Neubuser, A., Wilting, J., Koseki, H., Balling, R., 1996. Pax1 is expressed during development of the thymus epithelium and is required for normal T-cell maturation. Development 122, 23-30. Wang, W., Zhong, J., Su, B., Zhou, Y., Wang, Y.Q., 2007. Comparison of Pax1/9 locus reveals 500-Myr-old syntenic block and evolutionary conserved noncoding regions. Mol Biol Evol 24, 784-791. Wheeler, B.M., Heimberg, A.M., Moy, V.N., Sperling, E.A., Holstein, T.W., Heber, S., Peterson, K.J., 2009. The deep evolution of metazoan microRNAs. Evol Dev 11, 50-68. Whitney, K.D., Garland, T., Jr., 2010. Did Genetic Drift Drive Increases in Genome Complexity? PLoS Genet 6, e1001080. Yi, S., Streelman, J.T., 2005. Genome size is negatively correlated with effective population size in ray-finned fish. Trends Genet 21, 643-646. Zdobnov, E.M., Bork, P., 2007. Quantification of insect genome divergence. Trends Genet 23, 16-20. Zhao, L., Saelao, P., Jones, C.D., Begun, D.J., 2014. Origin and spread of de novo genes in Drosophila melanogaster populations. Science 343, 769-772. Zhou, Q., Zhang, G., Zhang, Y., Xu, S., Zhao, R., Zhan, Z., Li, X., Ding, Y., Yang, S., Wang, W., 2008. On the origin of new genes in Drosophila. Genome Res 18, 1446-1455.

Table 1. The ancestral ‘metazoan’ genome Genomic feature

Inferred ancestral values

Presumptive genome size

~ 300 Mb

Gene family number

7,000 - 8,000

Total gene number

> 20,000

Average exon count

3-4

Repeat content

~30%

Macro-syntenic linkage groups

~10-17

Micro-syntenic linkage groups

~ 400

Table 2. Explanation of different novelty types

27

Novelty type

Definition

Example

Type 1

No detectable sequence similarity (by any, Wnt ligand family at including training-based methods) to outgroup the metazoan node data

Type 2

Detectable sequence similarity but novel domain Collagen at (e.g., defined by PFAM) metazoan node

Type 3

Same domain composition, but novel domain Innate immunity genes order (architecture) in invertebrates

Type 4

Same architecture, but duplicated/accelerated Lefty at the sequence evolution forming a subfamily; likely deuterostome node functional change

the

HIGHLIGHTS - Genomics provides insights into the origin and diversification of the metazoan ancestor - Recent studies quantify the rates and potential impact of several genomic features - Features such as gene content and synteny evolve at different rates across metazoans - Biological impact of novel synteny and noncoding regions remains largely understudied

28

I+:-/&:/ R5/,101/ I(#+18"1&/ R1&:6#&/

Non-Bilateria

10 1 2 1

Ecdysozoa ]#./(1-/ 81 R&:/8*5:-/ 1 ]#./(1.1&8"/ 0 X1&:,:6#&/ 0 ^:+1&"7+,"/ 0 4+7,"18"1&/ 0 T/&-:2&/-/$ 1 D&("&181-/ 239 Lophotrochozoa F155*',/ 10 D++#5:-/ 2 S&/,":181-/ 1 I7,5:18"1&/ 0 I"/#(12+/("/ 0 R"1&1+:-/ 0 S&7101/$[J,(18&1,(/\ 0 J+(18&1,(/ 0 D,/+("1,#8"/5/ 0
291 2 7

B

A

224

A. vaga

S-#551:-$&1(:6#& I1.C$Z#557 I1.C$Z#557

43

150 156 425 212 758 1229

348 324 228

L. gigantea C. teleta H. robusta

4@5$5:.8#( R157,"/#(/ U&#'"@/(#&$5##," M. leidyi P. bachei L. anatina X:+2*5/ H. dujardini ;/(#&$C#/& D,1&+@1&.$[D(5/+(:,\ S. kowalevskii D,1&+@1&.$[R/,:%,\ P!"#$%$ I. linei 4&("1+#,(:-

167

A. queenslandica

363

S. mansoni

=,":'(1'1./ E#.1'81+2#

98

T. adhaerens

R5/,101/

814

TBNML

180

97

Size (Mb)

357

S. purpuratus

H. sapiens

D. melanogaster

C. elegans

Species Name

=(/&5#($'#/$/+#.1+# N. vectensis

R*&85#$'#/$*&,":+

>*./+

U57

V1*+-$;1&.

Number of Common Name Decoded Spec.**

Deuterostomia I"1&-/(/ >#.:,"1&-/(/ J,":+1-#&./(/

Phylum Level Taxonomy *

Authors

D+,:#+($.#(/01/+$2#+1.#$,1.85#.#+($/+-$ $1&2/+:0/(:1+

<#+#'$/''1,:/(#-$@:("$A:':1+B$C/5/+,#B$/+-$ ,"#.1'#+'/(:1+$:+$*&,":+3$

T T1(/5$ "*./+$2#+#'$6#@#&$("/+$("1*2"(3 T1(/5$"*./+$2#+#'$6#@#&$("/+$("1*2"(3

!"#$%&'($.#(/01/+$;<=$2#+1.#3 >#(#&1?#*,"&1./(:+ $(&/+':(:1+$01+#3

!"#$%&'($'#)*#+,#-$.#(/01/+$2#+1.#3$ 456/,(1&7$&#,#8(1&$#98/+':1+3$

(Some) Remarkable Genomic Features

E#*(#&1'(1.#$+1A#5(:#'$/+-$("#$8"/&7+2#/5$,5*'(#&

J98/+':1+$16$,":(:+$'7+("/'#$,1&&#'81+-:+2$(1$("#$'"#55$ 61&./(:1+3$

F#'1-#&./5$2#+#'$/C'#+($6&1.$I(#+18"1&#$2#+1.# H+-#8#+-#+($#A15*(:1+$16$("#$+#&A1*'$'7'(#.

H+(&/2#+1.:,$'7+(#+7$/+-$/.#:1(:,$#A15*(:1+ 7

F*5(:85#$-*85:,/(:1+'$16$>19$,1.85#.#+('$:+$("#$5##,"3

NOKP$$F:G"/:51A$#($/53B$ J5#.#+(/5$2#+#'$16$("#$.#(/01/+$'#+'1&7$'7'(#.'

NOKY$$=:./G1A$#($/53B

NOKT$$V7/+$#($/53B NOKW$$F1&10$#($/53B 2015 X*1$#($/53B 2015 S11("C7$#($/53B

NOKT$$U51($#($/53B

NOKN$$=:./G1A$#($/53B

16$#A/+($2#+#'3

2010 =&:A/'(/A/$#($/53B =:9$"/55./&G'$16$/+:./5$.*5(:,#55*5/&:(7$/+-$("#$#A15*(:1+$

NOOL$$S#&&:./+$#($/53B F:,&1$#91+$/+-$*+*'*/5$:+(&1+$':0#$-:'(&:C*(:1+

$("/+$,#55$(78#'$16$(":'$1&2/+:'.

2008 =&:A/'(/A/$#($/53B E:A#&':(7$16$-#A#518.#+(/5$2#+#'$.1&#

2007 R*(+/.$#($/53B

NOOP$$=Q<=I$#($/53B

NOOK$$H><=I3B, V#+(#&$#($/53B

2000 D-/.'$#($/53B

KLLM$$I=I3B

Year

Gene structure novelty

Local gain and loss

Micro-synteny

Macro-synteny

Linkage group 1

Linkage group 2

species B

species A

species B

species A

species B

species A

species B

species A

!-catenin

GSK3!

Fz

Ancient Type I Type II Type III

non-canonical pathway

canonical pathway

Ax in

Axin

LRP5/6

Wnt

Gene structure novelty Local gain and loss Metazoan Ancestor

Micro-synteny Macro-synteny

Bilaterian Ancestor

Helobdella

Lottia

Octopus

Lophotrochozoa

Drosophila Ecdysozoa

Saccoglossus Human Deuterostomia

Hydra Cnidaria

nkx2 duplication

Loss of gill pores Gain of pentamerous symmetry

Emergence of the Pharyngeal Cluster

Hemichordate

Cepharochordate

Diversified the spatial gene expression

Echinoderm

Lost msxlx

Lophotrochozoa

Vertebrates

nkx2.1

nkx2.1

nkx2.1

nkx2.1

nkx2.1

nkx2.1

B. floridae

S. kowalevskii

TADs Locus

Cnidaria

S. purpuratus

P. dumerilii

M. musculus

X. tropicalis

nkx2 nkx2.2

nkx2.2

nkx2.2

nkx2.2

nkx2.2 X. tropicalis

C

B. floridae

nkx2.2

A.millepora

S. kowalevskii

S. purpuratus

P. dumerilii

M. musculus

msx3

?

msxlx

?

msxlx

?

msxlx

msxlx

A.millepora

pax1/9

C

S

pax1/9

C

paxD

pax1/9

?

pax1

pax1&pax9

S

S

S

S

S

B. florisae

X. tropicalis

S

T. transversa

S. kowalevskii

M. musculus

M

foxA

foxA

foxA1/2

foxA1/2

M

foxA1/2

foxA1/2

M

M

foxA1/2

M

M. musculus

S. kowalevskii

S. purpuratus

pax9

C

C B. floridae

N.vectensis

C. variopedatus

N.vectensis

B. florisae

D. japonica

X. tropicalis