Evolution of the genes mediating phototransduction in rod and cone photoreceptors

Evolution of the genes mediating phototransduction in rod and cone photoreceptors

Journal Pre-proof Evolution of the genes mediating phototransduction in rod and cone photoreceptors Trevor D. Lamb PII: S1350-9462(19)30110-7 DOI: ...

2MB Sizes 0 Downloads 54 Views

Journal Pre-proof Evolution of the genes mediating phototransduction in rod and cone photoreceptors Trevor D. Lamb PII:

S1350-9462(19)30110-7

DOI:

https://doi.org/10.1016/j.preteyeres.2019.100823

Reference:

JPRR 100823

To appear in:

Progress in Retinal and Eye Research

Received Date: 31 August 2019 Revised Date:

21 November 2019

Accepted Date: 21 November 2019

Please cite this article as: Lamb, T.D, Evolution of the genes mediating phototransduction in rod and cone photoreceptors, Progress in Retinal and Eye Research, https://doi.org/10.1016/ j.preteyeres.2019.100823. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Elsevier Ltd. All rights reserved.

2 Contents Introduction Background to analysis of phototransduction cascade evolution 2.1 Species phylogeny 2.2 Molecular phylogeny 2.3 Individual gene duplication versus whole genome duplication (WGD) 2.4 Gene synteny 3 Evolution of G-proteins and origin of the proto-vertebrate phototransduction cascade 3.1 Overview of G-protein evolution 3.2 Origin of the proto-vertebrate phototransduction cascade 4 Evolution of the activation steps of vertebrate phototransduction 4.1 Transducin alpha subunits (GNAT1–3) 4.2 G-protein beta subunits (GNB1–4) 4.3 PDE catalytic subunits (PDE6A,B,C) 4.4 PDE inhibitory subunits (PDE6G,H,I) 4.5 Cyclic nucleotide gated channels (CNGA1–4, CNGB1,3) 5 Evolution of the recovery steps of vertebrate phototransduction 5.1 G-protein receptor kinases (GRK1A,1B,7) 5.2 Arrestins (SAG, ARR3, ARRB1, ARRB2) 5.3 Regulator of G-protein signalling (RGS9, Gβ5 and R9AP) 6 Evolution of Ca-feedback regulation of vertebrate phototransduction 6.1 Na+-K+/Ca2+ exchangers (NCKX1,2) 6.2 Guanylyl cyclases (GC-E, GC-F, GC-D) 6.3 Guanylyl cyclase activating proteins (GCAP1, 1L, 2, 2L, 3) 6.4 Recoverin and visinin 7 Evolution of vertebrate visual opsins 8 A synthesis of the co-evolution of the genes for the vertebrate phototransduction cascade 8.1 Pattern and timing of phototransduction gene duplications 8.2 Summary of the evolution of individual phototransduction components 8.3 Co-evolution of components: Stages in the evolution of vertebrate phototransduction 8.4 Origin of photopic/scotopic dichotomy in vertebrate phototransduction 8.5 Refinement of the distinct isoforms for rods and cones 8.6 Summary 9 Future directions References Tables and Figures 1 2

3 4 5 5 8 10 13 13 14 15 15 16 17 18 19 20 20 22 23 24 24 25 26 28 28 29 29 30 32 33 34 36 36 38 45

3

1

Introduction

The primary purpose of this article is to review our current understanding of the evolution of the genes that mediate vertebrate phototransduction, and thereby to provide a clearer description of how the cascade of phototransduction reactions evolved over hundreds of millions of years. In doing so, this paper fills a gap in the overall picture of the evolution of vertebrate photoreceptors and vertebrate retina that I presented in this journal six years ago (Lamb, 2013). The components of the vertebrate phototransduction cascade are represented schematically in Figure 1. The activation steps are shown in the foreground, with activation flowing from left to right. Upon absorption of a photon of light, the activated visual pigment (Rh) catalyses the activation of the G-protein transducin (G), which in turn activates the phosphodiesterase (PDE), causing it to hydrolyse cyclic GMP (cG) so that the cyclic GMP concentration drops, thereby closing cyclic nucleotide-gated channels (CNGCs) and generating the photoreceptor’s electrical response. Note that the depiction in Figure 1, and in particular the topology of the membrane, is a generic form representing both cone and rod phototransduction. For cones, which mediate daytime vision, all of these proteins are located in the plasma membrane, as sketched. In contrast, for rods, which mediate night-time vision, only the ion channels (CNGCs) and the exchanger (NCKX) are located in the plasma membrane; the other proteins are restricted almost exclusively in the membranes of the pinched-off free-floating discs, with the disc and plasma membranes separated from each other by the cytoplasmic medium. Figure 1. Schematic representation of the phototransduction cascade The boxes above and below the schematic in Figure 1 give the HGNC names of the genes encoding the respective proteins in human. Remarkably, for 12 of the 17 illustrated classes of protein subunit, there are separate cone and rod isoforms, indicated in red and blue, respectively; only a handful of protein components are encoded by a common gene in both cones and rods, as indicated in black. (In various species, an isoform here or there has been lost, obscuring the general cone/rod duality. For example, most vertebrate lineages have lost either recoverin or visinin, with the result that both classes of photoreceptor then express a common isoform; see Section 6.4.) These protein families and the genes encoding them are described more comprehensively in Table 1. Table 1. Phototransduction cascade proteins and genes Because of their use of distinct protein isoforms, cones and rods represent a unique evolutionary system, where the same process (the detection of light) uses a different set of genes in different classes of cell. This situation raises a number of fundamental questions, including the following. How did the cone/rod duality of isoforms arise? When did the various gene duplication events occur? To what extent were any of those duplications synchronous, in the form of duplication of the entire genome? In what manner has each pair of isoforms diverged since their formation? What factors provided a survival advantage to the organism? Can we trace the entire sequence of events that led to the evolution of the separate cone and rod phototransduction cascades? And, finally, can we use this knowledge of evolution to enhance our overall understanding of the process of phototransduction? Over recent decades there have been numerous studies and reviews of the evolution of the huge family of opsin genes that encode the light-absorbing protein, rhodopsin and its cousins, and this facet of phototransduction will be considered in Section 7. In contrast, there have been far fewer studies of the evolution of the genes that encode the proteins for all the other steps in

4 phototransduction. For several of the individual proteins of the cascade, there have been studies of gene phylogeny, and these will be referred to in the relevant sections below. One of the earliest studies to analyse the evolution of multiple families of phototransduction components was by Hisatomi and Tokunaga (2002), who concluded that the isoforms they found were likely to have duplicated after the ‘prototype’ vertebrate phototransduction cascade had arisen. Then, Nordström et al. (2004) in Uppsala undertook a major study of the gene duplications required to explain the separate isoforms found for nine families of proteins involved in vertebrate phototransduction. They concluded that each of those duplication events appeared to have involved large blocks of genes, and possibly entire chromosomes. Subsequent work from the Uppsala group has greatly extended our understanding of those block duplications (Larhammar et al., 2009; Lagman et al., 2013), especially with respect to the transducins and PDEs (Lagman et al., 2012, 2016). Recently, my colleagues and I have examined the evolution of each of the steps in phototransduction, grouping those steps as: activation (Lamb et al., 2016; Lamb and Hunt, 2017); recovery (Lamb et al., 2018b); and Ca-feedback (Lamb and Hunt, 2018). This review aims to draw together, and where possible to extend, all such analysis of the evolution of the overall cascade of vertebrate phototransduction.

2

Background to analysis of phototransduction cascade evolution

The following sub-sections are intended to provide background information for those who are not closely involved in studies of gene evolution, so as to make the subsequent presentations and analyses in Sections 3 to 7 more readily comprehensible to the non-specialist. The raw data for these analyses are the genes (and the entire genomes) of numerous living species, for which the number of adequately annotated assembled genomes is expanding rapidly. To study the evolution of one class of protein (for example, the cGMP phosphodiesterases), one can examine the similarity of the molecular sequences across a wide range of species, to obtain a molecular phylogeny that describes the apparent degree of relatedness of members of the family (Section 2.2). Such a molecular phylogeny will help elucidate the gene duplications that have occurred, and will provide a purported pattern of branchings for the species under consideration. Therefore, as a minimum, one needs to be cognisant of the generallyaccepted pattern of species branchings, as determined from numerous studies of species phylogeny (Section 2.1). In an ideal world, one would wish to reconcile the branching pattern extracted from molecular phylogenetic analysis with the ‘true’ pattern of species branching, and in some cases it is feasible to apply constraints aimed at eliminating serious discrepancies. For now, though, the important point is that one needs to begin with a reliable species phylogeny. It is arguable that the most important factor in the evolution of vertebrates was the occurrence of two rounds of whole-genome duplication (2R WGD) in a chordate ancestor of vertebrates, as originally proposed by Susumu Ohno (1970); see Section 2.3. This pair of duplication events generated a potential quadruplication of every original gene, and paved the way for enormous diversification because, for example, one of the encoded proteins might retain its ancestral function whereas another copy (or copies) might evolve new or altered functions. Accordingly, an important aspect regarding each of the duplication events reported by molecular phylogenetic analysis is to determine whether it occurred before, during, or after 2R WGD. In many cases, this determination is greatly assisted by examining gene synteny (Section 2.4); that is, by analysis of the locations along the chromosomes of genes of interest relative to other genes. Thus, it is often straightforward to identify a family of paralogs that arose through 2R WGD (socalled ‘ohnologs’, see Section 2.3), in the form of a set of 2, 3 or 4 genes located on chromosomal regions that are also occupied by other neighbouring sets of identified ohnologs. Despite the extensive relocation of genes that has occurred over hundreds of millions of years, there remain

5 distinctive ‘signature’ features of the ancestral arrangement of the four regions of the quadruplicated genome, and these provide important clues to the evolutionary history. Once a particular duplication has been identified as having resulted from 2R WGD, it is then straightforward to assign any other duplications in that gene family as having occurred either before or after 2R WGD.

2.1

Species phylogeny

As alluded to above, it is useful and sometimes imperative, in the interpretation of a molecular phylogeny, to consider the species phylogeny of those species whose sequences were analysed. A one-dimensional view of the divergence of other lineages from our own lineage is shown in Figure 2A. The horizontal blue line represents our direct ancestors, plotted along an axis of estimated (and very approximate) time in millions of years ago (Mya). Each blue circle denotes an important divergence of another lineage from our own ancestors; for example, sauropsids (comprising reptiles and birds) diverged from a common ancestor we share with them around 320 Mya. Each horizontally-oriented name just below the line denotes the name of the clade that encompasses all of the lineages to the right of the adjacent dotted vertical line; thus, from at least the time at which sauropsids diverged, our lineage (and theirs) can be referred to as amniotes, a term that encompasses all sauropsids and all mammals. Figure 2. Species phylogeny The small yellow and cyan markers after the branching of tunicates show the approximate timing of the postulated two rounds of whole-genome duplication (2R WGD) that are understood to have led to the potential quadruplication of genes in a chordate ancestor of vertebrates (i.e. in a ‘proto-vertebrate’); see Section 2.3. The absolute timing of this pair of duplication events is uncertain, but it occurred after the divergence of tunicates and before the divergence of cartilaginous fish (Putnam et al., 2008). Here this pair of duplications is assumed to have occurred prior to the divergence of agnathan vertebrates, and is shown as having occurred at around 600 Mya; however, other estimates place the duplications at around 500 Mya (Larhammar et al., 2009). A third round of genome duplication (3R) occurred subsequently in teleost fish, with the result that teleosts frequently retain two copies of each gene found in most other vertebrates. What is missing from the one-dimensional view in Figure 2A is a representation of the multitude of divergences that have occurred within lineages other than our own; instead only two cases of interest (teleosts and birds) are indicated in this linear plot. For those species whose molecular sequences are used later in this paper for the analysis of sequence phylogeny, Figure 2B shows the currently-accepted view of lineage evolution, that will be taken into consideration in interpreting the molecular trees. Note that this panel provides no indication of timing, and instead simply sketches the topology of the divergences of lineages. For more extensive information about species phylogeny and estimates of divergence times, the reader is referred to Erwin et al (2011), Kumar et al (2017) and web resources including the Tree of Life Web Project (http://tolweb.org/tree/phylogeny.html) and TimeTree (http://www.timetree.org/).

2.2

Molecular phylogeny

Three separate processes are involved in the creation of a molecular phylogeny: (i) selecting (or obtaining) the molecular sequences to be analysed; (ii) aligning those multiple sequences; and (iii) inferring the evolutionary branching pattern that is most likely to have generated the sequences.

6 Obtaining sequences for early-diverging vertebrate species. Although published databases contain sequences from numerous species, the current coverage of lineages is very non-uniform. In order to improve the prospects for reliable reconstruction of the branching pattern in early vertebrate evolution, it is important to include species from agnathan vertebrates, from cartilaginous fish, and from non-teleost ray-finned fish, but to date these groups have been poorly represented in published databases. Therefore, to help fill the gaps, Lamb et al. (2016) used highthroughput sequencing followed by bioinformatics analysis to obtain the eye transcriptomes of a hagfish, two species of lamprey, three species of shark, and two species of ray, all from Australian waters; in addition, they included two species of non-teleost ray-finned fish from the northern hemisphere (bowfin and Florida gar). In addition, new high quality genomes are being added to public databases at an accelerating rate, so that the range of genes that can be examined is continually expanding. This is likely to continually improve the quality of molecular phylogenetic analysis that one is able to achieve. Selecting an appropriate range of species. To provide a reasonably well-balanced coverage of species across the whole range of vertebrates, each of the phylogenies presented in this review includes (as far as possible) the following jawed vertebrate taxa: three placental mammals; three marsupials; three birds; three other sauropods (i.e. reptiles); two amphibians; coelacanth; two teleosts; two other ray-finned fish; three sharks, three rays and one chimaera. In many cases it is found that the phylogeny is both clear-cut and informative when the taxa examined are restricted to jawed vertebrates, but in several cases (e.g. the GNATs and PDE6s) it turns out to be more informative to additionally include lampreys (for which data are available from three or four species), and sometimes hagfish. However, the hagfish sequences are often found to be highly divergent, and often only a single species is available; in such cases hagfish will be omitted. Outgroup selection. It is often straightforward to identify a closely-related but distinct family (or families) of vertebrate genes that can serve as outgroup; for example, PDE5 and PDE11 in the case of PDE6. In such cases, a subset of half-a-dozen or so jawed vertebrate sequences can be chosen to form the outgroup. In other cases, as for example with the arrestins, a sufficiently closely-related family of vertebrate genes cannot be identified, and in such cases the outgroup will need to comprise related sequences from invertebrate taxa. Wherever possible, the most closely related sequences from tunicates (e.g. Ciona) and lancelets (e.g. Branchiostoma) will be included. Aligning the multiple sequences. It is important to obtain the best possible alignment, because an ‘incorrect’ alignment (or indeed any change in the alignment) will lead to a tree that may differ significantly. Yet there is no fool-proof approach, and nor is there a clear test of whether the alignment produced by one program is genuinely better than that produced by another. Therefore, it currently remains important to visually inspect the alignment and to look for obvious problems. For the phylogenies presented here, the entire alignment has been used, except in the case of the guanylyl cyclases (GCs), where the divergent terminal regions (both N- and C-termini) have been trimmed manually. The choice was made to analyse amino acid sequences rather than nucleotides. One practical reason for avoiding nucleotide sequences is the added complexity and uncertainty involved in aligning codon-based sequences. But another possibly more important reason is that the rapid rate of nucleotide substitutions combined with the long time-scale across the vertebrate branches means that the nucleotide changes become saturated (because of multiple substitutions); thus, amino acid sequences are preferable for deep branches. The alignment tool chosen was MAFFT v7.409 (Katoh and Standley, 2013) with its L-ins-i option. Inferring the phylogenetic tree. The phylogenetic tree presented in Figure 3 is an example for illustrative purposes: firstly, it helps to provide a view of the tree inference process and,

7 secondly, its is useful in interpreting the phylogeny that is obtained. This particular tree was obtained for vertebrate rod transducin alpha subunits (GNAT1s, Gαt1s) and has been extracted from the larger tree for GNATs and GNAIs in Figure 11. In essence, the tree inference process has placed each molecular sequence near its close relatives, in such a manner as to maximise the likelihood that the plotted tree represents the ‘true’ evolutionary tree. The process of inferring the maximum-likelihood (ML) tree is complicated, but well-studied; vast numbers of alternative branching patterns are examined during the process, and for each such tentative tree the likelihood of its occurrence is calculated. This calculation of likelihood is made in accordance with established models for the probability that, at each of the sites, one particular amino acid might be replaced by another as a result of mutation in the nucleotide sequence. The process of searching for the tree that exhibits the maximum likelihood has a substantial stochastic element, and on repeated trials it does not always yield the same outcome. Because the magnitudes of the estimated likelihoods are extremely small, their values are universally specified logarithmically, as ‘log likelihoods’. Figure 3. Example of molecular phylogeny (vertebrate rod transducins, GNAT1) The numbers adjacent to each node in Figure 3 are ‘estimated bootstrap probabilities’, that provide an indication of the percentage chance that the sequences included within each clade have been correctly placed there. Historically, these values have been calculated by a method termed ‘bootstrapping’ (Felsenstein, 1985) whereby sites in the alignment are randomly resampled (with replacement) to generate pseudoreplicates, and then the entire ML tree is re-calculated; because of its need for repetition, this process can be exceptionally time-consuming. Recently, though, an alternative approach, termed the ‘ultrafast bootstrap approximation’ (Hoang et al., 2018), has been developed, that is orders of magnitude faster, yet provides bootstrap estimates that appear to be more unbiased than those from the classical method. In part because of this speed advantage, and in part because of its thorough tree-searching algorithm, the tree inference tool chosen in this study was IQ-Tree (Nguyen et al., 2015). The protein substitution model used throughout this paper was WAG (Whelan and Goldman, 2001), but closely similar results were obtained using the LG model (Le and Gascuel, 2008). In most cases, this gave a robust phylogeny with high levels of bootstrap support for the major clades and nodes; however, in a few cases where bootstrap support levels were not very high, the calculations were re-run with inclusion of allowance for rate heterogeneity (using IQ-Tree’s option ‘WAG+R4’); use of this option is noted in the legends for Figure 10, Figure 13, and Figure 14. Interpreting the phylogenetic tree. The example phylogenetic tree in Figure 3 permits a number of interpretations. Firstly, as is also indicated by the collapsed tree in the inset, it shows that jawed vertebrate (i.e. gnathostome) GNAT1s form a clade with unanimous (100%) support, and that hagfish and lamprey GNAT1s likewise form (small) clades with unanimous support. Secondly, it shows that within the jawed vertebrate clade, there is a clear tendency for groupings into the main evolutionary lineages. On the other hand, the placement of some groups (e.g. amphibians relative to sauropsids in Figure 3) does not conform to the accepted position shown in Figure 2. However, the placement of those groups is associated with very low bootstrap support, of 44% and 53%, at the relevant nodes, suggesting that the amphibian and sauropsid sequences could be constrained to their expected position with very little change in log likelihood of the tree. That specific prediction has not been tested here, but other similar constraints on tree topology are indeed tested quantitatively in later sections. A third interpretation stems from the lengths of the branches leading to the jawed vertebrate clade, and to the hagfish and lamprey clades, of approaching 0.1 amino acid residues per site in each case. This observation indicates that these three clades had each evolved by a moderate amount from their common GNAT1 ancestor (that resulted from 2R WGD) prior to speciation within each group.

8 Constraints on tree topology. As a result of the stochastic nature of the mutation of bases in genes (and thence of residues in proteins), there is always a substantial component of ‘noise’ present in the molecular sequences being analysed, and this leads to uncertainty in the topology of the tree that is inferred. Typically, one finds that minor rearrangements of the tree (e.g. the swapping of neighbouring branches) causes very little change in the calculated log likelihood, and that there exists a ‘landscape’ of slightly different trees that exhibit very similar log likelihoods. In such cases, the estimated bootstrap support is typically quite low at the nodes where swapping has little effect. As a result, one needs to inspect the ML tree carefully, looking both for low support values and also for topologies that appear implausible; e.g. for inconsistency with the known species phylogeny, or for inconsistency with the assumptions of 2R WGD. It is often possible to apply a constraint to the topology, in order to eliminate that inconsistency, and then to recalculate the ML tree subject to that constraint (or constraints). If the constrained tree fits better with one’s presumptions, then what is absolutely crucial is to apply suitable tests of topology to ascertain whether the differences between the trees are due simply to chance. Specifically, one examines the null hypothesis that the constrained tree is just as likely as the unconstrained (ML) tree, and one applies tests of whether this hypothesis should be rejected at an appropriate criterion probability level. IQ-Tree provides three suitable tests via its ‘-z’ option: the Bootstrap Proportion test using the RELL method, giving bp-RELL (Kishino et al., 1990); the Expected Likelihood Weight test, giving c-ELW (Strimmer and Rambaut, 2002); and the Approximately Unbiased test, giving p-AU (Shimodaira, 2002). Only those trees that passed all three of these tests at the 95% confidence level (i.e. with p ≥ 0.05) were considered further. Summary. Overall, the kind of information that one can obtain from a phylogenetic tree for some protein in which one is interested includes: the number of isoforms of that protein/gene that existed in the ancestral vertebrate organism; the pattern of duplications that formed those ancestral genes from a common precursor; the timing of such duplications, relative to 2R WGD; the extent of change in protein composition that has occurred in those different genes, prior to and also subsequent to the radiation of vertebrates; and any lineages from which the gene has subsequently been lost. In this paper, I re-calculate the molecular phylogenies for the majority of the proteins involved in vertebrate phototransduction, and then discuss the interpretation of the gene duplications likely to have generated each component, with particular attention to the origin of rod versus cone dichotomy, where that exists.

2.3

Individual gene duplication versus whole genome duplication (WGD)

The occurrence that may reasonably be regarded as the most far-reaching event in the evolution of vertebrates was the quadruplication of the entire set of chordate genes, as a result of two successive rounds of whole genome duplication (2R WGD). This pair of events occurred after the divergence of tunicates, and most likely prior to the divergence of agnathan vertebrates from our own lineage (Figure 2). This quadruplication of the genome may have given early vertebrates a major advantage over other creatures in the Cambrian seas, and may well have permitted the subsequent radiation of vertebrate species and the great success that vertebrates have subsequently achieved. The first study aimed at reconstructing the chromosomal arrangement (karyotype) of the ancestral proto-vertebrate organism following 2R WGD was undertaken by Nakatani et al (2007). It provided a view of the very extensive chromosomal reorganisations that have subsequently occurred in different vertebrate lineages, and outlined the considerable difficulties in this kind of analysis. Numerous subsequent studies have extended that work, and interestingly Putnam et al. (2008) have shown that the lancelet genome may provide a good model for the ancestral chordate genome prior to WGD.

9 The precise timing of the two genome duplications can only be guessed at, but in Figure 2 they are suggested to have occurred around 600 Mya. From the difficulty there is in separating the first and second rounds of duplication in molecular phylogenies (see subsequent results), it seems likely that these two duplication events were separated by a relatively short interval (in evolutionary terms) that might not have exceeded, say, 10 My. A schematic diagram is presented in Figure 4 with examples of the kinds of events that would have occurred through the concatenation of three processes: local gene duplication, the pair of genome duplications, and the loss of various genes. The top row indicates a hypothetical initial situation in our chordate ancestor, prior to any genome duplication; the middle pair of rows shows the situation after the first genome duplication, and the bottom four rows show the situation after the second round of genome duplication, and include gene losses that occurred prior to the radiation of vertebrate species. Genes that arose through this process have been referred to as ‘ohnologs’, in honour of Susumu Ohno who proposed the genome duplication mechanism (Ohno, 1970). The vertically-arranged sets in the bottom four rows in Figure 4 can be termed ‘ohnolog families’. Figure 4. Schematic with examples of the combined effects of local gene duplication, wholegenome duplication, and gene loss In the example scenario in Figure 4, ohnolog families A and F are shown as having retained all four quadruplicate members; in the genomes of extant vertebrates, it is thought that around 200 such families remain (Singh et al., 2015), accounting for around 800 of the roughly 20,000 protein-coding genes. Ohnolog families B, C, D and E depict various ways in which only 3 or 2 members, or even just a single member, might have been retained. In the top row of Figure 4, the curved arrows indicate three example cases in which a local gene duplication had already occurred prior to attainment of the organisation of genes shown in that top row. The scenario for the pair of families (G, H) is likely to account for the observed arrangement of jawed vertebrate genes for the families GNAI and GNAT (see Section 4.1), where three copies of each are retained and where in each case the GNAT gene is adjacent to a GNAI gene. The final two examples, shown by the pairs of gene families (J, K) and (L, M) highlight a limitation in the analysis of phylogeny and synteny. After 2R WGD, the combined families each retain four genes; i.e. two J and two K in the first case, and one L and three M in the second case. As a result, it is very likely that each of these sets of four genes will appear to be ohnologs, especially if the local duplication in the top row occurred only shortly before 2R WGD; hence, on a strict interpretation these are not in fact ‘true’ ohnologs, though for certain purposes they may be regarded as such. One consequence of potential local gene duplications before, and of potential gene losses after, whole genome duplication is that it is entirely possible for a molecular phylogeny to be robust, and yet to show an unexpected relationship between proteins. It is equally possible for an incorrect pattern of branching to be deduced from a sound and convincing phylogeny. A specific case in point will be presented subsequently, where it will be suggested that the scenario in (J, K) may represent the situation for the four vertebrate arrestins (Section 5.2). Finally, it is worth pointing out that some lineages and some species have experienced additional genome duplications. In particular, teleost fish have undergone a third round (3R) of genome duplication, and therefore possess two copies of most of the genes found in other vertebrate species. In a very different case, Xenopus laevis is allotetraploid, meaning that it possesses around twice the number of chromosomes as diploid species, apparently as a result of the hybridisation of two distinct but related ancestral Xenopus species. Other examples of multiple copies of chromosomes and/or genes abound. However, the existence of additional copies of genes can complicate the analysis of molecular phylogeny and gene synteny, and as a

10 result it is often simplest to avoid teleost fish species, and similarly to use X. tropicalis rather than X. laevis.

2.4

Gene synteny

Chromosomal arrangement of phototransduction genes. Figure 5 shows the chromosomal positions, across four jawed vertebrate species, of a selection of 37 ohnolog families comprising 123 extant genes. In addition to the four families of phototransduction genes (G-protein β subunits, arrestins, visual GCs, and visual GRKs), I have included every set of additional ohnolog families that I could locate, that lay substantially on the same set of chromosomes as the illustrated phototransduction genes, subject to the restriction that each family should comprise at least three extant members. This involved laborious manual searching, using as a basis the sets of ohnolog families identified by Singh et al (2015), ohnologs.curie.fr); this procedure led to the identification of 33 such ohnolog families. The species that have been selected for examination are spotted gar, chicken, opossum and human, on the basis firstly that each of these genomes has been assembled into reasonably complete sets of chromosomes (i.e. relatively few genes remain on unplaced scaffolds), and secondly that none has undergone a third round of duplication (thus, teleost fish have been excluded). Figure 5. Synteny of a subset of phototransduction genes across four species Inspection of Figure 5 provides evidence that these 37 gene families form a paralogon (a paralogous chromosomal region derived from a common ancestral region). As a first indication, for each column almost all of the genes reside either on a single chromosome or else on just two chromosomes. To help illustrate this, coloured shading has been used to indicate those columns that include only a single chromosome. Furthermore, within each column, most of the genes are in reasonably close proximity to each other; for example, for the opossum column under ‘Ancestral 1’, all 23 genes reside within a span of 3.3 Mb on opossum chromosome 2. Overall, what is important is not just the finding of proximity, but additionally the fact that all 37 families conform to a broadly similar pattern of gene locations across the four species examined. As a result, this set of 37 gene families shows the hallmark features of having arisen during 2R WGD quadruplication of a single set of ancestral genes, once some allowance is made for rearrangement of genes in the subsequent hundreds of millions of years. On the other hand, one cannot rule out the possibility that local gene duplications and deletions, of the kind indicated in the right-hand sections of Figure 4, might have contributed. Such a high degree of conformity across the four species can only be expected over relatively short stretches on each chromosome, because rearrangements of genes within and between chromosomes have occurred differently in different species – so-called lineage-specific genome rearrangements. An example of two ‘breaks’ in chromosomal coverage can be seen in Figure 5, marked by the horizontal line between the GNBs and GPCs. (Because these particular breaks occur across all four species, and in both of what will subsequently be shown to be regions that arose at the second round of WGD, it is possible that this rearrangement originated in one of the two duplicates that existed during the interval between 1R and 2R.) In other cases, breaks can be restricted to a sub-set of the species examined. If such a break affected only opossum and human, then one might suspect a chromosomal rearrangement in a stem mammal; if it affected chicken, opossum and human, then one might suspect a rearrangement in a stem amniote; and so on. What cannot be determined simply from the arrangement of ohnologs in Figure 5 is which pairs of ancestral chromosomal regions diverged at the first round of genome duplication (1R). To establish this, one additionally needs information about gene phylogeny, as will be examined shortly.

11 Gene synteny for multiple phototransduction genes. As a first step in analysing the syntenic arrangement of phototransduction genes, the four principal columns from Figure 5 (labelled ‘Ancestral 1’ – ‘Ancestral 4’) have been converted into the four rows of Figure 6B; however, for compactness, the non-phototransduction families have been restricted to those that retain all four members and that appear to be ‘genuine’ quartets (see below). Figure 6B includes four families of phototransduction genes together with seven families of nearby ohnolog ‘quartets’. The other three panels in Figure 6 likewise present additional regions that contain one or more families of phototransduction genes together with their nearby ohnolog quartets. This gives a total of 13 families of phototransduction genes (comprising 35 phototransduction genes), together with 26 non-phototransduction ohnolog quartets (comprising another 104 genes). Figure 6. Overview of syntenic arrangement of vertebrate phototransduction genes Although the tabulation of gene locations in Figure 5 is provided only for panel B of Figure 6, the genes in each of the other panels are likewise located in close proximity to each other (data not shown). Accordingly, each of the five groupings in Figure 6 is likely to represent a locally paralogous region (i.e. paralogon); note that there are two such groupings in panel C. Furthermore, analysis of the gene locations in the four taxa provides suggestive evidence that the rows numbered 1 to 4 may be continuous across all three panels; in other words, it may be the case that each of the four numbered rows continues across each of the four panels. If this interpretation proves to be correct, then the arrangement in Figure 6 would portray a single large paralogon, that would include 35 phototransduction genes along with hundreds of non-phototransduction genes (of which only 104 are shown). Pairs of rows that diverged at 1R. What has not been determined up to this point is which pair of the four rows diverged from which other pair at the first round of genome duplication (1R). Potentially, this question can be resolved by phylogenetic analysis, as will now be addressed. In undertaking such analysis, it is important to concentrate on ‘genuine ohnolog quartets’ that retain all four members and that show no signs of intrusion of invertebrate sequences, or other reasons for rejection. It is because of this importance that only apparently ‘genuine’ quartet families are illustrated in Figure 6. Then, for every genuine quartet of ohnologs, one can calculate the molecular phylogeny for that quartet and, in principle, determine which pairs of rows are sisters in that local vicinity. In Figure 5 there are 10 families that each comprise four members, but for the purposes of obtaining genuine quartets, three of those families were rejected: the STAGs had protostome sequences embedded (in the Ensembl98 gene tree); the SOX19 family was found only in some species of bony fish, and it had an additional intron; and the TSC22D family was problematical to analyse because it had huge differences in sequence length between clades, and because its TSC22D4 clade only had convincing members within mammals. Accordingly, only the seven remaining quartet families (indicated by grey shading in Figure 5) were transferred to Figure 6B, and used in the analysis of phylogeny. The analysis of quartet phylogeny is illustrated in Figure 7 for eight examples of ohnolog quartets taken from Figure 6; two quartets have been taken from each of the four panels, A–D. In each of these eight unconstrained molecular phylogenies there is at least 98% support for the illustrated topology. For every one of the 26 ohnolog quartets shown in Figure 6 the phylogenetic analysis supported the pairings indicated by the grey links between genes; for 17 of these quartets the level of support was at least 99%, for another six it was at least 95%, and for the remaining three it was 94%, 92% and 92% (see Supplementary Figure 13). Accordingly, in light of the high support across multiple quartets near each phototransduction gene family, there is extremely strong support for the pairings of phototransduction genes depicted in Figure 6. Note that the only thing that each phylogeny establishes is which pairs are sisters; i.e. the phylogeny cannot allocate

12 the positions of the clades onto rows in Figure 6; that allocation needs to be accomplished by reference to gene synteny relationships of the kind shown in Figure 5. Figure 7. Molecular phylogenies for eight examples of ohnolog quartets Interestingly, the pairings depicted in panel B differ from the interpretation of Lamb and Hunt (2018), where the assumption was made that the sister relationship of the visual arrestins (Arr-C = ARRC and ARR-S = SAG) in the molecular phylogeny implied that those two clades had separated at the second round of WGD. However, the overwhelming evidence of the pairings of the seven ohnolog quartets in the vicinity of phototransduction genes in Figure 6B requires a revision of that interpretation, with the conclusion that Arr-C and Arr-S diverged at 1R, and with the implication that the visual arrestins and the β-arrestins diverged from each other prior to WGD. Likewise, GRK1A and GRK1B must have diverged at 1R, with the implication that GRK7 and the GRK1s also diverged from each other prior to WGD. Importantly, the pairings of chromosomal rows illustrated in Figure 6 will provide an important baseline for interpreting each of the molecular phylogenies for phototransduction genes presented subsequently in this paper. It is noteworthy that the 26 quartets of reference ohnolog families shown in Figure 6 represent more than 10% of the entire complement of ohnolog quartets in the genome. Thus, it has been estimated that there are around 200 four-member ohnolog families (~800 genes) amongst the total of around 2000 ohnolog families contained in the genomes of vertebrates (Singh et al., 2015). Accordingly, if such quartets are distributed randomly in the genome, and if Figure 6 has sampled all such quartets in the vicinity of these phototransduction genes (which it may not have done), then this might suggest that prior to 2R WGD the 13 families of phototransduction genes had been situated in a restricted region comprising just 10% of the ancestral genome. Future extension of the analysis of phototransduction gene synteny. Finally, with regard to gene synteny, it is worth outlining the kinds of methodological approaches that will be required to conduct more comprehensive analyses of gene synteny in the vicinity of phototransduction genes. Ideally, one would wish to find every family of ohnologs that: (i) retains either three or four members; (ii) has all extant members in close proximity to phototransduction genes in the reference taxa; (iii) shows no sign of having invertebrate paralogs within the phylogeny for the family; and (iv) has members present across a wide range of vertebrate species. For analysis of this kind, one very useful resource is Ohnologs v2 (Singh et al., 2015; Singh and Isambert, 2019); ohnologs.curie.fr), which provides browsable and downloadable lists of probable ohnologs from a range of species. Nevertheless, the primary resource for analysis of synteny is provided by Ensembl (Herrero et al., 2016); ensembl.org), where one can browse for genes by species, and then inspect Ensembl’s gene phylogeny. One can readily examine paralogs in the selected species using the viewing option ‘View paralogues of current gene’, and it is then usually straightforward to see whether the chosen family is indeed promising as a candidate set of ohnologs. For any set of genes to be considered as ohnologs (i.e. as being ‘2R WGD paralogous’), one should require the Ensembl gene tree not to show any invertebrate taxa (e.g. Ciona or any protostome species) embedded within the set. However, as the Ensembl gene tree is constructed from a limited number of taxa, and as estimates of node support levels are not provided, and also as the placement of invertebrate sequences has in some cases changed in subsequent releases of Ensembl, one needs to be careful in one’s interpretation of whether an apparently intervening branch (typically for Ciona) is genuine or spurious. Thus, detailed analysis is in practice essential in determining whether a candidate gene family can be regarded as a genuine set of ohnologs. The potential intrusion of invertebrate sequences may have only limited impact in a tabulation such as in Figure 5, but it is crucially important to avoid such intrusion in establishing ‘genuine quartet’ families for phylogenetic analysis of the kind shown in Figure 7.

13 At the time of writing, there is a significant issue with Ensembl, that was introduced in Release 94 and that remains in the current Release 98. A major change was made to the approach for inferring orthologs and paralogs, with the result that large gene families have been split into multiple smaller ones, so that many paralogy relationships have been lost. Therefore, as of October 2019, there are advantages to the use of Release 93 (from July 2018). Analysis of synteny is currently restricted by the limited number of vertebrate taxa for which the genome assembly is nearly-complete; i.e. where there are relatively few genes on unplaced scaffolds. Four species for which the assembly is already substantially complete are those listed in Figure 5: human, opossum, chicken and spotted gar. In addition, anole (Anolis carolinensis) and xenopus (Xenopus tropicalis) currently have quite good coverage, though substantial regions remain as scaffolds. A second non-teleost ray-finned fish, for which a substantially complete assembly has recently become available is the reedfish (Erpetoichthys calabaricus), and this species is now included in Ensembl Release 98. An additional important resource is Genomicus (Muffato et al., 2010); genomicus.fr), to which a direct link is provided from Ensembl for each gene. There it is straightforward to view the paralogy of adjacent genes, so as to manually search for potential neighbouring ohnologs. Another useful resource is Synteny Database (syntenydb.uoregon.edu), though this has not been updated since 2015. For the future, it will be important to develop an automated system for locating potential ohnolog families that lie in the vicinity of the families of phototransduction genes shown in Figure 6, and additionally in those regions that could convincingly link the four panels together into a single unified paralogon. Ensembl’s BioMart facility can be used to download the gene locations and the orthology relationships for species of interest, so that in principle it should be straightforward to automate a search of those candidate ohnolog families identified in Ohnologs v2 (Singh et al., 2015; Singh and Isambert, 2019) to find all those sets that lie close to phototransduction genes, when viewed across multiple species. Such a system would also be useful in attempting to tie into that same paralogon the two families of phototransduction genes that have so far eluded integration: the CNGB1/3 genes, and the GNGT1/2 genes.

3

Evolution of G-proteins and origin of the proto-vertebrate phototransduction cascade

Before investigating the evolution of each of the components of the vertebrate phototransduction cascade, it will be helpful firstly to examine the evolution of G-proteins, and then secondly to consider how it was that a phototransduction cascade utilising transducin and PDE6 came to be present in a chordate organism. Thereafter, it will be possible to examine the individual steps with this overview in mind.

3.1

Overview of G-protein evolution

G-protein alpha subunits are classified into five primary families: Gαs, Gαq, Gαi, Gα12 and Gαv. The Gαi family includes Gαo and Gαt (transducin), as well as Gαi itself, and different members of these three clades are utilised for phototransduction in ciliary photoreceptors across invertebrate as well as vertebrate taxa (reviewed in Terakita et al. (2012)). The Gαq family includes Gα11, Gα14 and Gα15; members of the Gαq family are utilised for phototransduction in protostome rhabdomeric photoreceptors as well as in melanopsin-expressing ipRGCs (intrinsically photosensitive retinal ganglion cells) of vertebrates.

14 The evolution of these Gα genes has been investigated by Lagman et al (2012), Lamb et al (2016) and Lokits et al. (2018), and the updated model proposed in the last of these studies is reproduced here as Figure 8. In this diagram, the prefix ‘pre’ denotes genes predating 2R WGD, and throughout the diagram ‘Gα’ has been omitted, so that (for example) ‘T2’ denotes Gαt2. Working from left to right, this scheme depicts a very ancient duplication generating the Gαs and Gαi/q families, followed by another pre-metazoan duplication generating the Gαq and Gαi families, and a third duplication to form the Gαo branch (preO). During metazoan (animal) evolution, a tandem duplication of the Gαi gene (preI) occurred, forming preI' and preI", which both subsequently quadruplicated during 2R WGD to generate the Gαi and Gαt isoforms of the vertebrate lineage. Importantly, the three surviving pairs of quadruplicated genes in jawed vertebrates have remained adjacent to each other in numerous taxa, as GNAI1-GNAT3, GNAI2-GNAT1, and GNAI3-GNAT2 (see bottom right of Figure 8). Figure 8. Evolution of G-protein alpha subunits, as proposed by Lokits et al (2018)

3.2

Origin of the proto-vertebrate phototransduction cascade

Lamb and Hunt (2017) proposed a scenario for the origin of the proto-vertebrate (i.e. pre2R WGD) phototransduction cascade, based on successive modifications of a postulated ancestral deuterostome phototransduction cascade. A slightly revised form of that proposal is set out in Figure 9. Ancestral cascade. The postulated ancestral form of the phototransduction cascade is shown in Figure 9A, based on a combination of analogy to the cone/rod cascade, together with the limited information that exists in relation to extant invertebrate deuterostome ciliary photoreceptors. The following steps are proposed to have occurred. Upon absorption of light, the activated ciliary opsin (R*) activated an inhibitory G-protein (Gi), the activated alpha subunit of which (Gαi*) then inhibited adenylyl cyclase (AC), which in darkness had been synthesising cyclic AMP (cAMP). A cyclic nucleotide phosphodiesterase (PDE), possibly the common ancestor of vertebrate PDE5/6/11, hydrolysed cytoplasmic cAMP in a manner that was not lightdependent. In darkness, the high cytoplasmic concentration of cAMP caused cyclic nucleotidegated channels (CNGC) to open, while in light the decreased AC activity lowered the cAMP concentration, leading to channel closure. In many respects this postulated ancestral cascade resembles an inhibitory version of the canonical transduction cascade of vertebrate olfactory receptor cells. Figure 9. Postulated origin of the proto-vertebrate phototransduction cascade The basis for this proposal, and especially for the idea that the ancestral cascade utilised cAMP rather than cGMP as the cytoplasmic messenger, included consideration of the following pieces of circumstantial evidence. (i) The G-protein alpha subunit employed in cones and rods, Gtα, arose as a result of an ancient duplication of the gene for an inhibitory subunit, Gαi. (ii) Ciliary photoreceptors of extant lancelets express a C-opsin together with a Gαi (Vopalensky et al., 2012). (iii) The catalytic subunits of the vertebrate photoreceptor phosphodiesterase, PDE6, are close relatives of PDE5 and PDE11, the latter of which is a dual cAMP/cGMP phosphodiesterase. (iv) Cyclic nucleotide-gated channels are typically responsive to both cAMP and cGMP, though with differing binding constants. (v) In measurements of electrical responses from tunicate photoreceptor cells, the membrane conductance was found to decrease (Gorman et al., 1971), just as occurs in vertebrate photoreceptors, and consistent with closure of ion channels. A potential limitation of this ancestral system may have been that at high light intensities (e.g. daylight), the operation of the cascade would have been saturated, with the cyclase activity extremely low, and the concentration of cAMP consequently too low to cause appreciable opening of channels. This saturation might have been alleviated to some extent if the PDE activity had

15 been inhibited by another molecule, possibly on a diurnal cycle. It was suggested that this inhibitory molecule might have been the ancestral PDEγ, and its emergence in the scenario is indicated by [γ] in Figure 9A. Transition to proto-vertebrate cascade. Subsequently, the gene for the G-protein alpha subunit underwent a tandem duplication, to form Gαi' and Gαi'' (in notation similar that of Lokits et al. (2018); see Figure 8). As indicated in Figure 9B, it is now proposed that both of these alpha subunits continued to be expressed in the photoreceptor cell. One of these (Gαi') is assumed to have retained its original function, whereas the other (Gαi'') may have evolved an interaction with the proto-PDEγ, as indicated by the red arrow. Specifically, if there was any degree of affinity between those two molecules, then the extent of inhibition of the PDE may have declined. Such a reduction in inhibition would have amounted to a light-induced activation of the PDE, and would have reinforced the effect of the light-induced reduction in AC activity in lowering cAMP concentration, thereby increasing the size of the light-induced effect and presumably providing an advantage to the organism. With subsequent mutations, it is possible that this newer mechanism became more effective than the older mechanism, in which case the interaction between Gαi' and AC would have been of little use, and that older pathway may have declined in importance and eventually ceased to function, as indicated in Figure 9C. In parallel with these changes, there could have been a transition from cAMP to cGMP as the dominant form of cytoplasmic messenger. If a guanylyl cyclase were expressed in the cell (Figure 9B), then the cGMP that was synthesised may have functioned just as effectively as cAMP, because the PDE quite probably hydrolysed both molecules, and because the channels quite probably bound both. Then, if the GC happened to have some other advantage over the AC, for example through more effective Ca-feedback regulation, then there would have been no reason for the AC to continue to contribute, and it may simply have ceased to be expressed. At that stage (Figure 9C), the cascade would effectively have completed its transition to the proto-vertebrate form; the G-protein and the PDE that then existed could therefore be designated as Gαt and PDE6, respectively. While the scenario described above is entirely hypothetical, it nevertheless provides a plausible framework on which to hang ideas, and it allows the formulation of tests of the hypothesis. Perhaps the first such test would be to examine whether the photoreceptors of any extant deuterostomes utilise adenylyl cyclase. But, irrespective of the origin of the protovertebrate transduction cascade, it remains important to examine how the genes encoding those various proteins evolved in the early vertebrate lineage. The following sections delve into the evolution of each of the individual steps, and identify those changes that occurred before, during, and after 2R WGD.

4

Evolution of the activation steps of vertebrate phototransduction

The evolution of those protein families mediating the activation steps of vertebrate phototransduction will now be examined separately for: the transducins (Sections 4.1, 4.2); the cGMP phosphodiesterases (Section 4.3); and the cyclic nucleotide gated channels (Section 4.4). Much of this analysis will relate to the manner in which the genes encoding those proteins expanded during 2R WGD.

4.1

Transducin alpha subunits (GNAT1–3)

Background. As mentioned above, jawed vertebrates possess three genes for transducin alpha subunits (Gαt), and analysis of phylogeny and gene synteny has shown that these arose during 2R WGD (Nordström et al., 2004; Larhammar et al., 2009; Lagman et al., 2012). GNAT1,

16 encoding rod transducin, Gαt1, is used in rod photoreceptors; GNAT2, encoding cone transducin, Gαt2, is used in cone photoreceptors; and GNAT3, encoding gustducin, Gαt3, is used in parietal eye photoreceptors and in some taste receptor cells. Across most vertebrate species, each of these genes is located in close proximity to one of three GNAI genes, encoding the alpha subunit Gαi of an inhibitory G-protein, consistent with the concept that GNAI and GNAT arose by tandem duplication of an ancestral gene prior to 2R WGD, as shown schematically in Figure 8. Molecular phylogeny. The molecular phylogeny of vertebrate Gαt and Gαi subunits has previously been examined (Nordström et al., 2004; Larhammar et al., 2009; Lagman et al., 2012; Lamb et al., 2016; Lamb and Hunt, 2017; Lokits et al., 2018). Here, an updated analysis is presented in Figure 10A. The phylogeny is shown as a collapsed tree, calculated for a large family of vertebrate GNATs and GNAIs, using as outgroup a smaller selection of vertebrate GNAQs/GNA11s/GNA14s, and also including a selection of GNAOs. To put this tree into context, it is helpful to note that the uppermost three blue clades (GNAT1s) correspond exactly to the whole of the GNAT1 sub-tree in Figure 3. Figure 10. G-protein alpha subunits (Gαt, Gαi) For the GNATs, each of the six coloured clades has bootstrap support of at least 99%. In addition, the branching pattern for the GNATs generally has very high support, apart from the two nodes around GNAT3, which have only 63% and 87% support. Ordinarily, this would be an insufficient level of support to give one much confidence in the illustrated branching pattern. However, the three GNAI-GNAT pairings provide crucial additional information. Thus, GNAI1 and GNAI3 are supported unanimously as sister clades, and as a result we can have confidence that their syntenic neighbours, GNAT2 and GNAT3, are likewise sisters. Accordingly, the illustrated phylogeny provides strong support for the 1R and 2R duplications indicated by the yellow and cyan highlights, respectively, in Figure 10A. Additional powerful support comes from the pairings of the nearby quartet genes (PLXNAs, GRMs and TFs) as shown previously in Figure 6 and Figure 7. Deduced gene duplications and losses. The most parsimonious interpretation of the branchings in Figure 10A is indicated by the highlighted ‘1R’ and ‘2R’ annotations; the corresponding gene duplications and losses that are assumed to have given rise to vertebrate GNATs (and GNAIs) are indicated explicitly in Figure 10B. Prior to 2R WGD, an ancient Gαi gene underwent a tandem duplication, forming the adjacent genes GNAI and GNAT. During the course of 2R WGD, both these genes quadruplicated, and they remained adjacent. However, after the second round, on one chromosome (row 3) jawed vertebrates lost both members, GNAI4 and GNAT4, whereas Lokits et al (2018) report that agnathans lost only GNAT4; in addition, agnathans lost GNAT2 on another chromosome (row 1). However, it must be emphasised that gene synteny for agnathan species has not been analysed here, and so any conclusions about agnathan gene duplications and losses remain very preliminary. The arrangement of jawed vertebrate genes onto the four rows in Figure 10B conforms with the summary diagram of gene synteny in Figure 6D. Evolution of the proto-vertebrate GNAT. In Figure 10A, the limb labelled ‘GNAT (= preI'')’ is long, and corresponds to ~0.3 amino acid residue substitutions per site. This indicates that the proto-vertebrate GNAT underwent very substantial evolution prior to genome duplication. As will be shown in Section 4.3, this change appears to have occurred contemporaneously with both a substantial evolution of the PDE catalytic subunit and with the appearance of the PDE inhibitory subunit.

4.2

G-protein beta subunits (GNB1–4)

The molecular phylogeny of the four G-protein beta subunits Gβ1–Gβ4, that are encoded by GNB1–GNB4, was examined by Lagman et al. (2016), and the topology presented in their Fig.

17 2 is confirmed here in the updated phylogeny of Figure 11A, and with higher levels of bootstrap support. The four vertebrate clades are each supported at a bootstrap level of at least 97%, and they clade together with unanimous support. On the other hand, there is only 89% support for the pairing of GNB2 with GNB4. Figure 11. G-protein beta subunits (GNB1–4) The syntenic arrangement of the four genes was also examined by Lagman et al. (2012), and was subsequently investigated by Lamb & Hunt (2018). The latter study proposed that GNB1 and GNB3 diverged at 1R on the basis of the presumed pairings of rows shown in their Figure 1. However, from the new evidence provided here in Figure 6 and Figure 7, it seems clear that GNB1 and GNB3 instead diverged at the second round, as is indicated in both panels of Figure 11. Accordingly, it now appears that the four vertebrate GNB genes arose via the simplest form of WGD quadruplication, without prior local duplication and without loss of any genes. The G-protein gamma subunits will not be considered here because, as discussed by Lagman et al. (2012), the shortness of their sequences (around 70 residues), in conjunction with their relatively high degree of sequence identity, means that there is insufficient phylogenetic information available to allow meaningful conclusions to be drawn. However, phylogenetic analysis of nearby ohnolog quartets (including those containing COL1A2, ITGA8, NFE2L3, CACNB2) shows that GNGT1 and GNGT2 diverged at the second round of WGD (not shown).

4.3

PDE catalytic subunits (PDE6A,B,C)

Background. Jawed vertebrates possess three PDE6 genes encoding catalytic PDE subunits, and analysis of synteny has again shown that these arose during 2R WGD (Nordström et al., 2004; Larhammar et al., 2009; Lagman et al., 2016). PDE6A and PDE6B encode α and β subunits, that together form a heterodimeric catalytic unit in rod photoreceptors, whereas PDE6C encodes the α' subunit that forms a homodimeric catalytic unit in cone photoreceptors. The ancestral vertebrate PDE6 gene originated by duplication from a common ancestor gene for PDE5, PDE6 and PDE11. No gene that clades with the PDE6s has been found outside of vertebrates. Molecular phylogeny. The molecular phylogeny of the PDE6s has previously been investigated (Nordström et al., 2004; Larhammar et al., 2009; Lagman et al., 2016; Lamb et al., 2016; Lamb and Hunt, 2017), and an updated analysis is presented in Figure 12A. This unconstrained tree was obtained using a set of 63 vertebrate PDE6 sequences, together with the only two invertebrate sequences that could be found to exhibit moderate similarity (from Ciona), and with 14 vertebrate PDE5A and PDE11 sequences making up the outgroup. The unconstrained tree inference process placed PDE6A and PDE6B as sisters, with unanimous support, and jointly as sisters to the other PDE6s comprising PDE6C together with PDE6X. Figure 12. PDE catalytic and inhibitory subunits (PDE6s, PDE6γs) For cyclic nucleotide PDEs, the only known case where the catalytic domain comprises a heterodimer occurs in jawed vertebrate rods, with PDE6A + PDE6B; in all other PDEs the catalytic site is formed by a pair of identical subunits; i.e. a homodimer. PDE6C is the only PDE expressed in jawed vertebrate cones. In addition, it is the only PDE gene found in the lamprey Mordacia mordax, a species that has only a single class of cone-like photoreceptors (Collin et al., 2004), and that expresses LWS as its only opsin (Lamb et al., 2016). Furthermore, in hagfish, (which have only a single class of rod-like photoreceptors), there is only a single isoform, PDE6X. Taken together, these observations led Lamb et al. (2016) to propose that agnathan rod-like photoreceptors employ PDE6X as a homodimer, and that lamprey cone-like photoreceptors employ PDE6C as a homodimer.

18 Gene duplications. It has previously been established that the PDE6A/B/C genes reside in a paralogon (Nordström et al., 2004), though it was not possible to link that region to other regions containing phototransduction genes. However, analysis of gene synteny results similar to those in Figure 5 shows that PDE6A and PDE6B are located close to ohnolog quartets that diverged at the second round (e.g. the GABRAs, FGFRs and PSDs), and also close to the CNGA family (see Figure 6C). Hence, the gene synteny results and the phylogenetic results both indicate that the most parsimonious way of accounting for the observed molecular phylogeny is as indicated by the highlighted ‘1R’ and ‘2R’s in Figure 12A, and as shown explicitly in Figure 12B. Prior to 2R WGD, a PDE6 gene had already arisen, by duplication from a common ancestral PDE5/6/11 gene. As shown by the phylogeny in A, the limb of the PDE6 branch is long, at ~0.4 residue substitutions per site, indicating that extensive evolution had occurred after that previous duplication but prior to 2R WGD. Then, as a result of 2R WGD, the gene quadruplicated. Agnathan vertebrates lost two isoforms, retaining only PDE6X and PDE6C, which both continued to function as homodimers. Jawed vertebrates lost PDE6X, while the PDE6A and PDE6B evolved to function as a heterodimer in rods. Sauropsids (birds and reptiles) subsequently lost the PDE6A gene and as a result their rods are presumed to utilise the PDE6B as a homodimer.

4.4

PDE inhibitory subunits (PDE6G,H,I)

Background. PDE6 is unique amongst PDEs in being regulated by an inhibitory γ-subunit, PDEγ, a short peptide of 80–90 residues. These γ-subunits play a crucial role in phototransduction, by inhibiting the PDE’s catalytic activity in the resting state, but by permitting activated transducin to bind, thereby relieving the inhibition and activating the hydrolysis of cGMP. Additionally, the PDEγ plays a role in accelerating the GTPase activity of activated transducin, and hence in speeding the shut-off reactions (see Section 5.3). Classically, two PDEγ isoforms have been described, encoded by PDE6G in rods, and by PDE6H in cones. However, Lagman et al. (2016) clearly established that a third isoform, encoded by PDE6I, is expressed in a number of species, and they showed that the three genes arose through expansion during 2R WGD. To date, there have been no reports of homologous sequences outside of vertebrates; thus, it seems that a PDEγ somehow just “appeared” during proto-vertebrate evolution. On the other hand, the motif on the PDE catalytic subunits to which the γ-subunits bind appears to have existed in the ancestral PDE catalytic sequence long before the emergence of chordates (Zhang and Artemyev, 2010). Thus, reconstruction of the probable sequence of the ancestral PDE5/6/11 enzyme indicated that it contained the signature ‘Ile-Pro-Met’ (IPM) motif where the C-terminus of modern PDEγs bind. It has been noted that the C-terminus of the long form of RGS9 (RGS9-L) plays a similar role to PDEγ in accelerating the GTPase activity of activated G-protein α-subunits. Taken together with the finding that the genes for RGS9 and PDE6G are located on the same chromosome, and in moderate proximity to each other, Martemyanov et al. (2008) proposed that PDEγ may have originated from RGS9, or vice versa. This possibility will be explored below. Molecular phylogeny. As has previously been reported (Lagman et al., 2016; Lamb et al., 2016), the sequences for PDEγ inhibitory subunits are so short (<90 residues), and so highly conserved, that it has not been possible to obtain particularly informative molecular phylogenies. Nevertheless, from sequence alignments, Wang et al. (2019) have recently revealed several sites that differ characteristically between cone and rod isoforms, as well as several sites that are very tightly conserved in rod isoforms. Gene synteny. Lagman et al. (2016) showed that the PDE6G/H/I genes reside within a paralogon that they had previously characterised as containing the somatostatin receptor genes SSTR2/3/5 and the urotensin receptor genes (Ocampo Daza et al., 2012; Tostivint et al., 2014). The genes in the vicinity of PDE6G/H/I were also recently examined by Lamb and Hunt (2018) in

19 their Fig. 2, and the syntenic arrangement of the phototransduction genes in this region is summarised in Figure 6A. Importantly, that analysis shows the mutual proximity of the three ohnolog families encoding RGS9/11, the PDEγs, and visinin/recoverin. Of particular note is the observation that RGS11 and the newly discovered PDE6I gene are immediately adjacent on the same strand in four species where they both exist; namely, in spotted gar (L. oculatus), reedfish (Erpetoichthys calabaricus), xenopus (X. tropicalis) and nanorana (N. parkeri). For example, in spotted gar, RGS11 and PDE6I reside on the forward strand of chromosome LG13 at start positions of 12.108 and 12.132 Mb, respectively. Origin of the inhibitory subunits. Previously (prior to the discovery of the third isoform, PDE6I), Martemyanov et al. (2008) had noted the relative proximity of the RGS9 and PDE6G genes; e.g. on human chromosome 17, where they are ~16 Mb apart. Combining this with the knowledge that the PDE inhibitory subunits play a comparable role to the C-terminus of the long isoform of RGS9 (termed RGS9-L or RGS9-2) in aiding the acceleration of GTPase activity, they speculated that one of these genes may have evolved from the other. Given that RGS11 and PDE6I have now been found to be arranged head-to-tail in four taxa, it would seem almost certain that the pre-2R arrangement likewise placed the ancestral RGS9/11 and PDE6G/H/I genes in the same configuration. Accordingly, as indicated by the curved arrow in Figure 12C, it seems highly plausible that the ancestral PDE inhibitory gene originated from a local duplication of the tail of the ancestral RGS9/11 gene. This duplication probably occurred after tunicates had diverged from our own lineage.

4.5

Cyclic nucleotide gated channels (CNGA1–4, CNGB1,3)

Background. The cyclic nucleotide-gated ion channel of jawed vertebrate photoreceptors and olfactory receptor neurons is a heterotetramer comprising α and β subunits encoded by CNGA and CNGB genes, both classes of which expanded during 2R (Nordström et al., 2004). Rod channels comprise three α1 subunits and one β1 subunit, encoded respectively by CNGA1 and CNGB1 genes, whereas cone channels comprise two α3 and two β3 subunits, encoded by CNGA3 and CNGB3. The channels of canonical olfactory receptor neurons comprise two α2, one α4, and one β1 subunit. The duplication that gave rise to the α and β branches was very ancient, and occurred before the protostome-deuterostome split. Furthermore, three lines of evidence make it clear that CNGA4 (which is expressed only in olfactory receptor neurons) also diverged from the other CNGA genes prior to the protostome-deuterostome split. The first such line was the observation by Kaupp and Seifert (2002) that CNGA4 has three additional introns in its C-terminal region; the second came from inspection of the syntenic arrangement for the CNGA genes reported by Nordström et al. (2004) in their Fig. 6b; and the third came from the basal branching position of CNGA4 in the molecular phylogeny (see below). Subsequently, during 2R WGD, the other α branch (denoted here as αQ) and the β branch both expanded, giving rise to three and two extant isoforms, respectively. Molecular phylogeny. Two recent examinations of molecular phylogeny (Lamb et al., 2016; Lamb and Hunt, 2017) came to slightly different conclusions about the origin of the three αQ isoforms during 2R WGD, with two alternative positions for CNGA1: either as sister to the pair CNGA2 + CNGA3 or instead as sister to CNGA3. Upon further examination here, the former proposal appears to be the correct one. Figure 13 presents the molecular phylogeny obtained for a set of 152 vertebrate CNGC sequences, with seven invertebrate sequences, and with the four human HCN sequences as outgroup. α subunits. The unconstrained maximum likelihood (ML) topology is shown in Figure 13A, placing CNGA1 as sister to CNGA2 + CNGA3, and this topology is supported at a level of

20 95%. When instead CNGA1 was constrained to be sister to CNGA3, that topology could not be ruled out on the basis of phylogeny alone, as the change in log likelihood was only ΔLogL = 6.7, and the constrained tree passed all three tests of topology, with the approximately unbiased probability being p-AU = 0.3 (well above the rejection level). However, that topology was rejected by the analysis of synteny shown in Figure 6. Figure 13. Cyclic nucleotide gated channel subunits (CNGCα, CNGCβ) Gene synteny for α subunits. As indicated in Figure 6C, the phylogeny of the nearby ohnolog quartets (e.g. SPRYs and TyrKs) shows that CNGA2 and CNGA3 are sisters. Furthermore, the CNGAQ family resides close to the arrestins, visual GRKs and visual GCs (with CNGA2 just 2.3 Mb from Arr3 and CNGA3 less than 6 Mb from GRK1A in spotted gar), even though those families have been placed in the previous panel, Figure 6B. Hence the combination of molecular phylogeny and gene synteny provides powerful evidence for the gene duplication topology shown in Figure 13B. β subunits. For the β subunits, each of the four clades (two agnathan and two jawed vertebrate) is supported unanimously, as is the split between the β1 and β3 divisions. For the CNGB3s, the lamprey and jawed vertebrate clades group together with 97% support, whereas for the CNGB1s support for the agnathan and jawed vertebrate clades grouping together is somewhat lower, at 85%, possibly because of the inclusion of a single hagfish sequence. It is clear that the β1 and β3 branches diverged during 2R WGD, and analysis of the phylogeny of nearby ohnolog quartets (e.g. NDRG1–4, RRAD; not shown) shows that they diverged at 1R, as indicated in Figure 13B. Summary of CNGC evolution. The origin of CNG channel genes is summarised in Figure 13B. The α and β branches of the family arose anciently, probably long before protostomes diverged from our own lineage around 750 Mya. In the α branch, another duplication prior to the divergence of protostomes generated the α4 and αQ genes. Subsequently, during 2R WGD in the proto-vertebrate lineage, the αQ gene quadruplicated, and three of those genes have survived in jawed vertebrates, as α1, α2 and α3. It is now clear that α1 diverged from α2/α3 at the first round of WGD. In the β tree, the two extant branches arose from a duplication during 2R WGD, and it is now clear that their divergence occurred at the first round. It seems reasonable to speculate that the ancestral CNG channel was formed by two α and two β subunits. For olfactory receptor cells, the channel in early deuterostomes is likely to have been formed by replacing one β subunit with an α4 subunit, to give a channel with 2 αQ + 1 α4 + 1 β subunit. In photoreceptor cells, it seems likely instead that the channel in early deuterostomes comprised 2 αQ + 2 β subunits. This configuration was presumably maintained in cones, after whole genome duplication, as 2 α3 + 2 β3 subunits. On the other hand, rods instead utilised channels comprising 3 α1 + 1 β1 subunits. Finally, it is worth bearing in mind that photoreceptor CNG channels participate not only in the activation steps of phototransduction, but, due to their high permeability to Ca2+ ions, they also play a crucial role in the Ca2+-feedback regulation of the cascade (covered in Section 6) and hence in photoreceptor light adaptation.

5 5.1

Evolution of the recovery steps of vertebrate phototransduction G-protein receptor kinases (GRK1A,1B,7)

Background. G-protein receptor kinases (GRKs) are members of the protein kinase A, G, and C (AGC) family. They phosphorylate specific residues of activated G-protein coupled

21 receptors (GPCRs), typically in the carboxy-terminal region of the GPCR. Mammals possess seven GRKs that fall into three families: (1) the ‘visual’ GRKs (GRK1, GRK7) that are the focus of this section; (2) a set of three nearest relatives (GRK4, GRK5, GRK6); and (3) a pair of more distant ‘β adrenergic GRKs’ (GRK2, GRK3). An investigation of the origin of GRKs suggested that the divergence of the β adrenergic GRKs occurred prior to the emergence of metazoa, whereas the divergence of the visual and GRK4/5/6 families occurred around the time that vertebrates evolved (Mushegian et al., 2012). In photoreceptors, the function of the GRKs is to phosphorylate photoactivated visual pigment (cone or rod opsin) and thereby permit the binding of arrestin, which quenches the activity of the activated form. Although the existence of the two main classes of photoreceptorspecific GRK (GRK1 and GRK7) has long been known, it was only 13 years ago that Wada et al. (2006) discovered the existence of two distinct isoforms of GRK1, named GRK1A and GRK1B. These isoforms were shown to have diverged at an early stage in the evolution of vertebrates (Wada et al., 2006), and subsequently it has become clear that both isoforms are present in most vertebrate taxa. One exception is that mammals have lost GRK1B, so that any reference to GRK1 in a mammal signifies the GRK1A group. For cones and rods, the pattern of expression of GRK isoforms has been summarized in a number of species by Osawa and Weiss (2012) in their Table 1, where many examples of coexpression of a GRK1 and GRK7 are found. Lamb et al. (2018b) suggested that the following three rules apply to jawed vertebrates: (1) if the GRK1A isoform exists in a species, then it is expressed in the rod photoreceptors; (2) if the GRK7 isoform exists, then it is expressed in the cone photoreceptors; and (3) if the GRK1B isoform exists, then it is normally expressed in cones. Based on these ideas, GRK1A will be shown as blue in the Figures, and GRK7 and GRK1B will be shown as red. However, these ‘rules’ are simplifications, primarily because of the loss of isoforms in many species. For example, sauropsids (reptiles and birds) have lost GRK1A, and, at least in the case of chicken, their rods express GRK1B (Zhao et al., 1999). In a more extreme example, mice and rats have lost both GRK7 and GRK1B, so that their cones (and rods) can express only GRK1A. Molecular phylogeny. The molecular phylogeny of vertebrate visual GRKs has recently been examined by Lamb et al. (2018b), and here that analysis is updated. The unconstrained ML molecular phylogeny, obtained for 77 visual GRKs and with 17 GRK4/5/5L/6s and nine GRK2/3s forming the outgroup, is presented in Figure 14A. The three jawed vertebrate clades (GRK7, GRK1A and GRK1B) correspond to the tallest triangles, because of the larger number of jawed vertebrate sequences analysed. In addition, there are three agnathan vertebrate clades (GRK7-1, GRK7-2 and GRK1B), which are represented by narrower triangles because of the smaller number of sequences available. The bootstrap support levels around the jawed vertebrate GRK1B clade are only moderate (86% and 90%), and this is almost certainly the result of ‘attraction’ between the bird and lamprey GRK1B sub-trees, both of which have long limbs (see the fully expanded tree in Supplementary Figure S5). The long limb to the bird GRK1B sub-tree indicates substantial evolution in these sequences, presumably as a consequence of the loss in birds of the GRK1A gene. The position of the Ciona clade as sister to the vertebrate GRK7s is poorly-defined, with bootstrap support of only 74%; in some calculations this clade was instead placed as sister to the GRK1s. Such uncertainty is a result of the extensive divergence that has occurred in the tunicate sequences, together with the small number of sequences (two) that could be found. Apart from this pair of tunicate sequences (from C. intestinalis and C. savignyi), no other invertebrate sequences were found to clade with either the GRK1s or the GRK7s; instead the nearest

22 invertebrate sequences (from tunicates, lancelets, echinoderms and hemichordates, and indeed from protostomes) were found to clade with the GRK4/5/5L/6s. Figure 14. G-protein receptor kinases (GRK1s, GRK7s) Gene duplications and losses. The pattern of gene duplications presumed to have given rise to the visual GRKs is presented in Figure 14B. An ancestral GRK gene had duplicated anciently to form what would become GRK2/3 together with a second gene that again duplicated in bilaterian times to give rise to the ancestral visual GRK (GRK1/7) and the ancestor of GRK4/5/5L/6. The timings of the subsequent duplications within the GRK1/7 branch are not resolved by the phylogeny presented in Figure 14; thus, on the basis of phylogeny alone, it would be possible that the divergence of GRK1 and GRK7 occurred during WGD. However, from the gene synteny results in Figures 5–7 (see Section 2.4), it is clear that GRK1A and GRK1B diverged at 1R. Hence it seems clear that the GRK1 and GRK7 branches must have diverged from each other prior to WGD. These two genes then expanded during 2R WGD, though only a single GRK7 has been retained in jawed vertebrates. The syntenic arrangement of the genes in extant jawed vertebrates (on rows 1, 2 and 4 in Figure 6) is consistent with the notion the GRK7 and GRK1 genes remained close together on a chromosome in the chordate organism prior to genome quadruplication, as might be expected if they arose through a local tandem duplication, as indicated by the local preWGD duplication in Figure 14B; i.e. in a scenario similar to those shown in the right-hand section of Figure 4. In Figure 14B, the two agnathan clades are shown as having arisen during WGD, but it is instead possible that they arose via a lineage-specific duplication. However, that possibility seems unlikely because (as may be appreciated from the fully-expanded phylogeny in Supplementary Figure S5), such a duplication would need to have occurred during the relatively short interval prior to the divergence of hagfish and lampreys.

5.2

Arrestins (SAG, ARR3, ARRB1, ARRB2)

Background. Arrestins mediate termination of the response and desensitisation in numerous G-protein signalling cascades. Jawed vertebrate genomes typically possess four arrestin genes, SAG (retinal S-antigen), ARR3, ARRB1 and ARRB2, that encode proteins that are denoted here as Arr-S (expressed in rods), Arr-C (in cones), Arr-B1 and Arr-B2. The last two are often referred to as β-arrestins, though they are by no means restricted to the β-adrenergic system, and instead are widely distributed. Analysis of the phylogeny of arrestins indicates a likely origin from distantly related sequences in archaea and bacteria (Alvarez, 2008; Gurevich and Gurevich, 2006). As discussed in Section 2.4, the syntenic arrangement of arrestin genes is strongly suggestive of the possibility that the four members in jawed vertebrates arose during 2R WGD (Nordström et al., 2004; Larhammar et al., 2009). In jawed vertebrate photoreceptors, the two ‘visual arrestins’ bind to their respective photo-activated visual pigment after it has first been phosphorylated by a GRK, and thereby block access of the G-protein, transducin. The β-arrestins may have a similar blocking function for other activated GPCRs, but they also play a role in receptor internalisation, mediated at least in part by a clathrin-binding site located near the C-terminus (Krupnick et al., 1997; Dell’Angelica, 2001; Kang et al., 2009). Molecular phylogeny. The molecular phylogeny of vertebrate arrestins has recently been examined by Lamb et al. (2018b), and here that analysis is updated. A collapsed ML molecular phylogeny for 101 arrestin sequences from both jawed and agnathan vertebrates is presented in Figure 15A. The outgroup comprised nine sequences from tunicates, lancelets, basal

23 deuterostomes and protostomes, and included the two arrestins that have been characterised in scallop photoreceptors (Gomez et al., 2011). In the unconstrained phylogeny, the β-arrestin tree was fragmented, and in addition the two agnathan visual arrestin clades were positioned as sisters, possibly as a result of long-branch attraction; see Supplementary Figure S6. Therefore, I applied minor constraints, to generate the constrained tree presented in Figure 15A. This caused a change in log likelihood of ΔLogL = 7.8, and the constrained tree passed all three tests of topology, with p-AU ≈ 0.31 (well above the 0.05 level), so that there were no grounds for rejecting the illustrated tree. Figure 15. Arrestins (Arr-S, Arr-C, Arr-β1, Arr-β2) Gene duplications. The closest invertebrate arrestin sequences are those shown in the outgroup, and no invertebrate sequence was found to clade with either the visual arrestins or the β-arrestins. On this basis, one might anticipate that the four clades of jawed vertebrate arrestins (Arr-S, Arr-C, Arr-β1, Arr-β2) had expanded during WGD, as proposed by Lamb et al. (2018b). However, in order to deduce the pattern of gene duplications, it is crucial to consider the syntenic arrangement of genes shown in Figure 6B, where multiple families of ohnolog quartets all show the upper pair of rows as sisters, and hence the lower pair of rows also as sisters. Accordingly, there is overwhelming evidence that the two visual arrestins diverged from each other at 1R, and likewise that the two β-arrestins diverged from each other at 1R, with the ancestral visual and β-arrestins having arisen from a pre-WGD duplication. The deduced duplication pattern for arrestins is illustrated in Figure 15B, and involves the loss of four genes after 2R. This scenario additionally provides a parsimonious explanation for the branching of the two clades of agnathan Arr-C genes mentioned above, which would be more complicated to explain if Arr-S and Arr-C had not diverged until the second round of WGD. Interestingly, it is clear that there are two clades of rod arrestin in cartilaginous fish (labelled S1 and S2 in Figure 15A, for Arr-S1 and Arr-S2), and it has been shown that these two genes are present in all the sharks and rays examined (Lamb et al., 2018b); see also the fullyexpanded tree in Supplementary Figure S6. In the two species for which assembled genomes are available (whale shark, Rhincodon typus, and elephant shark, Callorhincus milii), these two genes are arranged tail-to-tail on an unplaced scaffold (on NW_018032674, and on NW_006890054, respectively). These results indicate that a local duplication occurred in a stem cartilaginous fish, and that the two genes have been retained throughout sharks and rays. Inspection of Supplementary Table S1 in Lamb et al. (2018b) indicates that the transcript levels detected in the eye are 2–20× higher for Arr-S1 compared with Arr-S2, but also suggests that both isoforms are used in shark and ray photoreceptors. While the significance of the existence of two isoforms is not entirely clear, it might be related to the finding that Arr-B2 has been lost from cartilaginous fish.

5.3

Regulator of G-protein signalling (RGS9, Gβ5 and R9AP)

Shut-off of activated transducin, Gαt-GTP (and, in turn, of activated PDE6) is accelerated by the ‘regulator of G-protein signalling’ complex comprised of three protein sub-units: RGS9, Gβ5, and R9AP. Exactly the same isoforms are utilised both in rods and cones, and it appears that the faster shut-off in cones is achieved through the expression of a much higher concentration of the complex (Cowan et al., 1998; Zhang et al., 2003). An examination of the evolution of the three components of the RGS9 complex (namely RGS9, Gβ5, and R9AP) indicated that RGS9 and RGS11 originated through expansion during 2R WGD, whereas neither Gβ5 nor R9AP underwent expansion at that stage (Lamb et al., 2018b). An updated molecular phylogeny for RGS9/11 is presented in Figure 16A, calculated for 24 RGS9 and 25 RGS11 sequences from jawed vertebrates, plus 10 homologous sequences from

24 agnathan vertebrates, and with a selection of seven RGS6 and seven RGS7 sequences as outgroup. The two jawed vertebrate clades, RGS9 and RGS11, each exhibit at least 99% bootstrap support, and it is clear that the agnathan sequences also form two clades. In the unconstrained ML tree (Supplementary Figure S7), the root for these four vertebrate clades was positioned as indicated by the dotted arrow, so that an agnathan clade was paired with each jawed vertebrate clade, though with only moderate support levels of 88% and 90%. That topology would conform with 2R WGD, as well as with divergence of the two jawed vertebrate isoforms at the first round as is required by the gene synteny shown in Figure 6. However, it seemed unusual that the level of support for the positions of the agnathan clades would be as low as this, and so I examined the effect of moving the root by one node and constraining it to the position plotted in Figure 16A. That constraint caused only a very small change in log likelihood, of ΔLogL = 2.3, and the constrained tree passed all the tests of topology, with p-AU = 0.42, so there were certainly no grounds for rejecting the constrained tree. Figure 16: Regulator of G-protein signalling (RGS9/11) The pattern of gene duplications and losses suggested by the constrained tree is shown as Figure 16B, with the RGS9 and RGS11 branches diverging at 1R, as required by the gene synteny analysis. Then, following the second round, jawed vertebrates retained only a single copy of each of these, on rows 1 and 4 in Figure 6. In contrast, agnathan vertebrates lost RGS11 but retained both copies from the RGS9 branch. Realistically, it will not be possible to choose between the two topologies suggested by the constrained and unconstrained phylogenies until suitably complete genomes for lamprey species are available. However, from the perspective of jawed vertebrate evolution, both scenarios are consistent with the available phylogenetic and syntenic evidence.

6 6.1

Evolution of Ca-feedback regulation of vertebrate phototransduction Na+-K+/Ca2+ exchangers (NCKX1,2)

Background. Ca2+ ions are extruded from rod and cone outer segments by a sodium/calcium-potassium exchanger, NCKX; reviewed in Schnetkamp (2013), Schnetkamp et al. (2014). This exchanger is able to operate at very low cytoplasmic Ca2+ levels because it utilises both the inward concentration gradient of Na+ and the outward concentration gradient of K+. It operates electrogenically (Yau and Nakatani, 1984), with a net influx of one positive charge per Ca2+ extruded, because each cycle has a stoichiometry of 4 Na+ ions transported inward, in exchange for 1 Ca2+ ion plus 1 K+ ion (i.e. three positive charges) transported outward (Schnetkamp et al., 1989; Cervetto et al., 1989; Lagnado et al., 1992). Hence, the operation of this exchanger can be measured in intact cells by recording the electrogenic current. Under steady conditions, there must be a balance between any fluxes of Ca2+ ion into and out of the cytoplasm. In darkness, when cyclic nucleotide-gated ion channels (CNGCs) are held open by a moderate free concentration of cGMP, there is an appreciable steady influx of Ca2+ ions through the relatively non-selective channels. As a result there is a moderately high free Ca2+ concentration, which is needed in order to enable the NCKX to generate an equal efflux of Ca2+ ions. Measurements have shown this dark level of free cytoplasmic Ca2+ to be 200–500 nM (Ratto et al., 1988; Woodruff et al., 2002; Lagnado et al., 1992). In bright light, all the CNGCs are closed so that the influx of Ca2+ stops. Initially the efflux continues, resulting in a drop in cytoplasmic Ca2+ concentration, until the fluxes again balance. This drop is crucial in triggering rapid recovery of the electrical response and in mediating light adaptation (Matthews et al., 1988; Nakatani and Yau, 1988).

25 In the rod outer segment the NCKX protein forms a tight 2:1 association with CNGCs, with one NCKX binding to each of the two α-subunits of the CNGC (Bauer and Drechsler, 1992; Schwarzer et al., 1997). This protein complex in the plasma membrane additionally interacts with peripherin-2 in the rim of the disc membranes via the GARP (glutamic acid-rich protein) component of the CNGC β-subunit (Poetsch et al., 2001), thereby apparently contributing mechanical stability for the outer segment disc structure. Rods express NCKX1 (encoded by SLC24A1) whereas cones express NCKX2 (encoded by SLC24A2), and it is apparent that these two isoforms arose during 2R WGD (Lamb and Hunt, 2018). Recently, it has become clear that cones additionally express NCKX4 (encoded by SLC24A4), and that the presence of this isoform is important for the rapid extrusion of Ca2+ (Vinberg et al., 2017); as will be considered below, NCKX4 likewise arose during 2R WGD (Ocampo Daza et al., 2012). Interestingly, the NCKX4 isoform had previously been considered to be the ‘olfactory NCKX’ because of its expression and important function in olfactory receptor neurons (Stephan et al., 2011). The role of NCKX1, NCKX2 and NCKX4 in cones and rods has recently been reviewed by Vinberg et al. (2018), and the origin of whole family of Ca2+/cation antiporters has been reviewed in Emery et al. (2012). Figure 17. Na+-K+/Ca2+ exchangers (NCKX) Molecular phylogeny. The molecular phylogeny of vertebrate visual NCKXs has recently been examined by Lamb et al. (2018b), and here that analysis is updated. Figure 17A presents a molecular phylogeny for visual NCKX sequences from both jawed vertebrates and agnathan vertebrates, subject to constraints on the positions of the agnathan sequences. These constraints were designed to render the tree consistent with 2R WGD followed by the divergence of jawed and agnathan vertebrates. Imposition of the constraints caused a relatively small change in log likelihood, of ΔLogL = 6.1, and the constrained tree passed all three tests of topology, with p-AU = 0.39. This indicates that there are no grounds for rejecting the null hypothesis that the visual NCKX genes of jawed and agnathan vertebrates arose through 2R WGD, though the phylogeny alone does not indicate whether NCKX1 and NCKX2 diverged at the first or second rounds. The fully-expanded phylogeny, and the constraint tree used, are shown in Supplementary Figure S8. Gene duplications and losses. In Figure 6C, the three families of ohnolog quartets (TNFAIP8s, LINGOs and HCNs) in the vicinity of the NCKXs each show strong support for the phylogenetic pairings of the upper rows and the lower rows, with the consequence that NCKX1 and NCKX2 must have diverged at 1R. The deduced pattern of gene duplications is shown in Figure 17B, and involved the loss in jawed vertebrates of two genes after 2R. It will be interesting to examine the positions of the agnathan genes, once the genomes are sufficiently well documented. It is clear that NCKX1 and NCKX2 diverged from NCKX3, NCKX4 and NCKX5 prior to the split between protostomes and deuterostomes. Likewise, the subsequent split between NCKX5 and NCKX3/4 also appears to have pre-dated that protostome/deuterostome speciation event. Finally, it has been shown that the genes encoding NCKX3 and NCKX4, SLC24A3/4, reside in a region paralogous with a second family of somatostatin receptor genes, SSTR1/4, and that their expansion likewise occurred during 2R WGD (Ocampo Daza et al., 2012). Although only two isoforms (NCKX3 and NCKX4) have been retained in mammals, a third NCKX3/4-like isoform is retained in spotted gar, and underwent 3R duplication in teleosts (not shown).

6.2

Guanylyl cyclases (GC-E, GC-F, GC-D)

Background. The seven membrane-spanning guanylyl cyclase proteins encoded by the mammalian genome have been assigned the names GC-A to GC-G by IUPHAR/BPS (see www.guidetopharmacology.org/GRAC/FamilyDisplayForward?familyId=662), and the properties

26 of these GCs have recently been reviewed by Kuhn (2016). The two isoforms in mammalian photoreceptors are GC-E (=Ret-GC1) and GC-F (=Ret-GC2). GC-F is encoded by GUCY2F, whereas the gene encoding GC-E is named GUCY2D in human and many other species, but Gucy2e in mouse and a number of other mammals. A third isoform, GC-D, often referred to as the ‘olfactory’ GC, is present in most vertebrate taxa, but has been lost in primates (other than lemurforms); in mouse the encoding gene is named Gucy2d. In zebrafish, there are one-to-one orthologs of the three isoforms, that are named as follows: gc3 = GC-E, gc2 = GC-F, gucy2f = GC-D (Lamb et al., 2018b). In order to minimise the potential for confusion, the genes will here be referred to by their protein names. The only isoform expressed in cones is GC-E (Yang et al., 1999; Rätscho et al., 2009), whereas rods co-express both GC-E and GC-F (Dizhoor et al., 1994; Yang and Garbers, 1997). Mutations in the GC-E gene (GUCY2D) in human are a major cause of Leber congenital amaurosis type 1 (Perrault et al., 1996) and dominant cone-rod dystrophy (Kelsell et al., 1998). Over 140 disease-causing mutations in GUCY2D have been identified, and Sharon et al. (2018) have recently reviewed current knowledge of the genetics, biochemistry and phenotype related to GUCY2D mutations. To date, no human retinal diseases have been linked to mutations in the GC-F gene, GUCY2F. These photoreceptor GCs synthesise cGMP, at a rate set by the cytoplasmic Ca2+ concentration via the extent of their activation by GCAPs; however, the molecular mechanism of activation by GCAPs has not yet been elucidated. The cyclase is a long membrane-spanning molecule, in which seven functional domains have been identified (Bereta et al., 2010; Peshenko et al., 2014, 2015). It functions as a homodimer, with dimerisation mediated by binding of the α-helical coiled-coil dimerization domain in each partner (Ramamurthy et al., 2001). In the dimer, the paired CCDs (cyclase catalytic domains) form the catalytic centre where cGMP is synthesised (Tucker et al., 1999). Finally, it is interesting to note that during their synthesis and transport to the outer segment, GCs appear to be protected from activation by the binding of a Ca-insensitive protein, RD3 (Azadi et al., 2010; Peshenko et al., 2011). Molecular phylogeny and gene duplications. The molecular phylogeny of vertebrate visual GCs has recently been examined by Lamb et al. (2018b), and here that analysis is updated. Figure 18A presents an unconstrained molecular phylogeny for visual and ‘olfactory’ GCs from jawed vertebrates; the fully-expanded tree is given in Supplementary Figure S9. Bootstrap support in this unconstrained tree is remarkably high, being unanimous for each the three jawed vertebrate clades and also unanimous at the two nodes linking them. Thus, there is unanimous support for GC-F being sister to GC-D, as well as unanimous support for GC-E being sister to that pair. From the summary of gene synteny, and the multiple sister pairs of quartet ohnologs in Figure 6B, it is clear that GC-E diverged from GC-D/GC-F at 1R, and that GC-D and GC-F then diverged at 2R, as indicated by the gene duplication pattern in Figure 18B; note that this differs from the interpretation of Lamb and Hunt (2018). Figure 18. Guanylyl cyclases (GC-D, GC-E, GC-F)

6.3

Guanylyl cyclase activating proteins (GCAP1, 1L, 2, 2L, 3)

Background. Within the extensive set of neuronal calcium sensor proteins, the vertebrate genome includes a family of guanylyl cyclase regulatory proteins (reviewed in Ames and Lim (2012); Lim et al. (2014); and Koch and Dell’Orco (2015)), comprising several ‘activating’ proteins (GCAPs) and a single so-called ‘inhibitory’ protein (GCIP). A recent analysis of synteny and phylogeny has divided GCAPs into six sub-families (Lamb and Hunt, 2018), with teleost fish possessing 3R duplicates of several of these (Imanishi et al., 2004; Rätscho et al., 2009; Scholten and Koch, 2011). The best-studied members are GCAP1 (encoded by GUCA1A) and GCAP2

27 (encoded by GUCA1B); these two genes are arranged tail-to-tail in virtually all tetrapods as well as in spotted gar, though not in teleosts. In mammalian cones, the predominant isoform is GCAP1 (Cuenca et al., 1998), with the level of GCAP2 always being much lower, or even absent, depending on species. In mammalian rods, GCAP1 and GCAP2 are co-expressed, with the level of GCAP2 being higher (Dizhoor et al., 1995). A third isoform, GCAP3 (encoded by GUCA1C) occurs in many species, and is expressed only in cones, at least in human and zebrafish (Imanishi et al., 2002). A fourth isoform, GCAP1L, closely similar to GCAP1 and GCAP3, is often overlooked, probably because it has been lost from mammals. Finally, another set of isoforms, closely similar to GCAP2 and here referred to as GCAP2L, occur in a number of vertebrate taxa. However, very little is known about either the GCAP1L or the GCAP2L isoforms. GCAPs provide very powerful Ca-sensitive activation of guanylyl cyclases (GCs) (Koch and Stryer, 1988). The activation of GCAPs at lowered Ca2+ concentrations involves the binding of Mg2+ (Peshenko and Dizhoor, 2006) to two EF hands (EF-2 and EF-3), thereby inducing a conformational change. Recent evidence has shown that GCAP1 forms a functional homodimer (Lim et al., 2018), suggesting a 2:2 stoichiometry of interaction with the GC homodimer. In vitro experiments with mammalian proteins have shown that GCAP1 and GCAP2 are able to activate GC-E and GC-F with comparable efficacy. However, in vivo experiments on rods indicate that GCAP1 primarily regulates GC-E (Olshevskaya et al., 2012). Functionally, the Ca2+ sensitivity of a cell’s cyclase activity is determined by its GCAP(s). GCAP1 operates over a higher range of Ca2+ concentrations (i.e. at lower light intensities) than GCAP2 does; the Kms of the two isoforms for Ca2+ are ~140 nM and ~50 nM, respectively. Figure 19. Guanylyl cyclase activating proteins (GCAP) Molecular phylogeny. The molecular phylogeny of GCAPs has recently been examined by Lamb et al. (2018b), and here that analysis is updated. Figure 19A presents an unconstrained molecular phylogeny for GCAPs/GCIPs from jawed vertebrates; the fully-expanded tree is given in Supplementary Figure S10. Bootstrap support in this unconstrained tree is at least 99% for all but one clade and for all but one node. Within the unanimously supported sub-tree for GCAP1/1L/3, there is 95% support for GCAP1L and GCAP1 being sisters. Gene duplications and losses. From the gene synteny data and the pairings of ohnolog quartet genes in Figure 6D, it is clear that GCAP1 and GCAP1L diverged at the second round of WGD, and hence that GCAP3 diverged from these two at 1R. Furthermore, the phylogeny in Figure 19A makes it very clear that the GCAP2/2L division diverged from the GCAP1/1L/3 division prior to WGD, as indicated by the mauve ‘Pre-’. Despite the antiquity of the duplication that generated these two branches, the GCAP1 and GCAP2 genes have remained arranged tail-totail in virtually all tetrapod taxa, as well as in spotted gar (note the gene locations for spotted gar on row 2 in Figure 19B). Finally, it is clear that GCIP diverged from all of the above isoforms at an even earlier time. Subsequent to their expansion during 2R WGD, various isoforms have been lost from different lineages, though GCAP2 has been retained in most vertebrate taxa. Notably, mammals have lost GCAP1L, though it is retained in each of the other major lineages, where it forms the most highly-conserved of all the GCAP clades (see Figure 19A and Supplementary Figure S10); hence, its loss from mammals may have been very significant. Sharks and rays have lost both GCAP1 and GCAP3, and retain only GCAP1L from the 1/L/3 group; however, the elephant shark, a chimera, retains all three of these isoforms. Isoforms of GCAP2L are found in only a few jawed vertebrate taxa, and appear not to be present in agnathans. GCIP, which appears not to have duplicates remaining from 2R, has been lost from cartilaginous fish and from amniotes.

28

6.4

Recoverin and visinin

Background. Recoverin and visinin play relatively minor roles in the regulation of vertebrate phototransduction, and they will be considered only briefly here. One possibility is that their main role is in increasing the Ca2+-buffering power of the cytoplasm, which might be more important than any direct role they play in regulating the activity of GRKs, etc. It is clear that recoverin and visinin diverged from each other during 2R WGD, and it has been proposed that the proto-vertebrate organism expressed visinin in its cones and recoverin in its rods (Lamb and Hunt, 2018). However, because of the loss of one or other of these isoforms in many lineages, extant organisms typically express only a single isoform in both rods and cones. On the other hand, some taxa (including amphibia and bony fish) retain the genes for both isoforms. Figure 20. Recoverin and visinin Molecular phylogeny. The molecular phylogeny of recoverins and visinins has recently been examined (Lamb and Hunt, 2018), and here that analysis is updated. Figure 20A presents a constrained molecular phylogeny for 19 recoverins and 18 visinins from jawed vertebrates, plus eight closely related sequences from lampreys, using the same set of outgroup sequences as for the GCAPs phylogeny in Figure 19. The mild constraint that has been applied moved the root of the vertebrate tree by one node, from the position indicated by the dotted arrow, and it changed the log likelihood by the very small amount of ΔLogL = 2.1; this constrained tree passed all three tests of topology, with p-AU = 0.4 (well above the rejection level of 0.05). Gene duplications and losses. From the gene synteny data and the pairings of ohnolog quartet genes in Figure 6A, it is clear that recoverin and visinin diverged at the second round of WGD. This leads to the gene duplication pattern shown in Figure 20B. Interestingly, agnathan vertebrates retain only the other two isoforms, named RecVis-X and RecVis-Y (Lamb and Hunt, 2018), which diverged from recoverin and visinin at 1R. These two isoforms are shown as having arisen at the second round of WGD, and although it is possible that they might instead have arisen via a lineage-specific duplication, after a gene loss at the second round, it is more parsimonious to assign the occurrence to the known duplication event.

7

Evolution of vertebrate visual opsins

Background. In the early 1990s, Okano et al. (1992) analysed the molecular phylogeny of the vertebrate visual opsins that were then available, and showed that rod opsin (Rh1) appeared to have evolved after the cone opsin families had already been established. The cone opsin families they reported to have predated Rh1 were (by branching order, and using today’s terminology): LWS, SWS1, SWS2 and Rh (=Rh1+Rh2). Subsequently, when knowledge of the chromosomal locations of these genes was taken into account (Nordström et al., 2004; Larhammar et al., 2009; Lagman et al., 2013), it was instead proposed that the four shorter-wavelength-sensitive opsins (SWS1, SWS2, Rh1 and Rh2) had arisen through 2R quadruplication of an ancestral SWS gene and that the corresponding expansion of an ancestral LWS gene had presumably been followed by loss of all but one copy. However, more recent analysis supports the original proposal, and shows that of these vertebrate visual opsins only Rh1 and Rh2 diverged during 2R WGD. Specifically, when the phylogeny is constrained so as to place the SWS1 and SWS2 clades as sisters, all three statistical tests of topology reject that constrained topology at the 95% level (Lamb and Hunt, 2017). Molecular phylogeny. An updated molecular phylogeny for vertebrate ciliary opsins is presented in Figure 21, where 199 deuterostome C-opsin sequences have been analysed, using an outgroup comprising 16 OPN5 sequences from jawed vertebrates. Every clade in this Figure is

29 supported at a level of at least 98%, and the five nodes separating the lowermost six clades also show support of at least 98%. Thus, the topology of those six lowermost clades is defined very reliably, with very high support for pinopsin being sister to the five vertebrate ‘visual opsins’. The significance of this finding will be considered in Section 8. The deduced pattern of gene duplications is shown in the upper left section of Figure 22. In achieving such a high level of support for the position of the pinopsin clade, I found it helpful to omit the divergent Ciona opsins from the analysis; when the four Ciona C-opsins were included, the topology of the lowermost six clades was unchanged, but support at the asterisked node (where LWS branches) was lower. This presumably occurred because in some of the bootstrap replicates the Ciona opsin clade was positioned within the sub-tree of six vertebrate opsins, and this effect may have been exacerbated by a poorer quality of the alignment when those divergent sequences were included. The alignment also appeared less reliable when the TMT opsins were included, and so they too have been omitted in this phylogeny; the approximate positions that were obtained when the Ciona opsins and the TMT opsins were included are marked by the two dotted arrows in Figure 21. Figure 21. Vertebrate visual opsins

8

A synthesis of the co-evolution of the genes for the vertebrate phototransduction cascade

Now that each component of the vertebrate visual phototransduction cascade has been analysed individually, the entire set of results will be drawn together, in an attempt to provide an integrated view of the evolution of the system as a whole, and to provide an understanding of the origin of the dichotomy of rod/cone isoforms.

8.1

Pattern and timing of phototransduction gene duplications

Pattern of duplications. Figure 22 summarises the most parsimonious set of gene duplications and losses, consistent with the observed molecular phylogenies and gene synteny, that could have given rise to the multiple isoforms of phototransduction proteins expressed in vertebrate rods and cones. Except for the opsin section (top left), where the colours indicate spectral sensitivity, the following colour-code applies: red denotes ‘cone isoforms’, blue denotes ‘rod isoforms’, black denotes common isoforms and those for which the distribution is unclear, green denotes an isoform used in phototransduction though in neither rods nor cones, and grey denotes isoforms that are not involved in phototransduction. The quotation marks in the previous sentence reflect the fact that the rod/cone distinction is not absolute, primarily because in some lineages the loss of a gene has necessitated the use of a single isoform in both classes of cell in that particular lineage. Figure 22. Scenario for gene duplications in the vertebrate phototransduction cascade Timing of duplications. Massive uncertainties remain in estimating the timing of the various duplication events. In Figure 22 the dotted vertical lines indicate four notable events that occurred during the evolution of our ancestors. The first pair are speciation events, when protostomes and then tunicates diverged from our own lineage; after the first of these speciation events our bilaterian ancestors became deuterostomes, and after the second of these speciation events our ancestors can be considered to have been ‘proto-vertebrates’. The second pair of dotted vertical lines are the two rounds of whole genome duplication, 2R WGD, that preceded the vertebrate radiation. Even now, the absolute timings of these four important events are known only very approximately. As previously indicated in Figure 2A, order-of-magnitude timings are

30 probably around 750 Mya and 650 Mya for the first pair, and then around 600 Mya for the second pair, which appear to have occurred quite close to each other (on a geological timescale). For consideration of the difficulties in estimating the absolute timing of speciation events, see for example Kumar et al. (2017). In addition to uncertainty in the absolute timing of these four reference points, there is in some cases even greater uncertainty regarding the relative timing of individual gene duplication events (each marked by a □ in Figure 22). The illustrated positions are very approximate estimates, based to some extent on guesswork. For example, allocation of the position of an individual duplication event relative to the divergence of tunicates relies on the retention of (and our identification of) sufficiently closely-related genes in that lineage, yet tunicates have undergone extensive loss of genes and extensive modification of sequences. Likewise, it is not always straightforward to place a duplication event relative to the divergence of protostomes, unless suitably closely-related genes can be identified in basal deuterostome taxa. In spite of these uncertainties, the schematic in Figure 22 is an attempt to put the overall sequence of evolutionary events into perspective.

8.2

Summary of the evolution of individual phototransduction components

As a prelude to considering the significance of the co-evolution of the various phototransduction components, this section summarises the main evolutionary events that took place within each individual component. Opsins. The ciliary branch of animal opsins already existed by the time that bilateria (bilaterally-symmetric animals) appeared, and C-opsins are widely utilised throughout protostomes, though rarely for imaging vision. In the deuterostome lineage, multiple duplication events occurred prior to the divergence of tunicates, with OPN3, parietopsin, parapinopsin and VAL all having survived to the present day, though not with roles in imaging vision. Then, prior to 2R WGD, four further duplication events occurred, that gave rise to pinopsin plus four ‘cone opsin’ genes: LWS, SWS1, SWS2, and Rh1/Rh2. Rhodopsin (Rh1) did not emerge as a separate entity until the first round of WGD. Transducins. Several classes of G-protein α-subunits arose very early in animal evolution, with the GNAI (inhibitory) division having emerged prior to the divergence of protostomes. After tunicates had diverged, a tandem duplication generated the GNAI and GNAT (transducin) divisions seen in extant vertebrates. During proto-vertebrate evolution (i.e. prior to 2R WGD), the ancestral GNAT gene underwent extensive modification, represented by ~0.3 amino acid substitutions per residue; in contrast the GNAI gene underwent very little modification during the same interval. Following 2R WGD, the daughter pairs of GNAI/GNAT genes have remained adjacent in most extant lineages. All four isoforms of GNAI have survived in lamprey; one GNAI isoform has been lost in jawed vertebrates, along with its associated GNAT isoform in both lineages. Vertebrates possess four isoforms of G-protein β-subunit (GNB1–4), and two of these (GNB1 and GNB2) exhibit highly-conserved sequences, perhaps reflecting their association with multiple different G-protein α-subunits. The four isoforms may be unique amongst the genes involved in phototransduction, as a ‘textbook example’, having retained all four gene copies that arose during 2R WGD, and having undergone no other (surviving) duplications since at least the divergence of protostomes. GNB3 (which is expressed in cones) underwent substantial evolution after genome duplication, both before and after the radiation of vertebrates. Phosphodiesterase. Cyclic nucleotide phosphodiesterases arose early in evolution, and duplicated into multiple forms. The ancestral PDE5/6/11 was already present in bilateria, and duplicated to form PDE5/11 and PDE6 in deuterostomes, probably before tunicates diverged. Subsequently, that chordate/proto-vertebrate PDE6 underwent substantial modification,

31 representing ~0.4 amino acid residues per site, prior to WGD. As will be discussed below, much of this modification is likely to have involved the ability to interact with the inhibitory PDEγ subunits, a role that is unique to vertebrates PDE6s. Following 2R WGD, all four daughter PDE6 catalytic genes have survived, though jawed vertebrates have dispensed with one isoform and agnathans have dispensed with two. Uniquely amongst all phosphodiesterases, the two rod isoforms evolved so as to cooperate as a heterodimer, rather than operating as a homodimeric PDE in all other cases. The PDEγ inhibitory subunits arose somewhat mysteriously, quite possibly as a duplicated section of the chordate/proto-vertebrate RGS9/11 gene. Following 2R WGD, three isoforms of PDEγ have been retained in vertebrates, though PDE6I has been lost from amniotes. CNG channels. The ancestor of vertebrate cyclic nucleotide-gated channel genes arose through an ancient duplication that also generated the HCN channels. That CNG channel gene duplicated in a bilaterian ancestor to generate the α and β divisions. A further duplication, that also pre-dated the divergence of protostomes, generated GNGA4 and what is here referred to as CNGAQ, homologs of which are found in protostomes and tunicates. During WGD, CNGAQ and CNGB both expanded, with extant vertebrates retaining three CNGAQs and two CNGBs. G-protein receptor kinases. An ancient GRK2/3-like gene duplicated, probably before the divergence of protostomes, to generate GRK4/5/6 and GRK1/7, though the latter is not found in protostomes. Then, in the deuterostome lineage and prior to the divergence of tunicates, GRK1/7 duplicated to generate GRK1 and GRK7. These two genes expanded during 2R WGD, though only three of the eight daughter genes are retained in jawed vertebrates, with a different combination of three retained in agnathan vertebrates. Arrestins. An ancestral arrestin gene underwent a local duplication, possibly after the divergence of tunicates from our lineage, to generate genes for a β-arrestin (Arr-B) and a visual arrestin (Arr-V). Both of these expanded during the first round of WGD but, following the second round, only a single copy of each 1R duplicate was retained. Thus, the genes for Arr-C and Arr-S diverged at 1R. Regulator of G-protein signalling. An RGS gene duplicated in bilaterian times to form RGS6/7 and RGS9/11. The RGS9/11 gene expanded during 2R WGD, with the two members (RGS9 and RGS11) that diverged at the first round being retained in jawed vertebrates, and with two members (RGS9 and RGS9-Like), that may have diverged at the second round, being retained in agnathan vertebrates. It is also plausible that a tandem duplication of part of the ancestral RGS9/11 gene occurred in a chordate or proto-vertebrate organism, resulting in the advent of the ancestral PDEγ. Calcium exchangers. The sodium/calcium-potassium exchangers of animals arose anciently from duplication of an ancestral NCKX gene, giving rise to SLC24A1/2 and SLC24A3/4/5. The former gene (the ancestral visual NCKX) expanded during the first round of WGD, though with only a single copy of each, namely SLC24A1 and SLC24A2 (encoding NCKX1 and NCKX2), having been retained after the second round. Guanylyl cyclases. The nearest relative of the visual guanylyl cyclases (GC-D, GC-E and GC-F) is the so-called heat-sensitive enterotoxin receptor GC-C, encoded by GUCY2C, with the divergence between the two classes having occurred following a duplication in bilaterian times. The ancestral visual GC quadruplicated during WGD, with the subsequent loss of a single isoform (the original sister of the gene for GC-E). GC-D is interesting in that, in fish it appears to be expressed in photoreceptors, whereas in tetrapods (air-breathing) it is expressed in olfactory receptor cells; in primates it is a pseudogene or absent.

32 GCAPs. From the Ensembl93 gene tree for GCAPs or recoverin, it is clear that the duplication in bilaterian times of a neuronal calcium sensor gene gave rise to the ancestral GCAP/GCIP gene and the ancestral recoverin/visinin gene. At some stage during chordate/protovertebrate evolution, probably after the divergence of tunicates, a further duplication generated the GCAP and GCIP genes. The GCAP gene then underwent a tandem duplication prior to WGD, giving rise to the GCAP1/1L/3 and GCAP2/2L genes. Each of these genes expanded during 2R WGD, with extant vertebrates retaining three copies of the former (GCAP1, GCAP1L, GCAP3) and two copies of the latter (GCAP2, GCAP2L), though a number of lineages have lost GCAP1L and GCAP2L. GCIP did not undergo expansion during WGD. Recoverin/visinin. The recoverin/visinin gene, formed by the above-mentioned duplication of an NCS gene, underwent expansion during 2R WGD, with recoverin and visinin diverging at the second round. Other components. Several additional genes, that are also involved in phototransduction or that have other roles in photoreceptor function, are neither listed above nor shown in Figure 22, yet turn out to be intimately associated with the paralogon arrangement depicted in Figure 6. Amongst these are the following: the GNGTs, the RGS9BPs, the RARs (retinoic acid receptors), the THRs (thyroid hormone receptors) and PRPH (peripherin), which are all close to the CNGBs; the OPN4s (melanopsins), RRH (peropsin) and RPE65, which are close to the PDE6s; the ONECUTs, GNB5, RAX (Rx) and KCNV2 (Kv8.2), close to the NCKXs; the SYTs (synaptotagmins) and SLC17As (VGluTs), close to the GCAPs; and BSN (bassoon) and PCLO (piccolo), close to the opsins.

8.3

Co-evolution of components: Stages in the evolution of vertebrate phototransduction

As was argued in Section 3.2, it seems probable that the ancestral signalling cascade in the ciliary photoreceptors of early deuterostomes utilised an inhibitory G-protein (Gαi) that triggered, possibly via inhibition of adenylyl cyclase, a reduction in cyclic nucleotide levels and closure of CNG channels. Here I will review the changes that such a cascade may have undergone, to become the canonical vertebrate cone/rod phototransduction cascade. Opsin. First, it seems that during deuterostome evolution several improvements occurred in the C-opsin’s performance. Perhaps the most important of these changes was the migration of the Schiff-base counterion location from site 181 to site 113, as has been reviewed by Terakita et al. (2012). This change had occurred by the time that parapinopsin evolved, and it permitted the release of all-trans retinal. Importantly, it meant that even in darkness (and hence in the absence of photoreversal) visual pigment could rapidly be regenerated by using a store of 11-cis retinal. In addition, it paved the way for the achievement of a higher efficacy of G-protein activation, by enabling further intra-molecular rearrangements that led to a large tilt in helix 6 in the meta II state, as has been reviewed by Hofmann et al. (2009). This change appears to have occurred around the time that VAL opsin emerged. Crucially, it substantially increased the gain of phototransduction, as summarised from a range of studies in Table 2 of Lamb (2013). However, this change ‘locked’ the molecular configuration of the activated opsin, preventing photoreversal of meta II, and it thereby made the release of all-trans retinal indispensable. Through these changes, higher gain was traded-off against the need to employ a separate pathway for resynthesis of 11-cis retinal. Phosphodiesterase and transducin. At around the same time that this higher-gain C-opsin emerged, the phosphodiesterase was undergoing a fundamental change. Previously it was a passive player, to the extent that its activity was not directly modulated by activation of the opsin or G-protein. But, once an inhibitory PDEγ-like molecule appeared, which could bind to the PDE

33 and that could also interact with the activated G-protein, then that phosphodiesterase would have become an active participant in the cascade. Thereafter, fine-tuning of the three proteins would inevitably have improved the efficacy of their interactions and could rapidly have led to the emergence of the canonical PDE6 and PDE6γ, and likewise could rapidly have converted the Gαi into Gαt. Dimeric activation of rod PDE6. It has recently become clear that activation of the rod PDE6 by the binding of two molecules of activated transducin (Gαt·GTP) is a highly non-linear activation step, in that the binding of a single transducin molecule causes negligible activation, and with full activation requiring the binding of both (Qureshi et al., 2018; Lamb et al., 2018a). This property provides the rod PDE6 with considerable ‘noise immunity’, because a background of thermally activated transducin molecules will induce very little activation, and it will only be in the case of photon absorption, where a high concentration of transducins is activated locally, that the PDE6 will be substantially activated. A trade-off that occurs with this mechanism is the introduction of a small delay (of the order of 5 ms) in the activation process. At present it is unclear whether the cone PDE6 subunits act independently or whether their activation is similarly co-operative. However, it is tempting to speculate that cones, which are optimised for speed, would have opted to avoid the small additional delay and simply put up with the extra noise that occurs with independent activation. Hence, it may be the case that the cooperative activation mechanism of the rod PDE6 is a feature that has evolved only in rods. If so, it would be natural to presume that it is the heterodimeric nature of rod PDE6A+PDE6B that has enabled this. Interestingly, then, this is a property that could not have emerged until at least the second round of WGD. Furthermore, it is a property that would be unique to jawed vertebrate rods, and that would not occur in the rod-like photoreceptors of agnathan vertebrates, where a homodimeric PDE6 is utilised. In this regard, it may be relevant that the rod-like photoreceptors of the lamprey (Lampetra fluviatilis) have been shown to exhibit a markedly lower signal-to-noise ratio for their single-photon responses than is found in the true rods of jawed vertebrates (Asteriti et al., 2015). Calcium feedback regulation of phototransduction. Currently, it is unclear at what stage the powerful calcium negative-feedback regulatory loop appeared. This system is crucially important to the ability of vertebrate photoreceptors, especially cones, to adapt rapidly to altered light intensity – so-called ‘light adaptation’. In mammalian photoreceptors, the feedback loop primarily involves the GCAPs acting on the visual GCs. The ancestral gene for a GCAP had diverged from that for recoverin/visinin long before WGD, and it had also duplicated into two isoforms prior to WGD, and so it seems likely that the calcium feedback loop was already in operation in an early chordate ancestor.

8.4

Origin of photopic/scotopic dichotomy in vertebrate phototransduction

Figure 22 makes it clear that multiple instances of dichotomy between rod and cone protein isoforms (indicated by blue and red lettering) arose during 2R WGD. But, in addition, it shows two earlier gene duplications that might have contributed to an ancestral photopic versus scotopic dichotomy. Notably, GRK1 and GRK7 arose through a gene duplication that occurred prior to WGD, and that possibly pre-dated the divergence of tunicates from our own lineage. Likewise, prior to WGD a GCAP gene had duplicated to form both GCAP1/1L/3 and GCAP2/2L. Following each of these pre-WGD duplications, it is entirely plausible that the daughter products could have been differentially expressed between two classes of photoreceptor cell, that may have provided better performance at higher and lower intensities in the respective cases. Further support for this contention comes from the recent findings of Sato et al (2018). They showed, firstly, that pinopsin exhibits a rate of thermal activation >20-fold lower than for cone opsins, and secondly, that pinopsin is present in the retina in a range of non-mammalian

34 species, and that at least in spotted gar and xenopus it is expressed in a small proportion of retinal rods and cones. Those findings led Sato et al (2018) to conclude that pinopsin is likely to have been the ancestral scotopic opsin. Although that study and other earlier studies were not able to determine the phylogenetic position of pinopsin with high precision, two other analyses (Lamb and Hunt, 2017; Hart et al., in preparation) have recently reported support for the position of pinopsin as sister to the set of five conventional vertebrate visual opsins (LWS, SWS1, SWS2, Rh2 and Rh1). In the present analysis in Figure 21, this basal position of the pinopsin clade was supported at a bootstrap level of 98%. Such a high level of support was only obtained when C-opsin sequences from Ciona were excluded from the analysis; inclusion of those divergent tunicate sequences lowered the quality of the whole alignment, contributing to uncertainty in the position of the tunicate clade, and thereby lowering support levels at adjacent nodes. Combining these observations, it would appear highly likely that even prior to 2R WGD there were already in existence two modes of vertebrate retinal phototransduction, presumably operating in separate classes of cell. The photopic cells would have employed a cone-type opsin (e.g. the ancestral SWS/LWS opsin), and their cascade would have achieved rapid shut-off by using GRK7, probably in conjunction with feedback via GCAP1/1L/3, along with high expression of RGS9. The scotopic cells would have employed pinopsin, and their cascade would have opted for slower shut-off by using GRK1, probably in conjunction with feedback via GCAP2/2L, and with a lower level of RGS9 expression. Crucially, this photopic/scotopic dichotomy would have existed well before the emergence of rhodopsin (Rh1) during WGD. After rhodopsin emerged, with even greater thermal stability, it is likely that it would have taken over from pinopsin as the preferred scotopic opsin. Then, at some unknown later time, these scotopic photoreceptors would have become identifiable as vertebrate rods. A corollary to this postulated sequence of events is that such a duplex photopic/scotopic division may already have been in operation at a stage when only a single spectral class of cone opsin existed, and hence it could have preceded the emergence of photopic colour vision.

8.5

Refinement of the distinct isoforms for rods and cones

If one accepts the above proposition that separate classes of photopic and scotopic photoreceptors already existed, then the subsequent occurrence of quadruplication of the entire genome would have provided exactly the opportunity that was required in order to refine each component of the phototransduction cascade to the respective needs of day-time and night-time vision, in those two classes of cell. Thus, if the daughter isoforms were differentially expressed in the photopic and scotopic classes of photoreceptor, then any mutations that benefitted photopic vision in the photopic class could have been selected for, and any mutations that benefitted scotopic vision in the scotopic class could likewise have been selected for, and so on. A scenario of this kind seems easier to rationalise than the alternative one, in which there had been no distinction of photoreceptor classes at the time that WGD occurred, and that instead the single preexisting class of photoreceptor somehow managed to juggle slight differences of numerous protein isoforms into a coherent division, so as to generate separate cone and rod classes. By the time of the radiation of jawed vertebrates (i.e. at latest by the time that cartilaginous fish diverged from our own ancestors, around 480 Mya; Figure 2A), it appears that all of the components of the respective cone and rod phototransduction pathways had become firmly established. In other words, there is no obvious evidence that there have been any changes of fundamental importance in any jawed vertebrate lineage, over the subsequent period of almost half-a-billion years. A possible exception to this might be the local duplication of the rod arrestin gene in cartilaginous fish, that led to the emergence of two classes, Arr-S1 and Arr-S2 (Section 5.2), though the significance of the existence two such isoforms is unclear. And, of course, some species (e.g. teleost fish) have undergone a third round of genome duplication.

35 On the other hand, it is not at all clear that exactly the same set of changes occurred in the ancestors of extant agnathan vertebrates, which diverged from our lineage before jaws had evolved, and possibly quite soon after 2R WGD. Although the last common ancestor of extant agnathan vertebrates and jawed vertebrates possessed a quadruplicated genome, and apparently already had multiple classes of retinal photoreceptor that expressed different isoforms of visual opsin, the subsequent evolution of those photoreceptor classes occurred independently in the two lineages. Hence it is hardly surprising that the rod-like photoreceptors of lampreys and hagfish appear to be quite different from the rods of jawed vertebrates, both anatomically and physiologically, even though they express an orthologous rhodopsin (Rh1). Likewise, it might be expected that the four classes of agnathan cone-like photoreceptors could have quite different properties from their jawed vertebrate counterparts. For jawed vertebrates, different lineages have had to cope with the loss of one or more isoforms, but on the whole this seems to have been handled by utilisation of a non-standard isoform and/or by molecular ‘tinkering’. As one example, the stem therian mammal (the ancestor of marsupials and placentals) lost two of its cone opsin genes (Rh2 and SWS2) and so these mammals have been reduced to a dichromatic version of colour vision. Subsequently, primates duplicated their LWS opsin, and the two variant isoforms gradually attained somewhat different spectral sensitivities, so providing rudimentary trichromacy. As a second example, sauropsids (birds and reptiles) lost the GRK1A gene. Their rods instead use GRK1B (which elsewhere is used in cones); interestingly, in birds that GRK1B isoform underwent considerable modification (see Supplementary Figure S5). The third example again involves sauropsids, which have also lost the PDE6A gene. Because of this loss, it is presumed that sauropsid rods utilise PDE6β as a homodimer, though this has not been exhaustively examined. Perhaps surprisingly, sauropsid PDE6Bs shows no obvious signs of having evolved differently from other vertebrate PDE6Bs (see Supplementary Figure S3), though they do have a deletion of about 18 residues towards the Nterminus. A final, interesting example is that of the nocturnal Tokay gecko, G. gekko, a species that is descended from a diurnal gecko that completely lost its rods (Walls, 1942), along with many of the rod-specific isoforms of phototransduction proteins (Zhang et al., 2006). Its two classes of photoreceptor both display all the ultrastructural features of cones (Röll, 2000), except that the outer segments of the scotopic photoreceptors are large, similar to those of rods in many species. Importantly, the light responses of these scotopic photoreceptors are broadly rod-like (Kleinschmidt and Dowling, 1975; Rispoli et al., 1993), though the single-photon response amplitude may be smaller than in genuine rods. In the absence of rhodopsin (Rh1), these rod-like photoreceptors express Rh2. The other identified phototransduction genes in this species clearly clade with their cone cousins (GNAT2, PDE6C, PDE6H, CNGA3 and ARR3), though a few residues have been identified as being rod-like (Zhang et al., 2006). On the other hand, the RGS9 protein is expressed at a low level, as normally seen in rods, rather than at the high level characteristic of cones; furthermore, the dark basal activity of the PDE is low, as is typical of rods rather than cones (Zhang et al., 2006). Taken together, these results indicate that the proteins of G. gekko photoreceptors are predominantly cone-like though modified in minor ways, but that the expression levels and/or activities of at least two of the proteins that are important in generating slow, sensitive responses are instead rod-like. In conjunction with an altered outer segment geometry, this has the consequence that rod-like electrical responses are attained. Hence, by redeploying modified cone proteins and by utilising a rod-like geometry, evolution has achieved in one class of photoreceptors in Tokay gecko electrophysiological responses broadly comparable to those exhibited by the true rods of other vertebrates.

36

8.6

Summary

There is now sufficient evidence to be able to propose the events shown in Figure 22 as a plausible account for the evolution of at least 40 isoforms of proteins utilised for vertebrate phototransduction (those shown as red, blue or black in Figure 22), that will serve as a test bed for more extensive studies in the future. Three of the protein classes (visual opsin, GRK, and GCAP) each appear to have possessed at least two isoforms prior to WGD, suggesting that photopic and scotopic specialisation could already have existed by that time. Quadruplication of the genome may then have provided the flexibility needed in order for specialised isoforms to evolve in both such classes of photoreceptor. It appears that the phototransduction cascade in cone and rod photoreceptors had already reached a superb state around half-a-billion years ago, and that little has changed in any fundamental way since then. Remnants of the syntenic arrangement of genes along the chromosomes of the proto-vertebrate organism just after 2R WGD can still be glimpsed in extant vertebrates, and analysis of the locations of genes across multiple species suggests that almost all of the phototransduction genes originally resided on at most five paralogons, and conceivably fewer. Indeed, it seems possible that our entire genome may be the reorganised remnant of just one huge paralogous arrangement of genes (spread across all of the post-WGD chromosomes) that resulted from two successive duplications of an ancestral chordate genome.

9

Future directions

The picture of evolution of vertebrate phototransduction presented here is ripe for further investigation, especially in the following directions: 1. To date, the analysis of phototransduction gene synteny has been restricted to examination of other gene families in the immediate neighbourhood of phototransduction genes, in just a handful of species, and using laborious manual processes. What is needed for the future is an extension, via automated processing, to all of the ohnolog families across multiple vertebrate genomes, so as link together the various regions containing the phototransduction gene families, in order to obtain a more comprehensive view of the paralogon structure and the continuity of ancestral chromosomal rows. 2. Likewise, the phylogenetic analysis needs to be extended to include the huge number of non-phototransduction ohnolog gene families – especially the 200 or so ‘quartet’ ohnolog families – so as to extend to the entire genome the evidence for those pairs of chromosomal rows that diverged at the first round of WGD. 3. It will also be immensely valuable to undertake comparable phylogenetic and syntenic analysis of the genes involved in other processes that are related to phototransduction – e.g. for the genes of the retinoid cycle, for those involved in synaptic transmission to bipolar cells, and for those involved in the homologous transduction cascade in ON-bipolar cells. The evolution of those processes occurred in parallel with the evolution of phototransduction, and there would undoubtedly have been links between innovations in each of the interacting systems. Hence, an understanding of the evolution of the genes involved in each of these processes will help advance our understanding of the others, and of the manner in which the systems cooperate. 4. Similarly, it will be extremely valuable to undertake a comparable analysis of the transcription factors that specify the development of the retina, especially in relation to the division of labour (Arendt et al., 2009) and the gain and loss of cell types (Musser and Arendt, 2017) that occurred during the evolution of the rod/cone dichotomy. Interestingly,

37 the genes for several such factors are ohnologs that reside close to phototransduction ohnolog families; these include: the RAXs, ONECUTs, THRs and RARs.

Acknowledgements

I am indebted to Professor David M Hunt for his collaboration on the four original studies upon which much of this review is based. I am also most grateful to three anonymous reviewers whose comments substantially improved the paper.

38

References Alvarez, C.E., 2008. On the origins of arrestin and rhodopsin. BMC Evol. Biol. 8, 222. https://doi.org/10.1186/1471-2148-8-222 Ames, J.B., Lim, S., 2012. Molecular structure and target recognition of neuronal calcium sensor proteins. Biochim. Biophys. Acta 1820, 1205–1213. https://doi.org/10.1016/j.bbagen.2011.10.003 Arendt, D., Hausen, H., Purschke, G., 2009. The “division of labour” model of eye evolution. Philos. Trans. R. Soc. Lond., B 364, 2809–2817. https://doi.org/10.1098/rstb.2009.0104 Asteriti, S., Grillner, S., Cangiano, L., 2015. A Cambrian origin for vertebrate rods. eLife 4. https://doi.org/10.7554/eLife.07166 Azadi, S., Molday, L.L., Molday, R.S., 2010. RD3, the protein associated with Leber congenital amaurosis type 12, is required for guanylate cyclase trafficking in photoreceptor cells. Proc. Natl. Acad. Sci. U.S.A. 107, 21158–21163. https://doi.org/10.1073/pnas.1010460107 Bauer, P.J., Drechsler, M., 1992. Association of cyclic GMP-gated channels and Na+-Ca2+-K+ exchangers in bovine retinal rod outer segment plasma membranes. J. Physiol. 451, 109– 131. Bereta, G., Wang, B., Kiser, P.D., Baehr, W., Jang, G.-F., Palczewski, K., 2010. A functional kinase homology domain is essential for the activity of photoreceptor guanylate cyclase 1. J. Biol. Chem. 285, 1899–1908. https://doi.org/10.1074/jbc.M109.061713 Cervetto, L., Lagnado, L., Perry, R.J., Robinson, D.W., McNaughton, P.A., 1989. Extrusion of calcium from rod outer segments is driven by both sodium and potassium gradients. Nature 337, 740–743. https://doi.org/10.1038/337740a0 Collin, S.P., Hart, N.S., Wallace, K.M., Shand, J., Potter, I.C., 2004. Vision in the southern hemisphere lamprey Mordacia mordax: spatial distribution, spectral absorption characteristics, and optical sensitivity of a single class of retinal photoreceptor. Vis. Neurosci. 21, 765–773. https://doi.org/10.1017/S0952523804215103 Cowan, C.W., Fariss, R.N., Sokal, I., Palczewski, K., Wensel, T.G., 1998. High expression levels in cones of RGS9, the predominant GTPase accelerating protein of rods. Proc. Natl. Acad. Sci. U.S.A. 95, 5351–5356. https://doi.org/10.1073/pnas.95.9.5351 Cuenca, N., Lopez, S., Howes, K., Kolb, H., 1998. The localization of guanylyl cyclase-activating proteins in the mammalian retina. Invest. Ophthalmol. Vis. Sci. 39, 1243–1250. Delbridge, M.L., Patel, H.R., Waters, P.D., McMillan, D.A., Marshall Graves, J.A., 2009. Does the human X contain a third evolutionary block? Origin of genes on human Xp11 and Xq28. Genome Res. 19, 1350–1360. https://doi.org/10.1101/gr.088625.108 Dell’Angelica, E.C., 2001. Clathrin-binding proteins: got a motif? Join the network! Trends Cell Biol. 11, 315–318. Dizhoor, A.M., Lowe, D.G., Olshevskaya, E.V., Laura, R.P., Hurley, J.B., 1994. The human photoreceptor membrane guanylyl cyclase, RetGC, is present in outer segments and is regulated by calcium and a soluble activator. Neuron 12, 1345–1352. Dizhoor, A.M., Olshevskaya, E.V., Henzel, W.J., Wong, S.C., Stults, J.T., Ankoudinova, I., Hurley, J.B., 1995. Cloning, sequencing, and expression of a 24-kDa Ca2+-binding protein activating photoreceptor guanylyl cyclase. J. Biol. Chem. 270, 25200–25206. Emery, L., Whelan, S., Hirschi, K.D., Pittman, J.K., 2012. Protein phylogenetic analysis of Ca2+/cation antiporters and insights into their evolution in plants. Front. Plant Sci. 3, 1. https://doi.org/10.3389/fpls.2012.00001 Erwin, D.H., Laflamme, M., Tweedt, S.M., Sperling, E.A., Pisani, D., Peterson, K.J., 2011. The Cambrian conundrum: early divergence and later ecological success in the early history of animals. Science 334, 1091–1097. https://doi.org/10.1126/science.1206375

39 Felsenstein, J., 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39, 783–791. https://doi.org/10.1111/j.1558-5646.1985.tb00420.x Gomez, M.D.P., Espinosa, L., Ramirez, N., Nasi, E., 2011. Arrestin in ciliary invertebrate photoreceptors: molecular identification and functional analysis in vivo. J. Neurosci. 31, 1811–1819. https://doi.org/10.1523/JNEUROSCI.3320-10.2011 Gorman, A.L., McReynolds, J.S., Barnes, S.N., 1971. Photoreceptors in primitive chordates: fine structure, hyperpolarizing receptor potentials, and evolution. Science 172, 1052–1054. Gurevich, E.V., Gurevich, V.V., 2006. Arrestins: ubiquitous regulators of cellular signaling pathways. Genome Biol. 7, 236. https://doi.org/10.1186/gb-2006-7-9-236 Hart, N.S., Lamb, T.D., Patel, H.R., Chuah, A., Natoli, R.C., Hudson, N.J., Cuttmore, S.C., Davies, W.I.L., Collin, S.P., Hunt, D.M., in preparation. Ciliary opsin diversity in elasmobranchs: functional implications and new perspectives on the evolution of vertebrate vision. Herrero, J., Muffato, M., Beal, K., Fitzgerald, S., Gordon, L., Pignatelli, M., Vilella, A.J., Searle, S.M.J., Amode, R., Brent, S., Spooner, W., Kulesha, E., Yates, A., Flicek, P., 2016. Ensembl comparative genomics resources. Database 2016, bav096. https://doi.org/10.1093/database/bav096 Hisatomi, O., Tokunaga, F., 2002. Molecular evolution of proteins involved in vertebrate phototransduction. Comp. Biochem. Physiol. B 133, 509–522. Hoang, D.T., Chernomor, O., von Haeseler, A., Minh, B.Q., Vinh, L.S., 2018. UFBoot2: Improving the ultrafast bootstrap approximation. Mol. Biol. Evol. 35, 518–522. https://doi.org/10.1093/molbev/msx281 Hofmann, K.P., Scheerer, P., Hildebrand, P.W., Choe, H.-W., Park, J.H., Heck, M., Ernst, O.P., 2009. A G protein-coupled receptor at work: the rhodopsin model. Trends Biochem. Sci. 34, 540–552. https://doi.org/10.1016/j.tibs.2009.07.005 Imanishi, Y., Li, N., Sokal, I., Sowa, M.E., Lichtarge, O., Wensel, T.G., Saperstein, D.A., Baehr, W., Palczewski, K., 2002. Characterization of retinal guanylate cyclase-activating protein 3 (GCAP3) from zebrafish to man. Eur. J. Neurosci. 15, 63–78. Imanishi, Y., Yang, L., Sokal, I., Filipek, S., Palczewski, K., Baehr, W., 2004. Diversity of guanylate cyclase-activating proteins (GCAPs) in teleost fish: characterization of three novel GCAPs (GCAP4, GCAP5, GCAP7) from zebrafish (Danio rerio) and prediction of eight GCAPs (GCAP1-8) in pufferfish (Fugu rubripes). J. Mol. Evol. 59, 204–217. https://doi.org/10.1007/s00239-004-2614-y Kang, D.S., Kern, R.C., Puthenveedu, M.A., von Zastrow, M., Williams, J.C., Benovic, J.L., 2009. Structure of an arrestin2-clathrin complex reveals a novel clathrin binding domain that modulates receptor trafficking. J. Biol. Chem. 284, 29860–29872. https://doi.org/10.1074/jbc.M109.023366 Katoh, K., Standley, D.M., 2013. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780. https://doi.org/10.1093/molbev/mst010 Kaupp, U.B., Seifert, R., 2002. Cyclic nucleotide-gated ion channels. Physiol. Rev. 82, 769–824. https://doi.org/10.1152/physrev.00008.2002 Kelsell, R.E., Gregory-Evans, K., Payne, A.M., Perrault, I., Kaplan, J., Yang, R.B., Garbers, D.L., Bird, A.C., Moore, A.T., Hunt, D.M., 1998. Mutations in the retinal guanylate cyclase (RETGC-1) gene in dominant cone-rod dystrophy. Hum. Mol. Genet. 7, 1179–1184. Kishino, H., Miyata, T., Hasegawa, M., 1990. Maximum likelihood inference of protein phylogeny and the origin of chloroplasts. J. Mol. Evol. 31, 151–160. https://doi.org/10.1007/BF02109483

40 Kleinschmidt, J., Dowling, J.E., 1975. Intracellular recordings from gecko photoreceptors during light and dark adaptation. J. Gen. Physiol. 66, 617–648. https://doi.org/10.1085/jgp.66.5.617 Koch, K.-W., Dell’Orco, D., 2015. Protein and signaling networks in vertebrate photoreceptor cells. Front. Mol. Neurosci. 8, 67. https://doi.org/10.3389/fnmol.2015.00067 Koch, K.W., Stryer, L., 1988. Highly cooperative feedback control of retinal rod guanylate cyclase by calcium ions. Nature 334, 64–66. https://doi.org/10.1038/334064a0 Krupnick, J.G., Goodman, O.B., Keen, J.H., Benovic, J.L., 1997. Arrestin/clathrin interaction. Localization of the clathrin binding domain of nonvisual arrestins to the carboxy terminus. J. Biol. Chem. 272, 15011–15016. https://doi.org/10.1074/jbc.272.23.15011 Kuhn, M., 2016. Molecular physiology of membrane guanylyl cyclase receptors. Physiol. Rev. 96, 751–804. https://doi.org/10.1152/physrev.00022.2015 Kumar, S., Stecher, G., Suleski, M., Hedges, S.B., 2017. TimeTree: a resource for timelines, timetrees, and divergence times. Mol. Biol. Evol. 34, 1812–1819. https://doi.org/10.1093/molbev/msx116 Lagman, D., Franzén, I.E., Eggert, J., Larhammar, D., Abalo, X.M., 2016. Evolution and expression of the phosphodiesterase 6 genes unveils vertebrate novelty to control photosensitivity. BMC Evol. Biol. 16, 124. https://doi.org/10.1186/s12862-016-0695-z Lagman, D., Ocampo Daza, D., Widmark, J., Abalo, X.M., Sundström, G., Larhammar, D., 2013. The vertebrate ancestral repertoire of visual opsins, transducin alpha subunits and oxytocin/vasopressin receptors was established by duplication of their shared genomic region in the two rounds of early vertebrate genome duplications. BMC Evol. Biol. 13, 238. https://doi.org/10.1186/1471-2148-13-238 Lagman, D., Sundström, G., Ocampo Daza, D., Abalo, X.M., Larhammar, D., 2012. Expansion of transducin subunit gene families in early vertebrate tetraploidizations. Genomics 100, 203– 211. https://doi.org/10.1016/j.ygeno.2012.07.005 Lagnado, L., Cervetto, L., McNaughton, P.A., 1992. Calcium homeostasis in the outer segments of retinal rods from the tiger salamander. J. Physiol. 455, 111–142. Lamb, T.D., 2013. Evolution of phototransduction, vertebrate photoreceptors and retina. Prog. Retin. Eye Res. 36, 52–119. https://doi.org/10.1016/j.preteyeres.2013.06.001 Lamb, T.D., Heck, M., Kraft, T.W., 2018a. Implications of dimeric activation of PDE6 for rod phototransduction. Open Biol. 8. https://doi.org/10.1098/rsob.180076 Lamb, T.D., Hunt, D.M., 2018. Evolution of the calcium feedback steps of vertebrate phototransduction. Open Biol. 8. https://doi.org/10.1098/rsob.180119 Lamb, T.D., Hunt, D.M., 2017. Evolution of the vertebrate phototransduction cascade activation steps. Devel. Biol. 431, 77–92. https://doi.org/10.1016/j.ydbio.2017.03.018 Lamb, T.D., Patel, H., Chuah, A., Natoli, R.C., Davies, W.I.L., Hart, N.S., Collin, S.P., Hunt, D.M., 2016. Evolution of vertebrate phototransduction: Cascade activation. Mol. Biol. Evol. 33, 2064–2087. https://doi.org/10.1093/molbev/msw095 Lamb, T.D., Patel, H.R., Chuah, A., Hunt, D.M., 2018b. Evolution of the shut-off steps of vertebrate phototransduction. Open Biol. 8, 170232. https://doi.org/10.1098/rsob.170232 Larhammar, D., Nordström, K., Larsson, T.A., 2009. Evolution of vertebrate rod and cone phototransduction genes. Philos. Trans. R. Soc. Lond., B 364, 2867–2880. https://doi.org/10.1098/rstb.2009.0077 Le, S.Q., Gascuel, O., 2008. An improved general amino acid replacement matrix. Mol. Biol. Evol. 25, 1307–1320. https://doi.org/10.1093/molbev/msn067 Lim, S., Dizhoor, A.M., Ames, J.B., 2014. Structural diversity of neuronal calcium sensor proteins and insights for activation of retinal guanylyl cyclase by GCAP1. Front. Mol. Neurosci. 7, 19. https://doi.org/10.3389/fnmol.2014.00019

41 Lim, S., Roseman, G., Peshenko, I., Manchala, G., Cudia, D., Dizhoor, A.M., Millhauser, G., Ames, J.B., 2018. Retinal guanylyl cyclase activating protein 1 forms a functional dimer. PLoS One 13, e0193947. https://doi.org/10.1371/journal.pone.0193947 Lokits, A.D., Indrischek, H., Meiler, J., Hamm, H.E., Stadler, P.F., 2018. Tracing the evolution of the heterotrimeric G protein α subunit in Metazoa. BMC Evol. Biol. 18, 51. https://doi.org/10.1186/s12862-018-1147-8 Lovell, P.V., Wirthlin, M., Wilhelm, L., Minx, P., Lazar, N.H., Carbone, L., Warren, W.C., Mello, C.V., 2014. Conserved syntenic clusters of protein coding genes are missing in birds. Genome Biol. 15, 565. https://doi.org/10.1186/s13059-014-0565-1 Martemyanov, K.A., Krispel, C.M., Lishko, P.V., Burns, M.E., Arshavsky, V.Y., 2008. Functional comparison of RGS9 splice isoforms in a living cell. Proc. Natl. Acad. Sci. U.S.A. 105, 20988–20993. https://doi.org/10.1073/pnas.0808941106 Matthews, H.R., Murphy, R.L., Fain, G.L., Lamb, T.D., 1988. Photoreceptor light adaptation is mediated by cytoplasmic calcium concentration. Nature 334, 67–69. https://doi.org/10.1038/334067a0 Muffato, M., Louis, A., Poisnel, C.-E., Roest Crollius, H., 2010. Genomicus: a database and a browser to study gene synteny in modern and ancestral genomes. Bioinformatics 26, 1119– 1121. https://doi.org/10.1093/bioinformatics/btq079 Mushegian, A., Gurevich, V.V., Gurevich, E.V., 2012. The origin and evolution of G proteincoupled receptor kinases. PLoS One 7, e33806. https://doi.org/10.1371/journal.pone.0033806 Musser, J.M., Arendt, D., 2017. Loss and gain of cone types in vertebrate ciliary photoreceptor evolution. Dev. Biol. 431, 26–35. https://doi.org/10.1016/j.ydbio.2017.08.038 Nakatani, K., Yau, K.W., 1988. Calcium and light adaptation in retinal rods and cones. Nature 334, 69–71. https://doi.org/10.1038/334069a0 Nakatani, Y., Takeda, H., Kohara, Y., Morishita, S., 2007. Reconstruction of the vertebrate ancestral genome reveals dynamic genome reorganization in early vertebrates. Genome Res. 17, 1254–1265. https://doi.org/10.1101/gr.6316407 Nguyen, L.-T., Schmidt, H.A., von Haeseler, A., Minh, B.Q., 2015. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274. https://doi.org/10.1093/molbev/msu300 Nordström, K., Larsson, T.A., Larhammar, D., 2004. Extensive duplications of phototransduction genes in early vertebrate evolution correlate with block (chromosome) duplications. Genomics 83, 852–872. https://doi.org/10.1016/j.ygeno.2003.11.008 Ocampo Daza, D., Sundström, G., Bergqvist, C.A., Larhammar, D., 2012. The evolution of vertebrate somatostatin receptors and their gene regions involves extensive chromosomal rearrangements. BMC Evol. Biol. 12, 231. https://doi.org/10.1186/1471-2148-12-231 Ohno, S., 1970. Evolution by Gene Duplication. Allen and Unwin, London. Okano, T., Kojima, D., Fukada, Y., Shichida, Y., Yoshizawa, T., 1992. Primary structures of chicken cone visual pigments: vertebrate rhodopsins have evolved out of cone visual pigments. Proc. Natl. Acad. Sci. U.S.A. 89, 5932–5936. Olshevskaya, E.V., Peshenko, I.V., Savchenko, A.B., Dizhoor, A.M., 2012. Retinal guanylyl cyclase isozyme 1 is the preferential in vivo target for constitutively active GCAP1 mutants causing congenital degeneration of photoreceptors. J. Neurosci. 32, 7208–7217. https://doi.org/10.1523/JNEUROSCI.0976-12.2012 Osawa, S., Weiss, E.R., 2012. A tale of two kinases in rods and cones. Adv. Exp. Med. Biol. 723, 821–827. https://doi.org/10.1007/978-1-4614-0631-0_105 Perrault, I., Rozet, J.M., Calvas, P., Gerber, S., Camuzat, A., Dollfus, H., Châtelin, S., Souied, E., Ghazi, I., Leowski, C., Bonnemaison, M., Le Paslier, D., Frézal, J., Dufier, J.L., Pittler, S., Munnich, A., Kaplan, J., 1996. Retinal-specific guanylate cyclase gene mutations in

42 Leber’s congenital amaurosis. Nat. Genet. 14, 461–464. https://doi.org/10.1038/ng1296461 Peshenko, I.V., Dizhoor, A.M., 2006. Ca2+ and Mg2+ binding properties of GCAP-1. Evidence that Mg2+-bound form is the physiological activator of photoreceptor guanylyl cyclase. J. Biol. Chem. 281, 23830–23841. https://doi.org/10.1074/jbc.M600257200 Peshenko, I.V., Olshevskaya, E.V., Azadi, S., Molday, L.L., Molday, R.S., Dizhoor, A.M., 2011. Retinal degeneration 3 (RD3) protein inhibits catalytic activity of retinal membrane guanylyl cyclase (RetGC) and its stimulation by activating proteins. Biochemistry 50, 9511–9519. https://doi.org/10.1021/bi201342b Peshenko, I.V., Olshevskaya, E.V., Dizhoor, A.M., 2015. Evaluating the role of retinal membrane guanylyl cyclase 1 (RetGC1) domains in binding guanylyl cyclase-activating proteins (GCAPs). J. Biol. Chem. 290, 6913–6924. https://doi.org/10.1074/jbc.M114.629642 Peshenko, I.V., Olshevskaya, E.V., Lim, S., Ames, J.B., Dizhoor, A.M., 2014. Identification of target binding site in photoreceptor guanylyl cyclase-activating protein 1 (GCAP1). J. Biol. Chem. 289, 10140–10154. https://doi.org/10.1074/jbc.M113.540716 Poetsch, A., Molday, L.L., Molday, R.S., 2001. The cGMP-gated channel and related glutamic acid-rich proteins interact with peripherin-2 at the rim region of rod photoreceptor disc membranes. J. Biol. Chem. 276, 48009–48016. https://doi.org/10.1074/jbc.M108941200 Putnam, N.H., Butts, T., Ferrier, D.E.K., Furlong, R.F., Hellsten, U., Kawashima, T., RobinsonRechavi, M., Shoguchi, E., Terry, A., Yu, J.-K., Benito-Gutiérrez, E.L., Dubchak, I., Garcia-Fernàndez, J., Gibson-Brown, J.J., Grigoriev, I.V., Horton, A.C., de Jong, P.J., Jurka, J., Kapitonov, V.V., Kohara, Y., Kuroki, Y., Lindquist, E., Lucas, S., Osoegawa, K., Pennacchio, L.A., Salamov, A.A., Satou, Y., Sauka-Spengler, T., Schmutz, J., Shin-I, T., Toyoda, A., Bronner-Fraser, M., Fujiyama, A., Holland, L.Z., Holland, P.W.H., Satoh, N., Rokhsar, D.S., 2008. The amphioxus genome and the evolution of the chordate karyotype. Nature 453, 1064–1071. https://doi.org/10.1038/nature06967 Qureshi, B.M., Behrmann, E., Schöneberg, J., Loerke, J., Bürger, J., Mielke, T., Giesebrecht, J., Noé, F., Lamb, T.D., Hofmann, K.P., Spahn, C.M.T., Heck, M., 2018. It takes two transducins to activate the cGMP-phosphodiesterase 6 in retinal rods. Open Biol. 8. https://doi.org/10.1098/rsob.180075 Ramamurthy, V., Tucker, C., Wilkie, S.E., Daggett, V., Hunt, D.M., Hurley, J.B., 2001. Interactions within the coiled-coil domain of RetGC-1 guanylyl cyclase are optimized for regulation rather than for high affinity. J. Biol. Chem. 276, 26218–26229. https://doi.org/10.1074/jbc.M010495200 Rätscho, N., Scholten, A., Koch, K.-W., 2009. Expression profiles of three novel sensory guanylate cyclases and guanylate cyclase-activating proteins in the zebrafish retina. Biochim. Biophys. Acta 1793, 1110–1114. https://doi.org/10.1016/j.bbamcr.2008.12.021 Ratto, G.M., Payne, R., Owen, W.G., Tsien, R.Y., 1988. The concentration of cytosolic free calcium in vertebrate rod outer segments measured with fura-2. J. Neurosci. 8, 3240–3246. Rispoli, G., Sather, W.A., Detwiler, P.B., 1993. Visual transduction in dialysed detached rod outer segments from lizard retina. J. Physiol. 465, 513–537. https://doi.org/10.1113/jphysiol.1993.sp019691 Röll, B., 2000. Gecko vision - visual cells, evolution, and ecological constraints. J. Neurocytol. 29, 471–484. Sato, K., Yamashita, T., Kojima, K., Sakai, K., Matsutani, Y., Yanagawa, M., Yamano, Y., Wada, A., Iwabe, N., Ohuchi, H., Shichida, Y., 2018. Pinopsin evolved as the ancestral dim-light visual opsin in vertebrates. Commun. Biol. 1, 156. https://doi.org/10.1038/s42003-0180164-x Satoh, N., Rokhsar, D., Nishikawa, T., 2014. Chordate evolution and the three-phylum system. Proc. Biol. Sci. 281, 20141729. https://doi.org/10.1098/rspb.2014.1729

43 Schnetkamp, P.P., Basu, D.K., Szerencsei, R.T., 1989. Na+-Ca2+ exchange in bovine rod outer segments requires and transports K+. Am. J. Physiol. 257, C153-157. https://doi.org/10.1152/ajpcell.1989.257.1.C153 Schnetkamp, P.P.M., 2013. The SLC24 gene family of Na+/Ca2+-K+ exchangers: from sight and smell to memory consolidation and skin pigmentation. Mol. Aspects Med. 34, 455–464. https://doi.org/10.1016/j.mam.2012.07.008 Schnetkamp, P.P.M., Jalloul, A.H., Liu, G., Szerencsei, R.T., 2014. The SLC24 family of K+dependent Na+-Ca2+ exchangers: structure-function relationships. Curr. Top. Membr. 73, 263–287. https://doi.org/10.1016/B978-0-12-800223-0.00007-4 Scholten, A., Koch, K.-W., 2011. Differential calcium signaling by cone specific guanylate cyclase-activating proteins from the zebrafish retina. PLoS One 6, e23117. https://doi.org/10.1371/journal.pone.0023117 Schwarzer, A., Kim, T.S., Hagen, V., Molday, R.S., Bauer, P.J., 1997. The Na/Ca-K exchanger of rod photoreceptor exists as dimer in the plasma membrane. Biochemistry 36, 13667– 13676. https://doi.org/10.1021/bi9710232 Sharon, D., Wimberg, H., Kinarty, Y., Koch, K.-W., 2018. Genotype-functional-phenotype correlations in photoreceptor guanylate cyclase (GC-E) encoded by GUCY2D. Prog. Retin. Eye Res. 63, 69–91. https://doi.org/10.1016/j.preteyeres.2017.10.003 Shimodaira, H., 2002. An approximately unbiased test of phylogenetic tree selection. Syst. Biol. 51, 492–508. https://doi.org/10.1080/10635150290069913 Singh, P.P., Arora, J., Isambert, H., 2015. Identification of Ohnolog Genes Originating from Whole Genome Duplication in Early Vertebrates, Based on Synteny Comparison across Multiple Genomes. PLoS Comput. Biol. 11, e1004394. https://doi.org/10.1371/journal.pcbi.1004394 Singh, P.P., Isambert, H., 2019. OHNOLOGS v2: A comprehensive resource for the genes retained from whole genome duplication in vertebrates. bioRxiv 717124. https://doi.org/10.1101/717124 Stephan, A.B., Tobochnik, S., Dibattista, M., Wall, C.M., Reisert, J., Zhao, H., 2011. The Na(+)/Ca(2+) exchanger NCKX4 governs termination and adaptation of the mammalian olfactory response. Nat. Neurosci. 15, 131–137. https://doi.org/10.1038/nn.2943 Strimmer, K., Rambaut, A., 2002. Inferring confidence sets of possibly misspecified gene trees. Proc. R. Soc. B 269, 137–142. https://doi.org/10.1098/rspb.2001.1862 Terakita, A., Kawano‐Yamashita, E., Koyanagi, M., 2012. Evolution and diversity of opsins. WIREs Membr. Transp. Signal. 1, 104–111. https://doi.org/10.1002/wmts.6 Tostivint, H., Ocampo Daza, D., Bergqvist, C.A., Quan, F.B., Bougerol, M., Lihrmann, I., Larhammar, D., 2014. Molecular evolution of GPCRs: Somatostatin/urotensin II receptors. J. Mol. Endocrinol. 52, T61-86. https://doi.org/10.1530/JME-13-0274 Tucker, C.L., Woodcock, S.C., Kelsell, R.E., Ramamurthy, V., Hunt, D.M., Hurley, J.B., 1999. Biochemical analysis of a dimerization domain mutation in RetGC-1 associated with dominant cone-rod dystrophy. Proc. Natl. Acad. Sci. U.S.A. 96, 9039–9044. Vinberg, F., Chen, J., Kefalov, V.J., 2018. Regulation of calcium homeostasis in the outer segments of rod and cone photoreceptors. Prog. Retin. Eye Res. 67, 87–101. https://doi.org/10.1016/j.preteyeres.2018.06.001 Vinberg, F., Wang, T., De Maria, A., Zhao, H., Bassnett, S., Chen, J., Kefalov, V.J., 2017. The Na+/Ca2+, K+ exchanger NCKX4 is required for efficient cone-mediated vision. Elife 6. https://doi.org/10.7554/eLife.24550 Vopalensky, P., Pergner, J., Liegertova, M., Benito-Gutierrez, E., Arendt, D., Kozmik, Z., 2012. Molecular analysis of the amphioxus frontal eye unravels the evolutionary origin of the retina and pigment cells of the vertebrate eye. Proc. Natl. Acad. Sci. U.S.A. 109, 15383– 15388. https://doi.org/10.1073/pnas.1207580109

44 Wada, Y., Sugiyama, J., Okano, T., Fukada, Y., 2006. GRK1 and GRK7: unique cellular distribution and widely different activities of opsin phosphorylation in the zebrafish rods and cones. J. Neurochem. 98, 824–837. https://doi.org/10.1111/j.1471-4159.2006.03920.x Walls, G.L., 1942. The vertebrate eye and its adaptive radiation. Cranbrook Institute of Science, London, England. Wang, X., Plachetzki, D.C., Cote, R.H., 2019. The N termini of the inhibitory γ-subunits of phosphodiesterase-6 (PDE6) from rod and cone photoreceptors differentially regulate transducin-mediated PDE6 activation. J. Biol. Chem. 294, 8351–8360. https://doi.org/10.1074/jbc.RA119.007520 Whelan, S., Goldman, N., 2001. A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol. Biol. Evol. 18, 691– 699. https://doi.org/10.1093/oxfordjournals.molbev.a003851 Woodruff, M.L., Sampath, A.P., Matthews, H.R., Krasnoperova, N.V., Lem, J., Fain, G.L., 2002. Measurement of cytoplasmic calcium concentration in the rods of wild-type and transducin knock-out mice. J. Physiol. 542, 843–854. https://doi.org/10.1113/jphysiol.2001.013987 Yang, R.B., Garbers, D.L., 1997. Two eye guanylyl cyclases are expressed in the same photoreceptor cells and form homomers in preference to heteromers. J. Biol. Chem. 272, 13738–13742. https://doi.org/10.1074/jbc.272.21.13738 Yang, R.B., Robinson, S.W., Xiong, W.H., Yau, K.W., Birch, D.G., Garbers, D.L., 1999. Disruption of a retinal guanylyl cyclase gene leads to cone-specific dystrophy and paradoxical rod behavior. J. Neurosci. 19, 5889–5897. Yau, K.W., Nakatani, K., 1984. Electrogenic Na-Ca exchange in retinal rod outer segment. Nature 311, 661–663. Zhang, X., Wensel, T.G., Kraft, T.W., 2003. GTPase regulators and photoresponses in cones of the eastern chipmunk. J. Neurosci. 23, 1287–1297. Zhang, X., Wensel, T.G., Yuan, C., 2006. Tokay gecko photoreceptors achieve rod-like physiology with cone-like proteins. Photochem. Photobiol. 82, 1452–1460. https://doi.org/10.1562/2006-01-05-RA-767 Zhang, Z., Artemyev, N.O., 2010. Determinants for phosphodiesterase 6 inhibition by its gammasubunit. Biochemistry 49, 3862–3867. https://doi.org/10.1021/bi100354a Zhao, X., Yokoyama, K., Whitten, M.E., Huang, J., Gelb, M.H., Palczewski, K., 1999. A novel form of rhodopsin kinase from chicken retina and pineal gland. FEBS Lett. 454, 115–121. https://doi.org/10.1016/s0014-5793(99)00764-4

45

Tables and Figures

Table 1. Phototransduction cascade proteins and genes Protein name

Gene name in human

Cell

Has role in:

Protein description

Rh1 (Rh)

RHO

Rod

Activation

Rhodopsin

LWS

OPN1LW OPN1MW

L-cone M-cone

Activation

Long-wave-sensitive opsin

SWS1

OPN1SW

S-cone

Activation

Short-wave-sensitive opsin

SWS2



Cone (or rod)

Activation

Green-sensitive opsin

Rh2



Cone

Activation

Blue-sensitive opsin

Gαt1 Gαt2

GNAT1 GNAT2

Rod Cone

Activation

Transducin α subunit

Gβ1 Gβ3

GNB1 GNB3

Rod Cone

Activation

G-protein β subunit 1, 3

Gγt1 Gγt2

GNGT1 GNGT2

Rod Cone

Activation

Transducin γ subunit

PDEα PDEβ PDEα’

PDE6A PDE6B PDE6C

Rod Rod Cone

Activation

PDE6 catalytic subunit (dimer)

PDEγ PDEγ’ PDEγ’’

PDE6G PDE6H PDE6I

Rod Cone ?

Activation

PDE6 inhibitory subunit

CNGCα1 CNGCα3

CNGA1 CNGA3

Rod Cone

Activation & Ca-feedback

Cyclic nucleotide-gated channel α

CNGCβ1 CNGCβ3

CNGB1 CNGB3

Rod Cone

Activation & Ca-feedback

Cyclic nucleotide-gated channel β

GRK1A GRK1B GRK7

GRK1 – GRK7

Rod Cone Cone

Recovery

G-protein receptor kinase

Arr-S Arr-C

SAG ARR3

Rod Cone

Recovery

Arrestin

RGS9

RGS9

Both

Recovery

Regulator of G-protein signalling 9

Gβ5

GNB5

Both

Recovery

G-protein β subunit 5

R9AP

RGS9BP

Both

Recovery

RGS9 anchor protein

46 GC-E GC-F GC-D

GUCY2D GUCY2F –

Both Both ?

Recovery & Ca-feedback

Guanylyl cyclase

GCAP1 GCAP1L GCAP2 GCAP2L GCAP3 GCIP

GUCA1A – GUCA1B – GUCA1C –

Both ? Both ? Cone Cone

Ca-feedback

Guanylyl cyclase activating protein

Rec Visinin

RCVRN –

Rod Cone

Ca-feedback

Recoverin Visinin

NCKX1 NCKX2

SLC24A1 SLC24A2

Rod Cone

Ca-feedback

Na+/Ca2+,K+ ion exchanger

List of proteins with known functions in activation, recovery, and Ca-feedback regulation of the phototransduction cascade. Names of the human genes are given in column 2. As in Figure 1, where there is a clear distinction between expression in cones and rods, the genes are coloured red for cones and blue for rods, while black (and ‘Both’) indicates expression in both cones and rods. The question marks are for isoforms not expressed in mammalian rods and cones, and where the class of cell in which they are used is unclear. Numerous additional proteins have important functions in cones and rods, other than involvement in phototransduction, but are not listed here.

47

Figures

RCVRN Visinin

GRK1 GRK7

SAG ARR3

RGS9 GNB5 RGS9BP

GUCA1A,B GUCY2D GUCA1C GUCY2F

SLC24A1 SLC24A2

© 2019 Chris Conway Lamb

S RHO, OPN1LW M Rh1, Rh2, SWS1, SWS2, LWS

GNGT1 GNGT2 GNAT1 GNB1 GNAT2 GNB3

PDE6A,B PDE6C

CNGA1 CNGA3

PDE6G PDE6H

CNGB1 CNGB3

Rod Cone Common

Figure 1. Schematic representation of the phototransduction cascade in cones and rods The proteins involved in phototransduction in vertebrate cones and rods are depicted schematically. The cytoplasmic surface of the lipid membrane is shown uppermost, and the illustrated arrangement is for the sac/plasma membrane of cone photoreceptors. In rods, the ion channel and exchanger are located in the plasma membrane, whereas all the other proteins are located in the disc membrane, that has become pinched-off from the plasma membrane. Rh: rhodopsin, or a cone opsin. G: heterotrimeric G protein, transducin. PDE: tetrameric cGMP phosphodiesterase, PDE6. CNGC: tetrameric cyclic nucleotide-gated ion channel. NCKX: sodium/calcium-potassium exchanger. Rec: recoverin or visinin. GRK: G-protein receptor kinase. Arr: arrestin. RGS9: regulator of G-protein signalling. GC: guanylyl cyclase. GCAP: guanylyl cyclase activating protein. cG: cytoplasmic messenger, cGMP. Ca2+: cytoplasmic calcium. Boxes list the HGNC gene names of the human isoforms. Blue denotes isoforms expressed primarily in rods, and red denotes isoforms expressed primarily in cones. Lower line in the Rh box lists the names of the five opsin isoforms found across vertebrates other than mammals. © 2019 Chris Conway Lamb, with permission.

48

A Placentals Bilateria

Chordates

Vertebrates

Deuterostomes

Bony

Mammals

Amniotes

Jawed

Tetrapods Mya

800

700

B

600

500

400

300

200

100

0

Lancelets Chordates

Tunicates Hagfish

Agnathans

Lampreys 1R 2R

Sharks Rays 3R

Cartilaginous fish

Gar, etc.

Ray-finned fish

Teleosts Coelacanth Amphibians Reptiles* Birds

Sauropsids

Marsupials Placentals

Mammals

Figure 2. Species phylogeny A. Horizontal blue line shows divergences from our own lineage (of placental mammals) as a function of estimated times of divergence in millions of years ago (Mya). Those diverging lineages and their names are shown at an angle; the unnamed short lineage diverging around 400 Mya indicates coelacanth (and lungfish). Horizontal text below the line gives the name of each clade to the right of the adjacent dotted vertical line. B. A broadly-accepted view of the branching pattern for those lineages from which molecular sequences are analysed in the phylogenies of Sections 3–7. The asterisk against Reptiles is to caution that this conventional terminology must be used with care, because reptiles do not form a monophyletic clade; for example, alligators and turtles may be related more closely to birds than they are to snakes, lizards, etc., so that alligator and turtle sequences tend to clade with those for birds. Currently, chordates are classified as a phylum, with lancelets, tunicates and vertebrates each as sub-phyla. However, there has been a proposal to reclassify chordates as a super-phylum, with lancelets, tunicates and vertebrates each as phyla (Satoh et al., 2014).

49

GNAT1 Human NP-000163 GNAT1 Cattle NP-851365 97 GNAT1 Rat NP-001102250 GNAT1 Koala XP-020854749 99 GNAT1 Wombat XP-027692882 GNAT1 GNAT1 Opossum XP-001368199 53 GNAT1 Platypus XP-028907302 96 GNAT1 Xenopus NP-001096278 44 GNAT1 Nanorana XP-018429566 100 GNAT1 Quail XP-015730141 96 92 GNAT1 Chicken NP-990022 96 GNAT1 Golden eagle XP-011571903 GNAT1 Turtle XP-007052737 89 GNAT1 Anole XP-003217603 92 GNAT1 Coelacanth XP-005996374 GNAT1 Zebrafish NP-571943 91 100 GNAT1 Bowfin ANV21080 80 GNAT1 Florida gar ANV21079 72 GNAT1 Medaka XP-004084432 79 GNAT1 Elephant shark XP-007888457 GNAT1 Bluespot ray ANV21078 99 GNAT1 Western ray ARF06928 98 GNAT1 Reef shark ANV21077 93 100 GNAT1 Bamboo shark ARF06935 GNAT1-Partial E.burgeri ENSEBUP00000015660 100 GNAT1 E.cirrhatus ANV21075 GNAT1 G.australis ANV21076 GNAT1 P.marinus ACB69761 100 GNAT1 L.camtschaticum BAW81374 100

98

100

Jawed

99 100 94

100

Hagfish Lamprey

99

94

0.1

Figure 3. Example of molecular phylogeny (vertebrate rod transducins, GNAT1) Maximum likelihood (ML) molecular phylogeny calculated for Gαt1 (GNAT1) amino acid sequences from 29 vertebrate species. Bold font denotes sequences obtained from the eye transcriptome analysis of hagfish, lampreys and basal ray-finned fish by Lamb et al (2016). NCBI accession numbers are listed as part of each sequence name. Numbers at each node represent percentage bootstrap support. Note that this sub-tree that has been extracted from the full tree for GNAT and GNAI sequences shown collapsed in Figure 10, where further details are given. The horizontal scale is in units of amino acid substitutions per site. The inset here shows the collapsed version of this GNAT1 sub-tree, exactly as it is presented in Figure 10; the entire expanded tree is shown in Supplementary Figure S1.

50

A

B

C

A1

B1

C1

A2

B2

C2

A11

B11

D

E

F

G

H

J

K

E1

F1

G1

H1

J1

K1

F2

G2

H2

J2

K2

L

M

1R

D2

M1 L2

M2

2R

A12

E11 C12

A21

B21

A22

B22

C22

F11

K11

M11

F12

G12

H12

J12

M12

D21

F21

G21

H21

J21

M21

D22

F22

G22

H22

G, H

K22 J, K

L22 L, M

Figure 4. Schematic with examples of the combined effects of local gene duplication, wholegenome duplication, and gene loss Top row: A hypothetical region comprising 12 arbitrary genes along a chromosome of our chordate ancestor, prior to any genome duplications. The six genes at the right represent three separate pairs of genes where a local (tandem) duplication had occurred previously. Thus, gene pair G and H, gene pair J and K, and gene pair L and M are each taken to have arisen from local duplication, as indicated by the three curved arrows. Middle two rows: Examples of possible genes remaining after the first round of whole-genome duplication (denoted by the downward arrow labelled 1R); three genes are shown as having been lost prior to the second round. Bottom four rows: Examples of possible genes remaining after the second round of genome duplication (denoted by the dashed and dotted arrows labelled 2R), but prior to the radiation of vertebrate species. The shading is simply a visual aid to identification of the relationship between genes that existed following the two rounds of duplications. The significance of the various examples of gene loss is described in the text.

51

Gene

PHC3

14 13.54 9 19.92 7 232.13 3 170.09 PHC2

25 7.25 21 6.52

4 430.20 1 33.32 PHC1

DVL3

14 16.23 9 15.84 2 537.80 3 184.16 DVL1

25 5.47 21 2.27

2 103.23 1

14 14.91 9 11.51 7 213.06 3 146.07 PLOD1

DVL2

2 63.42

2 255.54 17 7.23

PLOD3

2 61.64

2 256.49 7 101.21 PLOD2

CHD3

2 60.97

2 254.52 17 7.88 P3H2

14 9.18

Mb

Mb

Mb

Human

Mb

Opossum

Mb

Chicken

Mb

Gar

Mb

Human

Gene

Ancestral 4

Opossum

Mb

Chicken

Mb

Gar

Mb

Human

Mb

Ancestral 3

Opossum

Gene

Chicken

Mb

Gar

Mb

Human

Mb

Ancestral 2

Opossum

Mb

Chicken

Gene

Gar

Ancestral 1

Mb

26 13.57 1 76.19 8 104.82 12 8.91

1.34

25 9.71 21 5.51

4 365.30 1 11.93

CHD5

25 8.89 21 0.63

4 394.09 1

CHD4

26 12.21 1 76.99 8 107.95 12 6.57

9 14.03 7 257.22 3 189.96 P3H1

25 1.58 21 6.39

4 429.25 1 42.75 P3H3

26 12.64 1 77.24 8 108.27 12 6.83

25 7.43 21 6.47

4 429.37 1 42.93 SLC2A3

2 57.57 1 75.87 8 104.57 12 7.92

6.10

SLC2A4

2 60.50

2 255.49 17 7.28

SLC2A2

14 13.72 9 19.79 7 231.32 3 171.00 SLC2A1

TP53

2 60.54

2 254.79 17 7.66

TP63

14 9.25

9 14.09 7 257.44 3 189.63 TP73

25 9.29 21 0.93

4 389.86 1

3.65

KCNAB3

2 59.70

2 254.50 17 7.92

KCNAB1

14 4.38

9 22.99 7 245.37 3 156.04 KCNAB2

25 7.69 21 0.66

4 394.04 1

5.99

PER1

2 58.19

2 254.30 17 8.14

PER2

14 7.86

9

25 4.79 21 0.27

4 396.41 1

7.78

ACAP1

2 58.00 ?

2 255.37 17 7.34

ACAP2

14 8.27

9 12.72 ?

25 4.19 21 2.44

1

1.29

ENO3

2 59.40

2 256.22 17 4.95

25 12.92 ?

4 433.97 1

8.86

ENO2

26 12.14 1 77.29 8 108.41 12 6.91

9.73

CLSTN3

26 12.58 1 77.41 8 108.69 12 7.13

5.50

2 521.40 2 238.24 PER3 3 195.27 ACAP3 ENO1

CLSTN2

14 7.35

9

6.21

4 103.73 3 139.94 CLSTN1

25 12.48 21 3.49

4 371.89 1

RBP2

14 7.65

9

5.91

4 102.05 3 139.45 RBP7

25 12.39 21 3.59

4 371.66 1 10.00 RBP5

26 12.50 1 77.40

9 17.53 7 202.51 3 179.40 GNB1

25 11.41 21 1.92

2 104.01 1

GNB3

26 12.62 1 77.25 8 108.29 12 6.84

7 45.62 4

3.79

X 34.09 X 133.30 GPC6

17 12.82 1 146.53 7 104.52 13 93.23

7 29.44 4

1.81

X

14 15.38 9 10.51 7 217.35 3 142.60 PLS3

7 20.38 4

2.97

X 66.98 X 115.56 LCP1

17 29.54 1 168.02 4 322.90 13 46.13

14 16.33 9 15.10 4 92.05 3 197.79 LRCH2

7 20.50 4

2.88

X 67.38 X 115.11 LRCH1

17 29.42 1 168.19 4 322.28 13 46.55

14 10.26 9

7 23.11 4 13.44 X 69.55 X 110.94 PAK1

3

7 23.15 4 13.38 X 69.51 X 111.25 CAPN5

3 10.06 1 193.32 4 336.66 11 77.07

GNB2

2 58.91

2 257.13 7 100.67 GNB4

14 1.55

GPC2

2 57.27

2 257.56 7 100.17 GPC1

14 10.46 9

TSC22D4

2 59.04

2 257.38 7 100.46 TSC22D2 14 2.42

9 23.76 7 208.34 3 150.41 TSC22D3

SERPINE1 2 61.30

2 256.69 7 101.13 SERPINE2 14 6.62

9

PLS1 LRCH4

2 58.25

2 257.22 7 100.57 LRCH3 PAK2

CAPN5L

2.68 8.32

4.63

2 526.38 2 240.44 GPC4

12 7.12

X 107.71 TSC22D1 17 15.04 1 167.41 7 168.90 13 44.43 SERPINE3 17 29.06 1 170.08 4 316.44 13 51.34

7 225.52 2 223.98

4 93.67 3 196.74 PAK3 CAPN6

2 61.13

MOGAT3 2 59.26

5.16

1.79

9.94

1 193.15 4 336.17 11 77.32 1 194.88 4 342.41 11 75.76

X 70.67 X 70.23 DGAT2

3

6.19

?

2 233.31 ARR3

7 46.84 4

1.25

X 70.64 X 70.27 ARRB1

3

2.50

?

9 12.51 ?

3 197.04 DLG3

7 46.97 4

2.49

X 70.43 X 70.44 DLG2

3

7.68

1 189.51

X 109.37 GC-D

3

0.48

1 193.58 4 337.56

AWAT1

2 256.54 7 101.20

4 343.25 11 75.26

ARRB2

2 57.31

2 256.00 17 4.71

SAG

14 18.77 9

DLG4

2 59.56

2 255.57 17 7.19

DLG1

14 1.92

GC-E

2 58.97

2 254.43 17 8.00 TRPC1

14 15.36 9 10.55 7 217.26 3 142.72 TRPC5

7 36.90 4 13.13 X 69.08 X 111.77 TRPC4

ZBTB4

2 60.69

2 255.17 17 7.46

ZBTB38

14 15.74 9 10.21 7 218.52 3 141.32 ZBTB33

7 47.31 4 16.54 X 28.37 X 120.25

FXR2

2 57.67

2 255.02 17 7.59

FXR1

14 0.06

ATP1B2

2 59.11

2 254.90 17 7.65

ATP1B3

14 15.53 9 10.31

GRK1B

2 60.73

GRK7

14 15.57 9 10.27 7 218.06 3 141.78

SLC9A9

14 15.22 9 10.73 7 215.95 3 143.27 SLC9A6

7 45.93 4

14 14.32 9

7 38.97 4 15.64 X 17.57 X 123.96 STAGL

17 1.98

1 130.04 7 52.00

X 44.65 X 138.61 FGF14

17 2.76

1 143.25 7 94.98 13 101.71

X 46.39 X 140.50 SOX1

17 24.50 1 139.82 7 79.61 13 112.07

GC-F

STAG3

2 57.24

2 257.53 7 100.18 STAG1

FGF11

2 60.25

2 255.20 17 7.44

SOX19

2 63.66

CD68

2 57.51

2 255.04 17 7.58

1.19

9 17.14 7 204.09 3 180.87 FMR1

0.74

3 141.88 ATP1B4

4 98.19 3 136.34 STAG2

7 37.61

3 14.54 1 171.80 4 311.24 13 37.64

7 50.17 4 18.75 X 37.38 X 147.91 7 47.33 4 16.51 X 28.48 X 120.36 ATP4B GRK1A

7 46.47 4

4.21 4.78

?

X 135.97 SLC9A7

FGF12

14 19.00 9 13.37 7 254.79 3 192.14 FGF13

SOX2

14 0.18

9 16.92 7 204.85 3 181.71 SOX3

7 46.76

RAP2B

14 2.89

9 23.44 7 248.87 3 153.16 RAP2C

7 45.48 4

3.49

X 32.97 X 132.20 RAP2A

ZIC1

14 14.69 9 11.94 7 211.74 3 147.39 ZIC3

7 46.32 4

4.57

X 43.80 X 137.57 ZIC2

LAMP3

14 0.44

9 16.42 7 206.33 3 183.12 LAMP2

11 83.46

7 47.35 4 16.49 X 28.54 X 120.43 LAMP1

17 27.46 1 137.29 7 76.98 13 113.65 17 27.47

7 76.94 13 113.67

17 20.63 1 130.28 7 52.81 X 46.60

1 145.29 7 100.81 13 97.43 17 3.37

1 144.28 7 97.60 13 99.98

17 27.33 1 137.49 7 77.56 13 113.30

Figure 5. Synteny of a subset of phototransduction genes across four species Locations of genes from 37 families of ohnologs, assigned to four presumed chromosomal rows in the proto-vertebrate ancestor following 2R WGD. Four species have been analysed, and the locations are taken from Ensembl 93, which used the following assemblies: spotted gar, LepOcu1; chicken, Gallus_gallus-5.0; opossum, monDom5; and human, GRCh38.p12. Each column pair give the chromosome number and the start position of the gene in megabases (Mb). Eight genes involved in phototransduction are indicated in bold. In general, HGNC gene names are given, except where the gene is absent in human; however, for the avoidance of confusion amongst the guanylyl cyclases, the IUPHAR/BPS names are given (see Section 6.2). The coloured shading identifies regions where all the genes lie on a common chromosome. Note that many of the genes in each column are in close proximity to one another; e.g. under ‘Ancestral 1’, 25 of the 26 spotted gar genes are within a span of 4.4 Mb, and all 23 opossum genes are within a span of 3.3 Mb. Grey shading denotes quartet isoforms examined in Figure 6 and Figure 7. Where a gene name is missing, that gene has been lost from all jawed vertebrates. In other cases where there is a missing entry for a gene position, that gene is not found in the assembly for that species; a question mark indicates that the gene is on an unplaced scaffold. Note that the first column pair for chicken is empty, indicating the absence from the assembly of multiple genes. However, this is unlikely to indicate the loss of these genes from the chicken genome, because there is evidence for mRNA transcripts; instead the empty column is likely to indicate problems with the sequencing and assembly (Delbridge et al., 2009; Lovell et al., 2014).

52

A

1R 2R

GRIN2s

CYTHs

KCNJs

RGS9/11

PDEγs

GRIN2A

CYTH3

KCNJ12

RGS11

GRIN2B

CYTH4

KCNJ4

GRIN2D

CYTH2

KCNJ14

GRIN2C

CYTH1

KCNJ2

RGS9

CACNGs

GSGs

EMPs

PDE6I

CACNG3

GSG1L

EMP2

4

PDE6H

CACNG2

GSG1

EMP1

3

Visinin

CACNG8

GSG1L3

EMP3

2

RCVRN

CACNG4

GSG1L2

PMP22

1

PDE6G

B SLC2As

GNBs

GPCs

LRCHs

SLC2A3

GNB3

GPC6

LRCH1

SLC2A1

GNB1

GPC4

LRCH2

SLC2A2

GNB4

GPC1

LRCH3

SLC2A4

GNB2

GPC2

LRCH4

ARRB2

SPRYs

CNGAQs

TyrKs

GABAAαs

PDE6s

FGFRs

PSDs

SPRY2

CNGA3

BMX

GABRA5

PDE6C

FGFR2

PSD

SPRY3

CNGA2

BTK

GABRA3

FGFR1

ITK

GABRA1

PDE6A

TEC

GABRA2

PDE6B

C

SPRY4 SPRY1

CNGA1

D KCNCs

SYTs

KCNC1

SYT9

KCNC3

SYT3

KCNC4

SYT6

KCNC2

SYT10

GCAPs GCAP2L

GCAP3

Arrs

DLGs

GCs

ATPBs

DLG2

GC-D

ATP4B

ARR3

DLG3

GC-F

ATP1B4

SAG

DLG1

ARRB1

GC-E

PLXNA2

GCAP1L

PLXNA4

GRK1A

ATP1B2

GRK7 GRK1B

FGFs

LAMPs

FGF14

LAMP1

4

FGF13

LAMP2

3

FGF12

LAMP3

2

FGF11

CD68

1

TNFAIP8s

NCKXs

LINGOs

HCNs

4

TNFAIP8

SLC24A2

LINGO2

HCN1

4

PSD4

3

TNFAIP8L1

LINGO3

HCN2

3

FGFR4

PSD2

2

TNFAIP8L3

LINGO1

HCN4

2

FGFR3

PSD3

1

TNFAIP8L2

LINGO4

HCN3

1

Opsins Rh1

PLXNA1

GCAP1

GRKs

ATP1B3

DLG4

PLXNAs

PLXNA3 GCAP2

Rec/Vis

LWS

SLC24A1

GNAIs

GNATs

GRMs

TFs

GNAI2

GNAT1

GRM7

MITF

4

GRM6

TFE3

3

SWS2 Rh2 SWS1

GNAI3

GNAT2

GRM4

TFEB

2

GNAI1

GNAT3

GRM8

TFEC

1

Figure 6. Overview of syntenic arrangement of vertebrate phototransduction genes Presumed arrangement of phototransduction genes on four rows representing sections of the quadruplicated genome of an early vertebrate organism, after 2R WGD but before the vertebrate radiation. The panels depict locally paralogous regions, with panel B derived directly from Figure 5; each panel depicts a single such region, with the exception of panel C, which depicts two. Phototransduction genes are shown shaded red (cone isoforms), blue (rod isoforms), or grey (common isoforms, or not determined, or in photoreceptors other than rods and cones); non-phototransduction genes in these same families are shaded green. White is used for reference sets of ohnolog quartets, and the grey linkages between these genes show those pairs that have been established to be sisters by phylogenetic analysis of the kind illustrated in Figure 7. These pairings define the pairs of rows that diverged from each other at 1R, as indicated by the branching pattern inset in A. Thus, each panel has been arranged so that the upper pair of rows are sisters and the lower pair of rows are sisters. However, it is not certain that the row numbering is continuous between the five sections, because it has not yet been possible to definitively link them together Nevertheless, the proposal is made here that the entire set of genes (together with numerous other non-phototransduction genes) may form a single paralogon. GNAT3 (shown near the bottom right) is not used in cone or rod phototransduction, but is used instead in reptilian parietal photoreceptors, as well as in some taste receptors.

53

A1

A2

B1

B2 100

100

100

GRIN2A

100

SLC2A3

GSG1L

DLG2

100

100

GRIN2B

100

GSG1

100

SLC2A1

100

DLG3

100

100

99 100

100

GSG1L3

GRIN2D

SLC2A2

100 100

GRIN2C

100

SLC2A4

98

0.2

C1

100

D2 100

HCN1

PLXNA1

SPRY3

HCN2

100

GRM7

PLXNA3

100

99

GRM6

100

100

SPRY4

100

100

100

100

DLG4

0.2

D1

SPRY2

100

100

0.2

C2 100

DLG1

100

GSG1L2

100

100

PLXNA2

100

GRM4

HCN4 100

100

SPRY1

100

HCN3

100

0.2

PLXNA4

100

GRM8

100

0.2

0.2

0.2

Figure 7. Molecular phylogenies for eight examples of ohnolog quartets Unconstrained ML molecular phylogenies of jawed vertebrate sequences, for eight ‘quartet’ families of ohnologs that retain all four members, and that are located in the vicinity of phototransduction genes; these are sample quartets chosen from amongst the 26 shown in Figure 6. Here each major clade is shown collapsed. The fully-expanded trees for all of these ohnolog quartet families are presented in Supplementary Figure S13. Those 26 phylogenies provide the basis for arranging the row pairs in Figure 6. The phylogeny in A1 used outgroup sequences, but in all other cases no outgroup was used and so the trees are unrooted.

54

Figure 8. Evolution of G-protein alpha subunits, as proposed by Lokits et al (2018) For the five primary families (Gαs, Gαq, Gαi, Gα12 and Gαv) the duplications are shown that gave rise to the families and that led to expansion of each family. Prefix ‘pre’ denotes genes predating 2R WGD, and throughout the diagram ‘Gα’ has been omitted. The pairs of GNAI-GNAT genes in extant vertebrates (bottom right) arose by 2R WGD of a pair of genes (preI’ and preI’’) that resulted from the tandem duplication of a single preI gene. See text for further explanation. Reproduced from Fig. 4b of Lokits et al. (2018).

55

A. Ancestral Gi

Gαi*

AC 

cAMP↓

R*

CNGC↓

[γ] PDE

B. Transition Gαi'*

Gi'



Gαi''*

R*

GC cGMP↓

Gαt*

CNGC↓

γ·PDE ⊕

C. Proto-vertebrate

Gt

GC

cAMP↓

R* Gi''

AC

CNGC↓

γ·PDE6 ⊕

Figure 9. Postulated origin of the proto-vertebrate phototransduction cascade A. Postulated ancestral phototransduction cascade in a deuterostome organism. This cascade utilised inhibition of adenylyl cyclase by Gαi, and therefore would have resembled an inhibitory version of the canonical cascade of olfactory phototransduction. The possible appearance of a molecule that could inhibit the PDE is indicated by [γ]. B. Transition is proposed to have occurred following tandem duplication of Gαi to create a pair of isoforms that were both expressed in the cell. One of these isoforms mutated, to permit the light-activated form (Gαi'') to interact with γ and thereby lessen its inhibition of the PDE. A guanylyl cyclase (GC) may have been expressed, and this would have allowed both cAMP and cGMP to function as cytoplasmic messengers. C. Once the new mechanism became more potent than the old one, expression of the original set of genes may have ceased, leaving a proto-vertebrate organism (that existed before 2R WGD) with a single phototransduction cascade of the vertebrate style.

56

A

100

Jawed

GNAT1

99 100 94 100

100

Hagfish Lamprey

1R

GNAT (= preI'')

99

87

Jawed GNAT2

2R 100

100

Jawed

63

Lamprey

100

100

GNAT3

GNAI2

1R

100

100 99

GNAI3

2R

100

99 86

100 100 98

GNAI1

Lancelet GNAI Tunicate GNAI 100 Deutero GNAI GNAO 100

98 100 100 100 100

0.2

B

1R

2R

GNA14 GNA11 GNAQ

GNAIs GNAI2

Hsa3 50.2

GNATs GNAT1

Hsa3 50.2

GNAI4 GNAI

GNAT

3

Lamprey

GNAI3

Hsa1 109.6

GNAI1

Hsa7 80.1

4

GNAT2

2

GNAT3

1

Hsa1 109.6

Hsa7 80.5

57 Figure 10. G-protein alpha subunits (Gαt, Gαi) A. ML molecular phylogeny for vertebrate G-protein α subunits (GNATs and GNAIs), with outgroup comprising a set of vertebrate GNAOs and GNAQ/11/14s together with invertebrate GNAIs. Protein substitution model, WAG+R4. A minor constraint has been applied, to keep the three vertebrate GNAI clades together (i.e. to prevent other sequences from being placed within this set); that constraint caused only a small change in log likelihood, of ΔLogL ≈ 3.4, and the constrained tree passed all three tests of topology, with p-AU = 0.48. Note that two support values are marked with a ‘strike-through’, to indicate that they are artificially high. Blue shading is for isoforms expressed primarily in rods, and red for those primarily in cones. Yellow 1R and cyan 2R denote first and second rounds of WGD, respectively. The fully-expanded tree is shown in Supplementary Figure S1. B. Deduced pattern of gene duplications and losses; the tandem GNAI-GNAT genes in a chordate organism were quadruplicated. Although GNAI4 has been lost from jawed vertebrates, it is reported to have been retained in lampreys (Lokits et al., 2018). Row numbers have been arranged to correspond to those in Figure 6. Gene locations are given for human (in part because GNAT3 has been lost in spotted gar).

58

A 100 95

2R

97 100

GNB3

GNB1

1R 99

98 89 94

GNB4

2R

GNB2

100 100 99

GNB Tunicate GNB Basal deuterostome

GNB-13F Protostome

0.2

1R

B

2R

GNB3

4

GNB1

3

GNB4

2

GNB2

1

LG26 12.6

GNB

LG25 11.4

LG14 1.6

LG2 58.9

Figure 11. G-protein beta subunits (GNB1–4) A. Unconstrained ML molecular phylogeny for GNB1–4 sequences, using a set of protostome and invertebrate deuterostome GNBs as outgroup. Yellow 1R denotes first round, and cyan 2R denotes second round, of WGD. The fully-expanded tree is presented in Supplementary Figure S2. B. Deduced pattern of gene duplications (with gene locations listed for spotted gar). This is the simplest possible pattern, with no gene losses. Row numbers as in Figure 6.

59 100

A

PDE6C Lamprey

97

100

86

PDE6X Agnathan

100

PDE6

100

PDE6C Jawed

2R

1R 99

100

PDE6A Jawed

2R PDE6B Jawed

89

PDE Ciona

100 100

PDE5A Outgroup

100 100

PDE11A

0.2

B

1R

2R

PDE6cat PDE6C

4

PDE6X

3

PDE6A

2

PDE6B

1

2 246.2

PDE6 PDE 5/11

Ciona

Agnathan

11 31.8

5 205.1

C

RGS9/11 RGS11

11 153.5

? RGS9/11

PDEγ PDE6I

4

PDE6H

3

PDE6G

1

11 153.5

12 35.4

PDE6γ

RGS9

9 101.3

9 87.3

60

Figure 12. PDE catalytic and inhibitory subunits (PDE6s, PDE6γs) A. Unconstrained ML molecular phylogeny for PDE6 catalytic sequences, using a set of vertebrate PDE5s and PDE11s as outgroup. Purple shading is for a clade of agnathan isoforms (PDE6X) that are positioned separately from cone and rod isoforms. Yellow 1R and cyan 2R denoted first and second rounds of WGD, respectively. The fully-expanded tree is presented in Supplementary Figure S3. B. Deduced pattern of gene duplications and losses (with gene locations listed for reedfish). C. Postulated pattern of gene duplications and losses for PDE6 inhibitory subunits; the postulated pattern for RGS9/11 is also shown, for reasons set out in the text. Row numbers as in Figure 6

61

A

B 98

1R

CNGA3

2R

CNGA4 95

99

2R

CNGA3

4

CNGA2

3

LG17 21.7

LG7 44.6

CNGAQ

CNGA2

100

1R

2

CNGA1 LG4 50.6

1

CNGAQs 82

CNGA1

100 80

CNGB

99

Tunicate Protostome

100 99

100

CNGB3 LG9 28.7

Protostome Echinoderm

Tunicate

99 97 100 100

1R 100 0.5

Jawed

87

100

100

Jawed

100

100

CNGB1

Agnathan Lamprey

97

?

CNGBs

CNGA4

99

CNGB1

LG23 12.7

CNGB3

Human HCNs

0.5

Figure 13. Cyclic nucleotide gated channel subunits (CNGCα, CNGCβ) A, B. Unconstrained molecular phylogeny for CNGC alpha and beta subunits, using human HCNs as outgroup. Protein substitution model, WAG+R4. The fully expanded phylogeny is presented in Supplementary Figure S4. The highlighted 1R and 2R annotations indicate the duplications during 2R WGD. B. Deduced pattern of gene duplications and losses (with gene locations listed for spotted gar). Row numbers as in Figure 6.

?

62

92

A

100

74

92

99

Agnathan GRK7-2

2R

Lamprey GRK7-1

100

1R Jawed GRK7

100

Ciona GRK

100

Pre100

100

Jawed GRK1A

1R

100

90

Jawed GRK1B

86 100

GRK5L

100 100

100

90 99

Lamprey GRK1B

100

GRK5 GRK4

GRK4/5/5L/6

GRK6

99 100 70

GRK2 GRK3

Out

0.2

B

1R

GRK1s

GRK7s

GRK1A

GRK7-2

4

GRK7-1

3

GRK7

2

LG17 27.5

Pre-

GRK1

2R

Agnathan

Lamprey

GRK7

LG14 15.6

Ciona GRK1B LG2 60.7

1

Figure 14. G-protein receptor kinases (GRK1s, GRK7s) A. Unconstrained ML molecular phylogeny for visual GRKs (GRK1 and GRK7), using a small set of sequences from GRK2/3 and GRK4/5/5L/6 as outgroup. Jawed vertebrates have three isoforms of visual GRK: GRK7, GRK1A and GRK1B; agnathan vertebrates have a different combination of three visual isoforms: GRK7-1, GRK7-2 and GRK1B. Red shading is for isoforms expressed primarily in cones (or cone-like lamprey photoreceptors), and blue shading is for isoforms expressed primarily in rods (or agnathan rod-like photoreceptors). Yellow 1R and cyan 2R denote the first and second rounds of WGD, respectively. Protein substitution model, WAG+R4. The fully-expanded tree is presented in Supplementary Figure S5. B. Pattern of gene duplications and losses deduced using a combination of phylogeny and gene synteny. Gene locations are listed for spotted gar; row numbers as in Figure 6.

63

A 97

S Bony

100 100 100

1R

S1 Cartilag. S2 Cartilag.

100 100

97 100

2R97

C Agnathan C Lamprey

99

Pre-

100

100

1R

100

B1 Jawed

100

97

100

2R

54

85

C Jawed

98

BY Lamprey BX Lamprey

79

B2 Jawed

100

100

Lancelet 100 Hemichordate Mollusc

100

Tunicate

0.2

1R

B

2R

Beta

Visual

Arr-B1

Arr-B

4

LG3 2.5

Pre-

Arr-C

3

Arr-S

2

LG7 46.8

Arr-V

LG14 37.9

Ciona Arr-B2

LG2 57.3

1

Figure 15. Arrestins (Arr-S, Arr-C, Arr-β1, Arr-β2) A. ML molecular phylogeny for vertebrate arrestins, using an outgroup comprising nine invertebrate arrestins. A minor constraints has been applied to prevent the β-arrestin and Arr-C clades from fragmenting; this caused a change in log likelihood of ΔLogL = 7.8, and the constrained tree passed all tests of topology with p-AU = 0.31. Fully-expanded unconstrained phylogeny is shown in Supplementary Figure S6. B. Pattern of gene duplications deduced using a combination of phylogeny and gene synteny. Gene locations are for spotted gar; row numbers as in Figure 6.

64

A

B

1R

2R RGS11

100

RGS11

Hsa16 0.3

4

100 98

RGS9-Like Agnathan 86

RGS9 Agnathan

95 96

RGS9-Like Agnathan

RGS9

Hsa17 65.1

94

99

1

RGS9 Jawed

RGS9/11 Basal deuterostomes RGS9/11 Arthropod

99 100 98

100 100

RGS6 RGS7

0.2

Figure 16: Regulator of G-protein signalling (RGS9/11) A. ML molecular phylogeny for 49 RGS9/11 sequences from jawed vertebrates, plus 10 homologous sequences from agnathan vertebrates, together with five related sequences from invertebrates, and with an outgroup comprising 14 jawed vertebrate RGS6/7s. A minor constraint has been applied, to move the root one node (from the position of the dotted arrow); this caused a very small change in log likelihood, of ΔLogL = 2.3, and the constrained tree passed all three tests of topology, with p-AU = 0.42. The fully-expanded unconstrained tree is shown in Supplementary Figure S7. B. Corresponding pattern of gene duplications and losses (with extant jawed vertebrate gene locations given for human). Row numbers correspond to those in Figure 6.

65

A

B 1R 96

NCKX2 LG4 45.0

2 Jawed

100

X Agnathan

1R

NCKX1 LG3 42.9

NCKX

1 Jawed

80

NCKX5 NCKX3/4

100

Z Lamprey Lancelet 100 Mollusc 100

100 100 100

3 Jawed 4 Jawed

98 100

2

1 100

100

4

3

NCKX1/2 100

100

2R

5 Jawed

0.2

Figure 17. Na+-K+/Ca2+ exchangers (NCKX) A. ML molecular phylogeny for 45 NCKX1/2 sequences from jawed vertebrates, plus six homologous sequences from agnathan vertebrates, together with five related sequences from invertebrates, and with an outgroup comprising 15 jawed vertebrate NCKX3/4/5s. Constraints on the positions of the agnathan sequences have been applied; this caused a relatively small change in log likelihood, of ΔLogL = 6.1, and the constrained tree passed all three tests of topology, with p-AU = 0.39. The fully-expanded unconstrained tree is shown in Supplementary Figure S8. B. Deduced pattern of gene duplications (with gene locations listed for spotted gar). Row numbers correspond to those in Figure 6.

66

A

B 100

100

GC-F

1R

2R

2R GC-D

4

GC-F

3

LG3 0.5

100 100

GC-D

Visual GC

1R

LG17 37.6

2

Ciona GC-E

84 100

100 60

100

99

100

GC-E

LG2 59.0

1

Tunicate Lancelet Basal deutero. Protostome

0.2

Figure 18. Guanylyl cyclases (GC-D, GC-E, GC-F) A. Unconstrained ML molecular phylogeny for jawed vertebrate visual guanylyl cyclases, with the outgroup composed of invertebrate sequences. The fully-expanded phylogeny is shown in Supplementary Figure S9. B. Deduced pattern of gene duplications (with gene locations listed for spotted gar). From the chromosomal arrangement of genes and the pairings of nearby ohnolog quartet genes, it is clear that GC-E diverged from GC-D/GC-F at 1R. Row numbers correspond to those in Figure 6.

67

A

B 100

100

GCAP3

1R

1R 100

95 100

Pre-

2R GCAP2L

GCAP1

Pre-

LG27 11.4

GCAP3 LG3 18.6

4

3

2R

NCS 2/L

GCAP1L

95

NCS

1/L/3

GCAPs

GCAP2 LG3 32.7

GCAP1

2

GCAP1L

1

LG3 32.7

LG8 8.2

GCIP 99

100

GCAP2L

100

LG8 3.3

GCAP2

GCIP

100

100

GCIP

1R

Out

0.2

Figure 19. Guanylyl cyclase activating proteins (GCAP) A. Unconstrained ML molecular phylogeny for 115 GCAP/GCIP sequences from jawed vertebrates, with an outgroup comprising seven related invertebrate deuterostome sequences together with human HPCA, HPCAL1 and NCALD. The fully-expanded tree is shown in Supplementary Figure S10. B. Deduced pattern of gene duplications (with gene locations listed for spotted gar). Row numbers correspond to those in Figure 6.

1

68

A 100

100

2R 100

RecVis-Y Lamprey

1R

2R RecVis-Y

RecVis-X Lamprey

Agnathan

RecVis-X

1R 99

100

B

Agnathan

RecVis

Visinin Jawed

NCS

2R

Visinin

2

RCVRN

1

ZF16 / ZF19

ZF3 / ZF12

Recoverin Jawed

100

100

GCAPs

Out

0.2

Figure 20. Recoverin and visinin A. ML molecular phylogeny for 19 recoverins and 18 visinins from jawed vertebrates, plus 8 homologous sequences from lampreys, with the same outgroup as used for the GCAPs. A minor constraint has been applied, to move the root of the tree by one node from the position shown by the dotted arrow for the unconstrained tree; that constraint caused only a small change in log likelihood, of ΔLogL = 2.1, and the constrained tree passed all three tests of topology, with p-AU = 0.4. The fully-expanded tree is shown in Supplementary Figure S11. B. Deduced pattern of gene duplications and losses (with chromosomes listed for zebrafish). Row numbers correspond to those in Figure 6.

69

100

OPN5

99

Echinoderm 100

OPN3

TMT

90

100

Lancelet 3 Lancelet 1 100 Lancelet 2

95 100 100 100

Parietopsin

100

Parapinopsin

89

100

100

VAL

94

Ciona 100

Pinopsin

93

100

LWS

100

100

* 98

SWS1

100

SWS2

100

98 99

Rh2

100

100

Rh1

0.5

Figure 21. Vertebrate visual opsins Unconstrained ML molecular phylogeny for 199 C-opsin sequences from deuterostomes, with an outgroup comprising 16 jawed vertebrate OPN5s. The fully-expanded tree is shown in Supplementary Figure S12. Dotted arrows indicate the approximate positions of the clades for TMT opsins and Ciona C-opsins that were obtained in other calculations; those sequences have been omitted from the illustrated phylogeny because their inclusion led to an alignment that appeared inferior, and that gave lower levels of support. The asterisk marks a bootstrap support value that is discussed in the text. The deduced pattern of gene duplications is shown in the first section of Figure 22.

70 Protostome Tunicate Bilateria

1R

Protostome Tunicate

2R

Basal Proto-vertebrate deutero.

Bilateria

LWS

3

OPN3 PTO PPO VAL

SWS2

Rh

SWS

2R

Basal Proto-vertebrate deutero.

Pinopsin

Pinopsin

C-Opsin

1R

LWS

3

SWS2

4

Rh1

2

Rh2

1

SWS1

SWS1

GRK 2/3

GRK7

GRK 1/7

2

GRK7

4

GRK1A

1

GRK1B

3 2

Arr-C

4

RGS11

1

RGS9

4

NCKX1

2

NCKX2

4

GC-D

3

GC-F

1

GC-E

4

GCAP3

2

GCAP1

1

GCAP1-L

4

GCAP2-L

2

GCAP2

1

GCIP

GRK1

GRK 4/5/6

Arr-V

Arr

Arr-S

Arr-B

4 GNAI/O

GNAI

GNAO

GNAT1

GNAT

GNAI

2

GNAT2

1

GNAT3

RGS

RGS9/11 ? RGS 6/7

GNB

4

GNB3

3 2

GNB1

1

GNB2

GNB4

PDEγ

NCKX

NCKX1/2

NCKX 3/4/5

PDE 5/6/11

PDE

PDE6C Visual GC

GC

PDE6 PDE6A

PDE10

PDE6B

PDE 5/11

PDE6γ

4

PDE6I

3

PDE6H

1

PDE6G

GC-C

GCAP

GCAP / GCIP CNGAQ

4

CNGA3

3

CNGA2

GCIP NCS

CNG

CNGA4

1

CNGA1 CNGB1

CNGB CNGB3

Rec / Vis 2

Visinin

1

Recoverin

71 Figure 22. Scenario for gene duplications in the vertebrate phototransduction cascade Deduced pattern and approximate timing of gene duplications for the multiple components of vertebrate phototransduction. The two main columns are continuous with each other. The four dotted vertical lines mark the following events. ‘Protostome’: the speciation divergence of protostomes from the deuterostome lineage; ‘Tunicate’: the speciation divergence of tunicates from the proto-vertebrate lineage; ‘1R’ and ‘2R’: the first and second rounds of whole-genome duplication (WGD). The horizontal axis is not to scale; very approximate timings for the four dotted vertical lines are: ~750 Mya, ~650 Mya, and a pair of events ~600 Mya (see Figure 2A). For the opsins (top left), the colour coding provides an indication of spectral sensitivity. For all other components, the colour coding shown after 2R is as follows: red, cone isoforms; blue, rod isoforms; black, common isoforms, or those for which the distribution is uncertain; grey, not involved in phototransduction; green, used in phototransduction, but neither in cones nor rods. Squares (□) mark individual gene duplications; circles (○) mark whole-genome duplications. The upward and downward sloping arrows at 1R and 2R correspond to the branching pattern shown in the inset at the top left of Figure 6, and the numbers 1–4 correspond to the chromosome row numbers in that Figure. However, row numbers are not assigned for the CNGB genes, because the chromosomal regions in which they reside have not yet been linked to the arrangement shown in Figure 6.

This paper reviews current knowledge of the evolution of the multiple genes encoding proteins that mediate the process of phototransduction in rod and cone photoreceptors of vertebrates. The approach primarily involves molecular phylogenetic analysis of phototransduction protein sequences, combined with analysis of the syntenic arrangement of the genes. At least 35 of these phototransduction genes appear to reside on no more than five paralogons – paralogous regions that each arose from a common ancestral region. Furthermore, it appears that such paralogs arose through quadruplication during the two rounds of genome duplication (2R WGD) that occurred in a chordate ancestor prior to the vertebrate radiation, probably around 600 millions years ago. For several components of the phototransduction cascade, it is shown that distinct isoforms already existed prior to WGD, with the likely implication that separate classes of scotopic and photopic photoreceptor cells had already evolved by that stage. The subsequent quadruplication of the entire genome then permitted the refinement of multiple distinct protein isoforms in rods and cones. A unified picture of the likely pattern and approximate timing of all the important gene duplications is synthesised, and the implications for our understanding of the evolution of rod and cone phototransduction are presented.

1

Evolution of the genes mediating phototransduction in rod and cone photoreceptors Trevor D Lamb Eccles Institute of Neuroscience, John Curtin School of Medical Research, Australian National University, Canberra ACT 2601, Australia E-mail: Tel: Last saved: Submitted to:

[email protected] +612 6161 0350 26 November 2019 Progress in Retinal and Eye Research

Abstract This paper reviews current knowledge of the evolution of the multiple genes encoding proteins that mediate the process of phototransduction in rod and cone photoreceptors of vertebrates. The approach primarily involves molecular phylogenetic analysis of phototransduction protein sequences, combined with analysis of the syntenic arrangement of the genes. At least 35 of these phototransduction genes appear to reside on no more than five paralogons – paralogous regions that each arose from a common ancestral region. Furthermore, it appears that such paralogs arose through quadruplication during the two rounds of genome duplication (2R WGD) that occurred in a chordate ancestor prior to the vertebrate radiation, probably around 600 millions years ago. For several components of the phototransduction cascade, it is shown that distinct isoforms already existed prior to WGD, with the likely implication that separate classes of scotopic and photopic photoreceptor cells had already evolved by that stage. The subsequent quadruplication of the entire genome then permitted the refinement of multiple distinct protein isoforms in rods and cones. A unified picture of the likely pattern and approximate timing of all the important gene duplications is synthesised, and the implications for our understanding of the evolution of rod and cone phototransduction are presented.

Keywords:

Evolution; Photoreceptors; Phototransduction genes; Molecular phylogeny; Gene synteny

Declaration of interests: None.

2 Contents Introduction Background to analysis of phototransduction cascade evolution 2.1 Species phylogeny 2.2 Molecular phylogeny 2.3 Individual gene duplication versus whole genome duplication (WGD) 2.4 Gene synteny 3 Evolution of G-proteins and origin of the proto-vertebrate phototransduction cascade 3.1 Overview of G-protein evolution 3.2 Origin of the proto-vertebrate phototransduction cascade 4 Evolution of the activation steps of vertebrate phototransduction 4.1 Transducin alpha subunits (GNAT1–3) 4.2 G-protein beta subunits (GNB1–4) 4.3 PDE catalytic subunits (PDE6A,B,C) 4.4 PDE inhibitory subunits (PDE6G,H,I) 4.5 Cyclic nucleotide gated channels (CNGA1–4, CNGB1,3) 5 Evolution of the recovery steps of vertebrate phototransduction 5.1 G-protein receptor kinases (GRK1A,1B,7) 5.2 Arrestins (SAG, ARR3, ARRB1, ARRB2) 5.3 Regulator of G-protein signalling (RGS9, Gβ5 and R9AP) 6 Evolution of Ca-feedback regulation of vertebrate phototransduction 6.1 Na+-K+/Ca2+ exchangers (NCKX1,2) 6.2 Guanylyl cyclases (GC-E, GC-F, GC-D) 6.3 Guanylyl cyclase activating proteins (GCAP1, 1L, 2, 2L, 3) 6.4 Recoverin and visinin 7 Evolution of vertebrate visual opsins 8 A synthesis of the co-evolution of the genes for the vertebrate phototransduction cascade 8.1 Pattern and timing of phototransduction gene duplications 8.2 Summary of the evolution of individual phototransduction components 8.3 Co-evolution of components: Stages in the evolution of vertebrate phototransduction 8.4 Origin of photopic/scotopic dichotomy in vertebrate phototransduction 8.5 Refinement of the distinct isoforms for rods and cones 8.6 Summary 9 Future directions References Tables and Figures

1 2

3 4 5 5 8 10 13 13 14 15 15 16 17 18 19 20 20 22 23 24 24 25 26 28 28 29 29 30 32 33 34 36 36 38 45

3

1

Introduction

The primary purpose of this article is to review our current understanding of the evolution of the genes that mediate vertebrate phototransduction, and thereby to provide a clearer description of how the cascade of phototransduction reactions evolved over hundreds of millions of years. In doing so, this paper fills a gap in the overall picture of the evolution of vertebrate photoreceptors and vertebrate retina that I presented in this journal six years ago (Lamb, 2013). The components of the vertebrate phototransduction cascade are represented schematically in Figure 1. The activation steps are shown in the foreground, with activation flowing from left to right. Upon absorption of a photon of light, the activated visual pigment (Rh) catalyses the activation of the G-protein transducin (G), which in turn activates the phosphodiesterase (PDE), causing it to hydrolyse cyclic GMP (cG) so that the cyclic GMP concentration drops, thereby closing cyclic nucleotide-gated channels (CNGCs) and generating the photoreceptor’s electrical response. Note that the depiction in Figure 1, and in particular the topology of the membrane, is a generic form representing both cone and rod phototransduction. For cones, which mediate daytime vision, all of these proteins are located in the plasma membrane, as sketched. In contrast, for rods, which mediate night-time vision, only the ion channels (CNGCs) and the exchanger (NCKX) are located in the plasma membrane; the other proteins are restricted almost exclusively in the membranes of the pinched-off free-floating discs, with the disc and plasma membranes separated from each other by the cytoplasmic medium. Figure 1. Schematic representation of the phototransduction cascade The boxes above and below the schematic in Figure 1 give the HGNC names of the genes encoding the respective proteins in human. Remarkably, for 12 of the 17 illustrated classes of protein subunit, there are separate cone and rod isoforms, indicated in red and blue, respectively; only a handful of protein components are encoded by a common gene in both cones and rods, as indicated in black. (In various species, an isoform here or there has been lost, obscuring the general cone/rod duality. For example, most vertebrate lineages have lost either recoverin or visinin, with the result that both classes of photoreceptor then express a common isoform; see Section 6.4.) These protein families and the genes encoding them are described more comprehensively in Table 1. Table 1. Phototransduction cascade proteins and genes Because of their use of distinct protein isoforms, cones and rods represent a unique evolutionary system, where the same process (the detection of light) uses a different set of genes in different classes of cell. This situation raises a number of fundamental questions, including the following. How did the cone/rod duality of isoforms arise? When did the various gene duplication events occur? To what extent were any of those duplications synchronous, in the form of duplication of the entire genome? In what manner has each pair of isoforms diverged since their formation? What factors provided a survival advantage to the organism? Can we trace the entire sequence of events that led to the evolution of the separate cone and rod phototransduction cascades? And, finally, can we use this knowledge of evolution to enhance our overall understanding of the process of phototransduction? Over recent decades there have been numerous studies and reviews of the evolution of the huge family of opsin genes that encode the light-absorbing protein, rhodopsin and its cousins, and this facet of phototransduction will be considered in Section 7. In contrast, there have been far fewer studies of the evolution of the genes that encode the proteins for all the other steps in

4 phototransduction. For several of the individual proteins of the cascade, there have been studies of gene phylogeny, and these will be referred to in the relevant sections below. One of the earliest studies to analyse the evolution of multiple families of phototransduction components was by Hisatomi and Tokunaga (2002), who concluded that the isoforms they found were likely to have duplicated after the ‘prototype’ vertebrate phototransduction cascade had arisen. Then, Nordström et al. (2004) in Uppsala undertook a major study of the gene duplications required to explain the separate isoforms found for nine families of proteins involved in vertebrate phototransduction. They concluded that each of those duplication events appeared to have involved large blocks of genes, and possibly entire chromosomes. Subsequent work from the Uppsala group has greatly extended our understanding of those block duplications (Larhammar et al., 2009; Lagman et al., 2013), especially with respect to the transducins and PDEs (Lagman et al., 2012, 2016). Recently, my colleagues and I have examined the evolution of each of the steps in phototransduction, grouping those steps as: activation (Lamb et al., 2016; Lamb and Hunt, 2017); recovery (Lamb et al., 2018b); and Ca-feedback (Lamb and Hunt, 2018). This review aims to draw together, and where possible to extend, all such analysis of the evolution of the overall cascade of vertebrate phototransduction.

2

Background to analysis of phototransduction cascade evolution

The following sub-sections are intended to provide background information for those who are not closely involved in studies of gene evolution, so as to make the subsequent presentations and analyses in Sections 3 to 7 more readily comprehensible to the non-specialist. The raw data for these analyses are the genes (and the entire genomes) of numerous living species, for which the number of adequately annotated assembled genomes is expanding rapidly. To study the evolution of one class of protein (for example, the cGMP phosphodiesterases), one can examine the similarity of the molecular sequences across a wide range of species, to obtain a molecular phylogeny that describes the apparent degree of relatedness of members of the family (Section 2.2). Such a molecular phylogeny will help elucidate the gene duplications that have occurred, and will provide a purported pattern of branchings for the species under consideration. Therefore, as a minimum, one needs to be cognisant of the generallyaccepted pattern of species branchings, as determined from numerous studies of species phylogeny (Section 2.1). In an ideal world, one would wish to reconcile the branching pattern extracted from molecular phylogenetic analysis with the ‘true’ pattern of species branching, and in some cases it is feasible to apply constraints aimed at eliminating serious discrepancies. For now, though, the important point is that one needs to begin with a reliable species phylogeny. It is arguable that the most important factor in the evolution of vertebrates was the occurrence of two rounds of whole-genome duplication (2R WGD) in a chordate ancestor of vertebrates, as originally proposed by Susumu Ohno (1970); see Section 2.3. This pair of duplication events generated a potential quadruplication of every original gene, and paved the way for enormous diversification because, for example, one of the encoded proteins might retain its ancestral function whereas another copy (or copies) might evolve new or altered functions. Accordingly, an important aspect regarding each of the duplication events reported by molecular phylogenetic analysis is to determine whether it occurred before, during, or after 2R WGD. In many cases, this determination is greatly assisted by examining gene synteny (Section 2.4); that is, by analysis of the locations along the chromosomes of genes of interest relative to other genes. Thus, it is often straightforward to identify a family of paralogs that arose through 2R WGD (socalled ‘ohnologs’, see Section 2.3), in the form of a set of 2, 3 or 4 genes located on chromosomal regions that are also occupied by other neighbouring sets of identified ohnologs. Despite the extensive relocation of genes that has occurred over hundreds of millions of years, there remain

5 distinctive ‘signature’ features of the ancestral arrangement of the four regions of the quadruplicated genome, and these provide important clues to the evolutionary history. Once a particular duplication has been identified as having resulted from 2R WGD, it is then straightforward to assign any other duplications in that gene family as having occurred either before or after 2R WGD.

2.1

Species phylogeny

As alluded to above, it is useful and sometimes imperative, in the interpretation of a molecular phylogeny, to consider the species phylogeny of those species whose sequences were analysed. A one-dimensional view of the divergence of other lineages from our own lineage is shown in Figure 2A. The horizontal blue line represents our direct ancestors, plotted along an axis of estimated (and very approximate) time in millions of years ago (Mya). Each blue circle denotes an important divergence of another lineage from our own ancestors; for example, sauropsids (comprising reptiles and birds) diverged from a common ancestor we share with them around 320 Mya. Each horizontally-oriented name just below the line denotes the name of the clade that encompasses all of the lineages to the right of the adjacent dotted vertical line; thus, from at least the time at which sauropsids diverged, our lineage (and theirs) can be referred to as amniotes, a term that encompasses all sauropsids and all mammals. Figure 2. Species phylogeny The small yellow and cyan markers after the branching of tunicates show the approximate timing of the postulated two rounds of whole-genome duplication (2R WGD) that are understood to have led to the potential quadruplication of genes in a chordate ancestor of vertebrates (i.e. in a ‘proto-vertebrate’); see Section 2.3. The absolute timing of this pair of duplication events is uncertain, but it occurred after the divergence of tunicates and before the divergence of cartilaginous fish (Putnam et al., 2008). Here this pair of duplications is assumed to have occurred prior to the divergence of agnathan vertebrates, and is shown as having occurred at around 600 Mya; however, other estimates place the duplications at around 500 Mya (Larhammar et al., 2009). A third round of genome duplication (3R) occurred subsequently in teleost fish, with the result that teleosts frequently retain two copies of each gene found in most other vertebrates. What is missing from the one-dimensional view in Figure 2A is a representation of the multitude of divergences that have occurred within lineages other than our own; instead only two cases of interest (teleosts and birds) are indicated in this linear plot. For those species whose molecular sequences are used later in this paper for the analysis of sequence phylogeny, Figure 2B shows the currently-accepted view of lineage evolution, that will be taken into consideration in interpreting the molecular trees. Note that this panel provides no indication of timing, and instead simply sketches the topology of the divergences of lineages. For more extensive information about species phylogeny and estimates of divergence times, the reader is referred to Erwin et al (2011), Kumar et al (2017) and web resources including the Tree of Life Web Project (http://tolweb.org/tree/phylogeny.html) and TimeTree (http://www.timetree.org/).

2.2

Molecular phylogeny

Three separate processes are involved in the creation of a molecular phylogeny: (i) selecting (or obtaining) the molecular sequences to be analysed; (ii) aligning those multiple sequences; and (iii) inferring the evolutionary branching pattern that is most likely to have generated the sequences.

6 Obtaining sequences for early-diverging vertebrate species. Although published databases contain sequences from numerous species, the current coverage of lineages is very non-uniform. In order to improve the prospects for reliable reconstruction of the branching pattern in early vertebrate evolution, it is important to include species from agnathan vertebrates, from cartilaginous fish, and from non-teleost ray-finned fish, but to date these groups have been poorly represented in published databases. Therefore, to help fill the gaps, Lamb et al. (2016) used highthroughput sequencing followed by bioinformatics analysis to obtain the eye transcriptomes of a hagfish, two species of lamprey, three species of shark, and two species of ray, all from Australian waters; in addition, they included two species of non-teleost ray-finned fish from the northern hemisphere (bowfin and Florida gar). In addition, new high quality genomes are being added to public databases at an accelerating rate, so that the range of genes that can be examined is continually expanding. This is likely to continually improve the quality of molecular phylogenetic analysis that one is able to achieve. Selecting an appropriate range of species. To provide a reasonably well-balanced coverage of species across the whole range of vertebrates, each of the phylogenies presented in this review includes (as far as possible) the following jawed vertebrate taxa: three placental mammals; three marsupials; three birds; three other sauropods (i.e. reptiles); two amphibians; coelacanth; two teleosts; two other ray-finned fish; three sharks, three rays and one chimaera. In many cases it is found that the phylogeny is both clear-cut and informative when the taxa examined are restricted to jawed vertebrates, but in several cases (e.g. the GNATs and PDE6s) it turns out to be more informative to additionally include lampreys (for which data are available from three or four species), and sometimes hagfish. However, the hagfish sequences are often found to be highly divergent, and often only a single species is available; in such cases hagfish will be omitted. Outgroup selection. It is often straightforward to identify a closely-related but distinct family (or families) of vertebrate genes that can serve as outgroup; for example, PDE5 and PDE11 in the case of PDE6. In such cases, a subset of half-a-dozen or so jawed vertebrate sequences can be chosen to form the outgroup. In other cases, as for example with the arrestins, a sufficiently closely-related family of vertebrate genes cannot be identified, and in such cases the outgroup will need to comprise related sequences from invertebrate taxa. Wherever possible, the most closely related sequences from tunicates (e.g. Ciona) and lancelets (e.g. Branchiostoma) will be included. Aligning the multiple sequences. It is important to obtain the best possible alignment, because an ‘incorrect’ alignment (or indeed any change in the alignment) will lead to a tree that may differ significantly. Yet there is no fool-proof approach, and nor is there a clear test of whether the alignment produced by one program is genuinely better than that produced by another. Therefore, it currently remains important to visually inspect the alignment and to look for obvious problems. For the phylogenies presented here, the entire alignment has been used, except in the case of the guanylyl cyclases (GCs), where the divergent terminal regions (both N- and C-termini) have been trimmed manually. The choice was made to analyse amino acid sequences rather than nucleotides. One practical reason for avoiding nucleotide sequences is the added complexity and uncertainty involved in aligning codon-based sequences. But another possibly more important reason is that the rapid rate of nucleotide substitutions combined with the long time-scale across the vertebrate branches means that the nucleotide changes become saturated (because of multiple substitutions); thus, amino acid sequences are preferable for deep branches. The alignment tool chosen was MAFFT v7.409 (Katoh and Standley, 2013) with its L-ins-i option. Inferring the phylogenetic tree. The phylogenetic tree presented in Figure 3 is an example for illustrative purposes: firstly, it helps to provide a view of the tree inference process and,

7 secondly, its is useful in interpreting the phylogeny that is obtained. This particular tree was obtained for vertebrate rod transducin alpha subunits (GNAT1s, Gαt1s) and has been extracted from the larger tree for GNATs and GNAIs in Figure 11. In essence, the tree inference process has placed each molecular sequence near its close relatives, in such a manner as to maximise the likelihood that the plotted tree represents the ‘true’ evolutionary tree. The process of inferring the maximum-likelihood (ML) tree is complicated, but well-studied; vast numbers of alternative branching patterns are examined during the process, and for each such tentative tree the likelihood of its occurrence is calculated. This calculation of likelihood is made in accordance with established models for the probability that, at each of the sites, one particular amino acid might be replaced by another as a result of mutation in the nucleotide sequence. The process of searching for the tree that exhibits the maximum likelihood has a substantial stochastic element, and on repeated trials it does not always yield the same outcome. Because the magnitudes of the estimated likelihoods are extremely small, their values are universally specified logarithmically, as ‘log likelihoods’. Figure 3. Example of molecular phylogeny (vertebrate rod transducins, GNAT1) The numbers adjacent to each node in Figure 3 are ‘estimated bootstrap probabilities’, that provide an indication of the percentage chance that the sequences included within each clade have been correctly placed there. Historically, these values have been calculated by a method termed ‘bootstrapping’ (Felsenstein, 1985) whereby sites in the alignment are randomly resampled (with replacement) to generate pseudoreplicates, and then the entire ML tree is re-calculated; because of its need for repetition, this process can be exceptionally time-consuming. Recently, though, an alternative approach, termed the ‘ultrafast bootstrap approximation’ (Hoang et al., 2018), has been developed, that is orders of magnitude faster, yet provides bootstrap estimates that appear to be more unbiased than those from the classical method. In part because of this speed advantage, and in part because of its thorough tree-searching algorithm, the tree inference tool chosen in this study was IQ-Tree (Nguyen et al., 2015). The protein substitution model used throughout this paper was WAG (Whelan and Goldman, 2001), but closely similar results were obtained using the LG model (Le and Gascuel, 2008). In most cases, this gave a robust phylogeny with high levels of bootstrap support for the major clades and nodes; however, in a few cases where bootstrap support levels were not very high, the calculations were re-run with inclusion of allowance for rate heterogeneity (using IQ-Tree’s option ‘WAG+R4’); use of this option is noted in the legends for Figure 10, Figure 13, and Figure 14. Interpreting the phylogenetic tree. The example phylogenetic tree in Figure 3 permits a number of interpretations. Firstly, as is also indicated by the collapsed tree in the inset, it shows that jawed vertebrate (i.e. gnathostome) GNAT1s form a clade with unanimous (100%) support, and that hagfish and lamprey GNAT1s likewise form (small) clades with unanimous support. Secondly, it shows that within the jawed vertebrate clade, there is a clear tendency for groupings into the main evolutionary lineages. On the other hand, the placement of some groups (e.g. amphibians relative to sauropsids in Figure 3) does not conform to the accepted position shown in Figure 2. However, the placement of those groups is associated with very low bootstrap support, of 44% and 53%, at the relevant nodes, suggesting that the amphibian and sauropsid sequences could be constrained to their expected position with very little change in log likelihood of the tree. That specific prediction has not been tested here, but other similar constraints on tree topology are indeed tested quantitatively in later sections. A third interpretation stems from the lengths of the branches leading to the jawed vertebrate clade, and to the hagfish and lamprey clades, of approaching 0.1 amino acid residues per site in each case. This observation indicates that these three clades had each evolved by a moderate amount from their common GNAT1 ancestor (that resulted from 2R WGD) prior to speciation within each group.

8 Constraints on tree topology. As a result of the stochastic nature of the mutation of bases in genes (and thence of residues in proteins), there is always a substantial component of ‘noise’ present in the molecular sequences being analysed, and this leads to uncertainty in the topology of the tree that is inferred. Typically, one finds that minor rearrangements of the tree (e.g. the swapping of neighbouring branches) causes very little change in the calculated log likelihood, and that there exists a ‘landscape’ of slightly different trees that exhibit very similar log likelihoods. In such cases, the estimated bootstrap support is typically quite low at the nodes where swapping has little effect. As a result, one needs to inspect the ML tree carefully, looking both for low support values and also for topologies that appear implausible; e.g. for inconsistency with the known species phylogeny, or for inconsistency with the assumptions of 2R WGD. It is often possible to apply a constraint to the topology, in order to eliminate that inconsistency, and then to recalculate the ML tree subject to that constraint (or constraints). If the constrained tree fits better with one’s presumptions, then what is absolutely crucial is to apply suitable tests of topology to ascertain whether the differences between the trees are due simply to chance. Specifically, one examines the null hypothesis that the constrained tree is just as likely as the unconstrained (ML) tree, and one applies tests of whether this hypothesis should be rejected at an appropriate criterion probability level. IQ-Tree provides three suitable tests via its ‘-z’ option: the Bootstrap Proportion test using the RELL method, giving bp-RELL (Kishino et al., 1990); the Expected Likelihood Weight test, giving c-ELW (Strimmer and Rambaut, 2002); and the Approximately Unbiased test, giving p-AU (Shimodaira, 2002). Only those trees that passed all three of these tests at the 95% confidence level (i.e. with p ≥ 0.05) were considered further. Summary. Overall, the kind of information that one can obtain from a phylogenetic tree for some protein in which one is interested includes: the number of isoforms of that protein/gene that existed in the ancestral vertebrate organism; the pattern of duplications that formed those ancestral genes from a common precursor; the timing of such duplications, relative to 2R WGD; the extent of change in protein composition that has occurred in those different genes, prior to and also subsequent to the radiation of vertebrates; and any lineages from which the gene has subsequently been lost. In this paper, I re-calculate the molecular phylogenies for the majority of the proteins involved in vertebrate phototransduction, and then discuss the interpretation of the gene duplications likely to have generated each component, with particular attention to the origin of rod versus cone dichotomy, where that exists.

2.3

Individual gene duplication versus whole genome duplication (WGD)

The occurrence that may reasonably be regarded as the most far-reaching event in the evolution of vertebrates was the quadruplication of the entire set of chordate genes, as a result of two successive rounds of whole genome duplication (2R WGD). This pair of events occurred after the divergence of tunicates, and most likely prior to the divergence of agnathan vertebrates from our own lineage (Figure 2). This quadruplication of the genome may have given early vertebrates a major advantage over other creatures in the Cambrian seas, and may well have permitted the subsequent radiation of vertebrate species and the great success that vertebrates have subsequently achieved. The first study aimed at reconstructing the chromosomal arrangement (karyotype) of the ancestral proto-vertebrate organism following 2R WGD was undertaken by Nakatani et al (2007). It provided a view of the very extensive chromosomal reorganisations that have subsequently occurred in different vertebrate lineages, and outlined the considerable difficulties in this kind of analysis. Numerous subsequent studies have extended that work, and interestingly Putnam et al. (2008) have shown that the lancelet genome may provide a good model for the ancestral chordate genome prior to WGD.

9 The precise timing of the two genome duplications can only be guessed at, but in Figure 2 they are suggested to have occurred around 600 Mya. From the difficulty there is in separating the first and second rounds of duplication in molecular phylogenies (see subsequent results), it seems likely that these two duplication events were separated by a relatively short interval (in evolutionary terms) that might not have exceeded, say, 10 My. A schematic diagram is presented in Figure 4 with examples of the kinds of events that would have occurred through the concatenation of three processes: local gene duplication, the pair of genome duplications, and the loss of various genes. The top row indicates a hypothetical initial situation in our chordate ancestor, prior to any genome duplication; the middle pair of rows shows the situation after the first genome duplication, and the bottom four rows show the situation after the second round of genome duplication, and include gene losses that occurred prior to the radiation of vertebrate species. Genes that arose through this process have been referred to as ‘ohnologs’, in honour of Susumu Ohno who proposed the genome duplication mechanism (Ohno, 1970). The vertically-arranged sets in the bottom four rows in Figure 4 can be termed ‘ohnolog families’. Figure 4. Schematic with examples of the combined effects of local gene duplication, wholegenome duplication, and gene loss In the example scenario in Figure 4, ohnolog families A and F are shown as having retained all four quadruplicate members; in the genomes of extant vertebrates, it is thought that around 200 such families remain (Singh et al., 2015), accounting for around 800 of the roughly 20,000 protein-coding genes. Ohnolog families B, C, D and E depict various ways in which only 3 or 2 members, or even just a single member, might have been retained. In the top row of Figure 4, the curved arrows indicate three example cases in which a local gene duplication had already occurred prior to attainment of the organisation of genes shown in that top row. The scenario for the pair of families (G, H) is likely to account for the observed arrangement of jawed vertebrate genes for the families GNAI and GNAT (see Section 4.1), where three copies of each are retained and where in each case the GNAT gene is adjacent to a GNAI gene. The final two examples, shown by the pairs of gene families (J, K) and (L, M) highlight a limitation in the analysis of phylogeny and synteny. After 2R WGD, the combined families each retain four genes; i.e. two J and two K in the first case, and one L and three M in the second case. As a result, it is very likely that each of these sets of four genes will appear to be ohnologs, especially if the local duplication in the top row occurred only shortly before 2R WGD; hence, on a strict interpretation these are not in fact ‘true’ ohnologs, though for certain purposes they may be regarded as such. One consequence of potential local gene duplications before, and of potential gene losses after, whole genome duplication is that it is entirely possible for a molecular phylogeny to be robust, and yet to show an unexpected relationship between proteins. It is equally possible for an incorrect pattern of branching to be deduced from a sound and convincing phylogeny. A specific case in point will be presented subsequently, where it will be suggested that the scenario in (J, K) may represent the situation for the four vertebrate arrestins (Section 5.2). Finally, it is worth pointing out that some lineages and some species have experienced additional genome duplications. In particular, teleost fish have undergone a third round (3R) of genome duplication, and therefore possess two copies of most of the genes found in other vertebrate species. In a very different case, Xenopus laevis is allotetraploid, meaning that it possesses around twice the number of chromosomes as diploid species, apparently as a result of the hybridisation of two distinct but related ancestral Xenopus species. Other examples of multiple copies of chromosomes and/or genes abound. However, the existence of additional copies of genes can complicate the analysis of molecular phylogeny and gene synteny, and as a

10 result it is often simplest to avoid teleost fish species, and similarly to use X. tropicalis rather than X. laevis.

2.4

Gene synteny

Chromosomal arrangement of phototransduction genes. Figure 5 shows the chromosomal positions, across four jawed vertebrate species, of a selection of 37 ohnolog families comprising 123 extant genes. In addition to the four families of phototransduction genes (G-protein β subunits, arrestins, visual GCs, and visual GRKs), I have included every set of additional ohnolog families that I could locate, that lay substantially on the same set of chromosomes as the illustrated phototransduction genes, subject to the restriction that each family should comprise at least three extant members. This involved laborious manual searching, using as a basis the sets of ohnolog families identified by Singh et al (2015), ohnologs.curie.fr); this procedure led to the identification of 33 such ohnolog families. The species that have been selected for examination are spotted gar, chicken, opossum and human, on the basis firstly that each of these genomes has been assembled into reasonably complete sets of chromosomes (i.e. relatively few genes remain on unplaced scaffolds), and secondly that none has undergone a third round of duplication (thus, teleost fish have been excluded). Figure 5. Synteny of a subset of phototransduction genes across four species Inspection of Figure 5 provides evidence that these 37 gene families form a paralogon (a paralogous chromosomal region derived from a common ancestral region). As a first indication, for each column almost all of the genes reside either on a single chromosome or else on just two chromosomes. To help illustrate this, coloured shading has been used to indicate those columns that include only a single chromosome. Furthermore, within each column, most of the genes are in reasonably close proximity to each other; for example, for the opossum column under ‘Ancestral 1’, all 23 genes reside within a span of 3.3 Mb on opossum chromosome 2. Overall, what is important is not just the finding of proximity, but additionally the fact that all 37 families conform to a broadly similar pattern of gene locations across the four species examined. As a result, this set of 37 gene families shows the hallmark features of having arisen during 2R WGD quadruplication of a single set of ancestral genes, once some allowance is made for rearrangement of genes in the subsequent hundreds of millions of years. On the other hand, one cannot rule out the possibility that local gene duplications and deletions, of the kind indicated in the right-hand sections of Figure 4, might have contributed. Such a high degree of conformity across the four species can only be expected over relatively short stretches on each chromosome, because rearrangements of genes within and between chromosomes have occurred differently in different species – so-called lineage-specific genome rearrangements. An example of two ‘breaks’ in chromosomal coverage can be seen in Figure 5, marked by the horizontal line between the GNBs and GPCs. (Because these particular breaks occur across all four species, and in both of what will subsequently be shown to be regions that arose at the second round of WGD, it is possible that this rearrangement originated in one of the two duplicates that existed during the interval between 1R and 2R.) In other cases, breaks can be restricted to a sub-set of the species examined. If such a break affected only opossum and human, then one might suspect a chromosomal rearrangement in a stem mammal; if it affected chicken, opossum and human, then one might suspect a rearrangement in a stem amniote; and so on. What cannot be determined simply from the arrangement of ohnologs in Figure 5 is which pairs of ancestral chromosomal regions diverged at the first round of genome duplication (1R). To establish this, one additionally needs information about gene phylogeny, as will be examined shortly.

11 Gene synteny for multiple phototransduction genes. As a first step in analysing the syntenic arrangement of phototransduction genes, the four principal columns from Figure 5 (labelled ‘Ancestral 1’ – ‘Ancestral 4’) have been converted into the four rows of Figure 6B; however, for compactness, the non-phototransduction families have been restricted to those that retain all four members and that appear to be ‘genuine’ quartets (see below). Figure 6B includes four families of phototransduction genes together with seven families of nearby ohnolog ‘quartets’. The other three panels in Figure 6 likewise present additional regions that contain one or more families of phototransduction genes together with their nearby ohnolog quartets. This gives a total of 13 families of phototransduction genes (comprising 35 phototransduction genes), together with 26 non-phototransduction ohnolog quartets (comprising another 104 genes). Figure 6. Overview of syntenic arrangement of vertebrate phototransduction genes Although the tabulation of gene locations in Figure 5 is provided only for panel B of Figure 6, the genes in each of the other panels are likewise located in close proximity to each other (data not shown). Accordingly, each of the five groupings in Figure 6 is likely to represent a locally paralogous region (i.e. paralogon); note that there are two such groupings in panel C. Furthermore, analysis of the gene locations in the four taxa provides suggestive evidence that the rows numbered 1 to 4 may be continuous across all three panels; in other words, it may be the case that each of the four numbered rows continues across each of the four panels. If this interpretation proves to be correct, then the arrangement in Figure 6 would portray a single large paralogon, that would include 35 phototransduction genes along with hundreds of non-phototransduction genes (of which only 104 are shown). Pairs of rows that diverged at 1R. What has not been determined up to this point is which pair of the four rows diverged from which other pair at the first round of genome duplication (1R). Potentially, this question can be resolved by phylogenetic analysis, as will now be addressed. In undertaking such analysis, it is important to concentrate on ‘genuine ohnolog quartets’ that retain all four members and that show no signs of intrusion of invertebrate sequences, or other reasons for rejection. It is because of this importance that only apparently ‘genuine’ quartet families are illustrated in Figure 6. Then, for every genuine quartet of ohnologs, one can calculate the molecular phylogeny for that quartet and, in principle, determine which pairs of rows are sisters in that local vicinity. In Figure 5 there are 10 families that each comprise four members, but for the purposes of obtaining genuine quartets, three of those families were rejected: the STAGs had protostome sequences embedded (in the Ensembl98 gene tree); the SOX19 family was found only in some species of bony fish, and it had an additional intron; and the TSC22D family was problematical to analyse because it had huge differences in sequence length between clades, and because its TSC22D4 clade only had convincing members within mammals. Accordingly, only the seven remaining quartet families (indicated by grey shading in Figure 5) were transferred to Figure 6B, and used in the analysis of phylogeny. The analysis of quartet phylogeny is illustrated in Figure 7 for eight examples of ohnolog quartets taken from Figure 6; two quartets have been taken from each of the four panels, A–D. In each of these eight unconstrained molecular phylogenies there is at least 98% support for the illustrated topology. For every one of the 26 ohnolog quartets shown in Figure 6 the phylogenetic analysis supported the pairings indicated by the grey links between genes; for 17 of these quartets the level of support was at least 99%, for another six it was at least 95%, and for the remaining three it was 94%, 92% and 92% (see Supplementary Figure 13). Accordingly, in light of the high support across multiple quartets near each phototransduction gene family, there is extremely strong support for the pairings of phototransduction genes depicted in Figure 6. Note that the only thing that each phylogeny establishes is which pairs are sisters; i.e. the phylogeny cannot allocate

12 the positions of the clades onto rows in Figure 6; that allocation needs to be accomplished by reference to gene synteny relationships of the kind shown in Figure 5. Figure 7. Molecular phylogenies for eight examples of ohnolog quartets Interestingly, the pairings depicted in panel B differ from the interpretation of Lamb and Hunt (2018), where the assumption was made that the sister relationship of the visual arrestins (Arr-C = ARRC and ARR-S = SAG) in the molecular phylogeny implied that those two clades had separated at the second round of WGD. However, the overwhelming evidence of the pairings of the seven ohnolog quartets in the vicinity of phototransduction genes in Figure 6B requires a revision of that interpretation, with the conclusion that Arr-C and Arr-S diverged at 1R, and with the implication that the visual arrestins and the β-arrestins diverged from each other prior to WGD. Likewise, GRK1A and GRK1B must have diverged at 1R, with the implication that GRK7 and the GRK1s also diverged from each other prior to WGD. Importantly, the pairings of chromosomal rows illustrated in Figure 6 will provide an important baseline for interpreting each of the molecular phylogenies for phototransduction genes presented subsequently in this paper. It is noteworthy that the 26 quartets of reference ohnolog families shown in Figure 6 represent more than 10% of the entire complement of ohnolog quartets in the genome. Thus, it has been estimated that there are around 200 four-member ohnolog families (~800 genes) amongst the total of around 2000 ohnolog families contained in the genomes of vertebrates (Singh et al., 2015). Accordingly, if such quartets are distributed randomly in the genome, and if Figure 6 has sampled all such quartets in the vicinity of these phototransduction genes (which it may not have done), then this might suggest that prior to 2R WGD the 13 families of phototransduction genes had been situated in a restricted region comprising just 10% of the ancestral genome. Future extension of the analysis of phototransduction gene synteny. Finally, with regard to gene synteny, it is worth outlining the kinds of methodological approaches that will be required to conduct more comprehensive analyses of gene synteny in the vicinity of phototransduction genes. Ideally, one would wish to find every family of ohnologs that: (i) retains either three or four members; (ii) has all extant members in close proximity to phototransduction genes in the reference taxa; (iii) shows no sign of having invertebrate paralogs within the phylogeny for the family; and (iv) has members present across a wide range of vertebrate species. For analysis of this kind, one very useful resource is Ohnologs v2 (Singh et al., 2015; Singh and Isambert, 2019); ohnologs.curie.fr), which provides browsable and downloadable lists of probable ohnologs from a range of species. Nevertheless, the primary resource for analysis of synteny is provided by Ensembl (Herrero et al., 2016); ensembl.org), where one can browse for genes by species, and then inspect Ensembl’s gene phylogeny. One can readily examine paralogs in the selected species using the viewing option ‘View paralogues of current gene’, and it is then usually straightforward to see whether the chosen family is indeed promising as a candidate set of ohnologs. For any set of genes to be considered as ohnologs (i.e. as being ‘2R WGD paralogous’), one should require the Ensembl gene tree not to show any invertebrate taxa (e.g. Ciona or any protostome species) embedded within the set. However, as the Ensembl gene tree is constructed from a limited number of taxa, and as estimates of node support levels are not provided, and also as the placement of invertebrate sequences has in some cases changed in subsequent releases of Ensembl, one needs to be careful in one’s interpretation of whether an apparently intervening branch (typically for Ciona) is genuine or spurious. Thus, detailed analysis is in practice essential in determining whether a candidate gene family can be regarded as a genuine set of ohnologs. The potential intrusion of invertebrate sequences may have only limited impact in a tabulation such as in Figure 5, but it is crucially important to avoid such intrusion in establishing ‘genuine quartet’ families for phylogenetic analysis of the kind shown in Figure 7.

13 At the time of writing, there is a significant issue with Ensembl, that was introduced in Release 94 and that remains in the current Release 98. A major change was made to the approach for inferring orthologs and paralogs, with the result that large gene families have been split into multiple smaller ones, so that many paralogy relationships have been lost. Therefore, as of October 2019, there are advantages to the use of Release 93 (from July 2018). Analysis of synteny is currently restricted by the limited number of vertebrate taxa for which the genome assembly is nearly-complete; i.e. where there are relatively few genes on unplaced scaffolds. Four species for which the assembly is already substantially complete are those listed in Figure 5: human, opossum, chicken and spotted gar. In addition, anole (Anolis carolinensis) and xenopus (Xenopus tropicalis) currently have quite good coverage, though substantial regions remain as scaffolds. A second non-teleost ray-finned fish, for which a substantially complete assembly has recently become available is the reedfish (Erpetoichthys calabaricus), and this species is now included in Ensembl Release 98. An additional important resource is Genomicus (Muffato et al., 2010); genomicus.fr), to which a direct link is provided from Ensembl for each gene. There it is straightforward to view the paralogy of adjacent genes, so as to manually search for potential neighbouring ohnologs. Another useful resource is Synteny Database (syntenydb.uoregon.edu), though this has not been updated since 2015. For the future, it will be important to develop an automated system for locating potential ohnolog families that lie in the vicinity of the families of phototransduction genes shown in Figure 6, and additionally in those regions that could convincingly link the four panels together into a single unified paralogon. Ensembl’s BioMart facility can be used to download the gene locations and the orthology relationships for species of interest, so that in principle it should be straightforward to automate a search of those candidate ohnolog families identified in Ohnologs v2 (Singh et al., 2015; Singh and Isambert, 2019) to find all those sets that lie close to phototransduction genes, when viewed across multiple species. Such a system would also be useful in attempting to tie into that same paralogon the two families of phototransduction genes that have so far eluded integration: the CNGB1/3 genes, and the GNGT1/2 genes.

3

Evolution of G-proteins and origin of the proto-vertebrate phototransduction cascade

Before investigating the evolution of each of the components of the vertebrate phototransduction cascade, it will be helpful firstly to examine the evolution of G-proteins, and then secondly to consider how it was that a phototransduction cascade utilising transducin and PDE6 came to be present in a chordate organism. Thereafter, it will be possible to examine the individual steps with this overview in mind.

3.1

Overview of G-protein evolution

G-protein alpha subunits are classified into five primary families: Gαs, Gαq, Gαi, Gα12 and Gαv. The Gαi family includes Gαo and Gαt (transducin), as well as Gαi itself, and different members of these three clades are utilised for phototransduction in ciliary photoreceptors across invertebrate as well as vertebrate taxa (reviewed in Terakita et al. (2012)). The Gαq family includes Gα11, Gα14 and Gα15; members of the Gαq family are utilised for phototransduction in protostome rhabdomeric photoreceptors as well as in melanopsin-expressing ipRGCs (intrinsically photosensitive retinal ganglion cells) of vertebrates.

14 The evolution of these Gα genes has been investigated by Lagman et al (2012), Lamb et al (2016) and Lokits et al. (2018), and the updated model proposed in the last of these studies is reproduced here as Figure 8. In this diagram, the prefix ‘pre’ denotes genes predating 2R WGD, and throughout the diagram ‘Gα’ has been omitted, so that (for example) ‘T2’ denotes Gαt2. Working from left to right, this scheme depicts a very ancient duplication generating the Gαs and Gαi/q families, followed by another pre-metazoan duplication generating the Gαq and Gαi families, and a third duplication to form the Gαo branch (preO). During metazoan (animal) evolution, a tandem duplication of the Gαi gene (preI) occurred, forming preI' and preI", which both subsequently quadruplicated during 2R WGD to generate the Gαi and Gαt isoforms of the vertebrate lineage. Importantly, the three surviving pairs of quadruplicated genes in jawed vertebrates have remained adjacent to each other in numerous taxa, as GNAI1-GNAT3, GNAI2-GNAT1, and GNAI3-GNAT2 (see bottom right of Figure 8). Figure 8. Evolution of G-protein alpha subunits, as proposed by Lokits et al (2018)

3.2

Origin of the proto-vertebrate phototransduction cascade

Lamb and Hunt (2017) proposed a scenario for the origin of the proto-vertebrate (i.e. pre2R WGD) phototransduction cascade, based on successive modifications of a postulated ancestral deuterostome phototransduction cascade. A slightly revised form of that proposal is set out in Figure 9. Ancestral cascade. The postulated ancestral form of the phototransduction cascade is shown in Figure 9A, based on a combination of analogy to the cone/rod cascade, together with the limited information that exists in relation to extant invertebrate deuterostome ciliary photoreceptors. The following steps are proposed to have occurred. Upon absorption of light, the activated ciliary opsin (R*) activated an inhibitory G-protein (Gi), the activated alpha subunit of which (Gαi*) then inhibited adenylyl cyclase (AC), which in darkness had been synthesising cyclic AMP (cAMP). A cyclic nucleotide phosphodiesterase (PDE), possibly the common ancestor of vertebrate PDE5/6/11, hydrolysed cytoplasmic cAMP in a manner that was not lightdependent. In darkness, the high cytoplasmic concentration of cAMP caused cyclic nucleotidegated channels (CNGC) to open, while in light the decreased AC activity lowered the cAMP concentration, leading to channel closure. In many respects this postulated ancestral cascade resembles an inhibitory version of the canonical transduction cascade of vertebrate olfactory receptor cells. Figure 9. Postulated origin of the proto-vertebrate phototransduction cascade The basis for this proposal, and especially for the idea that the ancestral cascade utilised cAMP rather than cGMP as the cytoplasmic messenger, included consideration of the following pieces of circumstantial evidence. (i) The G-protein alpha subunit employed in cones and rods, Gtα, arose as a result of an ancient duplication of the gene for an inhibitory subunit, Gαi. (ii) Ciliary photoreceptors of extant lancelets express a C-opsin together with a Gαi (Vopalensky et al., 2012). (iii) The catalytic subunits of the vertebrate photoreceptor phosphodiesterase, PDE6, are close relatives of PDE5 and PDE11, the latter of which is a dual cAMP/cGMP phosphodiesterase. (iv) Cyclic nucleotide-gated channels are typically responsive to both cAMP and cGMP, though with differing binding constants. (v) In measurements of electrical responses from tunicate photoreceptor cells, the membrane conductance was found to decrease (Gorman et al., 1971), just as occurs in vertebrate photoreceptors, and consistent with closure of ion channels. A potential limitation of this ancestral system may have been that at high light intensities (e.g. daylight), the operation of the cascade would have been saturated, with the cyclase activity extremely low, and the concentration of cAMP consequently too low to cause appreciable opening of channels. This saturation might have been alleviated to some extent if the PDE activity had

15 been inhibited by another molecule, possibly on a diurnal cycle. It was suggested that this inhibitory molecule might have been the ancestral PDEγ, and its emergence in the scenario is indicated by [γ] in Figure 9A. Transition to proto-vertebrate cascade. Subsequently, the gene for the G-protein alpha subunit underwent a tandem duplication, to form Gαi' and Gαi'' (in notation similar that of Lokits et al. (2018); see Figure 8). As indicated in Figure 9B, it is now proposed that both of these alpha subunits continued to be expressed in the photoreceptor cell. One of these (Gαi') is assumed to have retained its original function, whereas the other (Gαi'') may have evolved an interaction with the proto-PDEγ, as indicated by the red arrow. Specifically, if there was any degree of affinity between those two molecules, then the extent of inhibition of the PDE may have declined. Such a reduction in inhibition would have amounted to a light-induced activation of the PDE, and would have reinforced the effect of the light-induced reduction in AC activity in lowering cAMP concentration, thereby increasing the size of the light-induced effect and presumably providing an advantage to the organism. With subsequent mutations, it is possible that this newer mechanism became more effective than the older mechanism, in which case the interaction between Gαi' and AC would have been of little use, and that older pathway may have declined in importance and eventually ceased to function, as indicated in Figure 9C. In parallel with these changes, there could have been a transition from cAMP to cGMP as the dominant form of cytoplasmic messenger. If a guanylyl cyclase were expressed in the cell (Figure 9B), then the cGMP that was synthesised may have functioned just as effectively as cAMP, because the PDE quite probably hydrolysed both molecules, and because the channels quite probably bound both. Then, if the GC happened to have some other advantage over the AC, for example through more effective Ca-feedback regulation, then there would have been no reason for the AC to continue to contribute, and it may simply have ceased to be expressed. At that stage (Figure 9C), the cascade would effectively have completed its transition to the proto-vertebrate form; the G-protein and the PDE that then existed could therefore be designated as Gαt and PDE6, respectively. While the scenario described above is entirely hypothetical, it nevertheless provides a plausible framework on which to hang ideas, and it allows the formulation of tests of the hypothesis. Perhaps the first such test would be to examine whether the photoreceptors of any extant deuterostomes utilise adenylyl cyclase. But, irrespective of the origin of the protovertebrate transduction cascade, it remains important to examine how the genes encoding those various proteins evolved in the early vertebrate lineage. The following sections delve into the evolution of each of the individual steps, and identify those changes that occurred before, during, and after 2R WGD.

4

Evolution of the activation steps of vertebrate phototransduction

The evolution of those protein families mediating the activation steps of vertebrate phototransduction will now be examined separately for: the transducins (Sections 4.1, 4.2); the cGMP phosphodiesterases (Section 4.3); and the cyclic nucleotide gated channels (Section 4.4). Much of this analysis will relate to the manner in which the genes encoding those proteins expanded during 2R WGD.

4.1

Transducin alpha subunits (GNAT1–3)

Background. As mentioned above, jawed vertebrates possess three genes for transducin alpha subunits (Gαt), and analysis of phylogeny and gene synteny has shown that these arose during 2R WGD (Nordström et al., 2004; Larhammar et al., 2009; Lagman et al., 2012). GNAT1,

16 encoding rod transducin, Gαt1, is used in rod photoreceptors; GNAT2, encoding cone transducin, Gαt2, is used in cone photoreceptors; and GNAT3, encoding gustducin, Gαt3, is used in parietal eye photoreceptors and in some taste receptor cells. Across most vertebrate species, each of these genes is located in close proximity to one of three GNAI genes, encoding the alpha subunit Gαi of an inhibitory G-protein, consistent with the concept that GNAI and GNAT arose by tandem duplication of an ancestral gene prior to 2R WGD, as shown schematically in Figure 8. Molecular phylogeny. The molecular phylogeny of vertebrate Gαt and Gαi subunits has previously been examined (Nordström et al., 2004; Larhammar et al., 2009; Lagman et al., 2012; Lamb et al., 2016; Lamb and Hunt, 2017; Lokits et al., 2018). Here, an updated analysis is presented in Figure 10A. The phylogeny is shown as a collapsed tree, calculated for a large family of vertebrate GNATs and GNAIs, using as outgroup a smaller selection of vertebrate GNAQs/GNA11s/GNA14s, and also including a selection of GNAOs. To put this tree into context, it is helpful to note that the uppermost three blue clades (GNAT1s) correspond exactly to the whole of the GNAT1 sub-tree in Figure 3. Figure 10. G-protein alpha subunits (Gαt, Gαi) For the GNATs, each of the six coloured clades has bootstrap support of at least 99%. In addition, the branching pattern for the GNATs generally has very high support, apart from the two nodes around GNAT3, which have only 63% and 87% support. Ordinarily, this would be an insufficient level of support to give one much confidence in the illustrated branching pattern. However, the three GNAI-GNAT pairings provide crucial additional information. Thus, GNAI1 and GNAI3 are supported unanimously as sister clades, and as a result we can have confidence that their syntenic neighbours, GNAT2 and GNAT3, are likewise sisters. Accordingly, the illustrated phylogeny provides strong support for the 1R and 2R duplications indicated by the yellow and cyan highlights, respectively, in Figure 10A. Additional powerful support comes from the pairings of the nearby quartet genes (PLXNAs, GRMs and TFs) as shown previously in Figure 6 and Figure 7. Deduced gene duplications and losses. The most parsimonious interpretation of the branchings in Figure 10A is indicated by the highlighted ‘1R’ and ‘2R’ annotations; the corresponding gene duplications and losses that are assumed to have given rise to vertebrate GNATs (and GNAIs) are indicated explicitly in Figure 10B. Prior to 2R WGD, an ancient Gαi gene underwent a tandem duplication, forming the adjacent genes GNAI and GNAT. During the course of 2R WGD, both these genes quadruplicated, and they remained adjacent. However, after the second round, on one chromosome (row 3) jawed vertebrates lost both members, GNAI4 and GNAT4, whereas Lokits et al (2018) report that agnathans lost only GNAT4; in addition, agnathans lost GNAT2 on another chromosome (row 1). However, it must be emphasised that gene synteny for agnathan species has not been analysed here, and so any conclusions about agnathan gene duplications and losses remain very preliminary. The arrangement of jawed vertebrate genes onto the four rows in Figure 10B conforms with the summary diagram of gene synteny in Figure 6D. Evolution of the proto-vertebrate GNAT. In Figure 10A, the limb labelled ‘GNAT (= preI'')’ is long, and corresponds to ~0.3 amino acid residue substitutions per site. This indicates that the proto-vertebrate GNAT underwent very substantial evolution prior to genome duplication. As will be shown in Section 4.3, this change appears to have occurred contemporaneously with both a substantial evolution of the PDE catalytic subunit and with the appearance of the PDE inhibitory subunit.

4.2

G-protein beta subunits (GNB1–4)

The molecular phylogeny of the four G-protein beta subunits Gβ1–Gβ4, that are encoded by GNB1–GNB4, was examined by Lagman et al. (2016), and the topology presented in their Fig.

17 2 is confirmed here in the updated phylogeny of Figure 11A, and with higher levels of bootstrap support. The four vertebrate clades are each supported at a bootstrap level of at least 97%, and they clade together with unanimous support. On the other hand, there is only 89% support for the pairing of GNB2 with GNB4. Figure 11. G-protein beta subunits (GNB1–4) The syntenic arrangement of the four genes was also examined by Lagman et al. (2012), and was subsequently investigated by Lamb & Hunt (2018). The latter study proposed that GNB1 and GNB3 diverged at 1R on the basis of the presumed pairings of rows shown in their Figure 1. However, from the new evidence provided here in Figure 6 and Figure 7, it seems clear that GNB1 and GNB3 instead diverged at the second round, as is indicated in both panels of Figure 11. Accordingly, it now appears that the four vertebrate GNB genes arose via the simplest form of WGD quadruplication, without prior local duplication and without loss of any genes. The G-protein gamma subunits will not be considered here because, as discussed by Lagman et al. (2012), the shortness of their sequences (around 70 residues), in conjunction with their relatively high degree of sequence identity, means that there is insufficient phylogenetic information available to allow meaningful conclusions to be drawn. However, phylogenetic analysis of nearby ohnolog quartets (including those containing COL1A2, ITGA8, NFE2L3, CACNB2) shows that GNGT1 and GNGT2 diverged at the second round of WGD (not shown).

4.3

PDE catalytic subunits (PDE6A,B,C)

Background. Jawed vertebrates possess three PDE6 genes encoding catalytic PDE subunits, and analysis of synteny has again shown that these arose during 2R WGD (Nordström et al., 2004; Larhammar et al., 2009; Lagman et al., 2016). PDE6A and PDE6B encode α and β subunits, that together form a heterodimeric catalytic unit in rod photoreceptors, whereas PDE6C encodes the α' subunit that forms a homodimeric catalytic unit in cone photoreceptors. The ancestral vertebrate PDE6 gene originated by duplication from a common ancestor gene for PDE5, PDE6 and PDE11. No gene that clades with the PDE6s has been found outside of vertebrates. Molecular phylogeny. The molecular phylogeny of the PDE6s has previously been investigated (Nordström et al., 2004; Larhammar et al., 2009; Lagman et al., 2016; Lamb et al., 2016; Lamb and Hunt, 2017), and an updated analysis is presented in Figure 12A. This unconstrained tree was obtained using a set of 63 vertebrate PDE6 sequences, together with the only two invertebrate sequences that could be found to exhibit moderate similarity (from Ciona), and with 14 vertebrate PDE5A and PDE11 sequences making up the outgroup. The unconstrained tree inference process placed PDE6A and PDE6B as sisters, with unanimous support, and jointly as sisters to the other PDE6s comprising PDE6C together with PDE6X. Figure 12. PDE catalytic and inhibitory subunits (PDE6s, PDE6γs) For cyclic nucleotide PDEs, the only known case where the catalytic domain comprises a heterodimer occurs in jawed vertebrate rods, with PDE6A + PDE6B; in all other PDEs the catalytic site is formed by a pair of identical subunits; i.e. a homodimer. PDE6C is the only PDE expressed in jawed vertebrate cones. In addition, it is the only PDE gene found in the lamprey Mordacia mordax, a species that has only a single class of cone-like photoreceptors (Collin et al., 2004), and that expresses LWS as its only opsin (Lamb et al., 2016). Furthermore, in hagfish, (which have only a single class of rod-like photoreceptors), there is only a single isoform, PDE6X. Taken together, these observations led Lamb et al. (2016) to propose that agnathan rod-like photoreceptors employ PDE6X as a homodimer, and that lamprey cone-like photoreceptors employ PDE6C as a homodimer.

18 Gene duplications. It has previously been established that the PDE6A/B/C genes reside in a paralogon (Nordström et al., 2004), though it was not possible to link that region to other regions containing phototransduction genes. However, analysis of gene synteny results similar to those in Figure 5 shows that PDE6A and PDE6B are located close to ohnolog quartets that diverged at the second round (e.g. the GABRAs, FGFRs and PSDs), and also close to the CNGA family (see Figure 6C). Hence, the gene synteny results and the phylogenetic results both indicate that the most parsimonious way of accounting for the observed molecular phylogeny is as indicated by the highlighted ‘1R’ and ‘2R’s in Figure 12A, and as shown explicitly in Figure 12B. Prior to 2R WGD, a PDE6 gene had already arisen, by duplication from a common ancestral PDE5/6/11 gene. As shown by the phylogeny in A, the limb of the PDE6 branch is long, at ~0.4 residue substitutions per site, indicating that extensive evolution had occurred after that previous duplication but prior to 2R WGD. Then, as a result of 2R WGD, the gene quadruplicated. Agnathan vertebrates lost two isoforms, retaining only PDE6X and PDE6C, which both continued to function as homodimers. Jawed vertebrates lost PDE6X, while the PDE6A and PDE6B evolved to function as a heterodimer in rods. Sauropsids (birds and reptiles) subsequently lost the PDE6A gene and as a result their rods are presumed to utilise the PDE6B as a homodimer.

4.4

PDE inhibitory subunits (PDE6G,H,I)

Background. PDE6 is unique amongst PDEs in being regulated by an inhibitory γ-subunit, PDEγ, a short peptide of 80–90 residues. These γ-subunits play a crucial role in phototransduction, by inhibiting the PDE’s catalytic activity in the resting state, but by permitting activated transducin to bind, thereby relieving the inhibition and activating the hydrolysis of cGMP. Additionally, the PDEγ plays a role in accelerating the GTPase activity of activated transducin, and hence in speeding the shut-off reactions (see Section 5.3). Classically, two PDEγ isoforms have been described, encoded by PDE6G in rods, and by PDE6H in cones. However, Lagman et al. (2016) clearly established that a third isoform, encoded by PDE6I, is expressed in a number of species, and they showed that the three genes arose through expansion during 2R WGD. To date, there have been no reports of homologous sequences outside of vertebrates; thus, it seems that a PDEγ somehow just “appeared” during proto-vertebrate evolution. On the other hand, the motif on the PDE catalytic subunits to which the γ-subunits bind appears to have existed in the ancestral PDE catalytic sequence long before the emergence of chordates (Zhang and Artemyev, 2010). Thus, reconstruction of the probable sequence of the ancestral PDE5/6/11 enzyme indicated that it contained the signature ‘Ile-Pro-Met’ (IPM) motif where the C-terminus of modern PDEγs bind. It has been noted that the C-terminus of the long form of RGS9 (RGS9-L) plays a similar role to PDEγ in accelerating the GTPase activity of activated G-protein α-subunits. Taken together with the finding that the genes for RGS9 and PDE6G are located on the same chromosome, and in moderate proximity to each other, Martemyanov et al. (2008) proposed that PDEγ may have originated from RGS9, or vice versa. This possibility will be explored below. Molecular phylogeny. As has previously been reported (Lagman et al., 2016; Lamb et al., 2016), the sequences for PDEγ inhibitory subunits are so short (<90 residues), and so highly conserved, that it has not been possible to obtain particularly informative molecular phylogenies. Nevertheless, from sequence alignments, Wang et al. (2019) have recently revealed several sites that differ characteristically between cone and rod isoforms, as well as several sites that are very tightly conserved in rod isoforms. Gene synteny. Lagman et al. (2016) showed that the PDE6G/H/I genes reside within a paralogon that they had previously characterised as containing the somatostatin receptor genes SSTR2/3/5 and the urotensin receptor genes (Ocampo Daza et al., 2012; Tostivint et al., 2014). The genes in the vicinity of PDE6G/H/I were also recently examined by Lamb and Hunt (2018) in

19 their Fig. 2, and the syntenic arrangement of the phototransduction genes in this region is summarised in Figure 6A. Importantly, that analysis shows the mutual proximity of the three ohnolog families encoding RGS9/11, the PDEγs, and visinin/recoverin. Of particular note is the observation that RGS11 and the newly discovered PDE6I gene are immediately adjacent on the same strand in four species where they both exist; namely, in spotted gar (L. oculatus), reedfish (Erpetoichthys calabaricus), xenopus (X. tropicalis) and nanorana (N. parkeri). For example, in spotted gar, RGS11 and PDE6I reside on the forward strand of chromosome LG13 at start positions of 12.108 and 12.132 Mb, respectively. Origin of the inhibitory subunits. Previously (prior to the discovery of the third isoform, PDE6I), Martemyanov et al. (2008) had noted the relative proximity of the RGS9 and PDE6G genes; e.g. on human chromosome 17, where they are ~16 Mb apart. Combining this with the knowledge that the PDE inhibitory subunits play a comparable role to the C-terminus of the long isoform of RGS9 (termed RGS9-L or RGS9-2) in aiding the acceleration of GTPase activity, they speculated that one of these genes may have evolved from the other. Given that RGS11 and PDE6I have now been found to be arranged head-to-tail in four taxa, it would seem almost certain that the pre-2R arrangement likewise placed the ancestral RGS9/11 and PDE6G/H/I genes in the same configuration. Accordingly, as indicated by the curved arrow in Figure 12C, it seems highly plausible that the ancestral PDE inhibitory gene originated from a local duplication of the tail of the ancestral RGS9/11 gene. This duplication probably occurred after tunicates had diverged from our own lineage.

4.5

Cyclic nucleotide gated channels (CNGA1–4, CNGB1,3)

Background. The cyclic nucleotide-gated ion channel of jawed vertebrate photoreceptors and olfactory receptor neurons is a heterotetramer comprising α and β subunits encoded by CNGA and CNGB genes, both classes of which expanded during 2R (Nordström et al., 2004). Rod channels comprise three α1 subunits and one β1 subunit, encoded respectively by CNGA1 and CNGB1 genes, whereas cone channels comprise two α3 and two β3 subunits, encoded by CNGA3 and CNGB3. The channels of canonical olfactory receptor neurons comprise two α2, one α4, and one β1 subunit. The duplication that gave rise to the α and β branches was very ancient, and occurred before the protostome-deuterostome split. Furthermore, three lines of evidence make it clear that CNGA4 (which is expressed only in olfactory receptor neurons) also diverged from the other CNGA genes prior to the protostome-deuterostome split. The first such line was the observation by Kaupp and Seifert (2002) that CNGA4 has three additional introns in its C-terminal region; the second came from inspection of the syntenic arrangement for the CNGA genes reported by Nordström et al. (2004) in their Fig. 6b; and the third came from the basal branching position of CNGA4 in the molecular phylogeny (see below). Subsequently, during 2R WGD, the other α branch (denoted here as αQ) and the β branch both expanded, giving rise to three and two extant isoforms, respectively. Molecular phylogeny. Two recent examinations of molecular phylogeny (Lamb et al., 2016; Lamb and Hunt, 2017) came to slightly different conclusions about the origin of the three αQ isoforms during 2R WGD, with two alternative positions for CNGA1: either as sister to the pair CNGA2 + CNGA3 or instead as sister to CNGA3. Upon further examination here, the former proposal appears to be the correct one. Figure 13 presents the molecular phylogeny obtained for a set of 152 vertebrate CNGC sequences, with seven invertebrate sequences, and with the four human HCN sequences as outgroup. α subunits. The unconstrained maximum likelihood (ML) topology is shown in Figure 13A, placing CNGA1 as sister to CNGA2 + CNGA3, and this topology is supported at a level of

20 95%. When instead CNGA1 was constrained to be sister to CNGA3, that topology could not be ruled out on the basis of phylogeny alone, as the change in log likelihood was only ∆LogL = 6.7, and the constrained tree passed all three tests of topology, with the approximately unbiased probability being p-AU = 0.3 (well above the rejection level). However, that topology was rejected by the analysis of synteny shown in Figure 6. Figure 13. Cyclic nucleotide gated channel subunits (CNGCα, CNGCβ) Gene synteny for α subunits. As indicated in Figure 6C, the phylogeny of the nearby ohnolog quartets (e.g. SPRYs and TyrKs) shows that CNGA2 and CNGA3 are sisters. Furthermore, the CNGAQ family resides close to the arrestins, visual GRKs and visual GCs (with CNGA2 just 2.3 Mb from Arr3 and CNGA3 less than 6 Mb from GRK1A in spotted gar), even though those families have been placed in the previous panel, Figure 6B. Hence the combination of molecular phylogeny and gene synteny provides powerful evidence for the gene duplication topology shown in Figure 13B. β subunits. For the β subunits, each of the four clades (two agnathan and two jawed vertebrate) is supported unanimously, as is the split between the β1 and β3 divisions. For the CNGB3s, the lamprey and jawed vertebrate clades group together with 97% support, whereas for the CNGB1s support for the agnathan and jawed vertebrate clades grouping together is somewhat lower, at 85%, possibly because of the inclusion of a single hagfish sequence. It is clear that the β1 and β3 branches diverged during 2R WGD, and analysis of the phylogeny of nearby ohnolog quartets (e.g. NDRG1–4, RRAD; not shown) shows that they diverged at 1R, as indicated in Figure 13B. Summary of CNGC evolution. The origin of CNG channel genes is summarised in Figure 13B. The α and β branches of the family arose anciently, probably long before protostomes diverged from our own lineage around 750 Mya. In the α branch, another duplication prior to the divergence of protostomes generated the α4 and αQ genes. Subsequently, during 2R WGD in the proto-vertebrate lineage, the αQ gene quadruplicated, and three of those genes have survived in jawed vertebrates, as α1, α2 and α3. It is now clear that α1 diverged from α2/α3 at the first round of WGD. In the β tree, the two extant branches arose from a duplication during 2R WGD, and it is now clear that their divergence occurred at the first round. It seems reasonable to speculate that the ancestral CNG channel was formed by two α and two β subunits. For olfactory receptor cells, the channel in early deuterostomes is likely to have been formed by replacing one β subunit with an α4 subunit, to give a channel with 2 αQ + 1 α4 + 1 β subunit. In photoreceptor cells, it seems likely instead that the channel in early deuterostomes comprised 2 αQ + 2 β subunits. This configuration was presumably maintained in cones, after whole genome duplication, as 2 α3 + 2 β3 subunits. On the other hand, rods instead utilised channels comprising 3 α1 + 1 β1 subunits. Finally, it is worth bearing in mind that photoreceptor CNG channels participate not only in the activation steps of phototransduction, but, due to their high permeability to Ca2+ ions, they also play a crucial role in the Ca2+-feedback regulation of the cascade (covered in Section 6) and hence in photoreceptor light adaptation.

5 5.1

Evolution of the recovery steps of vertebrate phototransduction G-protein receptor kinases (GRK1A,1B,7)

Background. G-protein receptor kinases (GRKs) are members of the protein kinase A, G, and C (AGC) family. They phosphorylate specific residues of activated G-protein coupled

21 receptors (GPCRs), typically in the carboxy-terminal region of the GPCR. Mammals possess seven GRKs that fall into three families: (1) the ‘visual’ GRKs (GRK1, GRK7) that are the focus of this section; (2) a set of three nearest relatives (GRK4, GRK5, GRK6); and (3) a pair of more distant ‘β adrenergic GRKs’ (GRK2, GRK3). An investigation of the origin of GRKs suggested that the divergence of the β adrenergic GRKs occurred prior to the emergence of metazoa, whereas the divergence of the visual and GRK4/5/6 families occurred around the time that vertebrates evolved (Mushegian et al., 2012). In photoreceptors, the function of the GRKs is to phosphorylate photoactivated visual pigment (cone or rod opsin) and thereby permit the binding of arrestin, which quenches the activity of the activated form. Although the existence of the two main classes of photoreceptorspecific GRK (GRK1 and GRK7) has long been known, it was only 13 years ago that Wada et al. (2006) discovered the existence of two distinct isoforms of GRK1, named GRK1A and GRK1B. These isoforms were shown to have diverged at an early stage in the evolution of vertebrates (Wada et al., 2006), and subsequently it has become clear that both isoforms are present in most vertebrate taxa. One exception is that mammals have lost GRK1B, so that any reference to GRK1 in a mammal signifies the GRK1A group. For cones and rods, the pattern of expression of GRK isoforms has been summarized in a number of species by Osawa and Weiss (2012) in their Table 1, where many examples of coexpression of a GRK1 and GRK7 are found. Lamb et al. (2018b) suggested that the following three rules apply to jawed vertebrates: (1) if the GRK1A isoform exists in a species, then it is expressed in the rod photoreceptors; (2) if the GRK7 isoform exists, then it is expressed in the cone photoreceptors; and (3) if the GRK1B isoform exists, then it is normally expressed in cones. Based on these ideas, GRK1A will be shown as blue in the Figures, and GRK7 and GRK1B will be shown as red. However, these ‘rules’ are simplifications, primarily because of the loss of isoforms in many species. For example, sauropsids (reptiles and birds) have lost GRK1A, and, at least in the case of chicken, their rods express GRK1B (Zhao et al., 1999). In a more extreme example, mice and rats have lost both GRK7 and GRK1B, so that their cones (and rods) can express only GRK1A. Molecular phylogeny. The molecular phylogeny of vertebrate visual GRKs has recently been examined by Lamb et al. (2018b), and here that analysis is updated. The unconstrained ML molecular phylogeny, obtained for 77 visual GRKs and with 17 GRK4/5/5L/6s and nine GRK2/3s forming the outgroup, is presented in Figure 14A. The three jawed vertebrate clades (GRK7, GRK1A and GRK1B) correspond to the tallest triangles, because of the larger number of jawed vertebrate sequences analysed. In addition, there are three agnathan vertebrate clades (GRK7-1, GRK7-2 and GRK1B), which are represented by narrower triangles because of the smaller number of sequences available. The bootstrap support levels around the jawed vertebrate GRK1B clade are only moderate (86% and 90%), and this is almost certainly the result of ‘attraction’ between the bird and lamprey GRK1B sub-trees, both of which have long limbs (see the fully expanded tree in Supplementary Figure S5). The long limb to the bird GRK1B sub-tree indicates substantial evolution in these sequences, presumably as a consequence of the loss in birds of the GRK1A gene. The position of the Ciona clade as sister to the vertebrate GRK7s is poorly-defined, with bootstrap support of only 74%; in some calculations this clade was instead placed as sister to the GRK1s. Such uncertainty is a result of the extensive divergence that has occurred in the tunicate sequences, together with the small number of sequences (two) that could be found. Apart from this pair of tunicate sequences (from C. intestinalis and C. savignyi), no other invertebrate sequences were found to clade with either the GRK1s or the GRK7s; instead the nearest

22 invertebrate sequences (from tunicates, lancelets, echinoderms and hemichordates, and indeed from protostomes) were found to clade with the GRK4/5/5L/6s. Figure 14. G-protein receptor kinases (GRK1s, GRK7s) Gene duplications and losses. The pattern of gene duplications presumed to have given rise to the visual GRKs is presented in Figure 14B. An ancestral GRK gene had duplicated anciently to form what would become GRK2/3 together with a second gene that again duplicated in bilaterian times to give rise to the ancestral visual GRK (GRK1/7) and the ancestor of GRK4/5/5L/6. The timings of the subsequent duplications within the GRK1/7 branch are not resolved by the phylogeny presented in Figure 14; thus, on the basis of phylogeny alone, it would be possible that the divergence of GRK1 and GRK7 occurred during WGD. However, from the gene synteny results in Figures 5–7 (see Section 2.4), it is clear that GRK1A and GRK1B diverged at 1R. Hence it seems clear that the GRK1 and GRK7 branches must have diverged from each other prior to WGD. These two genes then expanded during 2R WGD, though only a single GRK7 has been retained in jawed vertebrates. The syntenic arrangement of the genes in extant jawed vertebrates (on rows 1, 2 and 4 in Figure 6) is consistent with the notion the GRK7 and GRK1 genes remained close together on a chromosome in the chordate organism prior to genome quadruplication, as might be expected if they arose through a local tandem duplication, as indicated by the local preWGD duplication in Figure 14B; i.e. in a scenario similar to those shown in the right-hand section of Figure 4. In Figure 14B, the two agnathan clades are shown as having arisen during WGD, but it is instead possible that they arose via a lineage-specific duplication. However, that possibility seems unlikely because (as may be appreciated from the fully-expanded phylogeny in Supplementary Figure S5), such a duplication would need to have occurred during the relatively short interval prior to the divergence of hagfish and lampreys.

5.2

Arrestins (SAG, ARR3, ARRB1, ARRB2)

Background. Arrestins mediate termination of the response and desensitisation in numerous G-protein signalling cascades. Jawed vertebrate genomes typically possess four arrestin genes, SAG (retinal S-antigen), ARR3, ARRB1 and ARRB2, that encode proteins that are denoted here as Arr-S (expressed in rods), Arr-C (in cones), Arr-B1 and Arr-B2. The last two are often referred to as β-arrestins, though they are by no means restricted to the β-adrenergic system, and instead are widely distributed. Analysis of the phylogeny of arrestins indicates a likely origin from distantly related sequences in archaea and bacteria (Alvarez, 2008; Gurevich and Gurevich, 2006). As discussed in Section 2.4, the syntenic arrangement of arrestin genes is strongly suggestive of the possibility that the four members in jawed vertebrates arose during 2R WGD (Nordström et al., 2004; Larhammar et al., 2009). In jawed vertebrate photoreceptors, the two ‘visual arrestins’ bind to their respective photo-activated visual pigment after it has first been phosphorylated by a GRK, and thereby block access of the G-protein, transducin. The β-arrestins may have a similar blocking function for other activated GPCRs, but they also play a role in receptor internalisation, mediated at least in part by a clathrin-binding site located near the C-terminus (Krupnick et al., 1997; Dell’Angelica, 2001; Kang et al., 2009). Molecular phylogeny. The molecular phylogeny of vertebrate arrestins has recently been examined by Lamb et al. (2018b), and here that analysis is updated. A collapsed ML molecular phylogeny for 101 arrestin sequences from both jawed and agnathan vertebrates is presented in Figure 15A. The outgroup comprised nine sequences from tunicates, lancelets, basal

23 deuterostomes and protostomes, and included the two arrestins that have been characterised in scallop photoreceptors (Gomez et al., 2011). In the unconstrained phylogeny, the β-arrestin tree was fragmented, and in addition the two agnathan visual arrestin clades were positioned as sisters, possibly as a result of long-branch attraction; see Supplementary Figure S6. Therefore, I applied minor constraints, to generate the constrained tree presented in Figure 15A. This caused a change in log likelihood of ∆LogL = 7.8, and the constrained tree passed all three tests of topology, with p-AU ≈ 0.31 (well above the 0.05 level), so that there were no grounds for rejecting the illustrated tree. Figure 15. Arrestins (Arr-S, Arr-C, Arr-β1, Arr-β2) Gene duplications. The closest invertebrate arrestin sequences are those shown in the outgroup, and no invertebrate sequence was found to clade with either the visual arrestins or the β-arrestins. On this basis, one might anticipate that the four clades of jawed vertebrate arrestins (Arr-S, Arr-C, Arr-β1, Arr-β2) had expanded during WGD, as proposed by Lamb et al. (2018b). However, in order to deduce the pattern of gene duplications, it is crucial to consider the syntenic arrangement of genes shown in Figure 6B, where multiple families of ohnolog quartets all show the upper pair of rows as sisters, and hence the lower pair of rows also as sisters. Accordingly, there is overwhelming evidence that the two visual arrestins diverged from each other at 1R, and likewise that the two β-arrestins diverged from each other at 1R, with the ancestral visual and β-arrestins having arisen from a pre-WGD duplication. The deduced duplication pattern for arrestins is illustrated in Figure 15B, and involves the loss of four genes after 2R. This scenario additionally provides a parsimonious explanation for the branching of the two clades of agnathan Arr-C genes mentioned above, which would be more complicated to explain if Arr-S and Arr-C had not diverged until the second round of WGD. Interestingly, it is clear that there are two clades of rod arrestin in cartilaginous fish (labelled S1 and S2 in Figure 15A, for Arr-S1 and Arr-S2), and it has been shown that these two genes are present in all the sharks and rays examined (Lamb et al., 2018b); see also the fullyexpanded tree in Supplementary Figure S6. In the two species for which assembled genomes are available (whale shark, Rhincodon typus, and elephant shark, Callorhincus milii), these two genes are arranged tail-to-tail on an unplaced scaffold (on NW_018032674, and on NW_006890054, respectively). These results indicate that a local duplication occurred in a stem cartilaginous fish, and that the two genes have been retained throughout sharks and rays. Inspection of Supplementary Table S1 in Lamb et al. (2018b) indicates that the transcript levels detected in the eye are 2–20× higher for Arr-S1 compared with Arr-S2, but also suggests that both isoforms are used in shark and ray photoreceptors. While the significance of the existence of two isoforms is not entirely clear, it might be related to the finding that Arr-B2 has been lost from cartilaginous fish.

5.3

Regulator of G-protein signalling (RGS9, Gβ5 and R9AP)

Shut-off of activated transducin, Gαt-GTP (and, in turn, of activated PDE6) is accelerated by the ‘regulator of G-protein signalling’ complex comprised of three protein sub-units: RGS9, Gβ5, and R9AP. Exactly the same isoforms are utilised both in rods and cones, and it appears that the faster shut-off in cones is achieved through the expression of a much higher concentration of the complex (Cowan et al., 1998; Zhang et al., 2003). An examination of the evolution of the three components of the RGS9 complex (namely RGS9, Gβ5, and R9AP) indicated that RGS9 and RGS11 originated through expansion during 2R WGD, whereas neither Gβ5 nor R9AP underwent expansion at that stage (Lamb et al., 2018b). An updated molecular phylogeny for RGS9/11 is presented in Figure 16A, calculated for 24 RGS9 and 25 RGS11 sequences from jawed vertebrates, plus 10 homologous sequences from

24 agnathan vertebrates, and with a selection of seven RGS6 and seven RGS7 sequences as outgroup. The two jawed vertebrate clades, RGS9 and RGS11, each exhibit at least 99% bootstrap support, and it is clear that the agnathan sequences also form two clades. In the unconstrained ML tree (Supplementary Figure S7), the root for these four vertebrate clades was positioned as indicated by the dotted arrow, so that an agnathan clade was paired with each jawed vertebrate clade, though with only moderate support levels of 88% and 90%. That topology would conform with 2R WGD, as well as with divergence of the two jawed vertebrate isoforms at the first round as is required by the gene synteny shown in Figure 6. However, it seemed unusual that the level of support for the positions of the agnathan clades would be as low as this, and so I examined the effect of moving the root by one node and constraining it to the position plotted in Figure 16A. That constraint caused only a very small change in log likelihood, of ∆LogL = 2.3, and the constrained tree passed all the tests of topology, with p-AU = 0.42, so there were certainly no grounds for rejecting the constrained tree. Figure 16: Regulator of G-protein signalling (RGS9/11) The pattern of gene duplications and losses suggested by the constrained tree is shown as Figure 16B, with the RGS9 and RGS11 branches diverging at 1R, as required by the gene synteny analysis. Then, following the second round, jawed vertebrates retained only a single copy of each of these, on rows 1 and 4 in Figure 6. In contrast, agnathan vertebrates lost RGS11 but retained both copies from the RGS9 branch. Realistically, it will not be possible to choose between the two topologies suggested by the constrained and unconstrained phylogenies until suitably complete genomes for lamprey species are available. However, from the perspective of jawed vertebrate evolution, both scenarios are consistent with the available phylogenetic and syntenic evidence.

6 6.1

Evolution of Ca-feedback regulation of vertebrate phototransduction Na+-K+/Ca2+ exchangers (NCKX1,2)

Background. Ca2+ ions are extruded from rod and cone outer segments by a sodium/calcium-potassium exchanger, NCKX; reviewed in Schnetkamp (2013), Schnetkamp et al. (2014). This exchanger is able to operate at very low cytoplasmic Ca2+ levels because it utilises both the inward concentration gradient of Na+ and the outward concentration gradient of K+. It operates electrogenically (Yau and Nakatani, 1984), with a net influx of one positive charge per Ca2+ extruded, because each cycle has a stoichiometry of 4 Na+ ions transported inward, in exchange for 1 Ca2+ ion plus 1 K+ ion (i.e. three positive charges) transported outward (Schnetkamp et al., 1989; Cervetto et al., 1989; Lagnado et al., 1992). Hence, the operation of this exchanger can be measured in intact cells by recording the electrogenic current. Under steady conditions, there must be a balance between any fluxes of Ca2+ ion into and out of the cytoplasm. In darkness, when cyclic nucleotide-gated ion channels (CNGCs) are held open by a moderate free concentration of cGMP, there is an appreciable steady influx of Ca2+ ions through the relatively non-selective channels. As a result there is a moderately high free Ca2+ concentration, which is needed in order to enable the NCKX to generate an equal efflux of Ca2+ ions. Measurements have shown this dark level of free cytoplasmic Ca2+ to be 200–500 nM (Ratto et al., 1988; Woodruff et al., 2002; Lagnado et al., 1992). In bright light, all the CNGCs are closed so that the influx of Ca2+ stops. Initially the efflux continues, resulting in a drop in cytoplasmic Ca2+ concentration, until the fluxes again balance. This drop is crucial in triggering rapid recovery of the electrical response and in mediating light adaptation (Matthews et al., 1988; Nakatani and Yau, 1988).

25 In the rod outer segment the NCKX protein forms a tight 2:1 association with CNGCs, with one NCKX binding to each of the two α-subunits of the CNGC (Bauer and Drechsler, 1992; Schwarzer et al., 1997). This protein complex in the plasma membrane additionally interacts with peripherin-2 in the rim of the disc membranes via the GARP (glutamic acid-rich protein) component of the CNGC β-subunit (Poetsch et al., 2001), thereby apparently contributing mechanical stability for the outer segment disc structure. Rods express NCKX1 (encoded by SLC24A1) whereas cones express NCKX2 (encoded by SLC24A2), and it is apparent that these two isoforms arose during 2R WGD (Lamb and Hunt, 2018). Recently, it has become clear that cones additionally express NCKX4 (encoded by SLC24A4), and that the presence of this isoform is important for the rapid extrusion of Ca2+ (Vinberg et al., 2017); as will be considered below, NCKX4 likewise arose during 2R WGD (Ocampo Daza et al., 2012). Interestingly, the NCKX4 isoform had previously been considered to be the ‘olfactory NCKX’ because of its expression and important function in olfactory receptor neurons (Stephan et al., 2011). The role of NCKX1, NCKX2 and NCKX4 in cones and rods has recently been reviewed by Vinberg et al. (2018), and the origin of whole family of Ca2+/cation antiporters has been reviewed in Emery et al. (2012). Figure 17. Na+-K+/Ca2+ exchangers (NCKX) Molecular phylogeny. The molecular phylogeny of vertebrate visual NCKXs has recently been examined by Lamb et al. (2018b), and here that analysis is updated. Figure 17A presents a molecular phylogeny for visual NCKX sequences from both jawed vertebrates and agnathan vertebrates, subject to constraints on the positions of the agnathan sequences. These constraints were designed to render the tree consistent with 2R WGD followed by the divergence of jawed and agnathan vertebrates. Imposition of the constraints caused a relatively small change in log likelihood, of ∆LogL = 6.1, and the constrained tree passed all three tests of topology, with p-AU = 0.39. This indicates that there are no grounds for rejecting the null hypothesis that the visual NCKX genes of jawed and agnathan vertebrates arose through 2R WGD, though the phylogeny alone does not indicate whether NCKX1 and NCKX2 diverged at the first or second rounds. The fully-expanded phylogeny, and the constraint tree used, are shown in Supplementary Figure S8. Gene duplications and losses. In Figure 6C, the three families of ohnolog quartets (TNFAIP8s, LINGOs and HCNs) in the vicinity of the NCKXs each show strong support for the phylogenetic pairings of the upper rows and the lower rows, with the consequence that NCKX1 and NCKX2 must have diverged at 1R. The deduced pattern of gene duplications is shown in Figure 17B, and involved the loss in jawed vertebrates of two genes after 2R. It will be interesting to examine the positions of the agnathan genes, once the genomes are sufficiently well documented. It is clear that NCKX1 and NCKX2 diverged from NCKX3, NCKX4 and NCKX5 prior to the split between protostomes and deuterostomes. Likewise, the subsequent split between NCKX5 and NCKX3/4 also appears to have pre-dated that protostome/deuterostome speciation event. Finally, it has been shown that the genes encoding NCKX3 and NCKX4, SLC24A3/4, reside in a region paralogous with a second family of somatostatin receptor genes, SSTR1/4, and that their expansion likewise occurred during 2R WGD (Ocampo Daza et al., 2012). Although only two isoforms (NCKX3 and NCKX4) have been retained in mammals, a third NCKX3/4-like isoform is retained in spotted gar, and underwent 3R duplication in teleosts (not shown).

6.2

Guanylyl cyclases (GC-E, GC-F, GC-D)

Background. The seven membrane-spanning guanylyl cyclase proteins encoded by the mammalian genome have been assigned the names GC-A to GC-G by IUPHAR/BPS (see www.guidetopharmacology.org/GRAC/FamilyDisplayForward?familyId=662), and the properties

26 of these GCs have recently been reviewed by Kuhn (2016). The two isoforms in mammalian photoreceptors are GC-E (=Ret-GC1) and GC-F (=Ret-GC2). GC-F is encoded by GUCY2F, whereas the gene encoding GC-E is named GUCY2D in human and many other species, but Gucy2e in mouse and a number of other mammals. A third isoform, GC-D, often referred to as the ‘olfactory’ GC, is present in most vertebrate taxa, but has been lost in primates (other than lemurforms); in mouse the encoding gene is named Gucy2d. In zebrafish, there are one-to-one orthologs of the three isoforms, that are named as follows: gc3 = GC-E, gc2 = GC-F, gucy2f = GC-D (Lamb et al., 2018b). In order to minimise the potential for confusion, the genes will here be referred to by their protein names. The only isoform expressed in cones is GC-E (Yang et al., 1999; Rätscho et al., 2009), whereas rods co-express both GC-E and GC-F (Dizhoor et al., 1994; Yang and Garbers, 1997). Mutations in the GC-E gene (GUCY2D) in human are a major cause of Leber congenital amaurosis type 1 (Perrault et al., 1996) and dominant cone-rod dystrophy (Kelsell et al., 1998). Over 140 disease-causing mutations in GUCY2D have been identified, and Sharon et al. (2018) have recently reviewed current knowledge of the genetics, biochemistry and phenotype related to GUCY2D mutations. To date, no human retinal diseases have been linked to mutations in the GC-F gene, GUCY2F. These photoreceptor GCs synthesise cGMP, at a rate set by the cytoplasmic Ca2+ concentration via the extent of their activation by GCAPs; however, the molecular mechanism of activation by GCAPs has not yet been elucidated. The cyclase is a long membrane-spanning molecule, in which seven functional domains have been identified (Bereta et al., 2010; Peshenko et al., 2014, 2015). It functions as a homodimer, with dimerisation mediated by binding of the α-helical coiled-coil dimerization domain in each partner (Ramamurthy et al., 2001). In the dimer, the paired CCDs (cyclase catalytic domains) form the catalytic centre where cGMP is synthesised (Tucker et al., 1999). Finally, it is interesting to note that during their synthesis and transport to the outer segment, GCs appear to be protected from activation by the binding of a Ca-insensitive protein, RD3 (Azadi et al., 2010; Peshenko et al., 2011). Molecular phylogeny and gene duplications. The molecular phylogeny of vertebrate visual GCs has recently been examined by Lamb et al. (2018b), and here that analysis is updated. Figure 18A presents an unconstrained molecular phylogeny for visual and ‘olfactory’ GCs from jawed vertebrates; the fully-expanded tree is given in Supplementary Figure S9. Bootstrap support in this unconstrained tree is remarkably high, being unanimous for each the three jawed vertebrate clades and also unanimous at the two nodes linking them. Thus, there is unanimous support for GC-F being sister to GC-D, as well as unanimous support for GC-E being sister to that pair. From the summary of gene synteny, and the multiple sister pairs of quartet ohnologs in Figure 6B, it is clear that GC-E diverged from GC-D/GC-F at 1R, and that GC-D and GC-F then diverged at 2R, as indicated by the gene duplication pattern in Figure 18B; note that this differs from the interpretation of Lamb and Hunt (2018). Figure 18. Guanylyl cyclases (GC-D, GC-E, GC-F)

6.3

Guanylyl cyclase activating proteins (GCAP1, 1L, 2, 2L, 3)

Background. Within the extensive set of neuronal calcium sensor proteins, the vertebrate genome includes a family of guanylyl cyclase regulatory proteins (reviewed in Ames and Lim (2012); Lim et al. (2014); and Koch and Dell’Orco (2015)), comprising several ‘activating’ proteins (GCAPs) and a single so-called ‘inhibitory’ protein (GCIP). A recent analysis of synteny and phylogeny has divided GCAPs into six sub-families (Lamb and Hunt, 2018), with teleost fish possessing 3R duplicates of several of these (Imanishi et al., 2004; Rätscho et al., 2009; Scholten and Koch, 2011). The best-studied members are GCAP1 (encoded by GUCA1A) and GCAP2

27 (encoded by GUCA1B); these two genes are arranged tail-to-tail in virtually all tetrapods as well as in spotted gar, though not in teleosts. In mammalian cones, the predominant isoform is GCAP1 (Cuenca et al., 1998), with the level of GCAP2 always being much lower, or even absent, depending on species. In mammalian rods, GCAP1 and GCAP2 are co-expressed, with the level of GCAP2 being higher (Dizhoor et al., 1995). A third isoform, GCAP3 (encoded by GUCA1C) occurs in many species, and is expressed only in cones, at least in human and zebrafish (Imanishi et al., 2002). A fourth isoform, GCAP1L, closely similar to GCAP1 and GCAP3, is often overlooked, probably because it has been lost from mammals. Finally, another set of isoforms, closely similar to GCAP2 and here referred to as GCAP2L, occur in a number of vertebrate taxa. However, very little is known about either the GCAP1L or the GCAP2L isoforms. GCAPs provide very powerful Ca-sensitive activation of guanylyl cyclases (GCs) (Koch and Stryer, 1988). The activation of GCAPs at lowered Ca2+ concentrations involves the binding of Mg2+ (Peshenko and Dizhoor, 2006) to two EF hands (EF-2 and EF-3), thereby inducing a conformational change. Recent evidence has shown that GCAP1 forms a functional homodimer (Lim et al., 2018), suggesting a 2:2 stoichiometry of interaction with the GC homodimer. In vitro experiments with mammalian proteins have shown that GCAP1 and GCAP2 are able to activate GC-E and GC-F with comparable efficacy. However, in vivo experiments on rods indicate that GCAP1 primarily regulates GC-E (Olshevskaya et al., 2012). Functionally, the Ca2+ sensitivity of a cell’s cyclase activity is determined by its GCAP(s). GCAP1 operates over a higher range of Ca2+ concentrations (i.e. at lower light intensities) than GCAP2 does; the Kms of the two isoforms for Ca2+ are ~140 nM and ~50 nM, respectively. Figure 19. Guanylyl cyclase activating proteins (GCAP) Molecular phylogeny. The molecular phylogeny of GCAPs has recently been examined by Lamb et al. (2018b), and here that analysis is updated. Figure 19A presents an unconstrained molecular phylogeny for GCAPs/GCIPs from jawed vertebrates; the fully-expanded tree is given in Supplementary Figure S10. Bootstrap support in this unconstrained tree is at least 99% for all but one clade and for all but one node. Within the unanimously supported sub-tree for GCAP1/1L/3, there is 95% support for GCAP1L and GCAP1 being sisters. Gene duplications and losses. From the gene synteny data and the pairings of ohnolog quartet genes in Figure 6D, it is clear that GCAP1 and GCAP1L diverged at the second round of WGD, and hence that GCAP3 diverged from these two at 1R. Furthermore, the phylogeny in Figure 19A makes it very clear that the GCAP2/2L division diverged from the GCAP1/1L/3 division prior to WGD, as indicated by the mauve ‘Pre-’. Despite the antiquity of the duplication that generated these two branches, the GCAP1 and GCAP2 genes have remained arranged tail-totail in virtually all tetrapod taxa, as well as in spotted gar (note the gene locations for spotted gar on row 2 in Figure 19B). Finally, it is clear that GCIP diverged from all of the above isoforms at an even earlier time. Subsequent to their expansion during 2R WGD, various isoforms have been lost from different lineages, though GCAP2 has been retained in most vertebrate taxa. Notably, mammals have lost GCAP1L, though it is retained in each of the other major lineages, where it forms the most highly-conserved of all the GCAP clades (see Figure 19A and Supplementary Figure S10); hence, its loss from mammals may have been very significant. Sharks and rays have lost both GCAP1 and GCAP3, and retain only GCAP1L from the 1/L/3 group; however, the elephant shark, a chimera, retains all three of these isoforms. Isoforms of GCAP2L are found in only a few jawed vertebrate taxa, and appear not to be present in agnathans. GCIP, which appears not to have duplicates remaining from 2R, has been lost from cartilaginous fish and from amniotes.

28

6.4

Recoverin and visinin

Background. Recoverin and visinin play relatively minor roles in the regulation of vertebrate phototransduction, and they will be considered only briefly here. One possibility is that their main role is in increasing the Ca2+-buffering power of the cytoplasm, which might be more important than any direct role they play in regulating the activity of GRKs, etc. It is clear that recoverin and visinin diverged from each other during 2R WGD, and it has been proposed that the proto-vertebrate organism expressed visinin in its cones and recoverin in its rods (Lamb and Hunt, 2018). However, because of the loss of one or other of these isoforms in many lineages, extant organisms typically express only a single isoform in both rods and cones. On the other hand, some taxa (including amphibia and bony fish) retain the genes for both isoforms. Figure 20. Recoverin and visinin Molecular phylogeny. The molecular phylogeny of recoverins and visinins has recently been examined (Lamb and Hunt, 2018), and here that analysis is updated. Figure 20A presents a constrained molecular phylogeny for 19 recoverins and 18 visinins from jawed vertebrates, plus eight closely related sequences from lampreys, using the same set of outgroup sequences as for the GCAPs phylogeny in Figure 19. The mild constraint that has been applied moved the root of the vertebrate tree by one node, from the position indicated by the dotted arrow, and it changed the log likelihood by the very small amount of ∆LogL = 2.1; this constrained tree passed all three tests of topology, with p-AU = 0.4 (well above the rejection level of 0.05). Gene duplications and losses. From the gene synteny data and the pairings of ohnolog quartet genes in Figure 6A, it is clear that recoverin and visinin diverged at the second round of WGD. This leads to the gene duplication pattern shown in Figure 20B. Interestingly, agnathan vertebrates retain only the other two isoforms, named RecVis-X and RecVis-Y (Lamb and Hunt, 2018), which diverged from recoverin and visinin at 1R. These two isoforms are shown as having arisen at the second round of WGD, and although it is possible that they might instead have arisen via a lineage-specific duplication, after a gene loss at the second round, it is more parsimonious to assign the occurrence to the known duplication event.

7

Evolution of vertebrate visual opsins

Background. In the early 1990s, Okano et al. (1992) analysed the molecular phylogeny of the vertebrate visual opsins that were then available, and showed that rod opsin (Rh1) appeared to have evolved after the cone opsin families had already been established. The cone opsin families they reported to have predated Rh1 were (by branching order, and using today’s terminology): LWS, SWS1, SWS2 and Rh (=Rh1+Rh2). Subsequently, when knowledge of the chromosomal locations of these genes was taken into account (Nordström et al., 2004; Larhammar et al., 2009; Lagman et al., 2013), it was instead proposed that the four shorter-wavelength-sensitive opsins (SWS1, SWS2, Rh1 and Rh2) had arisen through 2R quadruplication of an ancestral SWS gene and that the corresponding expansion of an ancestral LWS gene had presumably been followed by loss of all but one copy. However, more recent analysis supports the original proposal, and shows that of these vertebrate visual opsins only Rh1 and Rh2 diverged during 2R WGD. Specifically, when the phylogeny is constrained so as to place the SWS1 and SWS2 clades as sisters, all three statistical tests of topology reject that constrained topology at the 95% level (Lamb and Hunt, 2017). Molecular phylogeny. An updated molecular phylogeny for vertebrate ciliary opsins is presented in Figure 21, where 199 deuterostome C-opsin sequences have been analysed, using an outgroup comprising 16 OPN5 sequences from jawed vertebrates. Every clade in this Figure is

29 supported at a level of at least 98%, and the five nodes separating the lowermost six clades also show support of at least 98%. Thus, the topology of those six lowermost clades is defined very reliably, with very high support for pinopsin being sister to the five vertebrate ‘visual opsins’. The significance of this finding will be considered in Section 8. The deduced pattern of gene duplications is shown in the upper left section of Figure 22. In achieving such a high level of support for the position of the pinopsin clade, I found it helpful to omit the divergent Ciona opsins from the analysis; when the four Ciona C-opsins were included, the topology of the lowermost six clades was unchanged, but support at the asterisked node (where LWS branches) was lower. This presumably occurred because in some of the bootstrap replicates the Ciona opsin clade was positioned within the sub-tree of six vertebrate opsins, and this effect may have been exacerbated by a poorer quality of the alignment when those divergent sequences were included. The alignment also appeared less reliable when the TMT opsins were included, and so they too have been omitted in this phylogeny; the approximate positions that were obtained when the Ciona opsins and the TMT opsins were included are marked by the two dotted arrows in Figure 21. Figure 21. Vertebrate visual opsins

8

A synthesis of the co-evolution of the genes for the vertebrate phototransduction cascade

Now that each component of the vertebrate visual phototransduction cascade has been analysed individually, the entire set of results will be drawn together, in an attempt to provide an integrated view of the evolution of the system as a whole, and to provide an understanding of the origin of the dichotomy of rod/cone isoforms.

8.1

Pattern and timing of phototransduction gene duplications

Pattern of duplications. Figure 22 summarises the most parsimonious set of gene duplications and losses, consistent with the observed molecular phylogenies and gene synteny, that could have given rise to the multiple isoforms of phototransduction proteins expressed in vertebrate rods and cones. Except for the opsin section (top left), where the colours indicate spectral sensitivity, the following colour-code applies: red denotes ‘cone isoforms’, blue denotes ‘rod isoforms’, black denotes common isoforms and those for which the distribution is unclear, green denotes an isoform used in phototransduction though in neither rods nor cones, and grey denotes isoforms that are not involved in phototransduction. The quotation marks in the previous sentence reflect the fact that the rod/cone distinction is not absolute, primarily because in some lineages the loss of a gene has necessitated the use of a single isoform in both classes of cell in that particular lineage. Figure 22. Scenario for gene duplications in the vertebrate phototransduction cascade Timing of duplications. Massive uncertainties remain in estimating the timing of the various duplication events. In Figure 22 the dotted vertical lines indicate four notable events that occurred during the evolution of our ancestors. The first pair are speciation events, when protostomes and then tunicates diverged from our own lineage; after the first of these speciation events our bilaterian ancestors became deuterostomes, and after the second of these speciation events our ancestors can be considered to have been ‘proto-vertebrates’. The second pair of dotted vertical lines are the two rounds of whole genome duplication, 2R WGD, that preceded the vertebrate radiation. Even now, the absolute timings of these four important events are known only very approximately. As previously indicated in Figure 2A, order-of-magnitude timings are

30 probably around 750 Mya and 650 Mya for the first pair, and then around 600 Mya for the second pair, which appear to have occurred quite close to each other (on a geological timescale). For consideration of the difficulties in estimating the absolute timing of speciation events, see for example Kumar et al. (2017). In addition to uncertainty in the absolute timing of these four reference points, there is in some cases even greater uncertainty regarding the relative timing of individual gene duplication events (each marked by a □ in Figure 22). The illustrated positions are very approximate estimates, based to some extent on guesswork. For example, allocation of the position of an individual duplication event relative to the divergence of tunicates relies on the retention of (and our identification of) sufficiently closely-related genes in that lineage, yet tunicates have undergone extensive loss of genes and extensive modification of sequences. Likewise, it is not always straightforward to place a duplication event relative to the divergence of protostomes, unless suitably closely-related genes can be identified in basal deuterostome taxa. In spite of these uncertainties, the schematic in Figure 22 is an attempt to put the overall sequence of evolutionary events into perspective.

8.2

Summary of the evolution of individual phototransduction components

As a prelude to considering the significance of the co-evolution of the various phototransduction components, this section summarises the main evolutionary events that took place within each individual component. Opsins. The ciliary branch of animal opsins already existed by the time that bilateria (bilaterally-symmetric animals) appeared, and C-opsins are widely utilised throughout protostomes, though rarely for imaging vision. In the deuterostome lineage, multiple duplication events occurred prior to the divergence of tunicates, with OPN3, parietopsin, parapinopsin and VAL all having survived to the present day, though not with roles in imaging vision. Then, prior to 2R WGD, four further duplication events occurred, that gave rise to pinopsin plus four ‘cone opsin’ genes: LWS, SWS1, SWS2, and Rh1/Rh2. Rhodopsin (Rh1) did not emerge as a separate entity until the first round of WGD. Transducins. Several classes of G-protein α-subunits arose very early in animal evolution, with the GNAI (inhibitory) division having emerged prior to the divergence of protostomes. After tunicates had diverged, a tandem duplication generated the GNAI and GNAT (transducin) divisions seen in extant vertebrates. During proto-vertebrate evolution (i.e. prior to 2R WGD), the ancestral GNAT gene underwent extensive modification, represented by ~0.3 amino acid substitutions per residue; in contrast the GNAI gene underwent very little modification during the same interval. Following 2R WGD, the daughter pairs of GNAI/GNAT genes have remained adjacent in most extant lineages. All four isoforms of GNAI have survived in lamprey; one GNAI isoform has been lost in jawed vertebrates, along with its associated GNAT isoform in both lineages. Vertebrates possess four isoforms of G-protein β-subunit (GNB1–4), and two of these (GNB1 and GNB2) exhibit highly-conserved sequences, perhaps reflecting their association with multiple different G-protein α-subunits. The four isoforms may be unique amongst the genes involved in phototransduction, as a ‘textbook example’, having retained all four gene copies that arose during 2R WGD, and having undergone no other (surviving) duplications since at least the divergence of protostomes. GNB3 (which is expressed in cones) underwent substantial evolution after genome duplication, both before and after the radiation of vertebrates. Phosphodiesterase. Cyclic nucleotide phosphodiesterases arose early in evolution, and duplicated into multiple forms. The ancestral PDE5/6/11 was already present in bilateria, and duplicated to form PDE5/11 and PDE6 in deuterostomes, probably before tunicates diverged. Subsequently, that chordate/proto-vertebrate PDE6 underwent substantial modification,

31 representing ~0.4 amino acid residues per site, prior to WGD. As will be discussed below, much of this modification is likely to have involved the ability to interact with the inhibitory PDEγ subunits, a role that is unique to vertebrates PDE6s. Following 2R WGD, all four daughter PDE6 catalytic genes have survived, though jawed vertebrates have dispensed with one isoform and agnathans have dispensed with two. Uniquely amongst all phosphodiesterases, the two rod isoforms evolved so as to cooperate as a heterodimer, rather than operating as a homodimeric PDE in all other cases. The PDEγ inhibitory subunits arose somewhat mysteriously, quite possibly as a duplicated section of the chordate/proto-vertebrate RGS9/11 gene. Following 2R WGD, three isoforms of PDEγ have been retained in vertebrates, though PDE6I has been lost from amniotes. CNG channels. The ancestor of vertebrate cyclic nucleotide-gated channel genes arose through an ancient duplication that also generated the HCN channels. That CNG channel gene duplicated in a bilaterian ancestor to generate the α and β divisions. A further duplication, that also pre-dated the divergence of protostomes, generated GNGA4 and what is here referred to as CNGAQ, homologs of which are found in protostomes and tunicates. During WGD, CNGAQ and CNGB both expanded, with extant vertebrates retaining three CNGAQs and two CNGBs. G-protein receptor kinases. An ancient GRK2/3-like gene duplicated, probably before the divergence of protostomes, to generate GRK4/5/6 and GRK1/7, though the latter is not found in protostomes. Then, in the deuterostome lineage and prior to the divergence of tunicates, GRK1/7 duplicated to generate GRK1 and GRK7. These two genes expanded during 2R WGD, though only three of the eight daughter genes are retained in jawed vertebrates, with a different combination of three retained in agnathan vertebrates. Arrestins. An ancestral arrestin gene underwent a local duplication, possibly after the divergence of tunicates from our lineage, to generate genes for a β-arrestin (Arr-B) and a visual arrestin (Arr-V). Both of these expanded during the first round of WGD but, following the second round, only a single copy of each 1R duplicate was retained. Thus, the genes for Arr-C and Arr-S diverged at 1R. Regulator of G-protein signalling. An RGS gene duplicated in bilaterian times to form RGS6/7 and RGS9/11. The RGS9/11 gene expanded during 2R WGD, with the two members (RGS9 and RGS11) that diverged at the first round being retained in jawed vertebrates, and with two members (RGS9 and RGS9-Like), that may have diverged at the second round, being retained in agnathan vertebrates. It is also plausible that a tandem duplication of part of the ancestral RGS9/11 gene occurred in a chordate or proto-vertebrate organism, resulting in the advent of the ancestral PDEγ. Calcium exchangers. The sodium/calcium-potassium exchangers of animals arose anciently from duplication of an ancestral NCKX gene, giving rise to SLC24A1/2 and SLC24A3/4/5. The former gene (the ancestral visual NCKX) expanded during the first round of WGD, though with only a single copy of each, namely SLC24A1 and SLC24A2 (encoding NCKX1 and NCKX2), having been retained after the second round. Guanylyl cyclases. The nearest relative of the visual guanylyl cyclases (GC-D, GC-E and GC-F) is the so-called heat-sensitive enterotoxin receptor GC-C, encoded by GUCY2C, with the divergence between the two classes having occurred following a duplication in bilaterian times. The ancestral visual GC quadruplicated during WGD, with the subsequent loss of a single isoform (the original sister of the gene for GC-E). GC-D is interesting in that, in fish it appears to be expressed in photoreceptors, whereas in tetrapods (air-breathing) it is expressed in olfactory receptor cells; in primates it is a pseudogene or absent.

32 GCAPs. From the Ensembl93 gene tree for GCAPs or recoverin, it is clear that the duplication in bilaterian times of a neuronal calcium sensor gene gave rise to the ancestral GCAP/GCIP gene and the ancestral recoverin/visinin gene. At some stage during chordate/protovertebrate evolution, probably after the divergence of tunicates, a further duplication generated the GCAP and GCIP genes. The GCAP gene then underwent a tandem duplication prior to WGD, giving rise to the GCAP1/1L/3 and GCAP2/2L genes. Each of these genes expanded during 2R WGD, with extant vertebrates retaining three copies of the former (GCAP1, GCAP1L, GCAP3) and two copies of the latter (GCAP2, GCAP2L), though a number of lineages have lost GCAP1L and GCAP2L. GCIP did not undergo expansion during WGD. Recoverin/visinin. The recoverin/visinin gene, formed by the above-mentioned duplication of an NCS gene, underwent expansion during 2R WGD, with recoverin and visinin diverging at the second round. Other components. Several additional genes, that are also involved in phototransduction or that have other roles in photoreceptor function, are neither listed above nor shown in Figure 22, yet turn out to be intimately associated with the paralogon arrangement depicted in Figure 6. Amongst these are the following: the GNGTs, the RGS9BPs, the RARs (retinoic acid receptors), the THRs (thyroid hormone receptors) and PRPH (peripherin), which are all close to the CNGBs; the OPN4s (melanopsins), RRH (peropsin) and RPE65, which are close to the PDE6s; the ONECUTs, GNB5, RAX (Rx) and KCNV2 (Kv8.2), close to the NCKXs; the SYTs (synaptotagmins) and SLC17As (VGluTs), close to the GCAPs; and BSN (bassoon) and PCLO (piccolo), close to the opsins.

8.3

Co-evolution of components: Stages in the evolution of vertebrate phototransduction

As was argued in Section 3.2, it seems probable that the ancestral signalling cascade in the ciliary photoreceptors of early deuterostomes utilised an inhibitory G-protein (Gαi) that triggered, possibly via inhibition of adenylyl cyclase, a reduction in cyclic nucleotide levels and closure of CNG channels. Here I will review the changes that such a cascade may have undergone, to become the canonical vertebrate cone/rod phototransduction cascade. Opsin. First, it seems that during deuterostome evolution several improvements occurred in the C-opsin’s performance. Perhaps the most important of these changes was the migration of the Schiff-base counterion location from site 181 to site 113, as has been reviewed by Terakita et al. (2012). This change had occurred by the time that parapinopsin evolved, and it permitted the release of all-trans retinal. Importantly, it meant that even in darkness (and hence in the absence of photoreversal) visual pigment could rapidly be regenerated by using a store of 11-cis retinal. In addition, it paved the way for the achievement of a higher efficacy of G-protein activation, by enabling further intra-molecular rearrangements that led to a large tilt in helix 6 in the meta II state, as has been reviewed by Hofmann et al. (2009). This change appears to have occurred around the time that VAL opsin emerged. Crucially, it substantially increased the gain of phototransduction, as summarised from a range of studies in Table 2 of Lamb (2013). However, this change ‘locked’ the molecular configuration of the activated opsin, preventing photoreversal of meta II, and it thereby made the release of all-trans retinal indispensable. Through these changes, higher gain was traded-off against the need to employ a separate pathway for resynthesis of 11-cis retinal. Phosphodiesterase and transducin. At around the same time that this higher-gain C-opsin emerged, the phosphodiesterase was undergoing a fundamental change. Previously it was a passive player, to the extent that its activity was not directly modulated by activation of the opsin or G-protein. But, once an inhibitory PDEγ-like molecule appeared, which could bind to the PDE

33 and that could also interact with the activated G-protein, then that phosphodiesterase would have become an active participant in the cascade. Thereafter, fine-tuning of the three proteins would inevitably have improved the efficacy of their interactions and could rapidly have led to the emergence of the canonical PDE6 and PDE6γ, and likewise could rapidly have converted the Gαi into Gαt. Dimeric activation of rod PDE6. It has recently become clear that activation of the rod PDE6 by the binding of two molecules of activated transducin (Gαt·GTP) is a highly non-linear activation step, in that the binding of a single transducin molecule causes negligible activation, and with full activation requiring the binding of both (Qureshi et al., 2018; Lamb et al., 2018a). This property provides the rod PDE6 with considerable ‘noise immunity’, because a background of thermally activated transducin molecules will induce very little activation, and it will only be in the case of photon absorption, where a high concentration of transducins is activated locally, that the PDE6 will be substantially activated. A trade-off that occurs with this mechanism is the introduction of a small delay (of the order of 5 ms) in the activation process. At present it is unclear whether the cone PDE6 subunits act independently or whether their activation is similarly co-operative. However, it is tempting to speculate that cones, which are optimised for speed, would have opted to avoid the small additional delay and simply put up with the extra noise that occurs with independent activation. Hence, it may be the case that the cooperative activation mechanism of the rod PDE6 is a feature that has evolved only in rods. If so, it would be natural to presume that it is the heterodimeric nature of rod PDE6A+PDE6B that has enabled this. Interestingly, then, this is a property that could not have emerged until at least the second round of WGD. Furthermore, it is a property that would be unique to jawed vertebrate rods, and that would not occur in the rod-like photoreceptors of agnathan vertebrates, where a homodimeric PDE6 is utilised. In this regard, it may be relevant that the rod-like photoreceptors of the lamprey (Lampetra fluviatilis) have been shown to exhibit a markedly lower signal-to-noise ratio for their single-photon responses than is found in the true rods of jawed vertebrates (Asteriti et al., 2015). Calcium feedback regulation of phototransduction. Currently, it is unclear at what stage the powerful calcium negative-feedback regulatory loop appeared. This system is crucially important to the ability of vertebrate photoreceptors, especially cones, to adapt rapidly to altered light intensity – so-called ‘light adaptation’. In mammalian photoreceptors, the feedback loop primarily involves the GCAPs acting on the visual GCs. The ancestral gene for a GCAP had diverged from that for recoverin/visinin long before WGD, and it had also duplicated into two isoforms prior to WGD, and so it seems likely that the calcium feedback loop was already in operation in an early chordate ancestor.

8.4

Origin of photopic/scotopic dichotomy in vertebrate phototransduction

Figure 22 makes it clear that multiple instances of dichotomy between rod and cone protein isoforms (indicated by blue and red lettering) arose during 2R WGD. But, in addition, it shows two earlier gene duplications that might have contributed to an ancestral photopic versus scotopic dichotomy. Notably, GRK1 and GRK7 arose through a gene duplication that occurred prior to WGD, and that possibly pre-dated the divergence of tunicates from our own lineage. Likewise, prior to WGD a GCAP gene had duplicated to form both GCAP1/1L/3 and GCAP2/2L. Following each of these pre-WGD duplications, it is entirely plausible that the daughter products could have been differentially expressed between two classes of photoreceptor cell, that may have provided better performance at higher and lower intensities in the respective cases. Further support for this contention comes from the recent findings of Sato et al (2018). They showed, firstly, that pinopsin exhibits a rate of thermal activation >20-fold lower than for cone opsins, and secondly, that pinopsin is present in the retina in a range of non-mammalian

34 species, and that at least in spotted gar and xenopus it is expressed in a small proportion of retinal rods and cones. Those findings led Sato et al (2018) to conclude that pinopsin is likely to have been the ancestral scotopic opsin. Although that study and other earlier studies were not able to determine the phylogenetic position of pinopsin with high precision, two other analyses (Lamb and Hunt, 2017; Hart et al., in preparation) have recently reported support for the position of pinopsin as sister to the set of five conventional vertebrate visual opsins (LWS, SWS1, SWS2, Rh2 and Rh1). In the present analysis in Figure 21, this basal position of the pinopsin clade was supported at a bootstrap level of 98%. Such a high level of support was only obtained when C-opsin sequences from Ciona were excluded from the analysis; inclusion of those divergent tunicate sequences lowered the quality of the whole alignment, contributing to uncertainty in the position of the tunicate clade, and thereby lowering support levels at adjacent nodes. Combining these observations, it would appear highly likely that even prior to 2R WGD there were already in existence two modes of vertebrate retinal phototransduction, presumably operating in separate classes of cell. The photopic cells would have employed a cone-type opsin (e.g. the ancestral SWS/LWS opsin), and their cascade would have achieved rapid shut-off by using GRK7, probably in conjunction with feedback via GCAP1/1L/3, along with high expression of RGS9. The scotopic cells would have employed pinopsin, and their cascade would have opted for slower shut-off by using GRK1, probably in conjunction with feedback via GCAP2/2L, and with a lower level of RGS9 expression. Crucially, this photopic/scotopic dichotomy would have existed well before the emergence of rhodopsin (Rh1) during WGD. After rhodopsin emerged, with even greater thermal stability, it is likely that it would have taken over from pinopsin as the preferred scotopic opsin. Then, at some unknown later time, these scotopic photoreceptors would have become identifiable as vertebrate rods. A corollary to this postulated sequence of events is that such a duplex photopic/scotopic division may already have been in operation at a stage when only a single spectral class of cone opsin existed, and hence it could have preceded the emergence of photopic colour vision.

8.5

Refinement of the distinct isoforms for rods and cones

If one accepts the above proposition that separate classes of photopic and scotopic photoreceptors already existed, then the subsequent occurrence of quadruplication of the entire genome would have provided exactly the opportunity that was required in order to refine each component of the phototransduction cascade to the respective needs of day-time and night-time vision, in those two classes of cell. Thus, if the daughter isoforms were differentially expressed in the photopic and scotopic classes of photoreceptor, then any mutations that benefitted photopic vision in the photopic class could have been selected for, and any mutations that benefitted scotopic vision in the scotopic class could likewise have been selected for, and so on. A scenario of this kind seems easier to rationalise than the alternative one, in which there had been no distinction of photoreceptor classes at the time that WGD occurred, and that instead the single preexisting class of photoreceptor somehow managed to juggle slight differences of numerous protein isoforms into a coherent division, so as to generate separate cone and rod classes. By the time of the radiation of jawed vertebrates (i.e. at latest by the time that cartilaginous fish diverged from our own ancestors, around 480 Mya; Figure 2A), it appears that all of the components of the respective cone and rod phototransduction pathways had become firmly established. In other words, there is no obvious evidence that there have been any changes of fundamental importance in any jawed vertebrate lineage, over the subsequent period of almost half-a-billion years. A possible exception to this might be the local duplication of the rod arrestin gene in cartilaginous fish, that led to the emergence of two classes, Arr-S1 and Arr-S2 (Section 5.2), though the significance of the existence two such isoforms is unclear. And, of course, some species (e.g. teleost fish) have undergone a third round of genome duplication.

35 On the other hand, it is not at all clear that exactly the same set of changes occurred in the ancestors of extant agnathan vertebrates, which diverged from our lineage before jaws had evolved, and possibly quite soon after 2R WGD. Although the last common ancestor of extant agnathan vertebrates and jawed vertebrates possessed a quadruplicated genome, and apparently already had multiple classes of retinal photoreceptor that expressed different isoforms of visual opsin, the subsequent evolution of those photoreceptor classes occurred independently in the two lineages. Hence it is hardly surprising that the rod-like photoreceptors of lampreys and hagfish appear to be quite different from the rods of jawed vertebrates, both anatomically and physiologically, even though they express an orthologous rhodopsin (Rh1). Likewise, it might be expected that the four classes of agnathan cone-like photoreceptors could have quite different properties from their jawed vertebrate counterparts. For jawed vertebrates, different lineages have had to cope with the loss of one or more isoforms, but on the whole this seems to have been handled by utilisation of a non-standard isoform and/or by molecular ‘tinkering’. As one example, the stem therian mammal (the ancestor of marsupials and placentals) lost two of its cone opsin genes (Rh2 and SWS2) and so these mammals have been reduced to a dichromatic version of colour vision. Subsequently, primates duplicated their LWS opsin, and the two variant isoforms gradually attained somewhat different spectral sensitivities, so providing rudimentary trichromacy. As a second example, sauropsids (birds and reptiles) lost the GRK1A gene. Their rods instead use GRK1B (which elsewhere is used in cones); interestingly, in birds that GRK1B isoform underwent considerable modification (see Supplementary Figure S5). The third example again involves sauropsids, which have also lost the PDE6A gene. Because of this loss, it is presumed that sauropsid rods utilise PDE6β as a homodimer, though this has not been exhaustively examined. Perhaps surprisingly, sauropsid PDE6Bs shows no obvious signs of having evolved differently from other vertebrate PDE6Bs (see Supplementary Figure S3), though they do have a deletion of about 18 residues towards the Nterminus. A final, interesting example is that of the nocturnal Tokay gecko, G. gekko, a species that is descended from a diurnal gecko that completely lost its rods (Walls, 1942), along with many of the rod-specific isoforms of phototransduction proteins (Zhang et al., 2006). Its two classes of photoreceptor both display all the ultrastructural features of cones (Röll, 2000), except that the outer segments of the scotopic photoreceptors are large, similar to those of rods in many species. Importantly, the light responses of these scotopic photoreceptors are broadly rod-like (Kleinschmidt and Dowling, 1975; Rispoli et al., 1993), though the single-photon response amplitude may be smaller than in genuine rods. In the absence of rhodopsin (Rh1), these rod-like photoreceptors express LWS. The other identified phototransduction genes in this species clearly clade with their cone cousins (GNAT2, PDE6C, PDE6H, CNGA3 and ARR3), though a few residues have been identified as being rod-like (Zhang et al., 2006). On the other hand, the RGS9 protein is expressed at a low level, as normally seen in rods, rather than at the high level characteristic of cones; furthermore, the dark basal activity of the PDE is low, as is typical of rods rather than cones (Zhang et al., 2006). Taken together, these results indicate that the proteins of G. gekko photoreceptors are predominantly cone-like though modified in minor ways, but that the expression levels and/or activities of at least two of the proteins that are important in generating slow, sensitive responses are instead rod-like. In conjunction with an altered outer segment geometry, this has the consequence that rod-like electrical responses are attained. Hence, by redeploying modified cone proteins and by utilising a rod-like geometry, evolution has achieved in one class of photoreceptors in Tokay gecko electrophysiological responses broadly comparable to those exhibited by the true rods of other vertebrates.

36

8.6

Summary

There is now sufficient evidence to be able to propose the events shown in Figure 22 as a plausible account for the evolution of at least 40 isoforms of proteins utilised for vertebrate phototransduction (those shown as red, blue or black in Figure 22), that will serve as a test bed for more extensive studies in the future. Three of the protein classes (visual opsin, GRK, and GCAP) each appear to have possessed at least two isoforms prior to WGD, suggesting that photopic and scotopic specialisation could already have existed by that time. Quadruplication of the genome may then have provided the flexibility needed in order for specialised isoforms to evolve in both such classes of photoreceptor. It appears that the phototransduction cascade in cone and rod photoreceptors had already reached a superb state around half-a-billion years ago, and that little has changed in any fundamental way since then. Remnants of the syntenic arrangement of genes along the chromosomes of the proto-vertebrate organism just after 2R WGD can still be glimpsed in extant vertebrates, and analysis of the locations of genes across multiple species suggests that almost all of the phototransduction genes originally resided on at most five paralogons, and conceivably fewer. Indeed, it seems possible that our entire genome may be the reorganised remnant of just one huge paralogous arrangement of genes (spread across all of the post-WGD chromosomes) that resulted from two successive duplications of an ancestral chordate genome.

9

Future directions

The picture of evolution of vertebrate phototransduction presented here is ripe for further investigation, especially in the following directions: 1. To date, the analysis of phototransduction gene synteny has been restricted to examination of other gene families in the immediate neighbourhood of phototransduction genes, in just a handful of species, and using laborious manual processes. What is needed for the future is an extension, via automated processing, to all of the ohnolog families across multiple vertebrate genomes, so as link together the various regions containing the phototransduction gene families, in order to obtain a more comprehensive view of the paralogon structure and the continuity of ancestral chromosomal rows. 2. Likewise, the phylogenetic analysis needs to be extended to include the huge number of non-phototransduction ohnolog gene families – especially the 200 or so ‘quartet’ ohnolog families – so as to extend to the entire genome the evidence for those pairs of chromosomal rows that diverged at the first round of WGD. 3. It will also be immensely valuable to undertake comparable phylogenetic and syntenic analysis of the genes involved in other processes that are related to phototransduction – e.g. for the genes of the retinoid cycle, for those involved in synaptic transmission to bipolar cells, and for those involved in the homologous transduction cascade in ON-bipolar cells. The evolution of those processes occurred in parallel with the evolution of phototransduction, and there would undoubtedly have been links between innovations in each of the interacting systems. Hence, an understanding of the evolution of the genes involved in each of these processes will help advance our understanding of the others, and of the manner in which the systems cooperate. 4. Similarly, it will be extremely valuable to undertake a comparable analysis of the transcription factors that specify the development of the retina, especially in relation to the division of labour (Arendt et al., 2009) and the gain and loss of cell types (Musser and Arendt, 2017) that occurred during the evolution of the rod/cone dichotomy. Interestingly,

37 the genes for several such factors are ohnologs that reside close to phototransduction ohnolog families; these include: the RAXs, ONECUTs, THRs and RARs.

Acknowledgements I am indebted to Professor David M Hunt for his collaboration on the four original studies upon which much of this review is based. I am also most grateful to three anonymous reviewers whose comments substantially improved the paper.

38

References Alvarez, C.E., 2008. On the origins of arrestin and rhodopsin. BMC Evol. Biol. 8, 222. https://doi.org/10.1186/1471-2148-8-222 Ames, J.B., Lim, S., 2012. Molecular structure and target recognition of neuronal calcium sensor proteins. Biochim. Biophys. Acta 1820, 1205–1213. https://doi.org/10.1016/j.bbagen.2011.10.003 Arendt, D., Hausen, H., Purschke, G., 2009. The “division of labour” model of eye evolution. Philos. Trans. R. Soc. Lond., B 364, 2809–2817. https://doi.org/10.1098/rstb.2009.0104 Asteriti, S., Grillner, S., Cangiano, L., 2015. A Cambrian origin for vertebrate rods. eLife 4. https://doi.org/10.7554/eLife.07166 Azadi, S., Molday, L.L., Molday, R.S., 2010. RD3, the protein associated with Leber congenital amaurosis type 12, is required for guanylate cyclase trafficking in photoreceptor cells. Proc. Natl. Acad. Sci. U.S.A. 107, 21158–21163. https://doi.org/10.1073/pnas.1010460107 Bauer, P.J., Drechsler, M., 1992. Association of cyclic GMP-gated channels and Na+-Ca2+-K+ exchangers in bovine retinal rod outer segment plasma membranes. J. Physiol. 451, 109– 131. Bereta, G., Wang, B., Kiser, P.D., Baehr, W., Jang, G.-F., Palczewski, K., 2010. A functional kinase homology domain is essential for the activity of photoreceptor guanylate cyclase 1. J. Biol. Chem. 285, 1899–1908. https://doi.org/10.1074/jbc.M109.061713 Cervetto, L., Lagnado, L., Perry, R.J., Robinson, D.W., McNaughton, P.A., 1989. Extrusion of calcium from rod outer segments is driven by both sodium and potassium gradients. Nature 337, 740–743. https://doi.org/10.1038/337740a0 Collin, S.P., Hart, N.S., Wallace, K.M., Shand, J., Potter, I.C., 2004. Vision in the southern hemisphere lamprey Mordacia mordax: spatial distribution, spectral absorption characteristics, and optical sensitivity of a single class of retinal photoreceptor. Vis. Neurosci. 21, 765–773. https://doi.org/10.1017/S0952523804215103 Cowan, C.W., Fariss, R.N., Sokal, I., Palczewski, K., Wensel, T.G., 1998. High expression levels in cones of RGS9, the predominant GTPase accelerating protein of rods. Proc. Natl. Acad. Sci. U.S.A. 95, 5351–5356. https://doi.org/10.1073/pnas.95.9.5351 Cuenca, N., Lopez, S., Howes, K., Kolb, H., 1998. The localization of guanylyl cyclase-activating proteins in the mammalian retina. Invest. Ophthalmol. Vis. Sci. 39, 1243–1250. Delbridge, M.L., Patel, H.R., Waters, P.D., McMillan, D.A., Marshall Graves, J.A., 2009. Does the human X contain a third evolutionary block? Origin of genes on human Xp11 and Xq28. Genome Res. 19, 1350–1360. https://doi.org/10.1101/gr.088625.108 Dell’Angelica, E.C., 2001. Clathrin-binding proteins: got a motif? Join the network! Trends Cell Biol. 11, 315–318. Dizhoor, A.M., Lowe, D.G., Olshevskaya, E.V., Laura, R.P., Hurley, J.B., 1994. The human photoreceptor membrane guanylyl cyclase, RetGC, is present in outer segments and is regulated by calcium and a soluble activator. Neuron 12, 1345–1352. Dizhoor, A.M., Olshevskaya, E.V., Henzel, W.J., Wong, S.C., Stults, J.T., Ankoudinova, I., Hurley, J.B., 1995. Cloning, sequencing, and expression of a 24-kDa Ca2+-binding protein activating photoreceptor guanylyl cyclase. J. Biol. Chem. 270, 25200–25206. Emery, L., Whelan, S., Hirschi, K.D., Pittman, J.K., 2012. Protein phylogenetic analysis of Ca2+/cation antiporters and insights into their evolution in plants. Front. Plant Sci. 3, 1. https://doi.org/10.3389/fpls.2012.00001 Erwin, D.H., Laflamme, M., Tweedt, S.M., Sperling, E.A., Pisani, D., Peterson, K.J., 2011. The Cambrian conundrum: early divergence and later ecological success in the early history of animals. Science 334, 1091–1097. https://doi.org/10.1126/science.1206375

39 Felsenstein, J., 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39, 783–791. https://doi.org/10.1111/j.1558-5646.1985.tb00420.x Gomez, M.D.P., Espinosa, L., Ramirez, N., Nasi, E., 2011. Arrestin in ciliary invertebrate photoreceptors: molecular identification and functional analysis in vivo. J. Neurosci. 31, 1811–1819. https://doi.org/10.1523/JNEUROSCI.3320-10.2011 Gorman, A.L., McReynolds, J.S., Barnes, S.N., 1971. Photoreceptors in primitive chordates: fine structure, hyperpolarizing receptor potentials, and evolution. Science 172, 1052–1054. Gurevich, E.V., Gurevich, V.V., 2006. Arrestins: ubiquitous regulators of cellular signaling pathways. Genome Biol. 7, 236. https://doi.org/10.1186/gb-2006-7-9-236 Hart, N.S., Lamb, T.D., Patel, H.R., Chuah, A., Natoli, R.C., Hudson, N.J., Cuttmore, S.C., Davies, W.I.L., Collin, S.P., Hunt, D.M., in preparation. Ciliary opsin diversity in elasmobranchs: functional implications and new perspectives on the evolution of vertebrate vision. Herrero, J., Muffato, M., Beal, K., Fitzgerald, S., Gordon, L., Pignatelli, M., Vilella, A.J., Searle, S.M.J., Amode, R., Brent, S., Spooner, W., Kulesha, E., Yates, A., Flicek, P., 2016. Ensembl comparative genomics resources. Database 2016, bav096. https://doi.org/10.1093/database/bav096 Hisatomi, O., Tokunaga, F., 2002. Molecular evolution of proteins involved in vertebrate phototransduction. Comp. Biochem. Physiol. B 133, 509–522. Hoang, D.T., Chernomor, O., von Haeseler, A., Minh, B.Q., Vinh, L.S., 2018. UFBoot2: Improving the ultrafast bootstrap approximation. Mol. Biol. Evol. 35, 518–522. https://doi.org/10.1093/molbev/msx281 Hofmann, K.P., Scheerer, P., Hildebrand, P.W., Choe, H.-W., Park, J.H., Heck, M., Ernst, O.P., 2009. A G protein-coupled receptor at work: the rhodopsin model. Trends Biochem. Sci. 34, 540–552. https://doi.org/10.1016/j.tibs.2009.07.005 Imanishi, Y., Li, N., Sokal, I., Sowa, M.E., Lichtarge, O., Wensel, T.G., Saperstein, D.A., Baehr, W., Palczewski, K., 2002. Characterization of retinal guanylate cyclase-activating protein 3 (GCAP3) from zebrafish to man. Eur. J. Neurosci. 15, 63–78. Imanishi, Y., Yang, L., Sokal, I., Filipek, S., Palczewski, K., Baehr, W., 2004. Diversity of guanylate cyclase-activating proteins (GCAPs) in teleost fish: characterization of three novel GCAPs (GCAP4, GCAP5, GCAP7) from zebrafish (Danio rerio) and prediction of eight GCAPs (GCAP1-8) in pufferfish (Fugu rubripes). J. Mol. Evol. 59, 204–217. https://doi.org/10.1007/s00239-004-2614-y Kang, D.S., Kern, R.C., Puthenveedu, M.A., von Zastrow, M., Williams, J.C., Benovic, J.L., 2009. Structure of an arrestin2-clathrin complex reveals a novel clathrin binding domain that modulates receptor trafficking. J. Biol. Chem. 284, 29860–29872. https://doi.org/10.1074/jbc.M109.023366 Katoh, K., Standley, D.M., 2013. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780. https://doi.org/10.1093/molbev/mst010 Kaupp, U.B., Seifert, R., 2002. Cyclic nucleotide-gated ion channels. Physiol. Rev. 82, 769–824. https://doi.org/10.1152/physrev.00008.2002 Kelsell, R.E., Gregory-Evans, K., Payne, A.M., Perrault, I., Kaplan, J., Yang, R.B., Garbers, D.L., Bird, A.C., Moore, A.T., Hunt, D.M., 1998. Mutations in the retinal guanylate cyclase (RETGC-1) gene in dominant cone-rod dystrophy. Hum. Mol. Genet. 7, 1179–1184. Kishino, H., Miyata, T., Hasegawa, M., 1990. Maximum likelihood inference of protein phylogeny and the origin of chloroplasts. J. Mol. Evol. 31, 151–160. https://doi.org/10.1007/BF02109483

40 Kleinschmidt, J., Dowling, J.E., 1975. Intracellular recordings from gecko photoreceptors during light and dark adaptation. J. Gen. Physiol. 66, 617–648. https://doi.org/10.1085/jgp.66.5.617 Koch, K.-W., Dell’Orco, D., 2015. Protein and signaling networks in vertebrate photoreceptor cells. Front. Mol. Neurosci. 8, 67. https://doi.org/10.3389/fnmol.2015.00067 Koch, K.W., Stryer, L., 1988. Highly cooperative feedback control of retinal rod guanylate cyclase by calcium ions. Nature 334, 64–66. https://doi.org/10.1038/334064a0 Krupnick, J.G., Goodman, O.B., Keen, J.H., Benovic, J.L., 1997. Arrestin/clathrin interaction. Localization of the clathrin binding domain of nonvisual arrestins to the carboxy terminus. J. Biol. Chem. 272, 15011–15016. https://doi.org/10.1074/jbc.272.23.15011 Kuhn, M., 2016. Molecular physiology of membrane guanylyl cyclase receptors. Physiol. Rev. 96, 751–804. https://doi.org/10.1152/physrev.00022.2015 Kumar, S., Stecher, G., Suleski, M., Hedges, S.B., 2017. TimeTree: a resource for timelines, timetrees, and divergence times. Mol. Biol. Evol. 34, 1812–1819. https://doi.org/10.1093/molbev/msx116 Lagman, D., Franzén, I.E., Eggert, J., Larhammar, D., Abalo, X.M., 2016. Evolution and expression of the phosphodiesterase 6 genes unveils vertebrate novelty to control photosensitivity. BMC Evol. Biol. 16, 124. https://doi.org/10.1186/s12862-016-0695-z Lagman, D., Ocampo Daza, D., Widmark, J., Abalo, X.M., Sundström, G., Larhammar, D., 2013. The vertebrate ancestral repertoire of visual opsins, transducin alpha subunits and oxytocin/vasopressin receptors was established by duplication of their shared genomic region in the two rounds of early vertebrate genome duplications. BMC Evol. Biol. 13, 238. https://doi.org/10.1186/1471-2148-13-238 Lagman, D., Sundström, G., Ocampo Daza, D., Abalo, X.M., Larhammar, D., 2012. Expansion of transducin subunit gene families in early vertebrate tetraploidizations. Genomics 100, 203– 211. https://doi.org/10.1016/j.ygeno.2012.07.005 Lagnado, L., Cervetto, L., McNaughton, P.A., 1992. Calcium homeostasis in the outer segments of retinal rods from the tiger salamander. J. Physiol. 455, 111–142. Lamb, T.D., 2013. Evolution of phototransduction, vertebrate photoreceptors and retina. Prog. Retin. Eye Res. 36, 52–119. https://doi.org/10.1016/j.preteyeres.2013.06.001 Lamb, T.D., Heck, M., Kraft, T.W., 2018a. Implications of dimeric activation of PDE6 for rod phototransduction. Open Biol. 8. https://doi.org/10.1098/rsob.180076 Lamb, T.D., Hunt, D.M., 2018. Evolution of the calcium feedback steps of vertebrate phototransduction. Open Biol. 8. https://doi.org/10.1098/rsob.180119 Lamb, T.D., Hunt, D.M., 2017. Evolution of the vertebrate phototransduction cascade activation steps. Devel. Biol. 431, 77–92. https://doi.org/10.1016/j.ydbio.2017.03.018 Lamb, T.D., Patel, H., Chuah, A., Natoli, R.C., Davies, W.I.L., Hart, N.S., Collin, S.P., Hunt, D.M., 2016. Evolution of vertebrate phototransduction: Cascade activation. Mol. Biol. Evol. 33, 2064–2087. https://doi.org/10.1093/molbev/msw095 Lamb, T.D., Patel, H.R., Chuah, A., Hunt, D.M., 2018b. Evolution of the shut-off steps of vertebrate phototransduction. Open Biol. 8, 170232. https://doi.org/10.1098/rsob.170232 Larhammar, D., Nordström, K., Larsson, T.A., 2009. Evolution of vertebrate rod and cone phototransduction genes. Philos. Trans. R. Soc. Lond., B 364, 2867–2880. https://doi.org/10.1098/rstb.2009.0077 Le, S.Q., Gascuel, O., 2008. An improved general amino acid replacement matrix. Mol. Biol. Evol. 25, 1307–1320. https://doi.org/10.1093/molbev/msn067 Lim, S., Dizhoor, A.M., Ames, J.B., 2014. Structural diversity of neuronal calcium sensor proteins and insights for activation of retinal guanylyl cyclase by GCAP1. Front. Mol. Neurosci. 7, 19. https://doi.org/10.3389/fnmol.2014.00019

41 Lim, S., Roseman, G., Peshenko, I., Manchala, G., Cudia, D., Dizhoor, A.M., Millhauser, G., Ames, J.B., 2018. Retinal guanylyl cyclase activating protein 1 forms a functional dimer. PLoS One 13, e0193947. https://doi.org/10.1371/journal.pone.0193947 Lokits, A.D., Indrischek, H., Meiler, J., Hamm, H.E., Stadler, P.F., 2018. Tracing the evolution of the heterotrimeric G protein α subunit in Metazoa. BMC Evol. Biol. 18, 51. https://doi.org/10.1186/s12862-018-1147-8 Lovell, P.V., Wirthlin, M., Wilhelm, L., Minx, P., Lazar, N.H., Carbone, L., Warren, W.C., Mello, C.V., 2014. Conserved syntenic clusters of protein coding genes are missing in birds. Genome Biol. 15, 565. https://doi.org/10.1186/s13059-014-0565-1 Martemyanov, K.A., Krispel, C.M., Lishko, P.V., Burns, M.E., Arshavsky, V.Y., 2008. Functional comparison of RGS9 splice isoforms in a living cell. Proc. Natl. Acad. Sci. U.S.A. 105, 20988–20993. https://doi.org/10.1073/pnas.0808941106 Matthews, H.R., Murphy, R.L., Fain, G.L., Lamb, T.D., 1988. Photoreceptor light adaptation is mediated by cytoplasmic calcium concentration. Nature 334, 67–69. https://doi.org/10.1038/334067a0 Muffato, M., Louis, A., Poisnel, C.-E., Roest Crollius, H., 2010. Genomicus: a database and a browser to study gene synteny in modern and ancestral genomes. Bioinformatics 26, 1119– 1121. https://doi.org/10.1093/bioinformatics/btq079 Mushegian, A., Gurevich, V.V., Gurevich, E.V., 2012. The origin and evolution of G proteincoupled receptor kinases. PLoS One 7, e33806. https://doi.org/10.1371/journal.pone.0033806 Musser, J.M., Arendt, D., 2017. Loss and gain of cone types in vertebrate ciliary photoreceptor evolution. Dev. Biol. 431, 26–35. https://doi.org/10.1016/j.ydbio.2017.08.038 Nakatani, K., Yau, K.W., 1988. Calcium and light adaptation in retinal rods and cones. Nature 334, 69–71. https://doi.org/10.1038/334069a0 Nakatani, Y., Takeda, H., Kohara, Y., Morishita, S., 2007. Reconstruction of the vertebrate ancestral genome reveals dynamic genome reorganization in early vertebrates. Genome Res. 17, 1254–1265. https://doi.org/10.1101/gr.6316407 Nguyen, L.-T., Schmidt, H.A., von Haeseler, A., Minh, B.Q., 2015. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274. https://doi.org/10.1093/molbev/msu300 Nordström, K., Larsson, T.A., Larhammar, D., 2004. Extensive duplications of phototransduction genes in early vertebrate evolution correlate with block (chromosome) duplications. Genomics 83, 852–872. https://doi.org/10.1016/j.ygeno.2003.11.008 Ocampo Daza, D., Sundström, G., Bergqvist, C.A., Larhammar, D., 2012. The evolution of vertebrate somatostatin receptors and their gene regions involves extensive chromosomal rearrangements. BMC Evol. Biol. 12, 231. https://doi.org/10.1186/1471-2148-12-231 Ohno, S., 1970. Evolution by Gene Duplication. Allen and Unwin, London. Okano, T., Kojima, D., Fukada, Y., Shichida, Y., Yoshizawa, T., 1992. Primary structures of chicken cone visual pigments: vertebrate rhodopsins have evolved out of cone visual pigments. Proc. Natl. Acad. Sci. U.S.A. 89, 5932–5936. Olshevskaya, E.V., Peshenko, I.V., Savchenko, A.B., Dizhoor, A.M., 2012. Retinal guanylyl cyclase isozyme 1 is the preferential in vivo target for constitutively active GCAP1 mutants causing congenital degeneration of photoreceptors. J. Neurosci. 32, 7208–7217. https://doi.org/10.1523/JNEUROSCI.0976-12.2012 Osawa, S., Weiss, E.R., 2012. A tale of two kinases in rods and cones. Adv. Exp. Med. Biol. 723, 821–827. https://doi.org/10.1007/978-1-4614-0631-0_105 Perrault, I., Rozet, J.M., Calvas, P., Gerber, S., Camuzat, A., Dollfus, H., Châtelin, S., Souied, E., Ghazi, I., Leowski, C., Bonnemaison, M., Le Paslier, D., Frézal, J., Dufier, J.L., Pittler, S., Munnich, A., Kaplan, J., 1996. Retinal-specific guanylate cyclase gene mutations in

42 Leber’s congenital amaurosis. Nat. Genet. 14, 461–464. https://doi.org/10.1038/ng1296461 Peshenko, I.V., Dizhoor, A.M., 2006. Ca2+ and Mg2+ binding properties of GCAP-1. Evidence that Mg2+-bound form is the physiological activator of photoreceptor guanylyl cyclase. J. Biol. Chem. 281, 23830–23841. https://doi.org/10.1074/jbc.M600257200 Peshenko, I.V., Olshevskaya, E.V., Azadi, S., Molday, L.L., Molday, R.S., Dizhoor, A.M., 2011. Retinal degeneration 3 (RD3) protein inhibits catalytic activity of retinal membrane guanylyl cyclase (RetGC) and its stimulation by activating proteins. Biochemistry 50, 9511–9519. https://doi.org/10.1021/bi201342b Peshenko, I.V., Olshevskaya, E.V., Dizhoor, A.M., 2015. Evaluating the role of retinal membrane guanylyl cyclase 1 (RetGC1) domains in binding guanylyl cyclase-activating proteins (GCAPs). J. Biol. Chem. 290, 6913–6924. https://doi.org/10.1074/jbc.M114.629642 Peshenko, I.V., Olshevskaya, E.V., Lim, S., Ames, J.B., Dizhoor, A.M., 2014. Identification of target binding site in photoreceptor guanylyl cyclase-activating protein 1 (GCAP1). J. Biol. Chem. 289, 10140–10154. https://doi.org/10.1074/jbc.M113.540716 Poetsch, A., Molday, L.L., Molday, R.S., 2001. The cGMP-gated channel and related glutamic acid-rich proteins interact with peripherin-2 at the rim region of rod photoreceptor disc membranes. J. Biol. Chem. 276, 48009–48016. https://doi.org/10.1074/jbc.M108941200 Putnam, N.H., Butts, T., Ferrier, D.E.K., Furlong, R.F., Hellsten, U., Kawashima, T., RobinsonRechavi, M., Shoguchi, E., Terry, A., Yu, J.-K., Benito-Gutiérrez, E.L., Dubchak, I., Garcia-Fernàndez, J., Gibson-Brown, J.J., Grigoriev, I.V., Horton, A.C., de Jong, P.J., Jurka, J., Kapitonov, V.V., Kohara, Y., Kuroki, Y., Lindquist, E., Lucas, S., Osoegawa, K., Pennacchio, L.A., Salamov, A.A., Satou, Y., Sauka-Spengler, T., Schmutz, J., Shin-I, T., Toyoda, A., Bronner-Fraser, M., Fujiyama, A., Holland, L.Z., Holland, P.W.H., Satoh, N., Rokhsar, D.S., 2008. The amphioxus genome and the evolution of the chordate karyotype. Nature 453, 1064–1071. https://doi.org/10.1038/nature06967 Qureshi, B.M., Behrmann, E., Schöneberg, J., Loerke, J., Bürger, J., Mielke, T., Giesebrecht, J., Noé, F., Lamb, T.D., Hofmann, K.P., Spahn, C.M.T., Heck, M., 2018. It takes two transducins to activate the cGMP-phosphodiesterase 6 in retinal rods. Open Biol. 8. https://doi.org/10.1098/rsob.180075 Ramamurthy, V., Tucker, C., Wilkie, S.E., Daggett, V., Hunt, D.M., Hurley, J.B., 2001. Interactions within the coiled-coil domain of RetGC-1 guanylyl cyclase are optimized for regulation rather than for high affinity. J. Biol. Chem. 276, 26218–26229. https://doi.org/10.1074/jbc.M010495200 Rätscho, N., Scholten, A., Koch, K.-W., 2009. Expression profiles of three novel sensory guanylate cyclases and guanylate cyclase-activating proteins in the zebrafish retina. Biochim. Biophys. Acta 1793, 1110–1114. https://doi.org/10.1016/j.bbamcr.2008.12.021 Ratto, G.M., Payne, R., Owen, W.G., Tsien, R.Y., 1988. The concentration of cytosolic free calcium in vertebrate rod outer segments measured with fura-2. J. Neurosci. 8, 3240–3246. Rispoli, G., Sather, W.A., Detwiler, P.B., 1993. Visual transduction in dialysed detached rod outer segments from lizard retina. J. Physiol. 465, 513–537. https://doi.org/10.1113/jphysiol.1993.sp019691 Röll, B., 2000. Gecko vision - visual cells, evolution, and ecological constraints. J. Neurocytol. 29, 471–484. Sato, K., Yamashita, T., Kojima, K., Sakai, K., Matsutani, Y., Yanagawa, M., Yamano, Y., Wada, A., Iwabe, N., Ohuchi, H., Shichida, Y., 2018. Pinopsin evolved as the ancestral dim-light visual opsin in vertebrates. Commun. Biol. 1, 156. https://doi.org/10.1038/s42003-0180164-x Satoh, N., Rokhsar, D., Nishikawa, T., 2014. Chordate evolution and the three-phylum system. Proc. Biol. Sci. 281, 20141729. https://doi.org/10.1098/rspb.2014.1729

43 Schnetkamp, P.P., Basu, D.K., Szerencsei, R.T., 1989. Na+-Ca2+ exchange in bovine rod outer segments requires and transports K+. Am. J. Physiol. 257, C153-157. https://doi.org/10.1152/ajpcell.1989.257.1.C153 Schnetkamp, P.P.M., 2013. The SLC24 gene family of Na+/Ca2+-K+ exchangers: from sight and smell to memory consolidation and skin pigmentation. Mol. Aspects Med. 34, 455–464. https://doi.org/10.1016/j.mam.2012.07.008 Schnetkamp, P.P.M., Jalloul, A.H., Liu, G., Szerencsei, R.T., 2014. The SLC24 family of K+dependent Na+-Ca2+ exchangers: structure-function relationships. Curr. Top. Membr. 73, 263–287. https://doi.org/10.1016/B978-0-12-800223-0.00007-4 Scholten, A., Koch, K.-W., 2011. Differential calcium signaling by cone specific guanylate cyclase-activating proteins from the zebrafish retina. PLoS One 6, e23117. https://doi.org/10.1371/journal.pone.0023117 Schwarzer, A., Kim, T.S., Hagen, V., Molday, R.S., Bauer, P.J., 1997. The Na/Ca-K exchanger of rod photoreceptor exists as dimer in the plasma membrane. Biochemistry 36, 13667– 13676. https://doi.org/10.1021/bi9710232 Sharon, D., Wimberg, H., Kinarty, Y., Koch, K.-W., 2018. Genotype-functional-phenotype correlations in photoreceptor guanylate cyclase (GC-E) encoded by GUCY2D. Prog. Retin. Eye Res. 63, 69–91. https://doi.org/10.1016/j.preteyeres.2017.10.003 Shimodaira, H., 2002. An approximately unbiased test of phylogenetic tree selection. Syst. Biol. 51, 492–508. https://doi.org/10.1080/10635150290069913 Singh, P.P., Arora, J., Isambert, H., 2015. Identification of Ohnolog Genes Originating from Whole Genome Duplication in Early Vertebrates, Based on Synteny Comparison across Multiple Genomes. PLoS Comput. Biol. 11, e1004394. https://doi.org/10.1371/journal.pcbi.1004394 Singh, P.P., Isambert, H., 2019. OHNOLOGS v2: a comprehensive resource for the genes retained from whole genome duplication in vertebrates. Nucleic Acids Res. https://doi.org/10.1093/nar/gkz909 Stephan, A.B., Tobochnik, S., Dibattista, M., Wall, C.M., Reisert, J., Zhao, H., 2011. The Na(+)/Ca(2+) exchanger NCKX4 governs termination and adaptation of the mammalian olfactory response. Nat. Neurosci. 15, 131–137. https://doi.org/10.1038/nn.2943 Strimmer, K., Rambaut, A., 2002. Inferring confidence sets of possibly misspecified gene trees. Proc. R. Soc. B 269, 137–142. https://doi.org/10.1098/rspb.2001.1862 Terakita, A., Kawano‐Yamashita, E., Koyanagi, M., 2012. Evolution and diversity of opsins. WIREs Membr. Transp. Signal. 1, 104–111. https://doi.org/10.1002/wmts.6 Tostivint, H., Ocampo Daza, D., Bergqvist, C.A., Quan, F.B., Bougerol, M., Lihrmann, I., Larhammar, D., 2014. Molecular evolution of GPCRs: Somatostatin/urotensin II receptors. J. Mol. Endocrinol. 52, T61-86. https://doi.org/10.1530/JME-13-0274 Tucker, C.L., Woodcock, S.C., Kelsell, R.E., Ramamurthy, V., Hunt, D.M., Hurley, J.B., 1999. Biochemical analysis of a dimerization domain mutation in RetGC-1 associated with dominant cone-rod dystrophy. Proc. Natl. Acad. Sci. U.S.A. 96, 9039–9044. Vinberg, F., Chen, J., Kefalov, V.J., 2018. Regulation of calcium homeostasis in the outer segments of rod and cone photoreceptors. Prog. Retin. Eye Res. 67, 87–101. https://doi.org/10.1016/j.preteyeres.2018.06.001 Vinberg, F., Wang, T., De Maria, A., Zhao, H., Bassnett, S., Chen, J., Kefalov, V.J., 2017. The Na+/Ca2+, K+ exchanger NCKX4 is required for efficient cone-mediated vision. Elife 6. https://doi.org/10.7554/eLife.24550 Vopalensky, P., Pergner, J., Liegertova, M., Benito-Gutierrez, E., Arendt, D., Kozmik, Z., 2012. Molecular analysis of the amphioxus frontal eye unravels the evolutionary origin of the retina and pigment cells of the vertebrate eye. Proc. Natl. Acad. Sci. U.S.A. 109, 15383– 15388. https://doi.org/10.1073/pnas.1207580109

44 Wada, Y., Sugiyama, J., Okano, T., Fukada, Y., 2006. GRK1 and GRK7: unique cellular distribution and widely different activities of opsin phosphorylation in the zebrafish rods and cones. J. Neurochem. 98, 824–837. https://doi.org/10.1111/j.1471-4159.2006.03920.x Walls, G.L., 1942. The vertebrate eye and its adaptive radiation. Cranbrook Institute of Science, London, England. Wang, X., Plachetzki, D.C., Cote, R.H., 2019. The N termini of the inhibitory γ-subunits of phosphodiesterase-6 (PDE6) from rod and cone photoreceptors differentially regulate transducin-mediated PDE6 activation. J. Biol. Chem. 294, 8351–8360. https://doi.org/10.1074/jbc.RA119.007520 Whelan, S., Goldman, N., 2001. A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol. Biol. Evol. 18, 691– 699. https://doi.org/10.1093/oxfordjournals.molbev.a003851 Woodruff, M.L., Sampath, A.P., Matthews, H.R., Krasnoperova, N.V., Lem, J., Fain, G.L., 2002. Measurement of cytoplasmic calcium concentration in the rods of wild-type and transducin knock-out mice. J. Physiol. 542, 843–854. https://doi.org/10.1113/jphysiol.2001.013987 Yang, R.B., Garbers, D.L., 1997. Two eye guanylyl cyclases are expressed in the same photoreceptor cells and form homomers in preference to heteromers. J. Biol. Chem. 272, 13738–13742. https://doi.org/10.1074/jbc.272.21.13738 Yang, R.B., Robinson, S.W., Xiong, W.H., Yau, K.W., Birch, D.G., Garbers, D.L., 1999. Disruption of a retinal guanylyl cyclase gene leads to cone-specific dystrophy and paradoxical rod behavior. J. Neurosci. 19, 5889–5897. Yau, K.W., Nakatani, K., 1984. Electrogenic Na-Ca exchange in retinal rod outer segment. Nature 311, 661–663. Zhang, X., Wensel, T.G., Kraft, T.W., 2003. GTPase regulators and photoresponses in cones of the eastern chipmunk. J. Neurosci. 23, 1287–1297. Zhang, X., Wensel, T.G., Yuan, C., 2006. Tokay gecko photoreceptors achieve rod-like physiology with cone-like proteins. Photochem. Photobiol. 82, 1452–1460. https://doi.org/10.1562/2006-01-05-RA-767 Zhang, Z., Artemyev, N.O., 2010. Determinants for phosphodiesterase 6 inhibition by its gammasubunit. Biochemistry 49, 3862–3867. https://doi.org/10.1021/bi100354a Zhao, X., Yokoyama, K., Whitten, M.E., Huang, J., Gelb, M.H., Palczewski, K., 1999. A novel form of rhodopsin kinase from chicken retina and pineal gland. FEBS Lett. 454, 115–121. https://doi.org/10.1016/s0014-5793(99)00764-4

45

Tables and Figures

Table 1. Phototransduction cascade proteins and genes

Protein name

Gene name in human

Cell

Has role in:

Protein description

Rh1 (Rh)

RHO

Rod

Activation

Rhodopsin

LWS

OPN1LW OPN1MW

L-cone M-cone

Activation

Long-wave-sensitive opsin

SWS1

OPN1SW

S-cone

Activation

Short-wave-sensitive opsin

SWS2



Cone (or rod)

Activation

Green-sensitive opsin

Rh2



Cone

Activation

Blue-sensitive opsin

Gαt1 Gαt2

GNAT1 GNAT2

Rod Cone

Activation

Transducin α subunit

Gβ1 Gβ3

GNB1 GNB3

Rod Cone

Activation

G-protein β subunit 1, 3

Gγt1 Gγt2

GNGT1 GNGT2

Rod Cone

Activation

Transducin γ subunit

PDEα PDEβ PDEα’

PDE6A PDE6B PDE6C

Rod Rod Cone

Activation

PDE6 catalytic subunit (dimer)

PDEγ PDEγ’ PDEγ’’

PDE6G PDE6H PDE6I

Rod Cone ?

Activation

PDE6 inhibitory subunit

CNGCα1 CNGCα3

CNGA1 CNGA3

Rod Cone

Activation & Ca-feedback

Cyclic nucleotide-gated channel α

CNGCβ1 CNGCβ3

CNGB1 CNGB3

Rod Cone

Activation & Ca-feedback

Cyclic nucleotide-gated channel β

GRK1A GRK1B GRK7

GRK1 – GRK7

Rod Cone Cone

Recovery

G-protein receptor kinase

Arr-S Arr-C

SAG ARR3

Rod Cone

Recovery

Arrestin

RGS9

RGS9

Both

Recovery

Regulator of G-protein signalling 9

Gβ5

GNB5

Both

Recovery

G-protein β subunit 5

R9AP

RGS9BP

Both

Recovery

RGS9 anchor protein

46 GC-E GC-F GC-D

GUCY2D GUCY2F –

Both Both ?

Recovery & Ca-feedback

Guanylyl cyclase

GCAP1 GCAP1L GCAP2 GCAP2L GCAP3 GCIP

GUCA1A – GUCA1B – GUCA1C –

Both ? Both ? Cone Cone

Ca-feedback

Guanylyl cyclase activating protein

Rec Visinin

RCVRN –

Rod Cone

Ca-feedback

Recoverin Visinin

NCKX1 NCKX2

SLC24A1 SLC24A2

Rod Cone

Ca-feedback

Na /Ca ,K ion exchanger

+

2+

+

List of proteins with known functions in activation, recovery, and Ca-feedback regulation of the phototransduction cascade. Names of the human genes are given in column 2. As in Figure 1, where there is a clear distinction between expression in cones and rods, the genes are coloured red for cones and blue for rods, while black (and ‘Both’) indicates expression in both cones and rods. The question marks are for isoforms not expressed in mammalian rods and cones, and where the class of cell in which they are used is unclear. Numerous additional proteins have important functions in cones and rods, other than involvement in phototransduction, but are not listed here.

47

Figures

RCVRN Visinin

GRK1 GRK7

SAG ARR3

RGS9 GNB5 RGS9BP

GUCA1A,B GUCY2D GUCA1C GUCY2F

SLC24A1 SLC24A2

© 2019 Chris Conway Lamb

S RHO, OPN1LW M Rh1, Rh2, SWS1, SWS2, LWS

GNGT1 GNGT2 GNAT1 GNB1 GNAT2 GNB3

PDE6A,B PDE6C

CNGA1 CNGA3

PDE6G PDE6H

CNGB1 CNGB3

Rod Cone Common

Figure 1. Schematic representation of the phototransduction cascade in cones and rods The proteins involved in phototransduction in vertebrate cones and rods are depicted schematically. The cytoplasmic surface of the lipid membrane is shown uppermost, and the illustrated arrangement is for the sac/plasma membrane of cone photoreceptors. In rods, the ion channel and exchanger are located in the plasma membrane, whereas all the other proteins are located in the disc membrane, that has become pinched-off from the plasma membrane. Rh: rhodopsin, or a cone opsin. G: heterotrimeric G protein, transducin. PDE: tetrameric cGMP phosphodiesterase, PDE6. CNGC: tetrameric cyclic nucleotide-gated ion channel. NCKX: sodium/calcium-potassium exchanger. Rec: recoverin or visinin. GRK: G-protein receptor kinase. Arr: arrestin. RGS9: regulator of G-protein signalling. GC: guanylyl cyclase. GCAP: guanylyl cyclase activating protein. cG: cytoplasmic messenger, cGMP. Ca2+: cytoplasmic calcium. Boxes list the HGNC gene names of the human isoforms. Blue denotes isoforms expressed primarily in rods, and red denotes isoforms expressed primarily in cones. Lower line in the Rh box lists the names of the five opsin isoforms found across vertebrates other than mammals. © 2019 Chris Conway Lamb, with permission.

48

A Placentals Bilateria

Chordates

Vertebrates

Deuterostomes

Bony

Amniotes

Jawed

Mammals

Tetrapods Mya

800

700

B

600

500

400

300

200

100

0

Lancelets Chordates

Tunicates Hagfish

Agnathans

Lampreys 1R 2R

Sharks Rays 3R

Cartilaginous fish

Gar, etc.

Ray-finned fish

Teleosts Coelacanth Amphibians Reptiles* Birds

Sauropsids

Marsupials Placentals

Mammals

Figure 2. Species phylogeny A. Horizontal blue line shows divergences from our own lineage (of placental mammals) as a function of estimated times of divergence in millions of years ago (Mya). Those diverging lineages and their names are shown at an angle; the unnamed short lineage diverging around 400 Mya indicates coelacanth (and lungfish). Horizontal text below the line gives the name of each clade to the right of the adjacent dotted vertical line. B. A broadly-accepted view of the branching pattern for those lineages from which molecular sequences are analysed in the phylogenies of Sections 3–7. The asterisk against Reptiles is to caution that this conventional terminology must be used with care, because reptiles do not form a monophyletic clade; for example, alligators and turtles may be related more closely to birds than they are to snakes, lizards, etc., so that alligator and turtle sequences tend to clade with those for birds. Currently, chordates are classified as a phylum, with lancelets, tunicates and vertebrates each as sub-phyla. However, there has been a proposal to reclassify chordates as a super-phylum, with lancelets, tunicates and vertebrates each as phyla (Satoh et al., 2014).

49

GNAT1 Human NP-000163 GNAT1 Cattle NP-851365 97 GNAT1 Rat NP-001102250 GNAT1 Koala XP-020854749 99 GNAT1 Wombat XP-027692882 GNAT1 GNAT1 Opossum XP-001368199 53 GNAT1 Platypus XP-028907302 96 GNAT1 Xenopus NP-001096278 44 GNAT1 Nanorana XP-018429566 100 GNAT1 Quail XP-015730141 96 92 GNAT1 Chicken NP-990022 96 GNAT1 Golden eagle XP-011571903 GNAT1 Turtle XP-007052737 89 GNAT1 Anole XP-003217603 92 GNAT1 Coelacanth XP-005996374 GNAT1 Zebrafish NP-571943 91 100 GNAT1 Bowfin ANV21080 80 GNAT1 Florida gar ANV21079 72 GNAT1 Medaka XP-004084432 79 GNAT1 Elephant shark XP-007888457 GNAT1 Bluespot ray ANV21078 99 GNAT1 Western ray ARF06928 98 GNAT1 Reef shark ANV21077 93 100 GNAT1 Bamboo shark ARF06935 GNAT1-Partial E.burgeri ENSEBUP00000015660 100 GNAT1 E.cirrhatus ANV21075 GNAT1 G.australis ANV21076 GNAT1 P.marinus ACB69761 100 GNAT1 L.camtschaticum BAW81374 100

98

100

Jawed

99 100 94

100

Hagfish Lamprey

99

94

0.1

Figure 3. Example of molecular phylogeny (vertebrate rod transducins, GNAT1) Maximum likelihood (ML) molecular phylogeny calculated for Gαt1 (GNAT1) amino acid sequences from 29 vertebrate species. Bold font denotes sequences obtained from the eye transcriptome analysis of hagfish, lampreys and basal ray-finned fish by Lamb et al (2016). NCBI accession numbers are listed as part of each sequence name. Numbers at each node represent percentage bootstrap support. Note that this sub-tree that has been extracted from the full tree for GNAT and GNAI sequences shown collapsed in Figure 10, where further details are given. The horizontal scale is in units of amino acid substitutions per site. The inset here shows the collapsed version of this GNAT1 sub-tree, exactly as it is presented in Figure 10; the entire expanded tree is shown in Supplementary Figure S1.

50

A

B

C

A1

B1

C1

A2

B2

C2

A11

B11

D

E

F

G

H

J

K

E1

F1

G1

H1

J1

K1

F2

G2

H2

J2

K2

L

M

1R

D2

M1 L2

M2

2R

A12

E11 C12

A21

B21

A22

B22

C22

F11

K11

M11

F12

G12

H12

J12

M12

D21

F21

G21

H21

J21

M21

D22

F22

G22

H22

G, H

K22 J, K

L22 L, M

Figure 4. Schematic with examples of the combined effects of local gene duplication, wholegenome duplication, and gene loss Top row: A hypothetical region comprising 12 arbitrary genes along a chromosome of our chordate ancestor, prior to any genome duplications. The six genes at the right represent three separate pairs of genes where a local (tandem) duplication had occurred previously. Thus, gene pair G and H, gene pair J and K, and gene pair L and M are each taken to have arisen from local duplication, as indicated by the three curved arrows. Middle two rows: Examples of possible genes remaining after the first round of whole-genome duplication (denoted by the downward arrow labelled 1R); three genes are shown as having been lost prior to the second round. Bottom four rows: Examples of possible genes remaining after the second round of genome duplication (denoted by the dashed and dotted arrows labelled 2R), but prior to the radiation of vertebrate species. The shading is simply a visual aid to identification of the relationship between genes that existed following the two rounds of duplications. The significance of the various examples of gene loss is described in the text.

51

Gene

PHC3

14 13.54 9 19.92 7 232.13 3 170.09 PHC2

25 7.25 21 6.52

4 430.20 1 33.32 PHC1

DVL3

14 16.23 9 15.84 2 537.80 3 184.16 DVL1

25 5.47 21 2.27

2 103.23 1

14 14.91 9 11.51 7 213.06 3 146.07 PLOD1

DVL2

2 63.42

2 255.54 17 7.23

PLOD3

2 61.64

2 256.49 7 101.21 PLOD2

CHD3

2 60.97

2 254.52 17 7.88 P3H2

14 9.18

Mb

Mb

Mb

Human

Mb

Opossum

Mb

Chicken

Mb

Gar

Mb

Human

Gene

Ancestral 4 Opossum

Mb

Chicken

Mb

Gar

Mb

Human

Mb

Ancestral 3 Opossum

Gene

Chicken

Mb

Gar

Mb

Human

Mb

Ancestral 2 Opossum

Mb

Chicken

Gene

Gar

Ancestral 1

Mb

26 13.57 1 76.19 8 104.82 12 8.91

1.34

25 9.71 21 5.51

4 365.30 1 11.93

CHD5

25 8.89 21 0.63

4 394.09 1

CHD4

26 12.21 1 76.99 8 107.95 12 6.57

9 14.03 7 257.22 3 189.96 P3H1

25 1.58 21 6.39

4 429.25 1 42.75 P3H3

26 12.64 1 77.24 8 108.27 12 6.83 2 57.57 1 75.87 8 104.57 12 7.92

6.10

SLC2A4

2 60.50

2 255.49 17 7.28

SLC2A2

14 13.72 9 19.79 7 231.32 3 171.00 SLC2A1

25 7.43 21 6.47

4 429.37 1 42.93 SLC2A3

TP53

2 60.54

2 254.79 17 7.66

TP63

14 9.25

9 14.09 7 257.44 3 189.63 TP73

25 9.29 21 0.93

4 389.86 1

3.65

KCNAB3

2 59.70

2 254.50 17 7.92

KCNAB1

14 4.38

9 22.99 7 245.37 3 156.04 KCNAB2

25 7.69 21 0.66

4 394.04 1

5.99

PER1

2 58.19

2 254.30 17 8.14

PER2

14 7.86

9

25 4.79 21 0.27

4 396.41 1

7.78

ACAP1

2 58.00 ?

2 255.37 17 7.34

ACAP2

14 8.27

9 12.72 ?

25 4.19 21 2.44

1

1.29

ENO3

2 59.40

2 256.22 17 4.95

25 12.92 ?

4 433.97 1

8.86

ENO2

26 12.14 1 77.29 8 108.41 12 6.91

9.73

CLSTN3

26 12.58 1 77.41 8 108.69 12 7.13

5.50

2 521.40 2 238.24 PER3 3 195.27 ACAP3 ENO1

CLSTN2

14 7.35

9

6.21

4 103.73 3 139.94 CLSTN1

25 12.48 21 3.49

4 371.89 1

5.91

RBP2

14 7.65

9

4 102.05 3 139.45 RBP7

25 12.39 21 3.59

4 371.66 1 10.00 RBP5

26 12.50 1 77.40

GNB2

2 58.91

2 257.13 7 100.67 GNB4

14 1.55

9 17.53 7 202.51 3 179.40 GNB1

25 11.41 21 1.92

2 104.01 1

GNB3

26 12.62 1 77.25 8 108.29 12 6.84

GPC2

2 57.27

2 257.56 7 100.17 GPC1

14 10.46 9

7 45.62 4

3.79

X 34.09 X 133.30 GPC6

17 12.82 1 146.53 7 104.52 13 93.23

TSC22D4

2 59.04

2 257.38 7 100.46 TSC22D2 14 2.42

9 23.76 7 208.34 3 150.41 TSC22D3

7 29.44 4

1.81

X

SERPINE1 2 61.30

2 256.69 7 101.13 SERPINE2 14 6.62

9

14 15.38 9 10.51 7 217.35 3 142.60 PLS3

7 20.38 4

2.97

X 66.98 X 115.56 LCP1

17 29.54 1 168.02 4 322.90 13 46.13

14 16.33 9 15.10 4 92.05 3 197.79 LRCH2

7 20.50 4

2.88

X 67.38 X 115.11 LRCH1

17 29.42 1 168.19 4 322.28 13 46.55

14 10.26 9

7 23.11 4 13.44 X 69.55 X 110.94 PAK1

3

7 23.15 4 13.38 X 69.51 X 111.25 CAPN5

3 10.06 1 193.32 4 336.66 11 77.07

PLS1 LRCH4

2 58.25

CAPN5L

2 61.13

2 257.22 7 100.57 LRCH3 PAK2

2.68

8.32

4.63

2 526.38 2 240.44 GPC4

5.16

1.79

12 7.12

X 107.71 TSC22D1 17 15.04 1 167.41 7 168.90 13 44.43

7 225.52 2 223.98

SERPINE3 17 29.06 1 170.08 4 316.44 13 51.34

4 93.67 3 196.74 PAK3 CAPN6

MOGAT3 2 59.26

2 256.54 7 101.20

ARRB2

2 57.31

2 256.00 17 4.71

SAG

14 18.77 9

AWAT1

DLG4

2 59.56

2 255.57 17 7.19

DLG1

14 1.92

GC-E

2 58.97

2 254.43 17 8.00 TRPC1

14 15.36 9 10.55 7 217.26 3 142.72 TRPC5

7 36.90 4 13.13 X 69.08 X 111.77 TRPC4

ZBTB4

2 60.69

2 255.17 17 7.46

ZBTB38

14 15.74 9 10.21 7 218.52 3 141.32 ZBTB33

7 47.31 4 16.54 X 28.37 X 120.25

FXR2

2 57.67

2 255.02 17 7.59

FXR1

14 0.06

7 50.17 4 18.75 X 37.38 X 147.91

ATP1B2

2 59.11

2 254.90 17 7.65

ATP1B3

14 15.53 9 10.31

GRK1B

2 60.73

GRK7

14 15.57 9 10.27 7 218.06 3 141.78

SLC9A9

14 15.22 9 10.73 7 215.95 3 143.27 SLC9A6

7 45.93 4

9.94

1 193.15 4 336.17 11 77.32

X 70.67 X 70.23 DGAT2

3

6.19

1 194.88 4 342.41 11 75.76

?

2 233.31 ARR3

7 46.84 4

1.25

X 70.64 X 70.27 ARRB1

3

2.50

?

9 12.51 ?

3 197.04 DLG3

7 46.97 4

2.49

X 70.43 X 70.44 DLG2

3

7.68

1 189.51

X 109.37 GC-D

3

0.48

1 193.58 4 337.56

1.19

GC-F

9 17.14 7 204.09 3 180.87 FMR1

GRK1A 4.21

?

X 135.97 SLC9A7

3 14.54 1 171.80 4 311.24 13 37.64

17 27.46 1 137.29 7 76.98 13 113.65 17 27.47

7 76.94 13 113.67

17 20.63 1 130.28 7 52.81 X 46.60

2 57.24

2 257.53 7 100.18 STAG1

14 14.32 9

7 38.97 4 15.64 X 17.57 X 123.96 STAGL

17 1.98

1 130.04 7 52.00

FGF11

2 60.25

2 255.20 17 7.44

FGF12

14 19.00 9 13.37 7 254.79 3 192.14 FGF13

7 46.47 4

X 44.65 X 138.61 FGF14

17 2.76

1 143.25 7 94.98 13 101.71

SOX19

2 63.66

SOX2

14 0.18

9 16.92 7 204.85 3 181.71 SOX3

7 46.76

X 46.39 X 140.50 SOX1

17 24.50 1 139.82 7 79.61 13 112.07

RAP2B

14 2.89

9 23.44 7 248.87 3 153.16 RAP2C

7 45.48 4

3.49

X 32.97 X 132.20 RAP2A

ZIC1

14 14.69 9 11.94 7 211.74 3 147.39 ZIC3

7 46.32 4

4.57

X 43.80 X 137.57 ZIC2

LAMP3

14 0.44

7 47.35 4 16.49 X 28.54 X 120.43 LAMP1

2 57.51

2 255.04 17 7.58

4 98.19 3 136.34 STAG2

7 47.33 4 16.51 X 28.48 X 120.36 ATP4B

11 83.46

STAG3

CD68

0.74

3 141.88 ATP1B4

7 37.61

4 343.25 11 75.26

9 16.42 7 206.33 3 183.12 LAMP2

4.78

1 145.29 7 100.81 13 97.43 17 3.37

1 144.28 7 97.60 13 99.98

17 27.33 1 137.49 7 77.56 13 113.30

Figure 5. Synteny of a subset of phototransduction genes across four species Locations of genes from 37 families of ohnologs, assigned to four presumed chromosomal rows in the proto-vertebrate ancestor following 2R WGD. Four species have been analysed, and the locations are taken from Ensembl 93, which used the following assemblies: spotted gar, LepOcu1; chicken, Gallus_gallus-5.0; opossum, monDom5; and human, GRCh38.p12. Each column pair give the chromosome number and the start position of the gene in megabases (Mb). Eight genes involved in phototransduction are indicated in bold. In general, HGNC gene names are given, except where the gene is absent in human; however, for the avoidance of confusion amongst the guanylyl cyclases, the IUPHAR/BPS names are given (see Section 6.2). The coloured shading identifies regions where all the genes lie on a common chromosome. Note that many of the genes in each column are in close proximity to one another; e.g. under ‘Ancestral 1’, 25 of the 26 spotted gar genes are within a span of 4.4 Mb, and all 23 opossum genes are within a span of 3.3 Mb. Grey shading denotes quartet isoforms examined in Figure 6 and Figure 7. Where a gene name is missing, that gene has been lost from all jawed vertebrates. In other cases where there is a missing entry for a gene position, that gene is not found in the assembly for that species; a question mark indicates that the gene is on an unplaced scaffold. Note that the first column pair for chicken is empty, indicating the absence from the assembly of multiple genes. However, this is unlikely to indicate the loss of these genes from the chicken genome, because there is evidence for mRNA transcripts; instead the empty column is likely to indicate problems with the sequencing and assembly (Delbridge et al., 2009; Lovell et al., 2014).

52

A

1R 2R

GRIN2s

CYTHs

KCNJs

RGS9/11

PDEγs

GRIN2A

CYTH3

KCNJ12

RGS11

GRIN2B

CYTH4

KCNJ4

GRIN2D

CYTH2

KCNJ14

GRIN2C

CYTH1

KCNJ2

RGS9

CACNGs

GSGs

EMPs

PDE6I

CACNG3

GSG1L

EMP2

4

PDE6H

CACNG2

GSG1

EMP1

3

Visinin

CACNG8

GSG1L3

EMP3

2

RCVRN

CACNG4

GSG1L2

PMP22

1

PDE6G

B SLC2As

GNBs

GPCs

LRCHs

SLC2A3

GNB3

GPC6

LRCH1

SLC2A1

GNB1

GPC4

LRCH2

SLC2A2

GNB4

GPC1

LRCH3

SLC2A4

GNB2

GPC2

LRCH4

ARRB2

SPRYs

CNGAQs

TyrKs

GABAAαs

PDE6s

FGFRs

PSDs

SPRY2

CNGA3

BMX

GABRA5

PDE6C

FGFR2

PSD

SPRY3

CNGA2

BTK

GABRA3

FGFR1

ITK

GABRA1

PDE6A

TEC

GABRA2

PDE6B

C

SPRY4 SPRY1

CNGA1

D KCNCs

SYTs

KCNC1

SYT9

KCNC3

SYT3

KCNC4

SYT6

KCNC2

SYT10

GCAPs GCAP2L

GCAP3

Arrs

DLGs

GCs

ATPBs

DLG2

GC-D

ATP4B

ARR3

DLG3

GC-F

ATP1B4

SAG

DLG1

ARRB1

GC-E

PLXNA2

GCAP1L

PLXNA4

GRK1A

ATP1B2

GRK7 GRK1B

FGFs

LAMPs

FGF14

LAMP1

4

FGF13

LAMP2

3

FGF12

LAMP3

2

FGF11

CD68

1

TNFAIP8s

NCKXs

LINGOs

HCNs

4

TNFAIP8

SLC24A2

LINGO2

HCN1

4

PSD4

3

TNFAIP8L1

LINGO3

HCN2

3

FGFR4

PSD2

2

TNFAIP8L3

LINGO1

HCN4

2

FGFR3

PSD3

1

TNFAIP8L2

LINGO4

HCN3

1

Opsins

PLXNA1

GCAP1

GRKs

ATP1B3

DLG4

PLXNAs

PLXNA3 GCAP2

Rec/Vis

Rh1 LWS

SLC24A1

GNAIs

GNATs

GRMs

TFs

GNAI2

GNAT1

GRM7

MITF

4

GRM6

TFE3

3

SWS2 Rh2 SWS1

GNAI3

GNAT2

GRM4

TFEB

2

GNAI1

GNAT3

GRM8

TFEC

1

Figure 6. Overview of syntenic arrangement of vertebrate phototransduction genes Presumed arrangement of phototransduction genes on four rows representing sections of the quadruplicated genome of an early vertebrate organism, after 2R WGD but before the vertebrate radiation. The panels depict locally paralogous regions, with panel B derived directly from Figure 5; each panel depicts a single such region, with the exception of panel C, which depicts two. Phototransduction genes are shown shaded red (cone isoforms), blue (rod isoforms), or grey (common isoforms, or not determined, or in photoreceptors other than rods and cones); non-phototransduction genes in these same families are shaded green. White is used for reference sets of ohnolog quartets, and the grey linkages between these genes show those pairs that have been established to be sisters by phylogenetic analysis of the kind illustrated in Figure 7. These pairings define the pairs of rows that diverged from each other at 1R, as indicated by the branching pattern inset in A. Thus, each panel has been arranged so that the upper pair of rows are sisters and the lower pair of rows are sisters. However, it is not certain that the row numbering is continuous between the five sections, because it has not yet been possible to definitively link them together Nevertheless, the proposal is made here that the entire set of genes (together with numerous other non-phototransduction genes) may form a single paralogon. GNAT3 (shown near the bottom right) is not used in cone or rod phototransduction, but is used instead in reptilian parietal photoreceptors, as well as in some taste receptors.

53

A1

A2

B1

B2 100

100 100

GRIN2A

100

SLC2A3

GSG1L

DLG2

100 100

GRIN2B

100

100

SLC2A1

100

GSG1

DLG3

100

100

99 100

100

GSG1L3

SLC2A2

100

DLG1

GRIN2D 100 100

D1 100

D2 100

HCN1

SPRY2

PLXNA1

SPRY3

HCN2

100

SPRY4

GRM7

PLXNA3

100

99

GRM6

100

100

100

100

100

100

100

0.2

0.2

C2

DLG4

100

0.2

Ciona GRIN2

C1

SLC2A4

98

GRIN2C

100

100

100

GSG1L2

100

PLXNA2

100

GRM4

HCN4 100

100

SPRY1

100

HCN3

100

0.2

PLXNA4

100

GRM8

100

0.2

0.2

0.2

Figure 7. Molecular phylogenies for eight examples of ohnolog quartets Unconstrained ML molecular phylogenies of jawed vertebrate sequences, for eight ‘quartet’ families of ohnologs that retain all four members, and that are located in the vicinity of phototransduction genes; these are sample quartets chosen from amongst the 26 shown in Figure 6. Here each major clade is shown collapsed. The fully-expanded trees for all of these ohnolog quartet families are presented in Supplementary Figure S13. Those 26 phylogenies provide the basis for arranging the row pairs in Figure 6. The phylogeny in A1 used outgroup sequences, but in all other cases no outgroup was used and so the trees are unrooted.

54

Figure 8. Evolution of G-protein alpha subunits, as proposed by Lokits et al (2018) For the five primary families (Gαs, Gαq, Gαi, Gα12 and Gαv) the duplications are shown that gave rise to the families and that led to expansion of each family. Prefix ‘pre’ denotes genes predating 2R WGD, and throughout the diagram ‘Gα’ has been omitted. The pairs of GNAI-GNAT genes in extant vertebrates (bottom right) arose by 2R WGD of a pair of genes (preI’ and preI’’) that resulted from the tandem duplication of a single preI gene. See text for further explanation. Reproduced from Fig. 4b of Lokits et al. (2018).

55

A. Ancestral Gi

Gαi*

AC cAMP↓

R*

CNGC↓

[γ] PDE

B. Transition Gi'

Gαi'*

GC

cAMP↓

R* Gi''

AC

Gαi''*

CNGC↓

γ·PDE ⊕

C. Proto-vertebrate GC R* Gt

cGMP↓ Gαt*

CNGC↓

γ·PDE6 ⊕

Figure 9. Postulated origin of the proto-vertebrate phototransduction cascade A. Postulated ancestral phototransduction cascade in a deuterostome organism. This cascade utilised inhibition of adenylyl cyclase by Gαi, and therefore would have resembled an inhibitory version of the canonical cascade of olfactory phototransduction. The possible appearance of a molecule that could inhibit the PDE is indicated by [γ]. B. Transition is proposed to have occurred following tandem duplication of Gαi to create a pair of isoforms that were both expressed in the cell. One of these isoforms mutated, to permit the light-activated form (Gαi'') to interact with γ and thereby lessen its inhibition of the PDE. A guanylyl cyclase (GC) may have been expressed, and this would have allowed both cAMP and cGMP to function as cytoplasmic messengers. C. Once the new mechanism became more potent than the old one, expression of the original set of genes may have ceased, leaving a proto-vertebrate organism (that existed before 2R WGD) with a single phototransduction cascade of the vertebrate style.

56

A

100

Jawed GNAT1

99 100 94 100

100

Hagfish Lamprey

1R

GNAT (= preI'')

99

87

Jawed GNAT2

2R 100

100

Jawed

63

Lamprey

100

100

GNAT3

GNAI2

1R

100

100 99

GNAI3

2R

100

99 86

100 100 98

GNAI1

Lancelet GNAI Tunicate GNAI 100 Deutero GNAI 100

GNAO 98

GNA14 100

100 100 100

0.2

B

1R

2R

GNA11 GNAQ

GNAIs

GNATs

GNAI2

GNAT1

Hsa3 50.2

Hsa3 50.2

GNAI4

3

Lamprey

GNAI

4

GNAT GNAI3

GNAT2

Hsa1 109.6

Hsa1 109.6

GNAI1

GNAT3

Hsa7 80.1

Hsa7 80.5

2

1

57 Figure 10. G-protein alpha subunits (Gαt, Gαi) A. ML molecular phylogeny for vertebrate G-protein α subunits (GNATs and GNAIs), with outgroup comprising a set of vertebrate GNAOs and GNAQ/11/14s together with invertebrate GNAIs. Protein substitution model, WAG+R4. A minor constraint has been applied, to keep the three vertebrate GNAI clades together (i.e. to prevent other sequences from being placed within this set); that constraint caused only a small change in log likelihood, of ∆LogL ≈ 3.4, and the constrained tree passed all three tests of topology, with p-AU = 0.48. Note that two support values are marked with a ‘strike-through’, to indicate that they are artificially high. Blue shading is for isoforms expressed primarily in rods, and red for those primarily in cones. Yellow 1R and cyan 2R denote first and second rounds of WGD, respectively. The fully-expanded tree is shown in Supplementary Figure S1. B. Deduced pattern of gene duplications and losses; the tandem GNAI-GNAT genes in a chordate organism were quadruplicated. Although GNAI4 has been lost from jawed vertebrates, it is reported to have been retained in lampreys (Lokits et al., 2018). Row numbers have been arranged to correspond to those in Figure 6. Gene locations are given for human (in part because GNAT3 has been lost in spotted gar).

58

A 100 95

2R

97 100

GNB3

GNB1

1R 99

98 89 94

GNB4

2R

GNB2

100 100 99

GNB Tunicate GNB Basal deuterostome

GNB-13F Protostome

0.2

1R

B

2R

GNB3 LG26 12.6

GNB1 LG25 11.4

4

3

GNB GNB4 LG14 1.6

GNB2 LG2 58.9

2

1

Figure 11. G-protein beta subunits (GNB1–4) A. Unconstrained ML molecular phylogeny for GNB1–4 sequences, using a set of protostome and invertebrate deuterostome GNBs as outgroup. Yellow 1R denotes first round, and cyan 2R denotes second round, of WGD. The fully-expanded tree is presented in Supplementary Figure S2. B. Deduced pattern of gene duplications (with gene locations listed for spotted gar). This is the simplest possible pattern, with no gene losses. Row numbers as in Figure 6.

59 100

A

PDE6C Lamprey

97

100

86

PDE6X Agnathan

100

PDE6

100

PDE6C Jawed

2R

1R 99

100

PDE6A Jawed

2R

PDE6B Jawed

89

PDE Ciona

100 100

PDE5A Outgroup

100 100

PDE11A

0.2

B

1R

2R

PDE6cat PDE6C

4

2 246.2

PDE6X Agnathan

3

PDE6A

2

PDE6 11 31.8

PDE 5/11

Ciona PDE6B

1

5 205.1

C

RGS9/11 ?

PDEγ

RGS11

PDE6I

11 153.5

11 153.5

4

PDE6H

3

12 35.4

RGS9/11

PDE6γ

RGS9

PDE6G

9 101.3

9 87.3

1

60

Figure 12. PDE catalytic and inhibitory subunits (PDE6s, PDE6γs) A. Unconstrained ML molecular phylogeny for PDE6 catalytic sequences, using a set of vertebrate PDE5s and PDE11s as outgroup. Purple shading is for a clade of agnathan isoforms (PDE6X) that are positioned separately from cone and rod isoforms. Yellow 1R and cyan 2R denoted first and second rounds of WGD, respectively. The fully-expanded tree is presented in Supplementary Figure S3. B. Deduced pattern of gene duplications and losses (with gene locations listed for reedfish). C. Postulated pattern of gene duplications and losses for PDE6 inhibitory subunits; the postulated pattern for RGS9/11 is also shown, for reasons set out in the text. Row numbers as in Figure 6

61

A

B 98

1R

CNGA3

2R

CNGA4 95

2R

CNGA3 LG17 21.7

4

CNGA2

3

LG7 44.6

CNGAQ 99

2

CNGA2

100

1R

CNGA1 LG4 50.6

1

CNGAQs 82

CNGA1

100

CNGB1

80

CNGB

99

Tunicate Protostome

100

LG23 12.7

?

CNGBs

99

CNGB3 100

LG9 28.7

CNGA4

99

Protostome Echinoderm

Tunicate

99 97 100 100

1R 100

0.5

Jawed

CNGB1

87

100

100

Agnathan Lamprey

97

Jawed

100

CNGB3

Human HCNs 100 0.5

Figure 13. Cyclic nucleotide gated channel subunits (CNGCα, CNGCβ) A, B. Unconstrained molecular phylogeny for CNGC alpha and beta subunits, using human HCNs as outgroup. Protein substitution model, WAG+R4. The fully expanded phylogeny is presented in Supplementary Figure S4. The highlighted 1R and 2R annotations indicate the duplications during 2R WGD. B. Deduced pattern of gene duplications and losses (with gene locations listed for spotted gar). Row numbers as in Figure 6.

?

62 99

A

92

Agnathan GRK7-2

2R

Lamprey GRK7-1

100 100

74

92

1R Jawed GRK7

100

Ciona GRK

100

Pre100

100

Jawed GRK1A

1R

100

Lamprey GRK1B

90

Jawed GRK1B

86 100

GRK5L

100 100

GRK5 100

90 99 100

GRK4

GRK4/5/5L/6

GRK6 99 100 70

GRK2 GRK3

Out

0.2

B

1R Pre-

2R

GRK1s

GRK7s

GRK1A

GRK7-2

LG17 27.5

Agnathan

GRK7-1 Lamprey

GRK1

4

3

GRK7 GRK7 LG14 15.6

Ciona GRK1B LG2 60.7

2

1

Figure 14. G-protein receptor kinases (GRK1s, GRK7s) A. Unconstrained ML molecular phylogeny for visual GRKs (GRK1 and GRK7), using a small set of sequences from GRK2/3 and GRK4/5/5L/6 as outgroup. Jawed vertebrates have three isoforms of visual GRK: GRK7, GRK1A and GRK1B; agnathan vertebrates have a different combination of three visual isoforms: GRK7-1, GRK7-2 and GRK1B. Red shading is for isoforms expressed primarily in cones (or cone-like lamprey photoreceptors), and blue shading is for isoforms expressed primarily in rods (or agnathan rod-like photoreceptors). Yellow 1R and cyan 2R denote the first and second rounds of WGD, respectively. Protein substitution model, WAG+R4. The fully-expanded tree is presented in Supplementary Figure S5. B. Pattern of gene duplications and losses deduced using a combination of phylogeny and gene synteny. Gene locations are listed for spotted gar; row numbers as in Figure 6.

63

A 97

S Bony

100 100 100

S1 Cartilag.

100

1R

S2 Cartilag.

100

97

C Agnathan C Lamprey

100

2R97 99 100

Pre-

100

1R

100

B1 Jawed

100

97

100

2R

C Jawed

98

BY Lamprey BX Lamprey

54

85

79

B2 Jawed

100

Tunicate

100

Lancelet 100 Hemichordate Mollusc

100

0.2

1R

B

2R

Beta

Visual

Arr-B1

4

LG3 2.5

Pre-

Arr-C LG7 46.8

Arr-B

3

Arr-V Arr-S LG14 37.9

2

Ciona Arr-B2 LG2 57.3

1

Figure 15. Arrestins (Arr-S, Arr-C, Arr-β1, Arr-β2) A. ML molecular phylogeny for vertebrate arrestins, using an outgroup comprising nine invertebrate arrestins. A minor constraints has been applied to prevent the β-arrestin and Arr-C clades from fragmenting; this caused a change in log likelihood of ∆LogL = 7.8, and the constrained tree passed all tests of topology with p-AU = 0.31. Fully-expanded unconstrained phylogeny is shown in Supplementary Figure S6. B. Pattern of gene duplications deduced using a combination of phylogeny and gene synteny. Gene locations are for spotted gar; row numbers as in Figure 6.

64

A

B

1R

2R RGS11

100

RGS11

Hsa16 0.3

4

100 98

RGS9-Like Agnathan 86

RGS9 Agnathan

RGS9-Like Agnathan

95

RGS9

96

Hsa17 65.1

94

99

1

RGS9 Jawed

RGS9/11 Basal deuterostomes RGS9/11 Arthropod

99 100 98

100 100

RGS6 RGS7

0.2

Figure 16: Regulator of G-protein signalling (RGS9/11) A. ML molecular phylogeny for 49 RGS9/11 sequences from jawed vertebrates, plus 10 homologous sequences from agnathan vertebrates, together with five related sequences from invertebrates, and with an outgroup comprising 14 jawed vertebrate RGS6/7s. A minor constraint has been applied, to move the root one node (from the position of the dotted arrow); this caused a very small change in log likelihood, of ∆LogL = 2.3, and the constrained tree passed all three tests of topology, with p-AU = 0.42. The fully-expanded unconstrained tree is shown in Supplementary Figure S7. B. Corresponding pattern of gene duplications and losses (with extant jawed vertebrate gene locations given for human). Row numbers correspond to those in Figure 6.

65

A

B 1R

2R NCKX2

96

LG4 45.0

2 Jawed

100

3

NCKX1/2

100

NCKX1

X Agnathan

100

LG3 42.9

1 100

1 Jawed

80

NCKX5 NCKX3/4

100

Z Lamprey Lancelet 100 Mollusc 100

100 100 100

3 Jawed 4 Jawed

98 100

2

NCKX

1R

100

4

5 Jawed

0.2

Figure 17. Na+-K+/Ca2+ exchangers (NCKX) A. ML molecular phylogeny for 45 NCKX1/2 sequences from jawed vertebrates, plus six homologous sequences from agnathan vertebrates, together with five related sequences from invertebrates, and with an outgroup comprising 15 jawed vertebrate NCKX3/4/5s. Constraints on the positions of the agnathan sequences have been applied; this caused a relatively small change in log likelihood, of ∆LogL = 6.1, and the constrained tree passed all three tests of topology, with p-AU = 0.39. The fully-expanded unconstrained tree is shown in Supplementary Figure S8. B. Deduced pattern of gene duplications (with gene locations listed for spotted gar). Row numbers correspond to those in Figure 6.

66

A

B 100

100

GC-F

1R

2R

2R GC-D LG3 0.5

100 100

GC-D

GC-F Visual GC

LG17 37.6

2

GC-E

84

LG2 59.0

100

60

100

3

1R Ciona

100

4

100

1

GC-E

Tunicate Lancelet Basal deutero. Protostome

99 0.2

Figure 18. Guanylyl cyclases (GC-D, GC-E, GC-F) A. Unconstrained ML molecular phylogeny for jawed vertebrate visual guanylyl cyclases, with the outgroup composed of invertebrate sequences. The fully-expanded phylogeny is shown in Supplementary Figure S9. B. Deduced pattern of gene duplications (with gene locations listed for spotted gar). From the chromosomal arrangement of genes and the pairings of nearby ohnolog quartet genes, it is clear that GC-E diverged from GC-D/GC-F at 1R. Row numbers correspond to those in Figure 6.

67

A

B 100

100

GCAP3

1R

1R 100

95

100

Pre-

GCAP1

2R

Pre-

GCAP2L

GCAP3

LG27 11.4

LG3 18.6

4

3

2R

NCS 2/L

GCAP1L

95

NCS

1/L/3

GCAPs

GCAP2

GCAP1

LG3 32.7

LG3 32.7

2

GCAP1L

1

LG8 8.2

GCIP 99

GCAP2L GCIP

100

100

GCAP2

GCIP

100

100

LG8 3.3

1R

Out

0.2

Figure 19. Guanylyl cyclase activating proteins (GCAP) A. Unconstrained ML molecular phylogeny for 115 GCAP/GCIP sequences from jawed vertebrates, with an outgroup comprising seven related invertebrate deuterostome sequences together with human HPCA, HPCAL1 and NCALD. The fully-expanded tree is shown in Supplementary Figure S10. B. Deduced pattern of gene duplications (with gene locations listed for spotted gar). Row numbers correspond to those in Figure 6.

1

68

A

100 100

2R 100

B RecVis-Y Lamprey

1R

2R RecVis-Y

RecVis-X Lamprey

Agnathan

RecVis-X

1R 99

Agnathan

RecVis

Visinin Jawed

Visinin 100

ZF16 / ZF19

NCS

2R

RCVRN ZF3 / ZF12

Recoverin Jawed

100

100

GCAPs

Out

0.2

Figure 20. Recoverin and visinin A. ML molecular phylogeny for 19 recoverins and 18 visinins from jawed vertebrates, plus 8 homologous sequences from lampreys, with the same outgroup as used for the GCAPs. A minor constraint has been applied, to move the root of the tree by one node from the position shown by the dotted arrow for the unconstrained tree; that constraint caused only a small change in log likelihood, of ∆LogL = 2.1, and the constrained tree passed all three tests of topology, with p-AU = 0.4. The fully-expanded tree is shown in Supplementary Figure S11. B. Deduced pattern of gene duplications and losses (with chromosomes listed for zebrafish). Row numbers correspond to those in Figure 6.

2

1

69

100

OPN5

99

Echinoderm 100

OPN3

TMT

90

100

Lancelet 3 Lancelet 1 100 Lancelet 2

95 100 100 100

Parietopsin

89 100

Parapinopsin

100

100

VAL

94

Ciona 100

Pinopsin

93

100

LWS

100

100

* 98

SWS1

100

SWS2

100

98 99

Rh2

100

100

Rh1

0.5

Figure 21. Vertebrate visual opsins Unconstrained ML molecular phylogeny for 199 C-opsin sequences from deuterostomes, with an outgroup comprising 16 jawed vertebrate OPN5s. The fully-expanded tree is shown in Supplementary Figure S12. Dotted arrows indicate the approximate positions of the clades for TMT opsins and Ciona C-opsins that were obtained in other calculations; those sequences have been omitted from the illustrated phylogeny because their inclusion led to an alignment that appeared inferior, and that gave lower levels of support. The asterisk marks a bootstrap support value that is discussed in the text. The deduced pattern of gene duplications is shown in the first section of Figure 22.

70 Protostome Tunicate Bilateria

1R

2R

Protostome Tunicate

Basal Proto-vertebrate deutero.

Bilateria

LWS

3

OPN3 PTO PPO VAL

SWS2

Rh

SWS

2R

Basal Proto-vertebrate deutero.

Pinopsin

Pinopsin

C-Opsin

1R

LWS

3

SWS2

4

Rh1

2

Rh2

1

SWS1

SWS1

GRK 2/3

GRK7

GRK 1/7

2

GRK7

4

GRK1A

1

GRK1B

3 2

Arr-C

4

RGS11

1

RGS9

4

NCKX1

2

NCKX2

4

GC-D

3

GC-F

1

GC-E

4

GCAP3

2

GCAP1

1

GCAP1-L

4

GCAP2-L

2

GCAP2

1

GCIP

2

Visinin

1

Recoverin

GRK1

GRK 4/5/6

Arr

Arr-V

Arr-S

Arr-B

4 GNAI/O

GNAI

GNAO

GNAT1

GNAT

GNAI

2

GNAT2

1

GNAT3

RGS

RGS9/11 ? RGS 6/7

GNB

4

GNB3

3 2

GNB1

1

GNB2

GNB4

PDEγ

NCKX

NCKX1/2

NCKX 3/4/5

PDE 5/6/11

PDE

PDE6C GC

PDE6

Visual GC

PDE6A PDE10

PDE6B

PDE 5/11

PDE6γ

4

PDE6I

3

PDE6H

1

GC-C

PDE6G

GCAP

GCAP / GCIP CNGAQ

4

CNGA3

3

CNGA2

GCIP NCS

CNG

CNGA4

1

CNGA1 CNGB1

CNGB CNGB3

Rec / Vis

71 Figure 22. Scenario for gene duplications in the vertebrate phototransduction cascade Deduced pattern and approximate timing of gene duplications for the multiple components of vertebrate phototransduction. The two main columns are continuous with each other. The four dotted vertical lines mark the following events. ‘Protostome’: the speciation divergence of protostomes from the deuterostome lineage; ‘Tunicate’: the speciation divergence of tunicates from the proto-vertebrate lineage; ‘1R’ and ‘2R’: the first and second rounds of whole-genome duplication (WGD). The horizontal axis is not to scale; very approximate timings for the four dotted vertical lines are: ~750 Mya, ~650 Mya, and a pair of events ~600 Mya (see Figure 2A). For the opsins (top left), the colour coding provides an indication of spectral sensitivity. For all other components, the colour coding shown after 2R is as follows: red, cone isoforms; blue, rod isoforms; black, common isoforms, or those for which the distribution is uncertain; grey, not involved in phototransduction; green, used in phototransduction, but neither in cones nor rods. Squares (□) mark individual gene duplications; circles (○) mark whole-genome duplications. The upward and downward sloping arrows at 1R and 2R correspond to the branching pattern shown in the inset at the top left of Figure 6, and the numbers 1–4 correspond to the chromosome row numbers in that Figure. However, row numbers are not assigned for the CNGB genes, because the chromosomal regions in which they reside have not yet been linked to the arrangement shown in Figure 6.

Declaration of interests: None.