Neuropeptide evolution: Chelicerate neurohormone and neuropeptide genes may reflect one or more whole genome duplications

Neuropeptide evolution: Chelicerate neurohormone and neuropeptide genes may reflect one or more whole genome duplications

General and Comparative Endocrinology 229 (2016) 41–55 Contents lists available at ScienceDirect General and Comparative Endocrinology journal homep...

4MB Sizes 1 Downloads 17 Views

General and Comparative Endocrinology 229 (2016) 41–55

Contents lists available at ScienceDirect

General and Comparative Endocrinology journal homepage: www.elsevier.com/locate/ygcen

Neuropeptide evolution: Chelicerate neurohormone and neuropeptide genes may reflect one or more whole genome duplications Jan A. Veenstra ⇑ INCIA UMR 5287 CNRS, Université de Bordeaux, Pessac, France

a r t i c l e

i n f o

Article history: Received 12 October 2015 Revised 20 November 2015 Accepted 29 November 2015 Available online 27 February 2016 Keywords: Arthropods CCRFamide GPCR Myriapod House dust mite Scorpion Spider

a b s t r a c t Four genomes and two transcriptomes from six Chelicerate species were analyzed for the presence of neuropeptide and neurohormone precursors and their GPCRs. The genome from the spider Stegodyphus mimosarum yielded 87 neuropeptide precursors and 120 neuropeptide GPCRs. Many neuropeptide transcripts were also found in the transcriptomes of three other spiders, Latrodectus hesperus, Parasteatoda tepidariorum and Acanthoscurria geniculata. For the scorpion Mesobuthus martensii the numbers are 79 and 93 respectively. The very small genome of the house dust mite, Dermatophagoides farinae, on the other hand contains a much smaller number of such genes. A few new putative Arthropod neuropeptide genes were discovered. Thus, both spiders and the scorpion have an achatin gene and in spiders there are two different genes encoding myosuppressin-like peptides while spiders also have two genes encoding novel LGamides. Another finding is the presence of trissin in spiders and scorpions, while neuropeptide genes that seem to be orthologs of Lottia LFRYamide and Platynereis CCRFamide were also found. Such genes were also found in various insect species, but seem to be lacking from the Holometabola. The Chelicerate neuropeptide and neuropeptide GPCR genes often have paralogs. As the large majority of these are probably not due to local gene duplications, is plausible that they reflect the effects of one or more ancient whole genome duplications. Ó 2016 Elsevier Inc. All rights reserved.

1. Introduction Neuropeptides and neurohormones regulate and/or modulate many biological processes from basal physiological functions such as carbohydrate metabolism and water balance to cognitive functions. It is possible that they were among the first chemical messengers used by the nervous and endocrine systems to establish communication between different cells of a single organism. In the last two decades a large number of genomes have been sequenced which allow in theory to identify all the neuropeptide genes they contain. In practice, this is often more difficult as neuropeptide sequences are small and relatively variable and are usually only recognized when they either code for homologs of previously identified neuropeptides or a number of very similar peptide sequences separated by putative convertase cleavage sites. Nevertheless, it seems likely that most neuropeptide genes can be identified in a given sequenced Arthropod genome. The large majority of neuropeptides act through G-protein coupled receptors (GPCRs) which are readily identified from genome ⇑ Address: INCIA UMR 5287 CNRS, Université de Bordeaux, allée Geoffroy St Hillaire, CS 50023, 33 615 Pessac Cedex, France. E-mail address: [email protected] http://dx.doi.org/10.1016/j.ygcen.2015.11.019 0016-6480/Ó 2016 Elsevier Inc. All rights reserved.

sequences for the presence of their seven transmembrane regions that are well conserved. Thus, by analyzing simultaneously both neuropeptide and their putative receptors in its genome it is possible to get a fairly complete view of the neuropeptidome of a species. Using such methods it was previously shown that the neuropeptidomes of insects, annelids and mollusks are remarkably similar and share a large number of neuropeptide genes (Veenstra, 2010, 2011; Stewart et al., 2014). As expected, neuropeptides and their GPCRs identified from the spider mite Tetranychus urticae, the first Chelicerate for which a complete genome was sequenced (Grbic´ et al., 2011), are most similar to those of insects (Veenstra et al., 2012). However, some of its neuropeptide genes were previously only known from mollusks. One of these, elevenin, has subsequently also been found in insects (Tanaka et al., 2014; Veenstra, 2014). Since then draft genomes have been published for four other Chelicerates, those of the African social velvet spider, Stegodyphus mimosarum, the Brazilian white-knee tarantula, Acanthoscurria geniculata (Sanggaard et al., 2014), the scorpion Mesobuthus martensii (Cao et al., 2013), and the house dust mite Dermatophagoides farinae (Chan et al., 2015). At the same time spider transcriptomes have become available for Latrodectus hesperus and Parasteatoda tepidariorum (Clarke et al., 2014; Posnien et al., 2014). It thus seemed interesting to take

42

J.A. Veenstra / General and Comparative Endocrinology 229 (2016) 41–55

another look at Chelicerate neuropeptides to complete the picture of Arthropod neuropeptides and their evolution. Christie (2015) has already published a number of spider neuropeptide precursors based on the Latrodectus transcriptome, but, as shown here, the actual number of neuropeptide precursors in this transcriptome is considerably larger and analyzing whole genomes reveals additional neuropeptide genes that are not represented in the assembled transcriptomes. While this work was in progress the genome of the Myriapod Strigamia maritima was also published (Chipman et al., 2014). I included this genome in the analysis as the phylogenetic position of the Myriapods falls between insects and Chelicerates (Fig. 1). I felt it would be interesting to have Protostomian outgroups in the phylogenetic analysis of GPCRs, as it was expected that the more recently shared ancestry might lead to better resolved phylogenetic trees. I, therefore, also predicted the neuropeptide GPCRs from the genome of the mollusk Lottia gigantea (Simakov et al., 2013), a species from which a large number of neuropeptide genes has previously been described (Veenstra, 2010; Roch et al., 2011; Mirabeau and Joly, 2013). More recently a large number of GPCRs from the annelid Platynereis dumerilii were published, some of which were deorphanized (Bauknecht and Jékely, 2015). Given the interest of this data set together with the previously published list of putative neuropeptide precursors from this species (Conzelmann et al., 2013) those GPCR sequences were also included in the analysis. Apart from discovering a few novel putative Arthropod neuropeptide genes, the more interesting findings are that many spider and scorpion genes encoding neuropeptides and their receptors have been duplicated, perhaps the result of one or more whole genome duplications. 2. Materials and methods Local BLAST (Altschul et al., 1997; Camacho et al., 2009) was used to analyze the published transcriptome of Latrodectus as well as the Stegodyphus, Mesobuthus, Strigamia and Dermatophagoides genomes (all obtained from NCBI) as well as the Parasteatoda transcriptome (downloaded from http://asgard.rc.fas.harvard. edu/download.html). The Acanthoscurria genome was analyzed directly at NCBI using the web interface and contigs that might contain neuropeptide genes or parts thereof were downloaded for further analysis. The peptide sequences of known arthropod and molluscan neuropeptides and their G-protein coupled receptors were used as queries in the BLAST searches. The neuropeptide and GPCR sequences were from the following species, T. urticae (Veenstra et al., 2012), Locusta migratoria, Zootermopsis nevadensis (Veenstra, 2014) and L. gigantea (Veenstra, 2010). It should be noted that such homology searches are often limited in determining the correct C- and N-terminal parts of protein sequences when

there are no or very incomplete transcriptome data available. Gene models were constructed using Artemis (Rutherford et al., 2000) as described previously (Veenstra, 2014). The assembled Mesobuthus transcriptome was downloaded from http://lifecenter.sgst. cn/main/en/scorpion.jsp. Unfortunately, the original RNAseq data from Mesobuthus are not in the data bases and could thus not be used. Short sequence read archives (SRAs) containing RNAseq data from Stegodyphus, Strigamia and Dermatophagoides were decompressed with the SRA toolkit (http://www.ncbi.nlm.nih.gov/ Traces/sra/?view=software) and used to make BLAST databases. The latter were then searched for sequences coding for the predicted RNA sequences that were subsequently used as input for the Trinity program (Haas et al., 2013) in order to check, correct and/or improve the various gene models. This procedure was very effective for Strigamia not only for improving the predicted neuropeptide GPCR gene models, but also for the completion and correction of several previously predicted neuropeptide precursors (Chipman et al., 2014). This method furthermore identified the Strigamia allatostatin C receptor, for which the gene is lacking from the genome assembly (Chipman et al., 2014). In Stegodyphus and Dermatophagoides on the other hand it led to just a few corrections as the RNAseq data is less extensive. However, the various transcriptomes from three Stegodyphus species (Mattila et al., 2012) were helpful in completing predicted neuropeptide precursors from the Stegodyphus genome. Differences in the quality of the genome assemblies analyzed here are explained by variables such as the absence or presence of homozygocity, sufficient coverage, existence of good transcriptome data and length of sequence reads (Richards and Murali, 2015). When novel neuropeptides genes were found, their presence in other species was studied by using the BLAST program on the various protostomian genomes, transcriptome assemblies and ESTs collections in the NCBI data bases. In the four Chelicerate genome assemblies scaffolds are either absent or relatively small and thus the positions of the various genes relative to one another are known in only a few cases. This makes it also impossible to predict complete protein sequences when different exons of the same gene are located in different contigs. This is particularly a problem for the prediction of GPCRs where introns of more than 100,000 base pairs are not rare. In those cases where different exons were sufficiently similar to orthologs from other Arthropods and when the genome appeared to encode a single copy of a particular receptor, I have taken the liberty to join those different contigs or scaffolds into a single pseudo-scaffold for reconstructing the GPCR. The same was done when there were two copies of a gene with one present in a single continuous DNA sequence and the other in various pieces. However, numerous Chelicerate GPCRs exist as three or four copies

Fig. 1. Simplified phylogenetic tree of the species analyzed in this paper as well as a few others. Note that the Chelicerate phylogeny has not been unambiguously resolved (Sharma et al., 2014). MYA, million years ago.

J.A. Veenstra / General and Comparative Endocrinology 229 (2016) 41–55

and in some cases all the copies are present in multiple contigs or scaffolds, making it impossible to deduce the structure of the complete GPCRs, although it is evident that several copies of the particular receptor are encoded by the genome. Phylogeny was used to assign putative identities to the receptors based on similarities across different species. Phylogenetic trees were made of the deduced neuropeptide GPCR protein sequences to which I added those of some better known insect species, i.e. Drosophila melanogaster, Apis mellifera, Tribolium castaneum (Hauser et al., 2006a,b, 2008), Bombyx mori (Yamanaka et al., 2008), Nilaparvata lugens (Tanaka et al., 2014) and Z. nevadensis (Veenstra, 2014), as well as those of the spider mite (Veenstra et al., 2012). At a later stage predicted GPCRs from the crayfish Procambarus clarkii (Veenstra, 2015) and the annelid P. dumerilii (Bauknecht and Jékely, 2015) were also added. I only used the concatenated transmembrane regions of the GPCRs for making these trees. Predicted transmembrane topology of putative neuropeptide GPCRs was checked with TMHMM (Krogh et al., 2001) at http://www.cbs.dtu.dk/services/TMHMM/. Only GPCRs for which at least four transmembrane regions were identified were included in the phylogenetic trees. Other deorphanized protostomian as well as a few deuterostomian GPCRs were also included in the analysis. Initial sequence alignments were obtained using Clustal Omega (Sievers et al., 2011) that were inspected and manually corrected using Seaview (Gouy et al., 2010). The latter program was also used for selecting the conserved protein regions as input for PhyML (Guindon and Gascuel, 2003) and FastTree 2 (Price et al., 2010) in order to construct the trees. For the large trees I only used the transmembrane regions, but for smaller trees consisting a group of orthologous GPCRs conserved regions outside the transmembrane regions were also included. As noted previously (Price et al., 2010; Veenstra et al., 2012) FastTree is very fast and was used exclusively for the large trees (i.e. Suppl. Figs 1 and 2), while for the eclosion hormone tree (Suppl Fig 4) PhyML was used. Prediction of the proteolytical processing of putative neuropeptide precursors at convertage cleavage sites was guided by the rules formulated for insects (Veenstra, 2000), while the prediction of signal peptides prediction was done by Signal P 4.0 (Petersen et al., 2011), although in those cases where a likely neuropeptide precursor seemed to lack a signal peptide based on Signal P 4.0, I also used Signal P 3.0 (Bendtsen et al., 2004); this program was also used to predict the signal anchor in the second CCRFamide precursor from P. dumerilii.

43

The predicted neuropeptidome (79 neuropeptide precursors) from the scorpion Mesobuthus (Supplementary Table 5) is quite similar to those of the spiders, although genes coding corazonin and sulfakinin and their receptors are absent from the latter genome. On the other hand, genes coding vasopressin and its receptor, lacking in spiders, are present in the scorpion. The genome from the house dust mite contains many small gaps, some which appear artificial and are likely due to allelic polymorphisms in the genome sequence as indicated by several Trinity predicted transcriptome sequences that overlap such gaps. Compared to the spider mite there are some interesting differences. Thus, whereas the spider mite does not have a sulfakinin gene, this gene and its receptor are present in the house dust mite. It is difficult to ascertain to which extent the absence of some neuropeptide genes and GPCRs is due to the fragmentary nature of the assembly. An interesting difference with the spider mite concerns two GPCRs (FMRFamide #2 and orphan 9 in Suppl. Fig. 1) that show very large expansions in Dermatophagoides but are present in a single copy in or even absent from the Tetranychus genome. In the case of the Myriapod Strigamia I added another 9 neuropeptide precursors to those previously identified in this species (Chipman et al., 2014) and used the RNAseq data to correct several more (Supplementary Table 7). The predicted GPCRs from the various species are listed in a supplementary fasta file and phylogenetic trees are present in the supplementary data (Suppl. Figs. 1 and 2). In the spider and scorpion genomes and transcriptomes many genes and/or transcripts coding neuropeptides, neurohormones and their receptors are present in duplicate and in the case of the receptors there are several GPCRs that are represented by four paralogs in the spider or scorpion genomes. Both the nucleotide and predicted protein sequences are sufficiently divergent to make it highly unlikely that these differences reflect allelic variation (Suppl. Data). However, the combination of multiple paralogs, the fragmented nature of the genomes and the often large intron sizes of GPCR genes made it often impossible to reconstruct complete GPCRs. For example, from the data in Suppl. Fig. 1 it might appear that there are no tachykinin or natalisin GPCRs in the Mesobuthus genome, while in fact such receptors are present but since the various exons are scattered over multiple contigs it is impossible to predict the encoded proteins. For a large number of deduced neuropeptide precursors and GPCRs the sequence data do not provide much new information and the following description will thus focus on those that show interesting differences with other arthropods.

3. Results and discussion 3.1. Achatin The predicted neuropeptide and neurohormone precursors and their putative processing into active peptides are listed in Supplementary Tables 1–7. A comparison of neuropeptide genes found in various Arthropod genomes is presented in Table 1. Of the two spider genomes that were analyzed, the one from the tarantula A. geniculata is considerably larger, has a 30 times higher heterozygosity and a lower coverage than the velvet spider, S. mimosarum. It is, therefore, not surprising that the quality of the latter is far superior. The tarantula genome is very fragmented and complete precursors were only identified for a limited number of neuropeptides. However, the tarantula belongs to the Mygalomorphae, while the other spiders species treated here all belong to the Araneomorphae and it thus seemed of interest to include these data. Apart from these two genomes I also analyzed the transcriptomes of two other spider species, i.e. L. hesperus and P. tepidariorum. A total of 87 neuropeptide precursors were found encoded by the Stegodyphus genome (Suppl. Table 1). Using transcriptome data Christie (2015) has already published 38 of the Latrodectus neuropeptide precursors; using the same data I found an additional 29 Latrodectus neuropeptide precursors (Supplementary Table 2).

I found spider and scorpion orthologs of the molluscan achatin gene that encode precursors coding for 6–12 tetrapeptides, mainly GFGE, but also some GFGD and in Acanthoscurria SFGD. Although I did not find it in the Myriapod Strigamia genome, it is present in the transcriptome of another Myriapod, Symphellella vulgaris and such a gene is similarly present in ticks (Fig. 2). It may well be that the Phe residue in the Arthropod achatins is also is a d-amino acid as is the case in the mollusk Achatin fulica from which this peptide was first identified (Kamatani et al., 1989) as well as in Platynereis, from which an achatin GPCR was recently deorphanized from Platynereis (Bauknecht and Jékely, 2015). Two orthologous receptors that are most likely achatin receptors are present in the Lottia genome and one such receptor was found in the Mesobuthus and Stegodyphus genomes (Suppl. Fig. 1). 3.2. Allatostatin A and other N-terminally extended GLamides The Stegodyphus genome contains a gene coding peptides similar to allatostatin A from insects that encodes at least 15 paracopies

44

J.A. Veenstra / General and Comparative Endocrinology 229 (2016) 41–55

Table 1 Comparison of the presence of neuropeptide and neurohormone genes and their GPCRs in representative Arthropod genomes. Stegodyphus

Mesobuthus

Tetranychus

Strigamia

Procambarus

Zootermopsis

Drosophila

Bursicon-A/B CHH/ITP/MIH* EH GPA2/GPB5 Insulin Neuroparsin Dilp7-like Dilp8-like

1/1 (2) 8 (1) 4 (nd) 2/2 (2) 12 (nd) nf (nd) 1 (2) nf (0)

2/2 (2) 9 (1) 3 (nd) 2/2 (1) 9 (nd) nf (nd) 2 (2) nf (2)

1/2 (1) 5 (2) 5 (nd) 1/1 (1) 2 (nd) nf (nd) 1 (1) nf (1)

1/1 (1) 1 (1) 3 (nd) abs 3 (nd) nf (nd) abs nf (1)

1/1 (1) 12+3 (2) 2 (nd) 1/1 (1) 1 (1) 3 (2) 1 (1) nf (1)

1/1 (1) 12 (1) 2 (nd) 1/1 (2) 5 (nd) 1 (nd) 1 (1) nf (1)

1/1 (1) 12 (1) 1 (nd) 1/1 (1) 6 (nd) abs 1 (1) 1 (1)

Achatin ACP/GnRH-like AKH/RCPH Allatostatin-A Allatostatin-B (=mip) Allatostatin-C/CC Allatotropin Calcitonin CCHamide CCAP CCRFamide CNMamde Corazonin CRF-like DH DENamide DH31 EFL(G)amide Elevenin ETH FMRFamide KYMGLamide Leucokinin Myosuppressin Natalisin NPF NPLP1 Orcokinin PDF Periviscerokinin (capa) PGW3XGLamide1 Proctolin Pyrokinin/Pyrokinin-like RYamide SIFamide/SMYamide sNPF Sulfakinin Tachykinin Trissin Vasopressin

1 (1) 1 (3) abs 1 (2) 1 (1) 4/2 (2) 1 (3) 3 (5) 2 (2) 1 (1) 2 (?) abs 1 (4) 2 (6) nf (?) 2 (1) 12+1 (2) 2 (1) 2 (1) 1 (2) 1 (?) 1 (3) 2 (4) 1 (3**) 2 (11) 1 (?) 2 (?) 1 (6) 1 (3) 2 (?) 2 (2) 2 (4) 1 (1+12) 1 (2) 2 (4) 2 (3) 1 (2) 1 (1) abs

1 (1) 1 (1) abs 1 (2) 1 (1) 2/1 (2) 1 (2) 2 (8) 1 (3) 2 (1) 1 (?) abs abs 1 (2) nf (?) 2 (5) 1 (1) 1 (1) 2 (4) 1 (1) nf (?) 1 (1) 1 (1) 1 (3**) 3 (4) 1 (?) 2 (?) 1 (2) 1 (2) 2 (?) 1 (1) 1 (4) 2 (1+12) 2 (2) 2 (3) 1 (1) 1 (nf) 2 (1) 1 (2)

nf 3 (5) abs 1 (1) 1 (1) 1/2 (1) 1 (1) 1 (1+1) 2 (1+2) 1 (1) 1 (?) abs abs 1 (2) nf (?) 1 (1) 1 (1) 2 (1) 1 (1) 1 (1) nf (?) 1 (1) 1 (2) 1 (1+2) 4 (2+3) nf 1 (?) nf (0***) 1 (1) nf (?) 1 (1) nf (2) 2 (1) 1 (1) 1 (1) abs 1 (1) nf (1) 1 (1)

nf 1 (1) 1 (1) 1 (1) 1 (1) 1/1 (1) 1 (1) 2 (1) 2 (1) 1 (1) 1 (?) 1 (1) 1 (2) 1 (2) nf (?) 1 (2) 1 (1) 1 (1) 1 (2) 2 (1) nf (?) abs 1 (2) abs 1 (1) 1 (?) 12 (?) nf (1) nf (1) nf (?) 1 (1) 1 (2) 1 (12) 1 (2) 1 (1) 1 (3) 1 (1) abs 1 (2)

nf 1 (1) 1 (1) 1 (1) 1 (1) 1/2 (2) abs 1 (nf) 2 (3) 1 (1) 1 (?) 1 (1) 1 (1) 1 (1) nf (?) 1 (1) 1 (1) 1 (2) nf (1) 1 (1) nf (?) 1 (1) 1 (2) 1 (2) 2 (2) 1 (?) 1 (?) 3 (3) 1 (1) nf (?) 1 (2) 1 (1) 1 (1) 1 (2) 1 (1) 1 (1) 1 (1) 1 (1) 1 (2)

abs 1 (1) 1 (1) 1 (2) 1 (1) 1/1 (1) 1 (1) 12 (2) 2 (2) 1 (1) 1 (?) 1 (2) 1 (1) 1 (1) nf (?) 1 (2) nf (?) 1 (1) 1 (12) 1 (1) nf (?) 1 (1) 1 (2) 1 (1) 12+1 (2) 1 (?) 12 (?) 1 (1) 1 (2) nf (?) 1 (1) 2 (3) 1 (1) 2 (2) 1 (1) 1 (1) 1 (1) 1 (1) 1 (1)

abs abs 1 (1) 1 (2) 1 (1) 1/1 (2) abs abs 2 (2) 1 (1) nf (?) 1 (1) 1 (1) 1 (2) nf (?) 1 (2) nf (?) abs 1 (12) 1 (1) nf (?) 1 (1) 1 (2+1) 1 (1) 1 (1) 1 (?) 12 (?) 1 (1) 1 (1) nf (?) 1 (1) 1 (3) 1 (1) 1 (1) 1 (1) 1 (2) 1 (1) 1 (1) abs

–List of the number of paralogs of each neuropeptide gene found in various Arthropod genomes; between parentheses the number of their putative GPCRs. When two proteins are needed to produce a single hormone as is the case for Bursicon and GPA2/GPB5, a slash was used to indicate the number of ligand producing genes. The same was done for Allatostatins C and CC which are known to act on the same receptor (Audsley et al., 2013). Note that the receptors for calcitonin and the dilp7-like hormones are identified solely by comparing their presence in various Arthropod genomes and the similarity of their ligands to those of the vertebrate orthologs of these GPCRs. The identification of these receptors is, therefore, speculative. Eclosion hormone activates a membrane bound guanylate cyclase, the insulin related peptides act on tyrosine kinases and the neuroparsins on venus trap kinases (Chang et al., 2009; Nässel et al., 2015; Vogel et al., 2015). None of those receptors were investigated here. Abbreviations: abs, likely absent, neuropeptides for which neither the neuropeptide nor its specific GPCR genes were found in the genome; nd, not determined; nf, not found; ?, no receptor has so far been identified for this type of ligand. 12, a single gene generating two different splice variants, each expected to yield a functional receptor. * Three different types of ITP receptors have been identified in Bombyx mori (Nagai et al., 2014). One of these has no obvious orthologs in other Arthropods, the second type is an ortholog of the Drosophila tachykinin GPCR; orthologous GPCRs have been listed under Tachykinin, in this line only the third type is indicated. ** Some of these GPCRs are absent from Supp. Fig. 1 and the fasta file. *** In a previous publication a putative PDF receptor was identified (Veenstra et al., 2012), but a more recent analysis (Veenstra, 2015) suggests that this is perhaps not a PDF receptor.

that have the C-terminal consensus sequence RFAFGLamide. In addition there are two other genes that encode N-terminally extended GLamides (Fig. 3). One encodes 18 neuropeptides with the consensus C-terminal sequence of YDPALKYMGLamide, as well a single copy of DRDNRPKYNPGWIFIGLamide. The third gene encodes two different GLamides that seem similar to both DRDNRPKYNPGWIFIGLamide and YDPALKYMGLamide. This gene I called the PGW3XGLamide for amino acid residues that seem to best conserved in the peptides it encodes. The same three genes are also represented in the transcriptomes of Parasteatoda and

Latrodectus. The apparent absence of the YDPALKYMGLamide gene from Acanthoscurria could be due to the fragmentary nature of this genome. In the scorpion Mesobuthus the YDPALKYMGLamide could neither be found, whereas two genomic fragments from this species that encode peptides similar to DRDNRPKYNPGWIFIGLamide might represent either one or two genes. The three different genes encoding LGamide neuropeptides could either have their origin in a gene duplication of the allatostatin A gene, or, alternatively, they share a C-terminal GLamide sequence by chance, not unlike the deuterostomian neuropeptides containing a C-terminal RFamide

J.A. Veenstra / General and Comparative Endocrinology 229 (2016) 41–55

45

Fig. 2. Predicted prepro-achatin from the spider S. mimosarum, the scorpion M. martensii, the garden centipede S. vulgaris, the tick I. scapularis and the owl limpet L. gigantea. Note that the structure of the four Arthropod neuropeptide precursors are very similar the molluscan one. Yellow indicates the signal peptide, red convertase cleavage sites and amino acid residues that are predicted to be removed by carboxypeptidase. Cysteine residues are in orange and light blue indicates the back bones of the mature peptides.

Fig. 3. Predicted precursors of three Stegodyphus LGamides, allatostatin A, YDPALKYMGLamide and PGW3XGLamide. Purple indicates Gly residues predicted to be transformed into C-terminal amides of the mature peptides and highlighting in black indicate the signature subsequences of the three different types of peptides; other colors as in Fig. 2. Note that the signature subsequences of YDPALKYMGLamide and PGW3XGLamide show significant overlap.

sequence (Elphick and Mirabeau, 2014). If these neuropeptide genes are close evolutionary relatives, then their GPCRs might similarly be closely related and the peptides encoded by the GLamide genes might act through two allatostatin A GPCR orthologs that are present in the Stegodyphus and Mesobuthus genomes (Suppl. Fig. 1).

3.3. Allatostatin C and CC In most insect genomes the Allatostatin C and CC genes are next to another on the same DNA strand, although in some species the two genes are on opposite strands. The Mesobuthus allatostatin C1 and CC and Stegodyphus allatostatin C1, C2 and CC1 genes are similarly clustered. These local paralogs suggest an ancient gene duplication. As discussed elsewhere it seems likely that the allatostatin C and CC genes are expressed in different cell types and bind the same GPCRs (Veenstra, 2009a). It has indeed been shown for the beetle T. castaneum that these two peptides activate the same receptor (Audsley et al., 2013). The spider Stegodyphus and the scorpion Mesobuthus each have two paralogs of the insect allatostatin C receptor, while two and one such receptors were found in the transcriptomes of Latrodectus and Parasteatoda respectively.

3.4. CCH or ITP Crustacean hyperglycemic hormone (CHH), moult inhibiting hormone (MIH), and insect ion transport peptid (ITP) are all members of the same peptide family. However, whereas insect genomes contain generally only a single ITP gene that through alternative splicing is responsible for two ITP isoforms, crustaceans generally have several genes encoding a CHH-like peptide, that may be clustered on a single chromosome region (Gu and Chan, 1998; Dircksen et al., 2001, 2011). Chelicerates, including the spider mite (Veenstra et al., 2012) and the house dust mite, have also multiple genes coding such peptides. In the Stegodyphus genome these genes are clustered in two locations, the first one contains four genes in a stretch of about 70,000 bp, while the second contains three genes in a stretch of about 120,000 bp (data not shown). The contigs containing the Mesobuthus CHH genes are much smaller and hence I don’t know whether or not the Mesobuthus CHH are similarly clustered. It is of interest to note that the Latrodectus CHHs have been called latrodectins when they were first identified as contaminants in the purification of a-latrotoxin (Pescatori et al., 1995). Whereas in Crustaceans and insects primary transcripts of these genes undergo alternative splicing, such alternative splicing

46

J.A. Veenstra / General and Comparative Endocrinology 229 (2016) 41–55

does not occur in the spider mite (Veenstra et al., 2012). Evidence for such alternative splicing was neither found for the spiders nor the scorpion, while the extensive RNAseq data from Strigamia similarly reveals a single mRNA. The Chelicerate peptides are quite variable in structure and do not seem very similar to either the Crustacean MIH’s or CHH’s nor to the insect ITP’s. Three GPCRs that are activated by ITP were recently identified from Bombyx mori (Nagai et al., 2014), one of them belongs to the tachykinin GPCRs and it seems likely that both ITP and tachykinins can activate this receptor. The second GPCR, which I have called ITP-like, also has orthologs in Chelicerates as well as in Strigamia, while the third one, although most similar to pyrokinin and periviscerokinin receptors, does not seem to have clear orthologs (Supp. Fig. 1). 3.5. CCRFamide (previously called LFRYamide) The LFRYamide neuropeptide gene was first identified from the Lottia genome (Veenstra, 2010). It was called LRFYamide due to similarity in structures of the neuropeptides coded by this gene with those encoded by the LFRFamide gene. When I analyzed the Lottia genome I looked for small neuropeptides and ignored the remainder of their precursors. In hindsight this may have been a mistake, as the first part of the Lottia LFRYamide precursor has two predicted cysteine bridges and is well conserved (Conzelmann et al., 2013; Stewart et al., 2014). It is this part in particular that allows one to find homologous precursors in other species, including spiders and some insects (Fig. 4 and Suppl.

Fig. 3). When comparing those precursors, it becomes clear that the Lottia precursor is unusual in two aspects. First, it contains four paracopies of the putative neuropeptide, while most other species contain only one or two. However, the most striking difference concerns the first predicted neuropeptide in each of these other precursors. In Lottia the first neuropeptide is separated from the conserved N-terminal part of the precursor containing the double disulfide bridge by a convertase cleavage site. Consequently, in this species this part of the precursor should not be part of the neuropeptide itself. However, in all the other species, such a convertase cleavage site is lacking and the two disulfide bridges are predicted to form the N-terminal of the neuropeptide. Curiously, in those precursors that contain a second neuropeptide, the latter lacks this N-terminal extension. The Chelicerate C-terminal of the predicted neuropeptides is R(V/I)P(M/L)RFamide. Such precursors are not only found in mollusks and Chelicerates, but also in Myriapods and various hemimetabolous insect orders (Fig. 4 and Suppl. Fig. 3). The Stegodyphus genome contains two such genes that appear to be homologs of this Lottia gene, while both the Parasteatoda and Latrodectus transcriptome similarly reveal two precursors. A similar neuropeptide has previously been described from P. dumerilii and was called CCRFamide (Conzelmann et al., 2013), the name I suggest also to be used for the Arthropod and molluscan peptides. Some of the putative insect CCRFamide precursors are still stranger in that they contain five rather than four cysteine residues (Fig. 4 and Suppl. Fig. 3). Cysteine residues are typically oxidized into disulfide bridges in the endoplasmatic reticulum,

Fig. 4. Predicted CCRFamide precursors from various protostomian species. Note that in the limpet owl L. gigantea the conserved N-terminal of the precursor is predicted to be cleaved by convertase, while in all other species it can be expected to form part of the first and, in Arthropods the only, predicted peptide. Remarkably, in P. dumerilii the second CCRFamide precursor (GBZT01003772.1) is predicted to have a signal anchor rather than a signal peptide. Color coding as in Figs. 2 and 3.

J.A. Veenstra / General and Comparative Endocrinology 229 (2016) 41–55

and when the number of cysteine residues is impair a dimere is expected to be formed, at least in the case of relatively small sequences as is the case here. If the insect CCRFamide were indeed to form a dimere, it is not impossible that the other protostomian CCRFamides might also form dimeres. It is tempting to speculate that this unusu al N-terminal extension might protect the peptide against rapid inactivation by proteases and thus increases its half life. Three of the Crassostrea ESTs for this putative neuropeptide gene come from two independent hemocyte specific libraries (FP006082.1, EW777524, EW778728). Allatostatin CC is another protostomian neuropeptide that is expressed in hemocytes (Veenstra, 2014). In some, but not all, insect species the precursor of this peptide does not have a signal peptide but a signal anchor (Veenstra, 2009a, 2014). It is interesting in this respect that the second CCRFamide precursor from P. dumerilii (GBZT01003772.1) similarly has a signal anchor rather than a signal peptide (Fig. 4).

47

Although Strigamia also has a EFLamide gene (Chipman et al., 2014), the transcriptome data only revealed a transcript encoding the EFLamides. Thus the Strigamia genome does not seem to code for EFLGGPamides. It is plausible that the Lottia PXFVamide and the Platynereis EFLGamide precursors (Veenstra, 2010; Conzelmann et al., 2013) constitute the lophotrochozoan orthologs of the arthropod EFLamide genes. If so, the orthologs of the Platynereis EFLGamide GPCR (Bauknecht and Jékely, 2015) might be the EFLamide receptors. Since this Platynereis receptor is a clear ortholog of the TRH receptor it suggests that these protostomian neuropeptides may be the TRH orthologs. Although similar neuropeptide precursors have so far not been reported from insects, the planthopper Nilaparvata does have a gene encoding an orthologous GPCR, and orthologs of this GPCR are also present in the genomes of Rhodnius prolixus and Diaphorina citri, but attempts to find EFLamide genes in these genomes were inconclusive. 3.9. Elevenin

3.6. CNMamide CNMamide was recently discovered as a novel insect neuropeptide (Jung et al., 2014). In none of the Chelicerates genes coding this peptide or its receptor were discovered, but both the peptide and its receptor are present in the Strigamia genome (Suppl Table 7 and Suppl. Fig. 1). Interestingly, a GPCR related to the CNMamide receptor was found in the Lottia genome (Suppl. Fig. 1), but I did not find a gene encoding the putative ligand from this species. 3.7. Eclosion hormone The three eclosion hormones identified from the spider mite genome turned out to be quite different from the insect hormones (Veenstra et al., 2012). I used the various eclosion hormone sequences to make a phylogenetic tree (Suppl. Fig. 4). Although the sequences are too short to use such a tree for reliable conclusions as to the exact phylogenetic relationships among the species involved, the method does allow to group similar sequences together and thus discern major evolutionary events. The results show three eclosion hormone clusters (A, B and C in Suppl. Fig. 4). Members of the A-cluster have been conserved in spiders, the spider mite and Strigamia, those of the B-cluster are present in insects and spiders, whereas the C-cluster is represented by Crustaceans, Mesobuthus, Strigamia and some insects, notably Zootermopsis and Tribolium, but also Nilaparvata and Diaphorina (Suppl. Fig. 4). It is impossible to predict whether the different paralogs have acquired different functions, but I am inclined to believe that the reason for amplification of this gene is that it is needed in large quantities but made by only a few neurons (cf Veenstra, 2014). Indeed the somewhat haphazard absence of the A and B clusters in Mesobuthus, the presence of the A cluster in the spider mite, while in the tick it is the C cluster that is preserved, all seem to suggest that the various hormones may be interchangeable. 3.8. EFLamide This neuropeptide gene and its alternative splicing was first identified in the genome of Tetranychus urticae. It is also found in several other Arthropods, but has so far not been found in insects (Veenstra et al., 2012). In Chelicerates the first transcript yields a precursor that can yield 7–10 paracopies of EFLamide and the second a somewhat smaller number of EFLGGPamides. In the Stegodyphus genome a second EFLamide gene was found that encodes a single copy of EFLGGPamide. The Latrodectus and Parasteatoda transcriptomes reveal the same genes, and while Mesobuthus has the typical Chelicerate EFLamide gene, one encoding a single copy of EFLGGPamide was not found in its genome (Suppl. Tables 1–7).

Elevenin was identified from the L11 neuron in Aplysia (Taussig et al., 1984) and was recently shown to be present in several arthropods, including insects, but absent from the fruit fly (Veenstra, 2014). The identification of two elevenin GPCRs from Platynereis (Bauknecht and Jékely, 2015) allows for the in silico deorphanization of other protostomian elevenin receptors. As one would expect the distribution of the elevenin GPCR corresponds well with that of the elevenin precursor (Veenstra, 2014). Interestingly, there is a second protostomian GPCR that is closely related to the elevenin receptor, but of course it is quite possible that its ligand is very different from elevenin (Suppl. Fig. 1). 3.10. FMRFamide and myosuppressin Insects have a FMRFamide gene and a myosuppressin gene. The first one encodes multiple FMRFamide copies and the second usually a single myosuppressin, that has the C-terminal sequence HVFLRFamide. Although specific GPCRs have been identified for these peptides (Cazzamali and Grimmelikhuijzen, 2002; Meeusen et al., 2002; Egerod et al., 2003), some promiscuity between these peptides and their receptors occurs under physiological conditions (Yamanaka et al., 2006). The genes that are most likely the FMRFamide Chelicerate orthologs code for a smaller number of FIRFamides, while there are two spider genes encoding myosuppressin-related peptides. The first encodes a number of neuropeptides with the consensus sequence of (G/A)HSMIHFamide. The second gene codes for a larger number of peptides with the consensus sequence of SDPWENHNTMHF-amide (Fig. 5 and Suppl. Tables 1–4). The latter gene may be specific to spiders, as it was not found in the Mesobuthus genome (Suppl. Table 5). Interestingly, a second FMRFamide GPCR that is quite distinct from the better known myosuppressin and FMRFamide receptors, was recently described from Platynereis (Bauknecht and Jékely, 2015). It will be interesting to see whether its Arthropod orthologs are similarly activated by FMRFamide. 3.11. GnRH-related peptides In insects adipokinetic hormone is probably the best known neuropeptide. Although AKH and its receptor is also found in mollusks (Roch et al., 2011; Johnson et al., 2014; Hauser and Grimmelikhuijzen, 2014), no genes were found to encode similar peptides or their GPCRs in Chelicerates. Insects have two other GnRH-related peptides, corazonin (Veenstra, 2009b) and ACP (AKH-corazonin-related peptide, Hansen et al., 2010). Corazonin is absent from the spider mite (Veenstra et al., 2012) as well as from the house dust mite and Mesobuthus, but it is present in

48

J.A. Veenstra / General and Comparative Endocrinology 229 (2016) 41–55

Fig. 5. The two predicted Stegodyphus myosuppressin precursors. Note that in the second precursor the large majority of the myosuppressin paracopies have the Pro-Trp signature in their sequence (highlighted in black). Color coding as in Figs. 2 and 3.

spiders in which the structure of the peptide is [Tyr]3-corazonin, virtually identical to the peptide initially identified from the American cockroach (Veenstra, 1989). Strigamia has both an AKH and a ACP gene as well as one AKH and one ACP GPCR that are very similar to their insect orthologs. However, the Strigamia corazonin gene is very unusual in that it codes for two structurally different peptides (Fig. 6a). Although the first peptide retains significant sequence similarity to corazonin, the second peptide seems more similar to ACP as it contains the RWD subsequence, but otherwise it would be hard to recognize this as a GnRH-related neuropeptide (Fig. 6b and c). The Strigamia genome has two corazonin receptors. While some insect neuropeptide genes have more than one receptor, this has never been found for corazonin. As the two peptides encoded by the Strigamia corazonin gene are structurally very different, this suggests that these two peptides have each their own specific receptor. Although the argument could be made that the spider Stegodyphus also has two corazonin receptors, it is obvious that neuropeptide GPCR encoding genes are commonly amplified in Chelicerates. I hypothesize that the structures of these peptides have diverged from one another in order to allow each one to interact specifically with only one of the two corazonin GPCRs. Such divergence in structures is also observed after duplication of the ancestor of the vasopressin gene in annelids and mollusks (Fig. 7 in Veenstra, 2011). The substitution of a charged amino acid residue for a neutral one (or the other way around) is a very effective way to change receptor binding. Such substitutions are often found when two initially identical peptide sequences acquire different functions, compare e.g. oxytocin and vasopressin or their protostomian orthologs (Veenstra, 2011). This may explain how the two peptides on the Strigamia corazonin precursor diverged. The third GnRH-related insect peptide is ACP. This peptide is relatively well conserved in insects (Hansen et al., 2010), but much less so in Chelicerates. However, as the putative Chelicerate GnRHrelated peptides have some structural similarity to ACP and they all have at least one ACP-GPCR ortholog, it seems likely that these are ACP orthologs (Fig. 6c). 3.12. Insulin and relaxin related hormones The largest numbers of paralog genes were found for the insulin-related peptides, 12 genes in the Stegogyphus genome and

Fig. 6. Unusual Arthropod GnRH-related peptides. The Strigamia corazonin precursor encodes likely two biologically active peptides. In (a) the complete predicted precursor, in (b) aligning several Arthropod corazonin sequences with those of the two predicted peptides from the Strigamia corazonin precursor, and in (c) Arthropod ACPs and aligned with the putative GnRH-related peptides from spiders as well as the second peptide from the Strigamia corazonin precursor. Color coding in (a) as in Figs. 1 and 2.

J.A. Veenstra / General and Comparative Endocrinology 229 (2016) 41–55

7 and 6 in the Parasteatoda and Latrodectus transcriptomes respectively. The Stegodyphus insulin genes are found in four different contigs, that contain 1, 3, 4 and 4 genes and are respectively made up of 123, 1,014, 378 and 201 kbp (data not shown). In the Mesobuthus genome there is only one contig (of 51 kbp) that contains two insulin genes, the other seven insulin genes are all located on different contigs that differ in size from 6 to 134 kbp. It is plausible that the much smaller size of Mesobuthus contigs has prevented me to observe a similar clustering of the insulin genes in this species. The peptides homologous to Drosophila insulin-like peptide 7 (dilp7) have a very characteristic A chain as well as signature amino acid residues in the B-chain (see Fig. 3, Veenstra et al., 2012). They are only present in those species that also have an ortholog of the GPCR encoded by the Drosophila gene CG34411 which, for that reason, is presumed to be its specific receptor (Veenstra et al., 2012; Veenstra, 2014). It is plausible and perhaps even likely based on work in Drosophila (Linneweber et al., 2014), that this relaxin-related peptide may also stimulate the classical insulin tyrosine-kinase receptor. In spiders there seems to be a single copy of this gene in the genome, but in Mesobuthus there are two such genes. Both this gene and its putative receptor are lacking from the Strigamia genome. Dilp8 is another relaxin-like hormone and it has been suggested that the orthologs of the Drosophila GPCR encoded by CG31096 might be its receptor (Veenstra, 2014), a hypothesis for which there is now significant experimental evidence (Garelli et al., 2015). 3.13. NPF In the Stegodyphus genome the two NPF genes are located on the same contig and there are about 47,000 base pairs between the start codons of the two genes. Clustering of NPF genes also happens in the genomes of the spider mite and the annelid Capitella teleta (Veenstra, 2011; Veenstra et al., 2012). The amplification of the NPF receptor is particularly important in the Stegodyphus genome as ten GPCRs on the phylogenetic tree fall in the same group as the deorphanized protostomian NPF receptors; eight of those have very similar amino acid sequences (Suppl. Fig. 1). Each of these GPCRs is located on a separate contig which are on average more than 253,000 base pairs in size. 3.14. Orcokinin Orcokinin genes code for three different types of neuropeptides, orcokinin A, orcokinin B and the orcomyotropins (Stangier et al., 1992; Dircksen et al., 2000, 2011; Pascual et al., 2004; Hofer et al., 2005; Sterkel et al., 2012; Veenstra, 2014; Jiang et al., 2015). Stegodyphus has two orcokinin genes, and this may well be the case in other spiders as well, as transcripts from two different orcokinin genes are found in the Latrodectus transcriptome, and also in the Mesobuthus genome. In Stegodyphus the first orcokinin gene codes for one orcokinin B and 13 orcokinin A paracopies, while the second orcokinin gene codes for two paracopies each of orcomyotropin, orcokinin A and orcokinin B (Supp. Tables 1–5). Contrary to an earlier report (Chipman et al., 2014), Strigamia also has an orcokinin gene that produces three different transcripts. The first two have only minor differences and encode two orcomyotropin paracopies and what is possibly the remainder of a degraded third copy, while the third transcript encodes ten orcokinin B paracopies (Supp. Table 7). 3.15. Pyrokinin and periviscerokinin In insects three pharmacologically different pyrokinin/perivis cerokinin-like peptides can be distinguished based on their interaction with different GPCRs. Although the data is most complete

49

for Drosophila and the beetle T. castaneum (Iversen et al., 2002; Rosenkilde et al., 2003; Cazzamali et al., 2005; Jiang et al., 2014), work on the kissing bug Rhodnius prolixus and the silkworm Bombyx mori seems to confirm this classification (Homma et al., 2006; Paluzzi et al., 2010; Paluzzi and O’Donnell, 2012). These are the pyrokinins, tryptopyrokinins and periviscerokinins, that have C-terminal consensus sequences of FXPR(I/L/V)-amide, (M/L)WFGPR(L/V)-amide and (F/Y)PR(I/V)-amide respectively. Of these the tryptopyrokinins seem to be limited to insects. Although most of these receptors are specific for their signature peptides, it is noteworthy that in Tribolium the tryptopyrokinin GPCR is also strongly stimulated by the pyrokinins (Jiang et al., 2014). I used these signature sequences to putatively identify the predicted Chelicerate peptides. In the Stegodyphus genome there is one gene that encodes a large precursor containing primarily periviscerokinins. A similar gene was also found in the Latrodectus transcriptome The Stegodyphus genome also contains two smaller genes that encode pyrokinins, the first exclusively so, and the second one periviscerokinins as well (Fig. 7). In Strigamia there is a single gene that appears to code both types of peptides. 3.16. Sulfakinin Insects have in general a single sulfakinin gene that encodes two paracopies. This is also the case in the Myriapod and the scorpion, but the spiders all have two sulfakinin genes, one that codes for two paracopies and the second that codes for a single sulfakinin (Supp. Tables 1–5). Strigamia has three different sulfakinin GPCRs (Supp. Fig. 1). While the spidermite has neither a sulfakinin gene nor a sulfakinin GPCR, both these genes are present in the house dust mite (Supp. Table 6, Supp. Fig. 1). 3.17. Tachykinins and natalisin Both the insect tachykinins and natalisins have a C-terminal Arg-amide sequences. The spider genes that encode primarily C-terminal Arg-amides with a Pro residue in the same position as the Drosophila natalisin (Jiang et al., 2013) have been named natalisin, and the others tachykinin. However not all natalisins have this particular residue and it is absent from most of the peptides encoded by the putative Mesobuthus natalisin gene (Supp. Tables 1–7). It is not clear how valid this identification really is, as the tachykinin and natalisin receptors were not as clearly separated in the phylogenetic trees as one would have liked (Supp. Fig. 1). 3.18. Trissin Trissin is another insect neuropeptide that was only recently discovered (Ida et al., 2011). Previously we were able to identify a spider mite GPCR that appeared to be an ortholog of the insect trissin receptor, but we were unable to find a gene encoding its ligand in this genome (Veenstra et al., 2012). In the house dust mite neither a trissin gene nor a trissin GPCR gene were found and this is also true for the Strigamia genome. However, all the spiders have such a gene and the scorpion has even two, while a single trissin GPCR was found in the Stegodyphus and Mesobuthus genomes. 3.19. Neuropeptide GPCRs Like the genes coding neuropeptides, those coding neuropeptide GPCRs are also duplicated in the Stegodyphus and Mesobuthus genomes. The degree of amplification of GPCRs seems even higher than that of the neuropeptide genes. The relatively complete transcriptomes of Parasteatoda and Latrodectus confirm for a number of these paralogs that they are expressed. Not all GPCR genes are

50

J.A. Veenstra / General and Comparative Endocrinology 229 (2016) 41–55

Fig. 7. Comparison of the three Stegodyphus precursors encoding periviscerokinins and pyrokinins. The conserved signature sequence of the putative Stegodyphus periviscerokinin (PYRP) has been highlighted in black. Note the large number of such peptides encoded by the periviscerokinin gene. The two pyrokinin genes encode either only pyrokinins or both type of peptides.

duplicated, several exist in a single copy, while others have up to eight paralogs. Whereas in insects the primary RNA transcript encoding the GPCR for eclosion triggering hormone is alternatively spliced into two different mRNAs each encoding a distinct but functional GPCR (Park et al., 1999), this does not appear to be the case in Stegodyphus, Mesobuthus or Strigamia. This is not surprising as it was found to be similarly absent from the spider mite (Veenstra et al., 2012). On the other hand, RNAseq data show that the primary transcript derived from the Strigamia ortholog of the RYamide GPCR is spliced into two different mRNA, each of which seems to encode a complete and likely functional receptor (supplementary fasta file). The RNAseq data from Strigamia reveal the existence of four exons, of which the first one is untranslated. The combination of exons 1–3 lead to the first transcript, while exons 1, 2 and 4 are combined to produce the second one. Both Mesobuthus and Stegodyphus have two paralogs of this gene. In each case one of those is predicted to have similar alternative splicing and to produce two different RYamide GPCRs (Fig. 8). In Stegodyphus the second RYamide GPCR is lacking the alternative exon in the genome, however this may well be an assembly error as there is a small unplaced genomic contig coding for what appears to be this missing exon. In Mesobuthus, another genome assembly problem could explain the apparent absence the alternative exon in the second RYamide GPCR gene.

3.20. In silico deorphanization of GPCRs I added the deorphanized lophotrochozoan GPCRs and all the identified Lottia and Platynereis GPCRs to the phylogenetic analysis as it might facilitate the delimitation of orthologous neuropeptide receptors. As is obvious from the phylogenetic trees (Suppl. Figs. 1 and 2), for a number of Chelicerate neuropeptide receptors a putative ligand can be identified. Nevertheless, there are limitations to this method. First it is not always possible to predict whether or not structurally similar peptides encoded by paralogous neuropeptide genes are able to activate the same GPCRs. In the case of the two spider proctolin genes the proctolin precursors are predicted to produce exactly the same proctolin neuropeptide (Supp. Tables 1–7). So in this case there is no evidence for sub- or neofunctionalization of the proctolin peptide (it remains possible of course that the temporal and/or spatial expression of the genes is different). In the case of the two spider genes encoding DH31 the difference in structure is limited to one amino acid in Stegodyphus, Latrodectus and Parasteatoda (a Ser – Asn substitution), which may be expected to have only a very minor effect on receptor activation. On the other hand the interactions between CRF and the urocortins 1–3 with the two CRF receptors in vertebrates shows that it is very difficult if not impossible to make reliable predictions as to how structurally similar peptides may react with paralogous GPCRs (e.g. Lovejoy et al., 2014).

Fig. 8. Graphical illustration of the putative RYamide GPCRs, from Strigamia (A31a,b), Stegodyphus (A41a,b) and Mesobuthus (A28a,b). Although the amino acids sequences are very similar and the intron–exon splice site locations identical, the intron sizes vary markedly between these three species. The numbers in the lower part of the figure indicate the sizes in nucleotides of the introns and exons as indicated. Colored bars indicated translated and open white bars untranslated 50 and 30 sequences. Sizes in the latter are unknown for Stegodyphus and Mesobuthus, and minimum estimates for Strigamia. The Strigamia GPCR mRNAs were elucidated from the RNAseq data for this species, for the other species amino sequence homology and likely intron–exon splice sites were used to predict the precursors.

J.A. Veenstra / General and Comparative Endocrinology 229 (2016) 41–55

The second limitation resides in our inclination to simplify how neuropeptides act through receptors. By naming a particular GPCR for a specific neuropeptide one often ignores all other neuropeptides that might act through the same receptor. The previously mentioned case of FMRFamide activating the myosuppressin receptor in the prothoracic gland of Bombyx mori (Yamanaka et al., 2006) is a good example of that. Another example is the insect NPF receptor. A honeybee NPF GPCR is missing from the neuropeptide GPCR tree, both here and in previous publications (e.g. Hauser et al., 2008), despite the existence of a honeybee NPF gene and the identification of the predicted peptide by mass spectrometry in brain extracts (Hummon et al., 2006). Such an absence could be due to the fortuitous absence from the genome assembly, as is e.g. the case for the allatostatin C receptor lacking from the Strigamia genome. This is however unlikely to be the correct explanation, as the NPF receptor is also lacking from all other Hymenoptera for which there is a sequenced genome, including the as yet unpublished turnip saw fly genome (data not shown). The Hymenoptera with a sequenced genome belong to very distinct groups, a saw fly, ants, parasitic wasps and various bees. This strongly suggests that Hymenoptera have lost the NPF specific GPCR and that Hymenopteran NPF interacts with a different receptor, perhaps the one for sNPF. This and other work (e.g. Jiang et al., 2014) illustrates that neuropeptides are more promiscuous with their GPCRs than the name given to the receptor seems to imply. Such promiscuity occurs perhaps more easily when peptides are released directly on the tartet site, such as is the case when it is released locally, e.g. within the central nervous system or by a peptidergic neuron directly on its target tissue. The recently published results with Platynereis GPCRs suggest that the promiscuity could be even more widespread. Although many of the deorphanized receptors appear to use the expected ligands, other neuropeptide-GPCR combinations seem surprising. Thus the interaction of DH31 with a type A GPCR and the activation of what looks like a FMRFamide receptor by myomodulin were not anticipated (Bauknecht and Jékely, 2015). In some cases sequence similarities between peptides, such as those between FMRFamide and myosuppressin, likely explain the observed promiscuity. However, in the case of ITP and tachykinin or the Drosophila sex peptide and allatostatin B (Kim et al., 2010), such sequence similarities are not obvious.

3.21. Paralog genes The most striking result of this study is the ubiquitous presence of paralog neuropeptide and neuropeptide GPCR genes in the spider and scorpion genomes. This phenomenon is probably common to many Chelicerates, as it is also present in the transcriptomes of Parasteatoda and Latrodectus. Furthermore, a quick look at the as yet unfinished genome from the horse shoe crab Limulus polyphemus similarly shows large numbers of paralog genes. At first sight it may look like the acari are different, as such an ubiquitous presence of paralog genes is absent from the genomes of the spider mite as well as the house dust mite. However, it is likely that the small size of the mites, 0.4 and 0.3 mm for an adult female spider or house dust mite respectively, and the small cell size that ensues from it imposes stringent limits on their genome sizes that are the smallest known in Arthropods (90.82 Mb and 53.5 Mb respectively). Such small genome sizes probably do not allow for large numbers of paralog genes, although two Dermatophagoides GPCRs are significantly expanded. Several tick transcriptome studies did not yield evidence for paralog neuropeptide genes (Bissinger et al., 2011; Donohue et al., 2010; Egekwu et al., 2014), but the still unfinished genome of the tick Ixodes scapularis may well have a number of paralog genes, as the four paralogs each for the recep-

51

tors of allatostatin A and ACP suggest (Šimo and Park, 2014; Hauser and Grimmelikhuijzen, 2014). The ubiquitous presence of paralog genes – in some cases there are three or four genes that encode very similar neuropeptides or GPCRs – raises two questions, the first one concerns the question how the original genes were duplicated and the second one is why have they been maintained during evolution. 3.22. Why are different paralog neuropeptide genes maintained? The two consecutive whole genome duplications in the vertebrate ancestor are often credited with allowing rapid evolution in this animal group by virtue of having multiple copies of the same genes, which could then be sub- or neo-functionalized (e.g. Hoffmann et al., 2012). Thus one reason duplicate genes are maintained in a genome is that they acquire different functions. As horseshoe crabs are often considered living fossils, such a rapid evolution does not seem to have followed the genome duplication in this species (Nossa et al., 2014). Whereas in vertebrates relative few of these neuropeptide gene duplicates have been retained, in spiders and perhaps other Chelicerate groups, many of these neuropeptide gene duplicates were maintained. Currently we don’t know whether or not the paralog neuropeptide genes are sub- or neo-functionalized. There is another reason why neuropeptide gene duplicates may be maintained. I have previously suggested that if a neuropeptide is needed in large quantities, two copies of a gene in a genome may not be enough to insure sufficient production of a neuropeptide (Veenstra, 2014). An example of this is the Locusta vasopressin gene. In most insect species the two neurons that produce this neuropeptide release it within the central nervous system as a neuromodulator (Tyrer et al., 1993), but in the migratory locust those neurons have evolved into neuroendocrine cells that release the peptide into the hemolymph (Rémy and Girardie, 1980). Obviously in the latter species much larger quantities of peptide need to be produced and this likely explains why this gene has about seven paralogues in Locusta, but is present in a single copy in other insect genomes (Veenstra, 2014). Another example may be the insulin genes in the Bombyx genome (Mizoguchi and Okamoto, 2013). Thus rather than being sub- or neo-functionalized, multiple copies of neuropeptide genes may serve to insure peptides can be made in sufficient quantity. The question is, whether it is possible to distinguish between these two possibilities, i. e. sub- or neo-functionalization or increasing peptide production. As alluded to earlier, there is no evidence for the sub- or neo-functionalization of the neuropeptide products from the two spider proctolin genes as the protcolin sequences predicted from these genes are identical, while in the case of the different DH31 paralogs, sequence differences seem so small that one is inclined to believe that those different neuropeptides act on the same receptors. In the as yet unpublished genome of Limulus there are five contigs coding for a DH31 gene, one of which contains five such genes in a continuous DNA stretch of 86,844 nucleotides while the other four contigs contain a single DH31 gene each. This suggests that at least in Limulus four copies of the DH31 gene are insufficient to insure adequate production of this particular neuropeptide, otherwise it would be hard to explain the amplification of the Limulus DH31 gene. In the case of larger neurohormones like GPA2/GPB5, bursicon, insulin and CHH’s the observed sequence variation does not appear to be larger than that found between different species and hence there are neither arguments for or against sub- or neo-functionalization. Therefore, the little data there is seems to suggest that least for some neuropeptides it may be a matter of being able to produce sufficient neuropeptide rather than sub- or neo-functionalization. There are at least three other mechanisms that could increase the amount of a neuropeptide that is produced, i.e. coding more

52

J.A. Veenstra / General and Comparative Endocrinology 229 (2016) 41–55

than one copy on the same precursor. The first one concerns increasing the number of cells expressing those neuropeptides. Increasing the number of specific neurons making a particular neuropeptide may be more difficult in animals that are relatively small and perhaps because of their small size have evolved developmental programs that are fine tuned in producing exactly one member of many neuron classes, the so-called identifiable neurons (Bullock, 2000). The chance that a spontaneous mutation will change such a developmental program in order to increase the cell number of one specific neuron type seems small and this may well explain why number of specific neuroendocrine cells within the central nervous system of many arthropods is so constant. Perhaps it is the difficulty of changing cell numbers that leads to the use of the combination of larger cells and polytene chromosomes as a second mechanism to increase the amount of peptides that can be made. The third alternative to amplifying a particular neuropeptide gene in order to increase neuropeptide production is making multiple copies on the same precursor. It is interesting in this respect that many of the neuropeptide genes that are duplicated produce a single copy of each neuropeptide, whereas many of the genes that encode precursors with multiple paracopies have no paralogs. In the case of small neuropeptides sub- or neo-functionalization is often associated with salient changes of amino acids important for receptor interaction, as described here for the Strigamia corazonin gene (Fig. 6). It is plausible that the two Chelicerate myosuppressin genes evolved by sub- or neo-functionalization after duplication of the original myosuppressin gene, as the peptides produced from these two genes have clearly different molecular signatures. This is likely also the case for the YDPALKYMGLamide and PGW3XGLamide genes, and those two may or many not share a common ancestry with the allatostatin A gene. 3.23. Possible origin of the various paralogs Genes get either duplicated as the result of a whole genome duplication or by the duplication of a single gene or a larger chromosome fragment that is often incorporated locally, as is e. g. the case of the vasopressin and oxytocin genes in vertebrates, but duplicated genes can also be translocated elsewhere. Alternatively, genes may get duplicated by retrotranscription. In the latter case the duplicated genes generally lack introns. Although I cannot exclude the possibility that some of the observed gene duplications originated by reverse transcription, the number and position of introns in gene duplicates are very very similar. This makes it unlikely that retrotranscription was a major factor in the creation of the gene duplicates. In the spider and scorpion genomes many genes are duplicated and very few duplicates are found on the same contig, suggesting that this might be due to one or more whole genome duplications (Clarke et al., 2015). In vertebrates the hox genes appear to be very conserved after a whole genome duplication and after the 2 R whole duplication, mammals now have four almost complete sets of these genes. Interestingly, the genome of L. polyphemus, another Chelicerate, has two complete as well as two incomplete sets of hox genes. Together with other data this constitutes clear evidence for one or more likely two genome duplications that have been estimated to have occured 230–310 MYA and about 450–600 MYA (Nossa et al., 2014). Thus the latter could have occurred in the common ancestor of spiders, horseshoe crabs and scorpions. A recent publication similarly shows two set of hox genes in the Mesobuthus genome (Di et al., 2015). Spiders have unusual sex chromosome systems (Araujo et al., 2012). In most species these are of the X1X20 type, but several species have X1X2X3X40 and X1X2X30 systems [In the Mygalomorphs there is even a species that has a X1X2X3X4X5X6X7X8X9X10X11X12X130 system (Král et al., 2013)]. The presence of two X chromosomes may suggest a whole

genome duplication, while four X chromosomes could be consistent with two subsequent genome duplications. The existence of species within the same genus having either a large number of chromosomes, including multiple X-chromosomes, or a much small number with ‘‘only” two X-chromosomes suggest that genome amplifications in spiders is perhaps relatively common in this group. Regulatory genes are known to be preferentially retained after whole genome duplications (e.g. Maere et al., 2005). Genes coding neuropeptides and their receptors are clearly regulatory genes and the X-chromosome has obvious regulatory functions. It is perhaps for this reason that the duplicated X-chromosome is maintained after a whole genome duplication. In vertebrates both neuropeptide GPCRs and their ligands that have their origins in whole genome duplications are often present in paralogon groups (e.g. Braasch et al., 2009; Cardoso et al., 2014; Dreborg et al., 2008; Sundström et al., 2010). Evidence for a genome duplication can thus be provided by showing co-linearity of paralogs of different genes in distinct chromosome locations. However, due to the small scaffold size in Stegodyphus and the Mesobuthus genomes the possibility that one or more whole genome duplications may explain the ubiquitous presence of these paralog genes cannot be tested. Nevertheless, the only local gene duplications that have been found concern those neuropeptide genes, i.e. allatostatin C, CHH, NPF and insulin, that have been multiplied to a much higher degree in these and other protostomes. If local gene duplication was the major mechanism behind the occurrence of so many paralog neuropeptide genes, there should have been at least a few examples of this in those genes that were duplicated only once, at least in the Stegodyphus genome, where the scaffold size is sufficiently large. It is for these reasons that I think that the most likely explanation for the origin of the numerous paralog genes is one or more whole genome duplications and may explain why about 45 and 65% of the ligand and receptor genes respectively have a duplicate gene in at least one Chelicerate genome. 3.24. Possible implications for the evolution of the insect neuropeptidome Some insect genes coding neuropeptides or their receptors have been duplicated relatively recently, such as e.g. the allatostain C receptors in Drosophila or the SIFamide gene paralogs in Lepidoptera (Roller et al., 2008). Others, such as the eclosion hormone genes (see SSection 3.7) or the different insect genes coding NPF (e.g. Veenstra, 2014) must be the evolutionary remnants of gene duplications in an early ancestor. Such a gene duplication probably also explains the existence of the distinct periviscerokinin and pyrokinin genes in insects. The latter two genes not only produce very similar peptides but are also expressed in serially homologous neuroendocrine cells (e.g. Kean et al., 2002). The insect paralog receptors for SIFamide and NPF also appear to be very old and such an ancient origin has previously been postulated for the CNMa GPCRs in insects (Jung et al., 2014). If a whole genome duplication occurred in an ancient arthropod ancestor, it is possible that it is responsible for some of these gene duplications. 3.25. Bilaterian neuroepeptide evolution An intriguing aspect of neuropeptide evolution is that in the first 300 to 500 million years since the arrival of the first metazoan a very large number of neuropeptides and receptors must have evolved, as their homologs are now present in both deuterostomes and protostomes (Mirabeau and Joly, 2013), which are believed to have split before 550 MYA. Sub- and/or neo-functionalization of duplicated genes is the only reasonable explanation for the wide variety of GPCRs in existence and there is little doubt that this is also true for a significant proportion of neuropeptides, even though

J.A. Veenstra / General and Comparative Endocrinology 229 (2016) 41–55

it can not be excluded that some neuropeptides might evolve de novo (see e.g. Veenstra, 2014). Ancient genome duplications are hard to prove as demonstrated by the history of the vertebrate 2R duplication. It seems reasonable to assume that in hermaphrodites that do not have sex chromosomes, genome duplications occur more easily. Indeed, it seems fairly easy to produce experimentally tetraploid mollusks (Beaumont and Fairbrother, 1991). If whole genome duplications happened at least twice in Limulus as well as in the early vertebrate lineages it is possible that in early metazoans a succession of genome duplications may have facilitated the evolution of the large variety of metazoan neuropeptides and their receptors. Acknowledgments I thank all those who generated all the data that I used here as well as those that wrote the programs that I needed to do so, nam nihil proprium est. Constructive comments from two reviewers are also gratefully acknowledged, as is institutional funding from the CNRS. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.ygcen.2015.11. 019. References Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J., 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res. 25, 3389–3402. Araujo, D., Schneider, M.C., Paula-Neto, E., Cella, D.M., 2012. Sex chromosomes and meiosis in spiders: a review. In: Swan, A. (Ed.), Meiosis – Molecular Mechanisms and Cytogenetic Diversity. Intech Europe, Rijeka, Croatia, pp. 87–108. Audsley, N., Vandersmissen, H.P., Weaver, R., Dani, P., Matthews, J., Down, R., Vuerinckx, K., Kim, Y.J., Vanden Broeck, J., 2013. Characterisation and tissue distribution of the PISCF allatostatin receptor in the red flour beetle Tribolium castaneum. Insect Biochem. Mol. Biol. 43, 65–74. Bauknecht, P., Jékely, G., 2015. Large-scale combinatorial deorphanization of platynereis neuropeptide GPCRs. Cell Rep. 12, 684–693. http://dx.doi.org/ 10.1016/j.celrep.2015.06.052. Beaumont, A.R., Fairbrother, J.E., 1991. Ploidy manipulation in molluscan shellfish: a review. J. Shellfish Res. 10, 1–18. Bendtsen, J.D., Nielsen, H., von Heijne, G., Brunak, S., 2004. Improved prediction of signal peptides: signalP 3.0. J. Mol. Biol. 340, 783–795. Bissinger, B.W., Donohue, K.V., Khalil, S.M., Grozinger, C.M., Sonenshine, D.E., Zhu, J., Roe, R.M., 2011. Synganglion transcriptome and developmental global gene expression in adult females of the American dog tick, Dermacentor variabilis (Acari: Ixodidae). Insect Mol. Biol. 20, 465–491. Braasch, I., Volff, J.N., Schartl, M., 2009. The endothelin system: evolution of vertebrate-specific ligand–receptor interactions by three rounds of genome duplication. Mol. Biol. Evol. 26, 783–799. Bullock, T.H., 2000. Revisiting the concept of identifiable neurons. Brain Behav. Evol. 55, 236–240. Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., Madden, T.L., 2009. BLAST+: architecture and applications. BMC Bioinformatics 10, 421. Cao, Z., Yu, Y., Wu, Y., Hao, P., Di, Z., He, Y., Chen, Z., Yang, W., Shen, Z., He, X., Sheng, J., Xu, X., Pan, B., Feng, J., Yang, X., Hong, W., Zhao, W., Li, Z., Huang, K., Li, T., Kong, Y., Liu, H., Jiang, D., Zhang, B., Hu, J., Hu, Y., Wang, B., Dai, J., Yuan, B., Feng, Y., Huang, W., Xing, X., Zhao, G., Li, X., Li, Y., Li, W., 2013. The genome of Mesobuthus martensii reveals a unique adaptation model of Arthropods. Nat. Commun. 4, 2602. Cardoso, J.C., Félix, R.C., Bergqvist, C.A., Larhammar, D., 2014. New insights into the evolution of vertebrate CRH (corticotropin-releasing hormone) and invertebrate DH44 (diuretic hormone 44) receptors in metazoans. Gen. Comp. Endocrinol. 209, 162–170. Cazzamali, G., Grimmelikhuijzen, C.J.P., 2002. Molecular cloning and functional expression of the first insect FMRFamide receptor. Proc. Natl. Acad. Sci. USA 99, 12073–12078. Cazzamali, G., Torp, M., Hauser, F., Williamson, M., Grimmelikhuijzen, C.J.P., 2005. The Drosophila gene CG9918 codes for a pyrokinin-1 receptor. Biochem. Biophys. Res. Commun. 335, 14–19. Chan, T.F., Ji, K.M., Yim, A.K., Liu, X.Y., Zhou, J.W., Li, R.Q., Yang, K.Y., Li, J., Li, M., Law, P.T., Wu, Y.L., Cai, Z.L., Qin, H., Bao, Y., Leung, R.K., Ng, P.K., Zou, J., Zhong, X.J., Ran, P.X., Zhong, N.S., Liu, Z.G., Tsui, S.K., 2015. The draft genome, transcriptome,

53

and microbiome of Dermatophagoides farinae reveal a broad spectrum of dust mite allergens. J. Allergy Clin. Immunol. 135, 539–548. Chang, J.C., Yang, R.B., Adams, M.E., Lu, K.H., 2009. Receptor guanylyl cyclases in Inka cells targeted by eclosion hormone. Proc. Natl. Acad. Sci. USA 106, 13371– 13376. Chipman, A.D., Ferrier, D.E., Brena, C., Qu, J., Hughes, D.S., Schröder, R., Torres-Oliva, M., Znassi, N., Jiang, H., Almeida, F.C., Alonso, C.R., Apostolou, Z., Aqrawi, P., Arthur, W., Barna, J.C., Blankenburg, K.P., Brites, D., Capella-Gutiérrez, S., Coyle, M., Dearden, P.K., Du Pasquier, L., Duncan, E.J., Ebert, D., Eibner, C., Erikson, G., Evans, P.D., Extavour, C.G., Francisco, L., Gabaldón, T., Gillis, W.J., Goodwin-Horn, E.A., Green, J.E., Griffiths-Jones, S., Grimmelikhuijzen, C.J.P., Gubbala, S., Guigó, R., Han, Y., Hauser, F., Havlak, P., Hayden, L., Helbing, S., Holder, M., Hui, J.H., Hunn, J.P., Hunnekuhl, V.S., Jackson, L., Javaid, M., Jhangiani, S.N., Jiggins, F.M., Jones, T.E., Kaiser, T.S., Kalra, D., Kenny, N.J., Korchina, V., Kovar, C.L., Kraus, F.B., Lapraz, F., Lee, S.L., Lv, J., Mandapat, C., Manning, G., Mariotti, M., Mata, R., Mathew, T., Neumann, T., Newsham, I., Ngo, D.N., Ninova, M., Okwuonu, G., Ongeri, F., Palmer, W.J., Patil, S., Patraquim, P., Pham, C., Pu, L.L., Putman, N.H., Rabouille, C., Ramos, O.M., Rhodes, A.C., Robertson, H.E., Robertson, H.M., Ronshaugen, M., Rozas, J., Saada, N., Sánchez-Gracia, A., Scherer, S.E., Schurko, A. M., Siggens, K.W., Simmons, D., Stief, A., Stolle, E., Telford, M.J., Tessmar-Raible, K., Thornton, R., van der Zee, M., von Haeseler, A., Williams, J.M., Willis, J.H., Wu, Y., Zou, X., Lawson, D., Muzny, D.M., Worley, K.C., Gibbs, R.A., Akam, M., Richards, S., 2014. The first Myriapod genome sequence reveals conservative Arthropod gene content and genome organisation in the centipede Strigamia maritima. PLoS Biol. 12, e1002005. Christie, A.E., 2015. In silico characterization of the neuropeptidome of the Western black widow spider Latrodectus hesperus. Gen. Comp. Endocrinol. 210, 63–80. Clarke, T.H., Garb, J.E., Hayashi, C.Y., Haney, R.A., Lancaster, A.K., Corbett, S., Ayoub, N.A., 2014. Multi-tissue transcriptomics of the black widow spider reveals expansions, co-options, and functional processes of the silk gland gene toolkit. BMC Genomics 15, 365. Clarke, T.H., Garb, J.E., Hayashi, C.Y., Arensburger, P., Ayoub, N.A., 2015. Spider transcriptomes identify ancient large-scale gene duplication event potentially important in silk gland evolution. Genome Biol. Evol. 7, 1856–1870. Conzelmann, M., Williams, E.A., Krug, K., Franz-Wachtel, M., Macek, B., Jékely, G., 2013. The neuropeptide complement of the marine annelid Platynereis dumerilii. BMC Genomics 14, 906. Di, Z., Yu, Y., Wu, Y., Hao, P., He, Y., Zhao, H., Li, Y., Zhao, G., Li, X., Li, W., Cao, Z., 2015. Genome-wide analysis of homeobox genes from Mesobuthus martensii reveals Hox gene duplication in scorpions. Insect Biochem. Mol. Biol. 61, 25–33. Dircksen, H., Burdzik, S., Sauter, A., Keller, R., 2000. Two orcokinins and the novel octapeptide orcomyotropin in the hindgut of the crayfish Orconectes limosus: identified myostimulatory neuropeptides originating together in neurones of the terminal abdominal ganglion. J. Exp. Biol. 203, 2807–2818. Dircksen, H., Böcking, D., Heyn, U., Mandel, C., Chung, J.S., Baggerman, G., Verhaert, P., Daufeldt, S., Plösch, T., Jaros, P.P., Waelkens, E., Keller, R., Webster, S.G., 2001. Crustacean hyperglycaemic hormone (CHH)-like peptides and CHH-precursorrelated peptides from pericardial organ neurosecretory cells in the shore crab, Carcinus maenas, are putatively spliced and modified products of multiple genes. Biochem. J. 356, 159–170. Dircksen, H., Neupert, S., Predel, R., Verleyen, P., Huybrechts, J., Strauss, J., Hauser, F., Stafflinger, E., Schneider, M., Pauwels, K., Schoofs, L., Grimmelikhuijzen, C.J.P., 2011. Genomics, transcriptomics, and peptidomics of Daphnia pulex neuropeptides and protein hormones. J. Proteome Res. 10, 4478–4504. Donohue, K.V., Khalil, S.M., Ross, E., Grozinger, C.M., Sonenshine, D.E., Roe, R.M., 2010. Neuropeptide signaling sequences identified by pyrosequencing of the American dog tick synganglion transcriptome during blood feeding and reproduction. Insect Biochem. Mol. Biol. 40, 79–90. Dreborg, S., Sundström, G., Larsson, T.A., Larhammar, D., 2008. Evolution of vertebrate opioid receptors. Proc. Natl. Acad. Sci. USA 105, 15487–15492. Egekwu, N., Sonenshine, D.E., Bissinger, B.W., Roe, R.M., 2014. Transcriptome of the female synganglion of the black-legged tick Ixodes scapularis (Acari: Ixodidae) with comparison between Illumina and 454 systems. PLoS ONE 9, e102667. Egerod, K., Reynisson, E., Hauser, F., Cazzamali, G., Williamson, M., Grimmelikhuijzen, C.J.P., 2003. Molecular cloning and functional expression of the first two specific insect myosuppressin receptors. Proc. Natl. Acad. Sci. USA 100, 9808–9813. Elphick, M.R., Mirabeau, O., 2014. The evolution and variety of RFamide-type neuropeptides: Insights from Deuterostomian invertebrates. Front. Endocrinol. 5, 93. Garelli, A., Heredia, F., Casimiro, A.P., Macedo, A., Nunes, C., Koyama, T., Gontijo, A. M., 2015. Dilp8 requires the neuronal relaxin receptor Lgr3 to couple growth to developmental timing. BioRxiv. http://dx.doi.org/10.1101/017053. Gouy, M., Guindon, S., Gascuel, O., 2010. SeaView version 4: a multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Mol. Biol. Evol. 27, 221–224. Grbic´, M., Van Leeuwen, T., Clark, R.M., Rombauts, S., Rouzé, P., Grbic´, V., Osborne, E. J., Dermauw, W., Cao, P., Ngoc, T., Ortego, T., Hernandez-Crespo, P., Diaz, I., Martinez, M., Navajas, M., Sucena, E., Magalhães, S., Nagy, L., Pace, R., Djuranovic´, S., Smagghe, G., Iga, M., Christiaens, O., Veenstra, J.A., Ewer, J., Mancilla Villalobos, R., Hutter, J.L., Hudson, S.D., Velez, M., Yi, S., Zeng, J., PiresdaSilva, A., Roch, F., Cazauz, M., Navarro, M., Zhurov, V., Acevedo, G., Bjelica, A., Fawcetts, J.A., Bonnets, E., Martens, C., Baele, G., Wissler, L., Sanchez-Rodriguez, A., Tirry, L., Blais, C., Demeestere, K., Henz, S., Gregory, R., Mathieu, J., Verdon, L., Farinelli, L., Schmutz, J., Lindquist, E., Feyereisen, R., Van de Peer, Y., 2011. The genome of Tetranychus urticae reveals herbivorous pest adaptations. Nature 479, 487–492.

54

J.A. Veenstra / General and Comparative Endocrinology 229 (2016) 41–55

Gu, P.L., Chan, S.M., 1998. The shrimp hyperglycemic hormone-like neuropeptide is encoded by multiple copies of genes arranged in a cluster. FEBS Lett. 441, 397– 403. Guindon, S., Gascuel, O., 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. System. Biol. 52, 696-670. Haas, B.J., Papanicolaou, A., Yassour, M., Grabherr, M., Blood, P.D., Bowden, J., Couger, M.B., Eccles, D., Li, B., Lieber, M., Macmanes, M.D., Ott, M., Orvis, J., Pochet, N., Strozzi, F., Weeks, N., Westerman, R., William, T., Dewey, C.N., Henschel, R., Leduc, R.D., Friedman, N., Regev, A., 2013. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 8, 1494–1512. Hansen, K.K., Stafflinger, E., Schneider, M., Hauser, F., Cazzamali, G., Williamson, M., Kollmann, M., Schachtner, J., Grimmelikhuijzen, C.J.P., 2010. Discovery of a novel insect neuropeptide signaling system closely related to the insect adipokinetic hormone and corazonin hormonal systems. J. Biol. Chem. 285, 10736–10747. Hauser, F., Grimmelikhuijzen, C.J.P., 2014. Evolution of the AKH/corazonin/ACP/ GnRH receptor superfamily and their ligands in the Protostomia. Gen. Comp. Endocrinol. 209, 35–49. Hauser, F., Williamson, M., Cazzamali, G., Grimmelikhuijzen, C.J.P., 2006a. Identifying neuropeptide and protein hormone receptors in Drosophila melanogaster by exploiting genomic data. Brief. Funct. Genomic Proteomic 4, 321–330. Hauser, F., Cazzamali, G., Williamson, M., Blenau, W., Grimmelikhuijzen, C.J.P., 2006b. A review of neurohormone GPCRs present in the fruitfly Drosophila melanogaster and the honey bee Apis mellifera. Prog. Neurobiol. 80, 1–19. Hauser, F., Cazzamali, G., Williamson, M., Park, Y., Li, B., Tanaka, Y., Predel, R., Neupert, S., Schachtner, J., Verleyen, P., Grimmelikhuijzen, C.J.P., 2008. A genome-wide inventory of neurohormone GPCRs in the red flour beetle Tribolium castaneum. Front. Neuroendocrinol. 29, 142–165. Hofer, S., Dircksen, H., Tollbäck, P., Homberg, U., 2005. Novel insect orcokinins: characterization and neuronal distribution in the brains of selected dicondylian insects. J. Comp. Neurol. 490, 57–71. Hoffmann, F.G., Opazo, J.C., Storz, J.F., 2012. Whole-genome duplications spurred the functional diversification of the globin gene superfamily in vertebrates. Mol. Biol. Evol. 29, 303–312. Homma, T., Watanabe, K., Tsurumaru, S., Kataoka, H., Imai, K., Kamba, M., Niimi, T., Yamashita, O., Yaginuma, T., 2006. G protein-coupled receptor for diapause hormone, an inducer of Bombyx embryonic diapause. Biochem. Biophys. Res. Commun. 344, 386–393. Hummon, A.B., Richmond, T.A., Verleyen, P., Baggerman, G., Huybrechts, J., Ewing, M.A., et al., 2006. From the genome to the proteome: uncovering peptides in the Apis brain. Science 314, 647–649. Ida, T., Takahashi, T., Tominaga, H., Sato, T., Kume, K., Yoshizawa-Kumagaye, K., Nishio, H., Kato, J., Murakami, N., Miyazato, M., Kangawa, K., Kojima, M., 2011. Identification of the endogenous cysteine-rich peptide trissin, a ligand for an orphan G protein-coupled receptor in Drosophila. Biochem. Biophys. Res. Commun. 414, 44–48. Iversen, A., Cazzamali, G., Williamson, M., Hauser, F., Grimmelikhuijzen, C.J.P., 2002. Molecular cloning and functional expression of a Drosophila receptor for the neuropeptides capa-1 and -2. Biochem. Biophys. Res. Commun. 299, 628– 633. Jiang, H., Lkhagva, A., Daubnerová, I., Chae, H.S., Šimo, L., Jung, S.H., Yoon, Y.K., Lee, N.R., Seong, J.Y., Zˇitnˇan, D., Park, Y., Kim, Y.J., 2013. Natalisin, a tachykinin-like signaling system, regulates sexual activity and fecundity in insects. Proc. Natl. Acad. Sci. USA 110, E3526-3534. Jiang, H., Wei, Z., Nachman, R.J., Adams, M.E., Park, Y., 2014. Functional phylogenetics reveals contributions of pleiotropic peptide action to ligand– receptor coevolution. Sci. Rep. 4, 6800. Jiang, H., Kim, H.G., Park, Y., 2015. Alternatively spliced orcokinin isoforms and their functions in Tribolium castaneum. Insect Biochem. Mol. Biol. 30 (65), 1–9. Johnson, J.I., Kavanaugh, S.I., Nguyen, C., Tsai, P.S., 2014. Localization and functional characterization of a novel adipokinetic hormone in the mollusk Aplysia californica. PLoS One 9, e106014. Jung, S.H., Lee, J.H., Chae, H.S., Seong, J.Y., Park, Y., Park, Z.Y., Kim, Y.J., 2014. Identification of a novel insect neuropeptide, CNMa and its receptor. FEBS Lett. 588, 2037–2041. Kamatani, Y., Minakata, H., Kenny, P.T.M., Iwashita, T., Watanabe, K., Funase, K., Sun, X.P., Yongsiri, A., Kim, K.H., Novales-Li, P., Novales, E.T., Kanapi, C.G., Takeuchi, H., Nomoto, K., 1989. Achatin-I, an endogenous neuroexcitatory tetrapeptide from Achatina fulica Férussac containing a D-amino acid residue. Biochem. Biophys. Res. Commun. 160, 1015–1020. Kean, L., Cazenave, W., Costes, L., Broderick, K.E., Graham, S., Pollock, V.P., Davies, S. A., Veenstra, J.A., Dow, J.A., 2002. Two nitridergic peptides are encoded by the gene capability in Drosophila melanogaster. Am. J. Physiol. Regul. Integr. Comp. Physiol. 282, R1297–R1307. Kim, Y.J., Bartalska, K., Audsley, N., Yamanaka, N., Yapici, N., Lee, J.Y., Kim, Y.C., Markovic, M., Isaac, E., Tanaka, Y., Dickson, B.J., 2010. MIPs are ancestral ligands for the sex peptide receptor. Proc. Natl. Acad. Sci. USA 1070, 6520–6525. Král, J.Í., Korˇínková, T., Krkavcová, L., Musilová, J., Forman, M., Ivalú, M., Herrera, Á., Haddad, C.R., Vítková, M., Henriques, S., Palacios Varga, J.G., Hedin, M., 2013. Evolution of karyotype, sex chromosomes, and meiosis in mygalomorph spiders (Araneae: Mygalomorphae). Biol. J. Linn. Soc. 109, 377–408. Krogh, A., Larsson, B., von Heijne, G., Sonnhammer, E.L., 2001. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305, 567–580.

Linneweber, G.A., Jacobson, J., Busch, K.E., Hudry, B., Christov, C.P., Dormann, D., Yuan, M., Otani, T., Knust, E., de Bono, M., Miguel-Aliaga, I., 2014. Neuronal control of metabolism through nutrient-dependent modulation of tracheal branching. Cell 156, 69–83. Lovejoy, D.A., Chang, B.S., Lovejoy, N.R., del Castillo, J., 2014. Molecular evolution of GPCRs: CRH/CRH receptors. J. Mol. Endocrinol. 52, T43–T60. Maere, S., De Bodt, S., Raes, J., Casneuf, T., Van Montagu, M., Kuiper, M., Van de Peer, Y., 2005. Modeling gene and genome duplications in eukaryotes. Proc. Natl. Acad. Sci. USA 102, 5454–5459. Mattila, T.M., Bechsgaard, J.S., Hansen, T.T., Schierup, M.H., Bilde, T., 2012. Orthologous genes identified by transcriptome sequencing in the spider genus Stegodyphus. BMC Genomics 13, 70. Meeusen, T., Mertens, I., Clynen, E., Baggerman, G., Nichols, R., Nachman, R.J., Huybrechts, R., De Loof, A., Schoofs, L., 2002. Identification in Drosophila melanogaster of the invertebrate G protein-coupled FMRFamide receptor. Proc. Natl. Acad. Sci. USA 99, 15363–15368. Mirabeau, O., Joly, J.S., 2013. Molecular evolution of peptidergic signaling systems in bilaterians. Proc. Natl. Acad. Sci. USA 110, E2028–E2037. Mizoguchi, A., Okamoto, N., 2013. Insulin-like and IGF-like peptides in the silkmoth Bombyx mori: discovery, structure, secretion, and function. Front. Physiol. 4, 217. Nagai, C., Mabashi-Asazuma, H., Nagasawa, H., Nagata, S., 2014. Identification and characterization of receptors for ion transport peptide (ITP) and ITP-like (ITPL) in the silkworm Bombyx mori. J. Biol. Chem. 289, 32166–32177. Nässel, D.R., Liu, Y., Luo, J., 2015. Insulin/IGF signaling and its regulation in Drosophila. Gen. Comp. Endocrinol. 221, 255–266. Nossa, C.W., Havlak, P., Yue, J.X., Lv, J., Vincent, K.Y., Brockmann, H.J., Putnam, N.H., 2014. Joint assembly and genetic mapping of the Atlantic horseshoe crab genome reveals ancient whole genome duplication. Gigascience 3, 9. Paluzzi, J.P., O’Donnell, M.J., 2012. Identification, spatial expression analysis and functional characterization of a pyrokinin-1 receptor in the Chagas’ disease vector, Rhodnius prolixus. Mol. Cell. Endocrinol. 363, 36–45. Paluzzi, J.P., Park, Y., Nachman, R.J., Orchard, I., 2010. Isolation, expression analysis, and functional characterization of the first antidiuretic hormone receptor in insects. Proc. Natl. Acad. Sci. USA 107, 10290–10295. Park, Y., Zitnan, D., Gill, S.S., Adams, M.E., 1999. Molecular cloning and biological activity of ecdysis-triggering hormones in Drosophila melanogaster. FEBS Lett. 463, 133–138. Pascual, N., Castresana, J., Valero, M.L., Andreu, D., Belles, X., 2004. Orcokinins in insects and other invertebrates. Insect Biochem. Mol. Biol. 34, 1141–1146. Pescatori, M., Bradbury, A., Bouet, F., Gargano, N., Mastrogiacomo, A., Grasso, A., 1995. The cloning of a cDNA encoding a protein (latrodectin) which co-purifies with the alpha-latrotoxin from the black widow spider Latrodectus tredecimguttatus (Theridiidae). Eur. J. Biochem. 230, 322–328. Petersen, T.N., Brunak, S., von Heijne, G., Nielsen, H., 2011. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat. Methods 8, 785–786. Posnien, N., Zeng, V., Schwager, E.E., Pechmann, M., Hilbrant, M., Keefe, J.D., Damen, W.G., Prpic, N.M., McGregor, A.P., Extavour, C.G., 2014. A comprehensive reference transcriptome resource for the common house spider Parasteatoda tepidariorum. PLoS ONE 9, e104885. Price, M.N., Dehal, P.S., Arkin, A.P., 2010. FastTree 2 – Approximately maximumlikelihood trees for large alignments. PLoS ONE 5, e9490. Rémy, C., Girardie, J., 1980. Anatomical organization of two vasopressin– neurophysin-like neurosecretory cells throughout the central nervous system of the migratory locust. Gen. Comp. Endocrinol. 40, 27–35. Richards, S., Murali, S.C., 2015. Best practices in insect genome sequencing: what works and what doesn’t. Curr. Opinion Insect Sci. 7, 1–7. Roch, G.J., Busby, E.R., Sherwood, N.M., 2011. Evolution of GnRH: diving deeper. Gen. Comp. Endocrinol. 171, 1–16. Roller, L., Yamanaka, N., Watanabe, K., Daubnerová, I., Zˇitnˇan, D., Kataoka, H., Tanaka, Y., 2008. The unique evolution of neuropeptide genes in the silkworm Bombyx mori. Insect Biochem. Mol. Biol. 38, 1147–1157. Rosenkilde, C., Cazzamali, G., Williamson, M., Hauser, F., Søndergaard, L., DeLotto, R., Grimmelikhuijzen, C.J.P., 2003. Molecular cloning, functional expression, and gene silencing of two Drosophila receptors for the Drosophila neuropeptide pyrokinin-2. Biochem. Biophys. Res. Commun. 309, 485–494. Rutherford, K., Parkhill, J., Crook, J., Horsnell, T., Rice, P., Rajandream, M.A., Barrell, B., 2000. Artemis: sequence visualization and annotation. Bioinformatics 16, 944– 945. Sanggaard, K.W., Bechsgaard, J.S., Fang, X., Duan, J., Dyrlund, T.F., Gupta, V., Jiang, X., Cheng, L., Fan, D., Feng, Y., Han, L., Huang, Z., Wu, Z., Liao, L., Settepani, V., Thøgersen, I.B., Vanthournout, B., Wang, T., Zhu, Y., Funch, P., Enghild, J.J., Schauser, L., Andersen, S.U., Villesen, P., Schierup, M.H., Bilde, T., Wang, J., 2014. Spider genomes provide insight into composition and evolution of venom and silk. Nat Commun. 5, 3765. Sharma, P.P., Kaluziak, S.T., Pérez-Porro, A.R., González, V.L., Hormiga, G., Wheeler, W.C., Giribet, G., 2014. Phylogenomic interrogation of arachnida reveals systemic conflicts in phylogenetic signal. Mol. Biol. Evol. 31, 2963–2984. Sievers, F., Wilm, A., Dineen, D., Gibson, T.J., Karplus, K., Li, W., Lopez, R., McWilliam, H., Remmert, M., Söding, J., Thompson, J.D., Higgins, D.G., 2011. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7, 539. Simakov, O., Marletaz, F., Cho, S.J., Edsinger-Gonzales, E., Havlak, P., Hellsten, U., Kuo, D.H., Larsson, T., Lv, J., Arendt, D., Savage, R., Osoegawa, K., de Jong, P., Grimwood, J., Chapman, J.A., Shapiro, H., Aerts, A., Otillar, R.P., Terry, A.Y., Boore,

J.A. Veenstra / General and Comparative Endocrinology 229 (2016) 41–55 J.L., Grigoriev, I.V., Lindberg, D.R., Seaver, E.C., Weisblat, D.A., Putnam, N.H., Rokhsar, D.S., 2008. Insights into bilaterian evolution from three spiralian genomes. Nature 493, 526–531. Šimo, L.1., Park, Y., 2014. Neuropeptidergic control of the hindgut in the blacklegged tick Ixodes scapularis. Int. J. Parasitol. 44, 819–826. Stangier, J., Hilbich, C., Burdzik, S., Keller, R., 1992. Orcokinin: a novelmyotropic peptide from the nervous system of the crayfish, Orconectes limosus. Peptides 13, 859–864. Sterkel, M., Oliveira, P.L., Urlaub, H., Hernandez-Martinez, S., Rivera-Pomar, R., Ons, S., 2012. OKB, a novel family of brain-gut neuropeptides from insects. Insect Biochem. Mol. Biol. 42, 466–473. Stewart, M.J., Favrel, P., Rotgans, B.A., Wang, T., Zhao, M., Sohail, M., O’Connor, W.A., Elizur, A., Henry, J., Cummins, S.F., 2014. Neuropeptides encoded by the genomes of the Akoya pearl oyster Pinctata fucata and Pacific oyster Crassostrea gigas: a bioinformatic and peptidomic survey. BMC Genomics 15, 840. Sundström, G., Dreborg, S., Larhammar, D., 2010. Concomitant duplications of opioid peptide and receptor genes before the origin of jawed vertebrates. PLoS ONE 5, e10512. Tanaka, Y., Suetsugu, Y., Yamamoto, K., Noda, H., Shinoda, T., 2014. Transcriptome analysis of neuropeptides and G-protein coupled receptors (GPCRs) for neuropeptides in the brown planthopper Nilaparvata lugens. Peptides 53, 125–133. Taussig, R., Kaldany, R.R., Scheller, R.H., 1984. A cDNA clone encoding neuropeptides isolated from Aplysia neuron L11. Proc. Natl. Acad. Sci. USA 81, 4988–4992. Tyrer, N.M., Davis, N.T., Arbas, E.A., Thompson, K.S., Bacon, J.P., 1993. Morphology of the vasopressin-like immunoreactive (VPLI) neurons in many species of grasshopper. J. Comp. Neurol. 329, 385–401. Veenstra, J.A., 1989. Isolation and structure of corazonin, a cardioactive peptide from the American cockroach. FEBS Lett. 250, 231–234. Veenstra, J.A., 2000. Mono- and dibasic proteolytic cleavage sites in insect neuroendocrine peptide precursors. Arch. Insect Biochem. Physiol. 43, 49–63.

55

Veenstra, J.A., 2009a. Allatostatin C and its paralog allatostatin double C: the Arthropod somatostatins. Insect Biochem. Mol. Biol. 39, 161–170. Veenstra, J.A., 2009b. Does corazonin signal nutritional stress in insects? Insect Biochem. Mol. Biol. 39, 755–762. Veenstra, J.A., 2010. Neurohormones and neuropeptides encoded by the genome of Lottia gigantea, with reference to other mollusks and insects. Gen. Comp. Endocrinol. 167, 86–103. Veenstra, J.A., 2011. Neuropeptide evolution: neurohormones and neuropeptides predicted from the genomes of Capitella teleta and Helobdella robusta. Gen. Comp. Endocrinol. 171, 160–175. Veenstra, J.A., 2014. The contribution of the genomes of a termite and a locust to our understanding of insect neuropeptides and neurohormones. Front. Physiol. 5, 454. Veenstra, J.A., 2015. The power of next-generation sequencing as illustrated by the neuropeptidome of the crayfish Procambarus clarkii. Gen. Comp. Endocrinol. 224, 84–95. Veenstra, J.A., Rombauts, S., Grbic´, M., 2012. In silico cloning of genes encoding neuropeptides, neurohormones and their putative G-protein coupled receptors in a spider mite. Insect Biochem. Mol. Biol. 42, 277–295. Vogel, K.J., Brown, M.R., Strand, M.R., 2015. Ovary ecdysteroidogenic hormone requires a receptor tyrosine kinase to activate egg formation in the mosquito Aedes aegypti. Proc. Natl. Acad. Sci. USA 112, 5057–5062. Yamanaka, N., Zitnan, D., Kim, Y.J., Adams, M.E., Hua, Y.J., Suzuki, Y., Suzuki, M., Suzuki, A., Satake, H., Mizoguchi, A., Asaoka, K., Tanaka, Y., Kataoka, H., 2006. Regulation of insect steroid hormone biosynthesis by innervating peptidergic neurons. Proc. Natl. Acad. Sci. USA 103, 8622–8627. Yamanaka, N., Yamamoto, S., Zˇitnˇan, D., Watanabe, K., Kawada, T., Satake, H., Kaneko, Y., Hiruma, K., Tanaka, Y., Shinoda, T., Kataoka, H., 2008. Neuropeptide receptor transcriptome reveals unidentified neuroendocrine pathways. PLoS ONE 3, e3048.