Neuropeptide evolution: Neurohormones and neuropeptides predicted from the genomes of Capitella teleta and Helobdella robusta

Neuropeptide evolution: Neurohormones and neuropeptides predicted from the genomes of Capitella teleta and Helobdella robusta

General and Comparative Endocrinology 171 (2011) 160–175 Contents lists available at ScienceDirect General and Comparative Endocrinology journal hom...

4MB Sizes 9 Downloads 48 Views

General and Comparative Endocrinology 171 (2011) 160–175

Contents lists available at ScienceDirect

General and Comparative Endocrinology journal homepage: www.elsevier.com/locate/ygcen

Neuropeptide evolution: Neurohormones and neuropeptides predicted from the genomes of Capitella teleta and Helobdella robusta Jan A. Veenstra ⇑ Université de Bordeaux, INCIA UMR 5287 CNRS, 33400 Talence, France

a r t i c l e

i n f o

Article history: Received 12 October 2010 Revised 4 January 2011 Accepted 10 January 2011 Available online 15 January 2011 Keywords: Glycoprotein hormone Insulin Conopressin NPY Allatostatin Achatin Allatotropin RGWamide FMRFamide Luqin Tachykinin Alvinella Hirudo Pomatoceros

a b s t r a c t Genes encoding neurohormones and neuropeptide precursors were identified in the genomes of two annelids, the leech Helobdella robusta and the polychaete worm Capitella teleta. Although no neuropeptides have been identified from these two species and relatively few neuropeptides from annelids in general, 43 and 35 such genes were found in Capitella and Helobdella, respectively. The predicted peptidomes of these two species are similar to one another and also similar to those of mollusks, particular in the case of Capitella. Helobdella seems to have less neuropeptide genes than Capitella and it lacks the glycoprotein hormones bursicon and GPA2/GPB5; in both cases the genes coding the two subunits as well as the genes coding their receptors are absent from its genome. In Helobdella several neuropeptide genes are duplicated, thus it has five NPY genes, including one pseudogene, as well as four genes coding Wwamides (allatostatin B). Genes coding achatin, allatotropin, allatostatin C, conopressin, FFamide, FLamide, FMRFamide, GGRFamide, GnRH, myomodulin, NPY, pedal peptides, RGWamide (a likely APGWamide homolog), RXDLamide, VR(F/I)amide, WWamide were found in both species, while genes coding cerebrin, elevenin, GGNG, LFRWamide, LRFYamide, luqin, lymnokinin and tachykinin were only found in Capitella. Ó 2011 Elsevier Inc. All rights reserved.

1. Introduction Neuropeptides and neurohormones are often the master regulators of physiological processes. It appears that they evolved very early, since Cnidarians already have a variety of neuropeptides, while they lack a complex nervous system [15]. It is possible that the most primitive nervous systems arose by the physical association and physiological interaction of different cell types producing regulatory peptides. This might explain why some relatively simple nervous systems, such as those in mollusks and nematodes are particularly rich in peptidergic neurons. It might also explain why the same genes that govern the early differentiation of the central nervous system, also do so in gut endocrine cells in both vertebrates and insects [18]. Thus the study of the physiological significance of neuropeptides both within and outside the nervous system in relatively simple animals may be helpful in understanding how central nervous systems evolved into more complex ones, like our own. ⇑ Address: INCIA UMR 5287 CNRS, Université Bordeaux I, Avenue des Facultés, 33405 Talence Cedex, France. Fax: +33 540 008 743. E-mail address: [email protected] 0016-6480/$ - see front matter Ó 2011 Elsevier Inc. All rights reserved. doi:10.1016/j.ygcen.2011.01.005

As a first approach to understanding neuropeptide evolution, it is useful to identify the neuropeptides themselves. Complete genome sequences are useful in this respect, as they allow a first glimpse of which neuropeptides may be produced. In a previous paper I analyzed the genes coding neuropeptides and neurohormones in the mollusk Lottia gigantea [57]. The results showed some interesting findings, notably the presence of three types of insulinrelated hormones, as well as the presence of bursicon, a hormone typically associated with arthropods, and for which the function in mollusks remains unclear. Comparison of the Lottia genome with that of the various sequenced insect genomes showed that mollusks and insects share a significant number of neuropeptide genes. This similarity between the predicted molluscan and arthropod peptidomes suggested that it might be possible to analyze the neuropeptide genes of other protostomians such as annelids by using the sequences of molluscan and arthropod neuropeptides as search motifs, even while few neuropeptides have actually been identified from this group. The availability of the assembled genomes of Capitella teleta and Helobdella robusta makes it possible to test this hypothesis as well as to compare neuropeptide evolution in lophotrochozoans with that in arthropods, where several, mostly

J.A. Veenstra / General and Comparative Endocrinology 171 (2011) 160–175

insect, genomes have been sequenced and analyzed for neuropeptide genes [19,22,25,29,44]. The polychaete Capitella are bristle worms living in sand or mud near the shore and which as opportunistic species are often used as indicators for environmental pollution. It is perhaps not surprising that a species thriving under environmentally challenging conditions can be easily kept in the laboratory and has been adopted as a new model in developmental biology. It was only after sequencing of its genome was finished that its taxonomic position was clarified and described as a new species, C. teleta [3]. The leech H. robusta is similarly an emerging model in developmental biology; it is used as such, because it is a relatively tractable representative of the spirally cleaving taxa. H. robusta is a nonblood sucking leech that feeds on snails and as such it is not closely related to the better known medicinal leech, Hirudo medicinalis; these two species belong to different orders, the Rhynchobellida and the Arhynchobellida, respectively. Like L. gigantea and C. teleta its genome is relatively small and this was a major factor in choosing these three species for genome sequencing.

2. Materials and methods The BLAST program [1] was downloaded from (www.ncbi.nlm.nih.gov/blast/Blast.cgi) to analyze the scaffolds of the C. teleta and H. robusta genomes, which were downloaded from http://genome.jgi-psf.org/Capca1/Capca1.download.ftp.html and http://genome.jgi-psf.org/Helro1/Helro1.download.ftp.html. On occasion searches were also done on the Helobdella and/or Capitella genome or gene models using the BLAST interface on the web site of the Department of Energy Joint Genome Institute (http://www.jgi.doe.gov/). Known molluscan and arthropod neuropeptides and neurohormones were used as bait in the BLAST programs. These sequences are the various molluscan neuropeptides reported in [57], the various insect neuropeptides reported in [19,22,25,29,44] and also the leech neuropeptides described by Salzet [46]. For the smaller peptides small conserved C-terminal sequences together with Gly for C-terminal amidation where appropriate and likely dibasic cleavage sites were used with tblastn with e values varying from 100 to 10,00,000. For larger hormones such as GPA2, GPB5 and insulin either complete sequences (GPA2, GPB5) or both complete and partial sequences (insulin) were used with lower e values. Once possible neuropeptide sequences were identified, attempts were made to find a plausible mRNA sequence containing a signal peptide. In order to be included in the list of putative neuropeptide precursors, a sequence needed to have not only a possible neuropeptide sequence but also a signal peptide and significant homology to known arthropod or lophotrochozoan neuropeptides, or have multiple paracopies on the precursor. The extensive use of all permutations of insect neuropeptides almost automatically included also the large majority of crustacean and chelicerate neuropeptides. In the case of crustacean hyperglycemic hormone and its various homologs a number of sequences were used, but like the insect neuroparsin sequences, they did not yield annelid homologs. Finally, every single neuropeptide gene found in Helobdella was used as a search motif to search the Capitella genome, and vice versa. All the found putative annelid neuropeptide genes sequences were also used to probe annelid EST sequences, in order to see whether evidence for homologous genes in other annelids exists. Two collections of EST sequences were particulary useful in this respect, those generated from Pomatoceros lamarckii, a polychaete worm [53] and those published recently from the leech H. medicinalis [32]; a number of unpublished ESTs from the polychaete Alvinella pompejana were also found in the data bases. Much use was made of the Artemis program [45] in analyzing these genomes, this program is available at: http://www. sanger.ac.uk/resources/software/artemis/. Clustal W [27] was used

161

for sequence alignments at http://www.ebi.ac.uk/Tools/clustalw2/ index.html. The presence of signal peptides in predicted neuropeptide precursors was analyzed by two different algorithms, predominantly SignalP [2], used at http://www.cbs.dtu.dk/services/SignalP/, and occasionally also Signal-3L [49], used at http://www.csbio.sjtu. edu.cn/bioinf/Signal-3L/. Yellow highlighting has been used to indicate predicted signal peptides in Figs. 1, 3, 10, 12, 17 and 19. The rules described for convertase cleavage sites in vertebrate [9,43] and insect neuropeptide precursors [55] were used to predict proteolytic cleavage of the putative precursors. These rules were applied in the same fashion as for the Lottia genome, and more detail can be found in that publication [57]. All predicted neuropeptides from the Helobdella and Capitella genomes are listed in a Table in the Supplementary data. The Capitella and Helobdella genome assemblies are still in their first versions and both genomes show evidence for allelic variation. Consequently, interpretations have been conservative. For example, there are two scaffolds showing a Capitella GPB5 gene, with very little sequence differences, these have been assumed to represent the same gene. The same happens with the insulin genes 4 and 5 of the same species of which partial copies are present on a different scaffold. Prediction of how genomic sequences will be processed into mRNA and subsequently translated into proteins is always hazardous. In this case this is even more so as the number of ESTs and/or cDNA sequences to indicate the processing of the introns is limited, particularly in Helobdella. It is often difficult to decide which potential splice site is more likely to be used. Genes that have ESTs allow for less ambiguous identification of intron–exon splice sites. Putative splice sites were chosen in such a way that predicted precursors would contain a hydrophobic signal peptide and a precursor sequence yielding neuropeptides. It is likely that in some cases the predictions will turn out be wrong, but as far as the final neuropeptides and hormones produced, this should be of little consequence. 3. Results All the neuropeptide and neurohormone genes identified here were found by their similarity to known molluscan and/or arthropodan neuropeptides. Most of predicted peptides are clearly homologous to known protostomian neuropeptides, although in a few cases the similarity is too limited to be sure that it reflects genuine homology. A few of the predicted peptides or their homologs have also been identified from annelids, e.g. GGNG [33,36,42], conopressins [41,47] and the allatotropin homologs [17,28], while the sequence of a cerebrin homolog was deduced previously from an EST [53], however, for several of the neuropeptide genes described here it is the first record in annelids. 3.1. Glycoprotein hormones In Capitella the genes encoding the bursicon subunits (Fig. 1) are present on the same scaffold at a distance of about 8000 bp. Although there are no ESTs for bursicon A, such sequences do exist for bursicon B (EY633285, EY633286, EY578920, EY578921). The Capitella genome also contains genes encoding a GPA2/GPB5 dimer (Fig. 1). Interestingly, this genome contains a third gene encoding a glycoprotein hormone B, which has been called GPB6. There are two interesting aspects of this gene. First, it is located about 30,000 bp from the GPA2 gene. As in many species the GPA2 and GPB5 genes are close to one another on the same chromosome [10], this suggests that once the GPB5 gene got duplicated elsewhere in the genome, it was the original GPB5 gene that evolved the most. The second curiosity of the GPB6 are the ESTs that show

162

J.A. Veenstra / General and Comparative Endocrinology 171 (2011) 160–175

Fig. 1. Predicted preprohormones for Capitella genes bursicon A, bursicon B, GPA2, GPB5, GPB6 and insulins 1–6 as well as the four Helobdella insulins. Yellow is used to indicate the predicted signal peptides. Dark red has been used for likely convertase cleavage sites and basic amino acid residues that will subsequently be removed by a carboxypeptidase, basic amino acid residues which are at the C-terminus of a preprohormone are therefore are also in dark red. Cleavage at a single Arg or Lys residue in neuropeptide precursors is facilitated by a basic amino acid in positions-4, -6 and -8; such residues have been colored light red to emphasize that the following single Arg residues are likely to be cleaved. Sometimes cleavage at putative convertase cleavage sites is hard to predict, and when this is the case such residues are made light red. Orange has been used to localize the cysteines and green for glycine residues likely to be converted into a C-terminal amide in the mature peptide. The parts of the precursors predicted to be transformed into biologically active neuropeptides are highlighted in light blue. The asterisks indicate a stop codon in the corresponding genomic sequence and the triangles the intron locations. Two black circles in front of a sequence indicate the existence of ESTs consistent with the proposed precursor structure. Two black diamonds indicate precursors that could be produced, but where the genes encoding them might well be pseudogenes. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

it to be expressed. There are four ESTs (EY647135, EY647134, EY647191, EY646357), of which at least two are independent, that all show that the acceptor splice site is not the one expected and

predicted based on homology of GPB6 with other glycoprotein hormones, but another one that leads to a truncated GPB6; both sequences are presented in Fig. 1. Having at least two independent

J.A. Veenstra / General and Comparative Endocrinology 171 (2011) 160–175

ESTs showing this ‘‘wrong’’ splice site, and none showing the ‘‘right’’ splice site suggests that the GPB6 is expressed and that at the very least a significant portion of the mRNAs from this gene may be spliced this way. It would be very interesting to know what the physiological significance of this is. ESTs coding GPA2 (FP515416) and GPB5 (e.g. FP529656, GO148390, FP504380) exist also from the polychaete worm A. pompejana. Genes encoding glycoprotein hormones are absent from the genome of Helobdella. Absence of a particular gene in an assembled genome does not necessarily mean that the gene is really absent, it might reflect an artifact of the DNA isolation or an error in the genome assembly. When a hormone or neuropeptide is absent, there is no reason to maintain the integrity of the gene(s) coding its receptor(s) and hence one would expect the receptors of these two hormones to be similarly absent if the hormones themselves are indeed lacking. Both bursicon and GPA2/GPB5 activate leucine-rich repeat G-protein coupled receptors (GPCRs) which have been identified in Drosophila [31,34,52]. Using those sequences to probe the Helobdella genome with BLAST it is impossible to find homologous genes, even though genes encoding other GPCRs and leucine-rich repeat containing proteins are found and the putative homologs of the bursicon and GPA2/GPB5 receptors are easily identified in the Capitella genome this way. 3.2. Insulins A total of six putative insulin genes were found in Capitella and four in Helobdella (Fig. 1). With the exception of the insulin 6 gene of Capitella, all the other genes code for an insulin-like peptide having four rather than three disulfide bridges, like several molluscan insulins [11,50,51]. In Capitella three of these genes, numbers 1, 4 and 5, contain a precursor which predicts in addition to the connecting peptide and the A and B chains a fourth peptide to be produced from the precursor at its C-terminal part, the D-peptide as in some molluscan insulins [11]. A single EST for Capitella insulin 1, sequenced in both directions (EY616772, EY616773) confirms the expression of this gene and as it does not reveal the presence of an unrecognized splice site, it also supports the existence of the D-peptide. The existence of such a D-peptide in the insulin precursor is also found in an EST (FP538718) from Alvinella and may thus be common in annelids. No ESTs were found for insulins 2 and 3. The sequences of these two genes show structural features which

163

suggest that they are probably not, or no longer, physiologically important and may be pseudogenes. Thus, there is an allelic variant for the Capitella insulin 2 gene which has a one nucleotide deletion in its coding sequencing (Fig. 2) which would lead to a severely truncated form of insulin unlikely to have any biological activity; this is the allele that got incorporated into the genome assembly. The predicted cysteine residue in the C-peptide of insulin 3 (Fig. 1) can be expected to interfere with efficient folding of the insulin precursor in the endoplasmic reticulum and may, therefore, have little biological activity. The Capitella insulin 4 gene has two ESTs (EY594143, EY572387), but none were found for the Capitella insulin 5 or 6 genes. The Helobdella insulin 1 gene has four ESTs (EY328301, EY336397, EY336398, EY328300) corresponding to two independent clones sequenced in both directions. These ESTs are all derived from a different allele than one from which the genome sequence was derived, as there are consistently 14 nucleotide substitutions leading to seven amino acid substitutions in the predicted insulin 1 precursor. The Capitella insulin 6 gene (Fig. 1) is very different from the other nine annelid insulin genes described here. First of all, the predicted insulin has only three disulfide bridges and thus resembles more the vertebrate and arthropod insulins. However, unlike vertebrate insulin, the predicted preproinsulin from this gene does not have a connecting peptide and as such it is similar to Drosophila insulin-like peptide 6, Tribolium insulin-like peptide 3 and the honey bee insulin encoded by GB17332. Others have recognized these three insulins as forming a separate subgroup of insect insulins [29], and it looks like Capitella insulin 6 is similar to these molecules. The predicted Capitella insulin 6 precursor has two putative convertase cleavage sites separated by a single amino acid residue (Fig. 1), which together also conform to the furin consensus cleavage site (RXRR), suggesting that this precursor might get cleaved into A and B chains either by neuroendocrine convertases or by the ubiquitously expressed general convertase furin.

3.3. Achatin In mollusks several neuropeptides are known to have a D-amino acid in the active peptide [23,37,40]. Achatin [23] is one of those and the presence in both Capitella and Helobdella of genes encoding

Fig. 2. Partial sequencing traces, obtained from the trace archives at NCBI, of the two alleles of Capitella insulin gene 2. The top trace (gnl|ti|1085064814) depicts the allele incorporated into the genome assembly from which a T has been deleted and which would yield a truncated insulin. The bottom trace (gnl|ti|1349786684) shows a sequencing trace from the allele which could yield the insulin 2 preprohormone depicted in Fig. 1. Note that sequence shown starts with the last part of the intron (nucleotides in lower case) and continues with the coding sequence for the second exon (nucleotides in capitals); their conceptual translation into amino acid residues has also been indicated.

164

J.A. Veenstra / General and Comparative Endocrinology 171 (2011) 160–175

an achatin precursor (Fig. 3) suggests that, like mollusks, annelids perhaps also produce D-amino-acid-containing neuropeptides. 3.4. Allatostatin C The somatostatin homolog allatostatin C is a neuropeptide initially isolated from the moth Manduca sexta [26] and are generally present in arthropods [56] as well as mollusks [57]. A gene coding such a peptide was found in the genome of both annelids (Fig. 3); the predicted peptides are more similar to the molluscan than the arthropodan allatostatin C’s (Fig. 4). There are some Capitella allatostatin C ESTs incorrectly labeled as being Helobdella ESTs, e.g. EY334263. Allatostatin C ESTs also exist for H. medicinalis (e.g. FP643552, FP602644, FP646710).

3.5. Allatotropin The first allatotropin was identified from M. sexta where it stimulates the production of juvenile hormone [24], similar peptides have been found in mollusks [17,28,57]. Both Helobdella and Capitella each have one gene coding such a peptide, which is structurally more similar to the molluscan than to the arthropodan homologs (Fig. 5). Allatotropin ESTs were also found for Alvinella (FP538439, GO162758) and Pomatoceros (GR308811). 3.6. Cerebrin or PDF It has been suggested previously [57] that the molluscan neuropeptide cerebrin might be a homolog of the arthropod

Fig. 3. Predicted preprohormones from Capitella and Helobdella genes coding achatin, allatostatin C, allatotropin, cerebrin, conopressin, elevenin, FFamide and FLamide. Colors and symbols as in Fig. 1.

J.A. Veenstra / General and Comparative Endocrinology 171 (2011) 160–175

165

identified vasopressin-like peptide from the pharyngobdellid leech Erpobdella octoculata, [Lys8]-conopressin [47], as well as [Val3, Thr8]-conopressin which was identified from Eisenia fetida [41]. Other annelid conopressins can be predicted from ESTs, thus in Alvinella we find [Met8]-conopressin (e.g. GO219782, GO191729, GO172063), in Hirudo [Leu8]-conopressin (FP601759, FP628609, FP660176), and in Lumbricus terrestris both CIVHNCPVGamide (DR008870) and CFMADCPRGamide (DR009555, DR009404) are present (Fig. 7). The presence of a second convertase cleavage site in the Capitella conopressin–neurophysin precursor suggests the production of Fig. 4. Sequences of allatostatin C from Hirudo, Helobdella and Capitella alligned with homologous peptides from the mollusks Lottia gigantea, Argopecten irradians, Mytilus galloprovincialis and Crassostrea gigas and the insects Drosophila melanogaster and Apis mellifera. Non-annelid sequences from [57]. Cysteine residues are colored orange, other conserved amino acid residues are in black and conservative replacements in gray. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 5. Sequence alignment of the predicted annelid allatotropin sequences with homologous peptides from the mollusks Lottia gigantea, Lymnaea stagnalis, Aplysia californica, Haliotis asinina, Idiosepius paradoxus and Euprymna scolopes and the insects Locusta migratoria and Manduca sexta. Non-annelid sequences from [57]. Colors as in Fig. 4.

Fig. 6. Sequence alignment of annelid cerebrins with putative homologous peptides from the mollusks Lottia gigantea, Lymnaea stagnalis, Aplysia californica and Crassostrea gigas and the arthropods Cancer productus, Schistocerca gregaria and Drosophila melanogaster. Non-annelid sequences from [57]. Colors as in Fig. 4.

Fig. 7. Annelid conopressin sequences alligned with sequences of homologous peptides from mollusks, insects and vertebrates. Non-annelid sequences from [19,29,57]. Colors as in Fig. 4.

Fig. 8. Sequence alignment of Capitella and Alvinella elevenin and its putative homologs from the mollusks Lottia gigantea, Aplysia californica, Idiosepius paradoxus and Crassostrea gigas [57] and the nematode Caenorhabditis elegans. Non-annelid sequences from [57]. Colors as in Fig. 4.

neuropeptide PDF (pigment dispersing factor). A gene encoding a cerebrin homolog was found in Capitella (Fig. 3) but not in Helobdella, the Capitella peptide is more similar to molluscan cerebrin than to its putative arthropod homologs (Fig. 6); cerebrin encoding ESTs were also found for Pomatoceros (GR311045) and Alvinella (e.g. FP555990, GO159770, GO159771).

3.7. Conopressin In both Capitella and Helobdella a single gene was found encoding [Ile8]-conopressin, a vasopressin-like peptide [8], and its associated neurophysin (Fig. 3). The sequence is very similar to the

Fig. 9. Sequence alignment of predicted Capitella and Helobdella FFamides with FFamide and LFamide from the mollusk Lottia gigantea and Drosophila SIFamide. Although only one arthropod SIFamide is shown, the peptide is very well conserved within arthropods [59]. Non-annelid sequences from [57]. Colors as in Fig. 4.

166

J.A. Veenstra / General and Comparative Endocrinology 171 (2011) 160–175

an additional peptide. This peptide is predicted to have both its Cand N-termini protected from rapid exopeptidase degradation by a C-terminal amide and N-terminal pyroglutamate, respectively, which indicates that it might have biological activity. 3.8. Elevenin The neuropeptide elevenin has been found in various molluscan species and a homolog is present in Caenorhabditis elegans [57]. The Capitella genome encodes an elevenin neuropeptide (Fig. 3) which is very similar to its molluscan homologs (Fig. 8), but such a gene

was not found in the Helobdella genome. An elevenin EST was found for Alvinella (FP533670) but not for Capitella. 3.9. FFamide FFamide is likely the lophotrochozoan homolog of arthropod SIFamide [57], a peptide which in Drosophila modulates sexual behavior [54]. As has been noted previously [59] the C-terminal part of the precursor contains a well conserved sequence encompassing a disulfide bridge, which is also conserved in both Lottia [57] and Capitella (Fig. 3), but is lacking in Helobdella, where this

Fig. 10. Predicted preprohormones from Capitella and Helobdella genes coding FMRFamide and FMRFamide related peptides, FVRIamide, FVRFamide, and FVamide. Colors and symbols as in Fig. 1.

J.A. Veenstra / General and Comparative Endocrinology 171 (2011) 160–175

part of the precursor has been replaced by another paracopy of FFamide (Fig. 3). The annelid FFamide are more similar to the molluscan FFamides than to arthropod SIFamide (Fig. 9). In Helobdella there is potentially also a LFamide peptide made by this precursor, but although it has some sequence similarity to the other three FFamides on the same precursor, it is significantly different from them and, therefore, may be without significant biological activity (Fig. 3). 3.10. FLamide The Helobdella genome contains two genes coding for FLamide containing neuropeptide precursors, while the Capitella genome contains one such gene. Whereas the consensus sequence of the FLamides in Capitella is AKYFLamide, that in the homologous gene from Alvinella is ARMFIamide as shown by an EST (FP517610). The signal peptide of the Helobdella FLamide 1 precursor (Fig. 3) is highly unusual, but SignalP [2] still predicts it to have a 0.437 probability of having a signal peptide and exactly the same probability to have a signal anchor, whereas Signal-3L [49] concludes it does not have a signal peptide. There are four ESTs, of which two are independent, containing the cDNA of this putative signal peptide. Comparison with the genomic sequence gives a few sequence differences, but inspection of the individual traces of the genomic sequence shows the latter to be reliable and no alternative signal peptides are apparent. The signal peptide from the homologous neuropeptide precursor from H. medicinalis can be obtained from ESTs (e.g. FP667124, FP643328, FP593916, FP599500) and is normal; there is also a Hirudo EST for the homologous FLamide 1 gene (FP612284). The two Helobdella FLamide precursors (Fig. 3) distinguish themselves from the Capitella (and Alvinella) FLamide precursor by the presence of six cysteines that likely form 3 disulfide bridges. The similar, but not identical, spacing of the cysteine residues in these parts of the two Helobdella FLamide precursors, as well as other conserved residues between them, suggest that these two genes are derived from the same ancestral gene; the homologous Hirudo precursors also have those six cysteine residues. 3.11. FMRFamide and FMRFamide-like There are two genes encoding FMRFamide and/or related peptides in Capitella and three such genes in Helobdella (Fig. 10). The Capitella FMRFamide gene codes for a precursor containing 19 paracopies with the consensus FMRFamide sequence; a Pomatoceros homolog of this gene is represented by an EST (GR310632). The Capitella FMKFamide gene codes for 23 paracopies of FMKFamide. A database search reveals four ESTs of this genes which are attributed to Helobdella, but represent no doubt Capitella sequences, as the corresponding nucleotide sequences are not present in the Helobdella genome but match 100% the Capitella FMKFamide gene. Helobdella has three genes coding FMRFamide-like neuropeptides, however, these genes code for only two or three paracopies. Various Hirudo ESTs (e.g. FP622225, FP636408, FP598883) code a FMRFamide precursor quite similar to the Helobdella one, while other ESTs (FP601409, FP610894, FP646821) code for a likely homolog of the Helobdella FMRFamide-like 2 precursor.

167

particulary in the Helobdella FVRFamide gene where two of the four paracopies have a FVRFamide C-terminal sequence (Fig. 10). 3.13. FVamide It was suggested previously [57] that the Lottia PXFVamides are homologs of the Mytilus inhibitory peptides [13,14], as the residues that are identical between these peptides are those that are essential for biological activity of the latter. Six paracopies predicted from the Capitella FVamide gene (Fig. 10) have the same amino acid residues conserved, and hence the Capitella FVamide gene is considered homologous to the molluscan PXFVamide genes. However, it appears that in Capitella this gene codes for two different sets of peptides on different precursor which share the same signal peptide. The Capitella FVamide gene contains two different exons, which are separated by more than 9000 bp. The second exon is likely to use alternative 30 -splice junctions, which are separated by 810 nucleotides. When the first acceptor site is used a peptide precursor is predicted producing PXFVamides, when the second 30 -splice site is produced, the precursor contains no FVamides, but EFLGamides (Figs. 10 and 11). All ESTs found used the first splice site. Although the mRNA’s also contain the sequence coding for the EFLGamides, this sequence should not be translated into peptides as the sequence is out of frame and contains stop codons. A more careful look at the PXFVamide gene in L. gigantea shows that alternative splicing also happens in this species, while ESTs suggest this is likely also the case in Crassostrea. However, alternative splicing of the FVamide gene appears not to occur in Helobdella. 3.14. FX1DFLX2amide In some putative neuropeptide genes the consensus sequence is not always clearly discernible. This is the case for the Capitella FX1DFLX2amide gene (Fig. 12). The first F is often a Y, and L is an aliphatic amino acid. This gene has no clear homologs in either Helobdella, mollusks or arthropods, however, it is likely an authentic neuropeptide gene, as it has a signal peptide, various similar sequences separated by convertase cleavage sites and the deduced neuropeptides have C-terminal Gly residues predicted to be transformed in amides. A likely Lumbricus homolog is encoded by an EST (CF810336). 3.15. GGNG The GGNG peptides have been found in both mollusks and annelids [33,36,42,57] and have been suggested to be homologs

3.12. FVRIamide and FVRFamide Both the Capitella and the Helobdella genomes contain a gene encoding N-terminally extended FVRIamides (Fig. 10) and the expression of homologous genes from Potamoceros and Hirudo is attested by ESTs (GR309733, GR308449, GR308867 and FP655083, FP600169, FP637785, respectively). Obviously, the structures of these peptides are reminiscent of FMRFamide,

Fig. 11. Schematic representation of the alternative splice sites used for the second exon of the Capitella FVamide gene. The first exon codes for the signal peptide there are two alternative 30 -splice sites. The first one produces a mRNA species encoding PXFVamides, the second one a neuropeptide precursor encoding EFLGamides. Due to a stop codon (asterisk in the figure) just before the second 30 -splice site the mRNA 1 should not lead to the production of EFLGamides.

168

J.A. Veenstra / General and Comparative Endocrinology 171 (2011) 160–175

Fig. 12. Predicted preprohormones from Capitella and Helobdella genes coding FX1DFLX2amide, GGNG, GGRFamide, GnRH-like peptides, LFRWamide, LFRYamide, luqin, lymnokinin and myomodulin. Colors and symbols as in Fig. 1.

of the insect CCHamides [57]. A Capitella gene encoding a GGNG was found (Fig. 12), but such a gene was not detected in the Helobdella genome, although various Hirudo GGNG ESTs are present in the databases (e.g. FP660814, FP627085, FP639824). If the predicted Hirudo GGNG precursor is used to probe the Helobdella genome, two pieces of scaffold 2 of the genome assembly are found, however, they are interrupted by more than 500,000 bp.

This suggests that a chromosome breakage occured in the middle of the GGNG gene of Helobdella and that a piece of genomic DNA got inserted inside this gene (Fig. 13). Two ESTs encoding the Alvinella GGNG (GO200823, GO165291) show a one amino acid insertion in the peptide sequence. The annelid GGNGs are clearly much more similar to the molluscan than to the arthropodan neuropeptides (Fig. 14).

J.A. Veenstra / General and Comparative Endocrinology 171 (2011) 160–175

169

Fig. 13. Schematic representation of part of scaffold 2 of the Helobdella genome. The bar on top indicates the relative localization of two pieces of DNA indicated by red which are separated by about 508,000 bp. The conceptual translations of these two pieces are indicated below and are identified by the BLAST program (tblastn) as having significant homology with Hirudo ESTs for GGNG (e.g. FP660814, FP627085, FP639824). The conceptual translation of these ESTs is indicated in yellow and alligned with the conceptually translated Helobdella genomic sequences 1 and 2. The corresponding genomic sequences are very well covered, solid and leave no possibility for sequencing errors which might mask splice sites. Numbers surrounding the Helobdella genomic sequences 1 and 2 refer to the nucleotides of scaffold 2. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 14. Sequence alignment of GGNG peptides from two Hirudo (H. japonica [36] and H. medicinalis), Capitella and Alvinella comparred to predicted molluscan GGNG peptides from Lottia gigantea and Thais clavigera as well as to the CCHamides from the insects Apis mellifera and Drosophila melanogaster [57]. Non-annelid sequences from [57]. Colors as in Fig. 4.

3.16. GGRFamide Both the Helobdella and Capitella genome contain a gene encoding one and two copies respectively of N-terminally extended GGRFamides (Fig. 12). ESTs from Alvinella (e.g. GO150735, GO150736, GO224610) encode a GGRFamide precursor similar to the one from Capitella. These peptides do not seem to have obvious homologs in either mollusks or arthropods and may be specific annelid neuropeptides.

3.17. GnRH-like peptides Capitella has three different genes encoding GnRH-like peptides (Fig. 12). The first one is most similar to other lophotrochozoan GnRH genes as far as the predicted neuropeptide goes, in particular it is quite similar to the molluscan GnRH-like peptides (Fig. 15). ESTs endcoding homologs of this peptide in Alvinella (GO221362, GO221361, GO150662) and Hirudo (FP663930, FP671062, FP590501) show this peptide to reasonably well conserved within annelids, however, a homologous gene was not found in the Helobdella genome.

Fig. 15. GnRHs predicted from annelid genome and EST sequences. The top panel comparing the various annelid GnRH-like peptides with one another, the bottom part shows the strong similarity of Capitella GnRH 1 with the molluscan GnRH’s from Lottia gigantea [57], Aplysia californica [60], and Mizuhopecten yessoensis (BAH47639). Colors as in Fig. 4.

The second Capitella GnRH-like gene codes for two peptides, a relatively long 12 amino acid residues containing GnRH-like peptide, and a second peptide which lacks the typical N-terminal pyroglutamate. The third gene encodes a single GnRH-like which looks like a combination of the two peptides of the second GnRH gene. Both the similarity of the GnRH-like peptides and that of the signal peptides of these precursors show that these two genes have a common evolutionary origin. Although both the second and third Capitella GnRH-like genes are represented by ESTs in the databases, no ESTs were found for homologous genes from other annelid species, but Helobdella has a gene which looks homologous to the third Capitella GnRH-like gene (Fig. 15).

Fig. 16. Sequence alignment of annelid and molluscan luqin sequences. Nonannelid sequences from [57]. Colors as in Fig. 4.

170

J.A. Veenstra / General and Comparative Endocrinology 171 (2011) 160–175

3.18. LFRWamide and LFRYamide

3.19. Luqin

In Capitella a gene was detected coding for LFRWamides while another gene codes for the closely related LFRYamides (Fig. 12), these genes resemble the LFRFamide and LFRYamide genes from the mollusk L. gigantea [57]. An EST from Alvinella (GO236254, GO236255) codes for two VFRYamides as well as two VRFWamides. Such genes were not found in the Helobdella genome.

Luqin is a molluscan neuropeptide and a gene encoding such a peptide is present in the genome of Capitella (Fig. 12) but was not found in that of Helobdella. ESTs encodig luqin precursors are present in the data bases for Alvinella (GO162596, GO112383, GO160483) and Hirudo (FP584925). As reported earlier, the molluscan luqins have a well conserved peptide structure of which

Fig. 17. Predicted preprohormones from Capitella and Helobdella genes coding NPY, pedal peptide, RGWamide and RXDLamide. Colors and symbols as in Fig. 1.

J.A. Veenstra / General and Comparative Endocrinology 171 (2011) 160–175

171

only the first three amino acids differ between the species [57] and the annelid luqins have very similar structures (Fig. 16).

FP616363, FP645854, FP602328 and e.g. FP627449, FP646132, FP632744, respectively).

3.20. Lymnokinin

3.23. Pedal peptides

Lymnokinin is the molluscan homolog of the insect leucokinins, neuropeptides which stimulate fluid secretion by the Malpighian tubules and as well as hindgut contractions [20]. A lymnokinin gene was found in the genome of Capitella (Fig. 12), but not in that of Helobdella.

Mollusks have several genes coding pedal peptides, three in Lottia and four in Aplysia [57], two such genes were found in the Capitella and one in the Helobdella genome (Fig. 17). Alvinella homologs of the two Capitella pedal peptide genes have ESTs (GO197477, GO197478 and FP490158, respectively) and there are a number of Hirudo ESTs for pedal peptide 2 gene which seems to have two alternatively spliced mRNAs (e.g. FP602952, FP602250, FP584519), while a single EST (FP617109) seems to encode a different Hirudo pedal peptide peptide precursor.

3.21. Myomodulin Molluscan myomodulins are peptides having (L/M)R(L/ M)amide C-termini, sequences which are somewhat similar to the insect pyrokinin and periviscerokinins. A single myomodulin gene was found in the Capitella genome, while three such genes were detected in the Helobdella genome (Fig. 12). Homologs of the Capitella myomodulin precursor are encoded by ESTs from Alvinella (GO126848, GO126849, FP506662, FP529367) and Pomatoceros (GR308449, GR309733). In Hirudo there are also at least two myomodulin genes as demonstrated by the various ESTs (e.g. FP624913, FP623612, FP603424 and e.g. FP651886, FP641673, FP53829, respectively); like the Helobdella myomodulin genes, the myomodulin genes in Hirudo code for a small number of myomodulin paracopies. 3.22. NPY The Capitella genome contains three different NPY genes (Fig. 17), which are likely derived from the same ancestral gene by duplication, as the introns are present at exactly the same nucleotides in each gene. The predicted NPYs from all three genes have a C-terminal Phe-amide, like most arthropod NPFs, but as the molluscan homologs have been called NPY this name was retained rather than NPF. In Helobdella there are five such genes (Fig. 17), four next to one another on the same scaffold in a stretch of 8000 bp. One of these genes is probably a pseudogene, as both splice sites have degenerated and in particular the 50 -splice site is likely to be no longer functional (Fig. 18). ESTs coding NPYs most similar to Capitella NPY1 exist for Alvinella (FP542806, GO145290) and the earthworm Lumbricus rubellus (CO048390), Alvinella has a second NPY which is most similar to Capitella NPY2. For Hirudo there are ESTs representing two different NPY2 genes (e.g.

3.24. RGWamide Both the Capitella and Helobdella genomes contain a gene encoding 8 and 2 paracopies RGWamide, respectively (Fig. 17). These genes likely represent homologs of the molluscan APGWamide genes [57]. 3.25. RXDLamide It is likely that the one Capitella and two Helobdella genes (Fig. 17) coding N-terminally extended RXDLamides are evolutionarily related. Hirudo ESTs show the existence of at least three such genes in this species (FP656686, FP646581, FP630708, FP658729, FP656937, FP659375, FP664126, FP586241). It is not clear whether or not they might be related to the molluscan buccalins and thus perhaps to allatostatin A, as perhaps suggested by their Lamide C-terminal sequences. 3.26. Tachykinins Tachykinins are commonly found in both arthropods [38] and mollusks [57]. A single tachykinin gene was identified in the Capitella genome (Fig. 19) and ESTs from homologous genes were found for Alvinella (GO121454, GO121455) as well as for Hirudo (e.g. FP656524, FP632123, FP632922), but no tachykinin gene was found in the leech Helobdella. The Hirudo tachykinin gene ESTs suggest that it encodes a single tachykinin (GPPMGFHFVRamide) and a possible exon of a putative Helobdella tachykinin gene was found to code the sequence ELAKRNPPRYFHFVRGKK, however,

Fig. 18. Schematic drawing of about 8000 bp of scaffold 53 of the Helobdella genome showing the localization of four NPY genes relatively to one another. The genes are head to tail, the coding sequences are indicated by black boxes and the intron splice sites are indicated below. Note that the intron is found at the typical site of NPY genes, i.e. between the second and third nucleotide of codon for the Arg residue in the Arg-(Phe/Tyr)-amide sequence. Nucleotides corresponding the exons are in capitals and those corresponding to the introns are in lower cases. The conceptual translation of the exon sequences in amino acids are indicated below the nucleotide sequences. Also note that the 50 splice site of the NPY2 gene does not start with gt and thus does not conform to the consensus sequence, suggesting that the NPY2 gene may be a pseudogene.

172

J.A. Veenstra / General and Comparative Endocrinology 171 (2011) 160–175

Fig. 19. Predicted preprohormones from Capitella and Helobdella genes tachykinin and WWamide. Colors and symbols as in Fig. 1.

whether this sequence does indeed represent the last exon of a Helobdella tachykinin gene encoding NPPFRYFHFVRamide cannot be ascertained due to a gap in the genome assembly. One might be tempted to believe that Alvinella has two tachykinin genes, but the offending EST (FP499727) in question is most likely a mislabeled Hirudo EST. 3.27. WWamide or allatostatin B The first allatostatin B was identified as a myoinhibitory peptide from the migratory locust [48] and called mip, structurally related peptides identified from mollusks have been called WWamides [35]. As in some insect species these peptides inhibit the biosynthesis of juvenile hormone [30] they are now commonly called allatostatin B. In the Capitella genome a single gene was found to encode such peptides, but four such genes were found in the Helobdella genome (Fig. 19). ESTs from Lumbricus rubellus may represent either one or two different genes (CF799282, CF799282 and BF422451, BF422446), those from Hirudo represent five different genes (EY492354, EY492353; e.g. FP616143, FP593191, FP605963; e.g. EY493905, EY493906, EY482384; EY495343, EY495344 and FP618166, FP594836). It is interesting to note that the spacing between the characteristic Trp residues is not completely constant in the predicted Capitella peptides there are either 7 or 8 amino acid residues separating them, while in the Helobdella peptides it varies from 6 to 7. A similar variability was also noted in the pea aphid [22]. 4. Discussion Due to large structural variability it is impossible to know whether some annelid neuropeptides are more similar to molluscan or arthropod neuropeptides, but in those cases where this is clear, predicted annelid neuropeptides that have homologs in both mollusks and arthropods are systematically more similar to their molluscan than to their arthropod homologs (e.g. Figs. 4–7, 9 and 14). Achatin, luqin, elevenin, PXFVamide and myomodulin have

homologs in mollusks, but have no homologs in arthropods, while on the other hand, there are no annelid peptides that have homologs in arthropods but not in mollusks. Thus, the predicted annelid neuropeptidomes are much more similar to molluscan than to arthropod neuropeptidomes, as was expected based on phylogeny. Interestingly, we found two genes, GGRFamide and FX1DFLX2amide, that seem specific for annelids. The first was found because of the RFamide sequence, the second for the presence of multiple convertase cleavage sites. Nevertheless other specific annelid neuropeptide genes without homologs in mollusks or arthropods may have been missed, particularly if they code for a small number of paracopies. As the Helobdella genes seem to code in general less paracopies on their precursors than the homologous Capitella genes, this could be a problem, as illustrated by the tachykinin gene for which we may or may not have identified an exon. Although the predicted neuropeptide genes of these two species are similar, there are two interesting differences. First, there is a large number of Capitella neuropeptide genes for which no homologs were found in Helobdella, while we did not find a single Helobdella neuropeptide gene without a Capitella homolog, even when using the Capitella sequences as search motif on the Helobdella genome. Thus, neuropeptide variety seems more limited in Helobdella than in Capitella. It has been suggested previously that evolution is relatively slow in polychaetes [53] and hence it can perhaps not be excluded that some of this is due to faster evolution of neuropeptide structures in Helobdella, which would make it more difficult to recognize such homologs. On the other hand the simultaneous absence in Helobdella of the three genes for each of the two glycoprotein hormones bursicon and GPA2/GPB5 – receptor, subunit A and subunit B – leads to the conclusion that these hormones are almost certainly genuinely absent from the genome of H. robusta. For some of the neuropeptides for which we did not find genes in the Helobdella genome, ESTs encoding homologous peptides were found from another leech species, H. medicinalis. One such peptide is GGNG, which had previously been identified from another Hirudo species [36]. For this particular gene there is very suggestive evidence that it got destroyed by a chromosome rearrangement (Fig. 13). It will be interesting to see whether the

J.A. Veenstra / General and Comparative Endocrinology 171 (2011) 160–175

Hirudo genome, which is currently assembled (M. Salzet, personal communication), shows sufficient synteny with the Helobdella genome to confirm or refute this hypothesis. The second difference between the Helobdella and Capitella neuropeptidomes concerns the phenomenon of gene duplication. Whereas the number of neuropeptide genes, at least those that were found, is smaller in Helobdella than in Capitella, it looks like the former species has a stronger tendency to duplicate existing neuropeptide genes. Thus Helobdella has two LFamide genes, two RXDLamide genes, two myomodulin genes and four WWamide genes, whereas only one copy for each of these genes was found in Capitella. Helobdella also has five NPY and three FMRFamide-like genes, versus three and two, respectively, in Capitella. It is interesting to see that EST data suggest that at least the myomodulin and WWamide genes are also amplified in Hirudo, a leech species which is not closely related to Helobdella. This apparent simultaneous loss of neuropeptide genes and amplification of others is reminiscent of neuropeptidomes of the nematode C. elegans or the planarian Schmidtea mediterranea, which also seem to have an even larger number of recently duplicated neuropeptide genes and a more limited variety of neuropeptides [7,21]. A systemic analysis of all the neuropeptide GPCRs in these genomes is likely to provide additional information as to neuropeptide evolution in these species. There are some interesting details on neuropeptide evolution emerging from the data. Genome sequencers ideally like to sequence a single 100% homozygous individual, since the presence of extensive allelic variation means that two similar, but different, genotypes need to be sequenced. Such allelic variation makes it more difficult to assemble the genome when the overall coverage is ‘‘only’’ 8 – when coverage reaches 100, as in the case of the recently sequenced ant genomes e.g. [4] this is no longer a problem. Nevertheless, neither the Lottia, Helobdella or the Capitella genome sequences were obtained from such individuals. The disadvantage is that the genome assembly is not perfect with different scaffolds containing what might well be the same gene. The advantage is that sometimes these allelic variants are in the coding regions and provide details on neuropeptide evolution. For example, in Capitella there are two alleles for the insulin 2 gene, one of which can no longer produce biologically active insulin. Although a single animal does not show how common the two alleles are in the population at large, it suggests that this gene may be a pseudogene or evolve into one. Insulin pseudogenes are known from other species (e.g. [22]), but usually the only way to recognize them is when coding sequences show more complete signs of degeneration. The general picture of neuropeptide evolution that is arising from the various arthropod genomes that have been sequenced so far is one of gene loss and gene duplication (e.g. [19]). The same is seen here, the genes encoding the glycoprotein hormones are clearly lost from Helobdella, while gene duplication is obvious from the genes encoding insulins, NPY, WWamide, myomodulin, FLamide, FMRFamide-like or GnRH-like peptides. What is perhaps interesting is that they are in part the same and in part different genes that are lost or duplicated. The quadruplication of the WWamide gene in Helobdella appears to be the first report of a duplication of this gene in any species. Genes encoding insulins and GnRH-like peptides are commonly amplified in insects (see e.g. [16,22,29]), while the NPY gene is triplicated in vertebrates (PP, NPY and PYY) and has at least ten paralogs in the flatworm S. mediterranea [7]. It is likely that there are reasons why the genes of some neurohormones or neuropeptides are more often duplicated or, more precisely, why the accidental mutations creating such duplications are not selected against. One of the genes most commonly duplicated in both protostomes and deuterostomes are those coding insulin-related peptides.

173

The best known invertebrate insulins, such as the Lymnaea stagnalis insulins and the silkworm bombyxins, are expressed in neuroendocrine cells. Such cells express a neuroendocrine convertase able to cleave insulin precursors into the A and B chains and the C and D peptides. Other insulin genes are not expressed in neuroendocrine cells, e.g. Drosophila insulin-like peptide 6 is highly expressed in the larval fat body [6]. In Bombyx mori one of the insulin genes is similarly highly expressed in this tissue [39]. As the fat body does not express a convertase able to cleave the Lys–Arg dibasic cleavage sites of the insulin precursor, the secreted peptide consists of a single chain with the connecting peptide still attached to the A- and B-chains [39]. However, the fat body does express furin, and Capitella insulin 6 and Drosophila insulin-like peptide 6 are, therefore, likely cleaved between the A and B chains at the furin consensus cleavage site, RSRR in Capitella insulin 6 (Fig. 1). The presence of furin consensus cleavage sites in the precursors of insulin-like peptides may be an indication that they are at least not exclusively made by classical (neuro-)endocrine cells. Although the evidence for mollusks is not as complete as for insects, insulin-like hormones promote growth in both insects and mollusks [5,50]. It is obvious that an improved regulation of growth rapidly leads to increased survival of the species, and consequently selection pressure on the endocrine regulation of growth must be strong. It seems likely that this explains at least in part the rapid evolution of protostomian insulin genes as well as that of the deuterostomian genes coding growth hormones and prolactins [12], although it does not explain why growth hormones are also expressed by the fat body. I have previously made the case that the switching of the preferential use of one energy substrate to another might be responsible for the rapid evolution of the peptides of the AKH/RPCH family in insects and explain why in crustaceans RPCH, which does not mobilize energy substrates, shows very little structural variability [58]. One can imagine that e.g. mobilizing trehalose separately from lipid was initially attained by a duplicated AKH acting on a duplicated AKH receptor. To increase specificity of each hormone, their structures and those of their receptors subsequently diverge. In a situation of one peptide with one receptor a change in the structure of either will likely lead to a decrease in affinity and hence be rapidly selected against. On the other hand in the situation of two peptides with two receptors, a structural change in any of the four elements is still likely to lead to a loss of affinity in the peptide receptor combination, but it may simultaneously lead to an increase of specificity, i.e. the loss in affinity in one pair may be much larger than for the other. Such a scenario can be expected to be actually favored by selection and should lead to divergent structures. It is interesting to note in this context that the lophotrochozoan species which are known to have two conopressin-like neuropeptides also have the structurally most divergent conopressins (Fig. 7). One would expect that in insects which use one AKH to mobilize trehalose to sustain flight and a second AKH to mobilize lipid the two AKHs have different structures. Once predominantly trehalose (or lipid) is used as an energy source during flight, one pair of genes coding for AKH and its receptor get lost, but the remaining AKH has acquired a different structure. Such hypotheses may explain some cases of duplicated neuropeptide genes and why such neuropeptides evolve more rapidly than others. It is clear that such hypotheses are more easily formulated once we have some idea as to the physiological function of the neuropeptides involved, which for most of them is unfortunately not yet the case. Acknowledgments This paper benefited from the constructive criticism from two anonymous reviewers. I thank them and the associate editor, Prof. E.S. Chang, for their suggestions for improving the manuscript. It is

174

J.A. Veenstra / General and Comparative Endocrinology 171 (2011) 160–175

a pleasure and a privilege to look in detail at well sequenced and assembled genomes such as these. This would not have been possible without the US Department of Energy Joint Genome Institute http://www.jgi.doe.gov/ which produced the Capitella teleta and Helobdella robusta genome sequence data in collaboration with the user community of which I thank in particular Drs. David Weisblat, Robert Savage and Dan Rokhsar for their time and energy spent on these projects. The exploration of this genome was much facilitated by programs such as BLAST, Artemis, etc. I thank all those who made these programs as well as Michel Salzet for discussing leech neuropeptides.

[20]

[21]

[22]

[23]

Appendix A. Supplementary data [24]

Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.ygcen.2011.01.005.

[25]

References [1] S.F. Altschul, T.L. Madden, A.A. Schäffer, J. Zhang, Z. Zhang, W. Miller, D.J. Lipman, Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Res. 25 (1997) 3389–3402. [2] J.D. Bendtsen, H. Nielsen, G. von Heijne, S. Brunak, Improved prediction of signal peptides: SignalP 3.0, J. Mol. Biol. 340 (2004) 783–795. [3] J.A. Blake, J.P. Grassle, K.J. Eckelbarger, Capitella teleta, a new species designation for the opportunistic and experimental capitellid, Capitella sp. I, with a review of the literature for confirmed records, Zoosymposia 2 (2009) 25–53. [4] R. Bonasio, G. Zhang, C. Ye, N.S. Mutti, X. Fang, N. Qin, G. Donahue, P. Yang, Q. Li, C. Li, P. Zhang, Z. Huang, S.L. Berger, D. Reinberg, J. Wang, J. Liebig, Genomic comparison of the ants Camponotus floridanus and Harpegnathos saltator, Science 329 (2010) 1068–1071. [5] W. Brogiolo, H. Stocker, T. Ikeya, F. Rintelen, R. Fernandez, E. Hafen, An evolutionarily conserved function of the Drosophila insulin receptor and insulinlike peptides in growth control, Curr. Biol. 11 (2001) 213–221. [6] V.R. Chintapalli, J. Wang, J.A.T. Dow, Using FlyAtlas to identify better Drosophila melanogaster models of human disease, Nat. Genet. 39 (2007) 715–720. [7] J.J. Collins III, X. Hou, E.V. Romanova, B.G. Lambrus, C.M. Miller, A. Saberi, J.V. Sweedler, P.A. Newmark, Genome-wide analyses reveal a role for peptide hormones in planarian germline development, PLoS Biol. 8 (2010) e1000509. [8] L.J. Cruz, V. de Santos, G.C. Zafaralla, C.A. Ramilo, R. Zeikus, W.R. Gray, B.M. Olivera, Invertebrate vasopressin/oxytocin homologs. Characterization of peptides from Conus geographus and Conus straitus venoms, J. Biol. Chem. 262 (1987) 15821–15824. [9] L. Devi, Consensus sequence for processing of peptide precursors at monobasic sites, FEBS Lett. 280 (1991) 189–194. [10] S. Dos Santos, C. Bardet, S. Bertrand, H. Escriva, D. Habert, B. Querat, Distinct expression patterns of glycoprotein hormone-alpha2 and -beta5 in a basal chordate suggest independent developmental functions, Endocrinology 150 (2009) 3815–3822. [11] P.D. Floyd, L. Li, S.S. Rubakhin, J.V. Sweedler, C.C. Horn, I. Kupfermann, V.Y. Alexeeva, T.A. Ellis, N.C. Dembrow, K.R. Weiss, F.S. Vilim, Insulin prohormone processing, distribution, and relation to metabolism in Aplysia californica, J. Neurosci. 19 (1999) 7732–7741. [12] I.A. Forsyth, M. Wallis, Growth hormone and prolactin–molecular and functional evolution, J. Mammary Gland Biol. Neoplasia 7 (2002) 291–312. [13] Y. Fujisawa, I. Kubota, T. Ikeda, H. Minakata, Y. Muneoka, A variety of Mytilus inhibitory peptides in the ABRM of Mytilus edulis: isolation and characterization, Comp. Biochem. Physiol. C 100 (1991) 525–531. [14] Y. Furukawa, K. Nakamaru, K. Sasaki, Y. Fujisawa, H. Minakata, S. Ohta, F. Morishita, O. Matsushima, L. Li, V. Alexeeva, T.A. Ellis, N.C. Dembrow, J. Jing, J.V. Sweedler, K.R. Weiss, F.S. Vilim, PRQFVamide, a novel pentapeptide identified from the CNS and gut of Aplysia, J. Neurophysiol. 89 (2003) 3114– 3127. [15] C.J.P. Grimmelikhuijzen, I. Leviev, K. Carstensen, Peptides in the nervous systems of cnidarians: structure, function, and biosynthesis, Int. Rev. Cytol. 167 (1996) 37–89. [16] K.K. Hansen, E. Stafflinger, M. Schneider, F. Hauser, G. Cazzamali, M. Williamson, M. Kollmann, J. Schachtner, C.J.P. Grimmelikhuijzen, Discovery of a novel insect neuropeptide/GPCR signaling system closely related to the insect adipokinetic hormone and corazonin hormonal systems, J. Biol. Chem. 285 (2010) 10736–10747. [17] A. Harada, M. Yoshida, H. Minakata, K. Nomoto, Y. Muneoka, M. Kobayashi, Structure and function of the molluscan myoactive tetradecapeptides, Zool. Sci. 10 (1993) 257–265. [18] V. Hartenstein, S. Takashima, K.L. Adams, Conserved genetic pathways controlling the development of the diffuse endocrine system in vertebrates and Drosophila, Gen. Comp. Endocrinol. 166 (2010) 462–469. [19] F. Hauser, S. Neupert, M. Williamson, R. Predel, Y. Tanaka, C.J.P. Grimmelikhuijzen, Genomics and peptidomics of neuropeptides and protein

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38] [39]

hormones present in the parasitic wasp Nasonia vitripennis, J. Proteome Res. 9 (2010) 5296–5310. G.M. Holman, R.J. Nachman, G.M. Coast, Isolation, characterization and biological activity of a diuretic myokinin neuropeptide from the housefly, Musca domestica, Peptides 20 (1999) 1–10. S.J. Husson, M. Lindemans, T. Janssen, L. Schoofs, Comparison of Caenorhabditis elegans NLP peptides with arthropod neuropeptides, Trends Parasitol. 25 (2009) 171–181. J. Huybrechts, J. Bonhomme, S. Minoli, N. Prunier-Leterme, A. Dombrowsky, M. Abdel-Latief, A. Robichon, J.A. Veenstra, D. Tagu, Neuropeptide and neurohormone precursors in the pea aphid Acyrthosiphon pisum, Insect Mol. Biol. 19 (Suppl. 2) (2010) 87–95. Y. Kamatani, H. Minakata, P.T.M. Kenny, T. Iwashita, K. Watanabe, K. Funase, X.P. Sun, A. Yongsiri, K.H. Kim, P. Novales-Li, E.T. Novales, C.G. Kanapi, H. Takeuchi, K. Nomoto, Achatin-I, an endogenous neuroexcitatory tetrapeptide from Achatina fulica Férussac containing a D-amino acid residue, Biochem. Biophys. Res. Commun. 160 (1989) 1015–1020. K. Kataoka, A. Toschi, J.P. Li, R.L. Carney, D.A. Schooley, S.J. Kramer, Identification of an allatotropin from adult Manduca sexta, Science 243 (1989) 1481–1483. E.F. Kirkness, B.J. Haas, W. Sun, H.R. Braig, M.A. Perotti, J.M. Clark, S.H. Lee, H.M. Robertson, R.C. Kennedy, E. Elhaik, D. Gerlach, E.V. Kriventseva, C.G. Elsik, D. Graur, C.A. Hill, J.A. Veenstra, B. Walenz, J.M. Tubío, J.M.C. Ribeiro, J. Rozas, J.S. Johnston, J.T. Reese, A. Popadic, M. Tojo, D. Raoult, D.L. Reed, Y. Tomoyasu, E. Krause, O. Mittapalli, V.M. Margam, H.M. Li, J.M. Meyer, R.M. Johnson, J. Romero-Severson, J.P. Vanzee, D. Alvarez-Ponce, F.G. Vieira, M. Aguadé, S. Guirao-Rico, J.M. Anzola, K.S. Yoon, J.P. Strycharz, M.F. Unger, S. Christley, N.F. Lobo, M.J. Seufferheld, N. Wang, G.A. Dasch, C.J. Struchiner, G. Madey, L.I. Hannick, S. Bidwell, V. Joardar, E. Caler, R. Shao, S.C. Barker, S. Cameron, R.V. Bruggner, A. Regier, J. Johnson, L. Viswanathan, T.R. Utterback, G.C. Sutton, D. Lawson, R.M. Waterhouse, J.C. Venter, R.L. Strausberg, M.R. Berenbaum, F.H. Collins, E.M. Zdobnov, B.R. Pittendrigh, Genome sequences of the human body louse and its primary endosymbiont provide insights into the permanent parasitic lifestyle, Proc. Natl. Acad. Sci. USA 107 (2010) 12168–12173. S.J. Kramer, A. Toschi, C.A. Miller, H. Kataoka, G.B. Quistad, J.P. Li, R.L. Carney, D.A. Schooley, Identification of an allatostatin from the tobacco hornworm Manduca sexta, Proc. Natl. Acad. Sci. USA 88 (1991) 9458–9462. M.A. Larkin, G. Blackshields, N.P. Brown, R. Chenna, P.A. McGettigan, H. McWilliam, F. Valentin, I.M. Wallace, A. Wilm, R. Lopez, J.D. Thompson, T.J. Gibson, D.G. Higgins, Clustal W and Clustal X version 2.0, Bioinformatics 23 (2007) 2947–2948. K.W. Li, T. Holling, N.D. de With, W.P.M. Geraerts, Purification and characterization of a novel tetradecapeptide that modulates oesophagus motility in Lymnaea stagnalis, Biochem. Biophys. Res. Commun. 197 (1993) 1056–1061. B. Li, R. Predel, S. Neupert, F. Hauser, Y. Tanaka, C. Cazzamali, M. Williamson, Y. Arakane, P. Verleyen, L. Schoofs, J. Schachtner, C.J.P. Grimmelikhuijzen, Y. Park, Genomics, transcriptomics, and peptidomics of neuropeptides and protein hormones in the red flour beetle Tribolium castaneum, Genome Res. 18 (2008) 113–122. M.W. Lorenz, R. Kellner, K.H. Hoffmann, A family of neuropeptides that inhibit juvenile hormone biosynthesis in the cricket, Gryllus bimaculatus, J. Biol. Chem. 270 (1995) 21103–21108. C.W. Luo, E.M. Dewey, S. Sudo, J. Ewer, S.Y. Hsu, H.W. Honegger, A.J. Hsueh, Bursicon, the insect cuticle-hardening hormone, is a heterodimeric cystine knot protein that activates G protein-coupled receptor LGR2, Proc. Natl. Acad. Sci. USA 102 (2005) 2820–2825. E.R. Macagno, T. Gaasterland, L. Edsall, V. Bafna, M.B. Soares, T. Scheetz, T. Casavant, C. Da Silva, P. Wincker, A. Tasiemski, M. Salzet, Construction of a medicinal leech transcriptome database and its application to the identification of leech homologs of neural and innate immune genes, BMC Genomics 11 (2010) 407. O. Matsushima, H. Takahama, Y. Ono, T. Nagahama, F. Morishita, Y. Furukawa, E. Iwakoshi-Ukena, M. Hisada, K. Takuwa-Kuroda, H. Minakata, A novel GGNGrelated neuropeptide from the polychaete Perinereis vancaurica, Peptides 23 (2002) 1379–1390. F.M. Mendive, T. Van Loy, S. Claeysen, J. Poels, M. Williamson, F. Hauser, C.J.P. Grimmelikhuijzen, G. Vassart, J. Vanden Broeck, Drosophila molting neurohormone bursicon is a heterodimer and the natural agonist of the orphan receptor DLGR2, FEBS Lett. 579 (2005) 2171–2176. H. Minakata, T. Ikeda, Y. Muneoka, M. Kobayashi, K. Nomoto, WWamide-1, -2 and -3: novel neuromodulatory peptides isolated from ganglia of the African giant snail, Achatina fulica, FEBS Lett. 323 (1993) 104–108. H. Minakata, T. Fujita, T. Kawano, T. Nagahama, T. Oumi, K. Ukena, O. Matsushima, Y. Muneoka, K. Nomoto, The leech excitatory peptide, a member of the GGNG peptide family: isolation and comparison with the earthworm GGNG peptides, FEBS Lett. 410 (1997) 437–442. F. Morishita, Y. Nakanishi, S. Kaku, Y. Furukawa, S. Ohta, T. Hirata, M. Ohtani, Y. Fujisawa, Y. Muneoka, O. Matsushima, A novel D-amino-acid-containing peptide isolated from Aplysia heart, Biochem. Biophys. Res. Commun. 240 (1997) 354–358. D.R. Nässel, Tachykinin-related peptides in invertebrates: a review, Peptides 20 (1999) 141–158. N. Okamoto, N. Yamanaka, H. Satake, H. Saegusa, H. Kataoka, A. Mizoguchi, An edysteroid-inducible insulin-like growth factor-like peptide regulates adult development of the silkmoth Bombyx mori, FEBS J. 276 (2009) 1221–1232.

J.A. Veenstra / General and Comparative Endocrinology 171 (2011) 160–175 [40] N. Ohta, I. Kubota, T. Takao, Y. Shimonishi, Y. Yasuda-Kamatani, H. Minakata, K. Nomoto, Y. Muneoka, M. Kobayashi, Fulicin, a novel neuropeptide containing a D-amino acid residue isolated from the ganglia of Achatina fulica, Biochem. Biophys. Res. Commun. 178 (1991) 486–493. [41] T. Oumi, K. Ukena, O. Matsushima, T. Ikeda, T. Fujita, H. Minakata, K. Nomoto, Annetocin: an oxytocin-related peptide isolated from the earthworm, Eisenia foetida, Biochem. Biophys. Res. Commun. 198 (1994) 393–399. [42] T. Oumi, K. Ukena, O. Matsushima, T. Ikeda, T. Fujita, H. Minakata, K. Nomoto, The GGNG peptides: novel myoactive peptides isolated from the gut and the whole body of the earthworms, Biochem. Biophys. Res. Commun. 216 (1995) 1072–1078. [43] M. Rholam, N. Brakch, D. Germain, D.Y. Thomas, C. Fahy, H. Boussetta, G. Boileau, P. Cohen, Role of amino acid sequences flanking dibasic cleavage sites in precursor proteolytic processing. The importance of the first residue Cterminal of the cleavage site, Eur. J. Biochem. 1227 (1995) 707–714. [44] L. Roller, N. Yamanaka, K. Watanabe, I. Daubnerová, D. Zitnan, H. Kataoka, Y. Tanaka, The unique evolution of neuropeptide genes in the silkworm Bombyx mori, Insect Biochem. Mol. Biol. 38 (2008) 1147–1157. [45] K. Rutherford, J. Parkhill, J. Crook, T. Horsnell, P. Rice, M.A. Rajandream, B. Barrell, Artemis: sequence visualization and annotation, Bioinformatics 16 (2000) 944–945. [46] M. Salzet, Molecular aspect of annelid neuroendocrine system, in: H. Sataka (Ed.), Invertebrate Neuropeptides and Hormones: Basic Knowledge and Recent Advances, Transworld Reserach Network, Kerala, India, 2006, pp. 1–19. [47] M. Salzet, P. Bulet, A. Van Dorsselaer, J. Malecha, Isolation, structural characterization and biological function of a lysine–conopressin in the central nervous system of the pharyngobdellid leech Erpobdella octoculata, Eur. J. Biochem. 217 (1993) 897–903. [48] L. Schoofs, G.M. Holman, T.K. Hayes, R.J. Nachman, A. De Loof, Isolation, identification and synthesis of locustamyoinhibiting peptide (LOM-MIP), a novel biologically active neuropeptide from Locusta migratoria, Regul. Pept. 36 (1991) 111–119. [49] H.B. Shen, K.C. Chou, Signal-3L: a 3-layer approach for predicting signal peptides, Biochem. Biophys. Res. Comm. 363 (2007) 297–303.

175

[50] A.B. Smit, E. Vreugdenhil, R.H. Ebberink, W.P. Geraerts, J. Klootwijk, J. Joosse, Growth-controlling molluscan neurons produce the precursor of an insulinrelated peptide, Nature 331 (1988) 535–538. [51] A.B. Smit, S. Spijker, J. Van Minnen, J.F. Burke, F. De Winter, R. Van Elk, W.P.M. Geraerts, Expression and characterization of molluscan insulin-related peptide VII from the mollusc Lymnaea stagnalis, Neuroscience 70 (1996) 589–596. [52] S. Sudo, Y. Kuwabara, J.I. Park, S.Y. Hsu, A.J. Hsueh, Heterodimeric fly glycoprotein hormone-alpha2 (GPA2) and glycoprotein hormone-beta5 (GPB5) activate fly leucine-rich repeat-containing G protein-coupled receptor-1 (DLGR1) and stimulation of human thyrotropin receptors by chimeric fly GPA2 and human GPB5, Endocrinology 146 (2005) 3596–3604. [53] T. Takahashi, C. McDougall, J. Troscianko, W.C. Chen, A. Jayaraman-Nagarajan, S.M. Shimeld, D.E.K. Ferrier, An EST screen from the annelid Pomatoceros lamarckii reveals patterns of gene loss and gain in animals, BMC Evol. Biol. 9 (2009) 240. [54] S. Terhzaz, P. Rosay, S.F. Goodwin, J.A. Veenstra, The neuropeptide SIFamide modulates sexual behavior in Drosophila, Biochem. Biophys. Res. Commun. 352 (2007) 305–310. [55] J.A. Veenstra, Mono- and dibasic proteolytic cleavage sites in insect neuroendocrine peptide precursors, Arch. Insect Biochem. Physiol. 43 (2000) 49–63. [56] J.A. Veenstra, Allatostatin C and its paralog allatostatin double C: the arthropod somatostatins, Insect Biochem. Mol. Biol. 39 (2009) 161–170. [57] J.A. Veenstra, Neurohormones and neuropeptides encoded by the genome of Lottia gigantea, with reference to other mollusks and insects, Gen. Comp. Endocrinol. 167 (2010) 86–103. [58] J.A. Veenstra, F. Camps, Structure of the hypertrehalosemic neuropeptide of the German cockroach, Blattella germanica, Neuropeptides 15 (1990) 107–109. [59] P. Verleyen, J. Huybrechts, L. Schoofs, SIFamide illustrates the rapid evolution in Arthropod neuropeptide research, Gen. Comp. Endocrinol. 162 (2009) 27–35. [60] L. Zhang, J.A. Tello, W. Zhang, P.S. Tsai, Molecular cloning, expression pattern, and immunocytochemical localization of a gonadotropin-releasing hormonelike molecule in the gastropod mollusk, Aplysia californica, Gen. Comp. Endocrinol. 156 (2008) 201–209.