Gene 327 (2004) 117 – 129 www.elsevier.com/locate/gene
‘‘Plus-C’’ odorant-binding protein genes in two Drosophila species and the malaria mosquito Anopheles gambiae Jing-Jiang Zhou a, Wensheng Huang b, Guo-An Zhang c, John A. Pickett a, Linda M. Field a,* a
Biological Chemistry Division, Rothamsted Research, Harpenden, Herts. AL5 2JQ, UK GMOs Detection Laboratory, Animal and Plant Quarantine Institute, Beijing 100029, PR China c College of Sciences and Biotechnology, Huazhong Agricultural University, Wuhan, Hubei 430070, PR China b
Received 14 August 2003; received in revised form 19 October 2003; accepted 7 November 2003 Received by D. Finnegan
Abstract Olfaction plays a crucial role in many aspects of insect behaviour, including host selection by agricultural pests and vectors of human disease. Insect odorant-binding proteins (OBPs) are thought to function as the first step in molecular recognition and the transport of semiochemicals. The whole genome sequence of the fruit fly Drosophila melanogaster has been completed and a large number of genes have been annotated as OBPs, based on the presence of six conserved cysteine residues and a conserved spacing between the cysteines. These proteins can be divided into three distinct subgroups; those with only one six-cysteine motif, those with two such motifs and those with one motif, three extra conserved cysteines and a conserved proline immediately after the sixth cysteine. This study concentrates on the last two subgroups, referred to as ‘dimer’ OBPs and ‘Plus-C’ OBPs, respectively. We determined the tissue-specific transcript levels of all of these OBP genes of D. melanogaster using semiquantitative RT-PCR. The results showed that the expression patterns can vary within a subgroup of genes and that this technique is valuable for assessing which of the putative OBP genes are likely to be involved in Drosophila olfaction. The publicly available genomes of another fruit fly Drosophila pseudoobscura, the malaria mosquito Anopheles gambiae and the yellow fever mosquito Aedes aegypti were searched by Blast against each Plus-C OBP and dimer OBP of D. melanogaster. Related genes were found in all of the other species and the relationships of these with the D. melanogaster genes and their possible biological functions are discussed. D 2003 Elsevier B.V. All rights reserved. Keywords: Odorant-binding protein; Olfaction; Fruit fly; Drosophila; RT-PCR; Expression
1. Introduction Insect odorant-binding proteins (OBPs) are present at millimolar concentration in the olfactory sensillum lymph, where they receive and transport semiochemicals to the olfactory neurons and hence play an important role in the signal transduction of insect olfaction (Pelosi and Maida, 1995). It has been suggested that OBPs are not merely passive carriers of chemical signals but also show molecular
Abbreviations: OBP, odorant-binding protein; CSP, chemosensory protein; PBP, pheromone-binding protein; BMPBP1, pheromone-binding protein 1 of Bombyx mori; CDS, coding sequence. * Corresponding author. Tel.: +44-1582-763133; fax: +44-1582762595. E-mail address:
[email protected] (L.M. Field). 0378-1119/$ - see front matter D 2003 Elsevier B.V. All rights reserved. doi:10.1016/j.gene.2003.11.007
recognition and are necessary for the binding of odorants to their receptors (ORs) (Pelosi and Maida, 1995). This view is clearly supported by more recent studies which have identified a diversity of OBPs within a species, for example in the tobacco hornworm Manduca sexta (Robertson et al., 1999) and the fruit fly Drosophila melanogaster (Galindo and Smith, 2001; Hekmat-Scafe et al., 2002). In addition, some species, including D. melanogaster, have different OBPs associated with functionally distinct classes of olfactory sensilla (Hekmat-Scafe et al., 1997; Park et al., 2000; Shanbhag et al., 2001) and each class of sensilla responds to a specific class of odours (Stensmyr et al., 2003) again suggesting a role in signal filtering. Since the first OBPs were reported in the Lepidoptera, many papers have been published describing the proteins and associated genes in a wide range of insect species in the orders Coleoptera
118
J.-J. Zhou et al. / Gene 327 (2004) 117–129
(Nikonov et al., 2002), Hymenoptera (Briand et al., 2001), Diptera (Kim et al., 1998), Orthoptera (Picone et al., 2001), Dictyoptera (Riviere et al., 2003) and Heteroptera (Vogt et al., 1999). Although OBPs show little sequence similarity between species there is always conservation of six cysteines (Pelosi and Maida, 1995; Pikielny et al., 1994; Field et al., 2000) and indeed this has become the hallmark by which genes are ascribed an OBP function even in the absence of functional data (Vogt et al., 1999). The best studied OBP is the Bombyx mori pheromone-binding protein 1 (BMPBP1) where the precise bridging between the cysteines has been established for the native protein (Scaloni et al., 1999) and recombinant protein has been used to study the PBP/pheromone complex by electrospray ionization mass spectrometry (Oldham et al., 2000) and X-ray crystallography (Sandler et al., 2000; Lee et al., 2002). In D. melanogaster, seven genes encoding putative OBPs have been cloned and sequenced. These are the OBP LUSH (Kim et al., 1998), the olfactory-specific proteins OS-E and OS-F (McKenna et al., 1994) and the pheromone-binding proteins PBPRP1, PBPRP2, PBPRP4 and PBPRP5 (Pikielny et al., 1994). Deletion of the LUSH gene results in abnormal attraction to food sources with high concentrations of ethanol, suggesting that LUSH plays a direct role in D. melanogaster olfaction (Kim et al., 1998; Kim and Smith, 2001). With the publication of the D. melanogaster genome in 2000, it became possible for the first time to look at how many OBP genes an insect might have. Galindo and Smith (2001) used the tBLASTn algorithm, along with comparisons with the accepted characteristics of OBPs, and estimated that the D. melanogaster genome had 35 putative OBP genes. This was later extended to 38 by Graham and Davies (2002), and Hekmat-Scafe et al. (2002) annotated 51 OBP genes of which, 32 encoded proteins with the typical six-cysteine motif. The other 19 were in two distinct subgroups, which Hekmat-Scafe et al. (2002) named the ‘‘minus-C’’ subgroup (seven proteins) with some cysteine residues missing, and the ‘‘Plus-C’’ subgroup (12 proteins), all with three extra cysteines and a conserved proline. The genome organisation of 23 D. melanogaster OBPs has also been analysed showing that they group at 12 loci with three groups with different exon boundaries (Vogt et al., 2002). The tissue-specific expression of many six-cysteine OBPs has also been determined indirectly, using OBP promoterdriven expression of the LacZ reporter gene (Galindo and Smith, 2001) and RT-PCR (Koganezawa and Shimada, 2002); however, in situ hybridisation has failed to demonstrate the expression of the Plus-C OBPs (Hekmat-Scafe et al., 2002), and indeed to date there is no evidence of an olfactory function for these putative OBPs. In this study, we have used semiquantitative RT-PCR to examine the tissue distribution of transcripts of each member of the putative Plus-C OBP genes and two dimer OBPs in D. melanogaster. We have also looked at the presence of such genes in other insects where the genome sequences are publicly available.
2. Materials and methods 2.1. Insect material The D. melanogaster, strain Canton S, were a gift from Dr. John Brookfield (University of Nottingham). Antennae, heads, legs and winged bodies of mixed sexes were separated on ice under a microscope and frozen immediately in liquid nitrogen. 2.2. RT-PCR Each tissue was ground in liquid nitrogen, and total RNA extracted using the TRIZOLR reagent (Invitrogen). Messenger RNA was purified with the Dynalbeads purification kit (Dynal) and cDNA was synthesized using the SuperScriptk first-strand synthesis system (Invitrogen). Hotstart Taq DNA polymerase from Qiagen was used to amplify individual transcripts. The PCR primers were designed with Primer3 (http://www-genome.wi.mit.edu/cgi-bin/primer) and were as follows: LUSH 5V-TGACGATGGAGCAGTTCTTG-3V and 5V-TGTTTATGCGTATCCCGACA-3V; CG11218 5V-GAGGGAATCACCAAGGATCA-3V and 5VGGCGCGATTCTTGTAGTAGC-3V; CG12905 5V-GACGTACATGTTTTGCACCG-3Vand 5V-GCGAAAGACGTTTGGATAGC-3V; CG13208 5V-ACGATTGATTGCCAAAGACC-3V and 5V-CGTTGGAGAACTTAGCGGAG-3V; CG30052 5V-CGTTTGTCAATCCCAAAACC-3V and 5VGCTTTGCCTCATCCAGTTTC-3V; CG30072 5V-GACCAACACAGTTGTGGACG-3V and 5V-CAGCTGGATGATGGACAATG-3V; CG30067 5V-GCCGTTTAGATTGCGATTTC-3V and 5V-GCTGTAGAAGAGGGCGTGTC-3V; CG30074 5V-AACCATTTTGGGCTCTGATG-3V and 5VACTCTTGCTCCACACGCTTT-3V; CG30073 5V-ATAGGCACTGGGACCTCCTT-3Vand 5V-GGACAGTTGGCGGTTAGGTA-3V; CG13939 5V-GCCATGGTCAATGGAAAG AT-3V and 5V-GG G TTT CT CG TT TT TC CA CA -3V; CG13518 5V-GTACGCGACTGAAATGCTGA-3V and 5VTCAGCACATCCTTGCACTTC-3V; CG13524 5V-ATC C C G AT G G A C A C A AT G AT- 3 V a n d 5 V- A G G CGTTGATCATTTCCTTG-3V; CG11732 5V-GGAAATTCAA C T T T G C C G A A - 3 V a n d 5 V- G ATA C G C AT C C A CCAGGAAT-3V; CG17284 5V-AACGACAAGGCCATCAATTC-3V and 5V-GCTTCCTAGCGTAGTCCGTG-3V; CG31557 5V-CGATCTTCGAAA-GCTGGAAC-3V and 5VTCATCAGAGGAGTGTGGCTG-3V; CG15582 5V-TCAAGG A AT G G T C G G AT A G C - 3 V a n d 5 V- C T G T A G CAGTGTGCTCAGGC-3V. The primers of a Drosophila actin gene 79b 5V-TCGCCATCTAACCGACTACC-3V and 5V-AGTGCGGTGATTTCCTTTTG-3V were used as the internal control. The RT-PCR products for the OBP genes were limited to 200 – 300 bp so that they could be separated from the actin product (414 bp) on a 1.5% agarose gel. The relative amount of cDNA from each tissue was determined by RT-PCRs using dilutions of cDNA template and the actin primers. Then equal amounts
J.-J. Zhou et al. / Gene 327 (2004) 117–129
of cDNA were used in the RT-PCR reactions containing OBP primers and actin primers as the internal control. Three 10-fold dilutions were used and the reactions were hot started for 15 min at 95 jC, then 94 jC 30 s, 55 jC 1 min 72 jC 1 min for 30 –35 cycles and finally 72 jC for 5 min. The PCR products were loaded onto a 1.5% agarose gel. The gel was run for 2– 2.5 h at 80 V and stained with ethidium bromide (0.5 Ag/ml) for 15 min and the images were stored as JPEG files. The intensity of the PCR bands was measured using Kodak 1D image analysis software.
119
neighbour-joining method (Saitou and Nei, 1987) and presented with a cut off value of 70. The positions of introns and the total number of nucleic acids within introns, of a given gene were obtained from alignments between the coding sequences (CDS) and the genomic sequences identified in the databases. The predicted molecular weight, isoelectric point and hydrophobic amino acids were determined for each protein sequence using Vector NTI software (InforMax).
3. Results and discussion 2.3. Genome searches DNA and protein sequences of the 12 Plus-C putative OBPs of D. melanogaster were retrieved from Release 3 of FlyBase (http://www.FlyBase.bio.indiana.edu/genes/ fbgquery.hform). The latest major update of FlyBase was made on 30 May 2003 when this study was conducted. If there was no amino acid sequence available in the database, the coding regions were translated into predicted protein sequences using a translator programme (http://www.ca. expasy.org/tools/dna.html). Sequences were stored and maintained using Vector NTI (InforMax, USA). The similarities between protein sequences were estimated from multiple alignments of all OBPs (ClustalX 8.1) using the GeneDoc program (Nicholas et al., 1997). Signal peptide sequences were assigned using the SignalP V2.0 program (http://www.cbs.dtu.dk/services/SignalP-2.0) (Nielsen et al., 1997). Map locations of the OBP genes of D. melanogaster were obtained from the FlyBase GadFly Genome Annotation Database (http://www.fruitfly.org/cgi-bin/annot/fban?) by inputting the gene ID as the query term. The Plus-C OBP-related genes and dimer OBP genes of other insect species were obtained from each individual genome database by tBlastn searching against each sequence from D. melanogaster. The databases used were: Anopheles gambiae; assembly 2 of the International Anopheles Genome Project as of 1st April 2003 (http:// www.ensembl.org/Anopheles_gambiae/), Aedes aegypti; BAC end and cDNA sequence of TIGR’s genome project (http://www.tigr.org/tdb/e2k1/aabe/) and Drosophila pseudoobscura; the Human Genome Sequencing Centre (HGSC) (http://www.hgsc.bcm.tmc.edu/projects/). The genomic sequences of the most significant subject matches were retrieved from the individual genome databases and the ranges used as reference to predict the open reading frame (ORF) of each gene using the GenScan software (http://genes.mit.edu/GENSCAN.html). The protein sequences of all annotated Plus-C OBPs and dimer OBPs were aligned using Clustal X (8.1) (Thompson et al., 1997) with default gap-penalty parameters of gap opening 10.0 and extension 0.20. The phylogenetic trees were then constructed from these multiple alignments using MEGA2 software (Kumar et al., 2001). The final unrooted consensus tree was generated with 1000 bootstrap trials using the
3.1. Expression of Drosophila Plus-C OBP and dimer OBP genes Semiquantitative RT-PCR, using gene-specific primers along with primers for a control actin gene, was used to examine relative transcript levels in D. melanogaster tissues for all putative Plus-C OBP genes, the two dimer OBPs (OBP83CD and OBP83EF, see later) and two regular sixcysteine OBPs (LUSH and CG11218). The results for six of the genes are shown in Fig. 1 and a summary of the data for all genes is given in Fig. 2. One of the six-cysteine genes, LUSH, shows expression in antennae with lower levels in the head (the latter quite possibly results from contamination of head tissue with the base of the antennae) and no expression in leg or body tissue. This result agrees with previous reports (Galindo and Smith, 2001) of LUSH being olfactory specific and is consistent with the genetic evidence for its involvement in olfactory-mediated behaviour towards ethanol (Kim et al., 1998). The other six-cysteine putative OBP gene (CG11218 or OBP56D) is expressed in all tissues, again consistent with previous studies (HekmatScafe et al., 2002; Galindo and Smith, 2001; Koganezawa and Shimada, 2002). One of the Plus-C OBP genes (CG13524) shows expression only in the antennae, and four (CG13518, CG13208, CG11732 and CG30074) are expressed in both antennae and head, but not legs or bodies, suggesting that they have an olfactory role (see comment above). The remaining nine putative Plus-C OBP genes are expressed in olfactory tissues and also in either legs or bodies or both, suggesting that they are not specifically involved in olfaction. It is interesting that one of the Plus-C OBP genes (CG13518) which is expressed only in antennae and head has two transcripts in the head (Fig. 1). It is unlikely that the bigger transcript results from contamination by genomic DNA because the same cDNAs were used in all of the other PCR reactions where genomic DNA contamination was not detected (Fig. 1). When we used different cDNA preparations, the two transcripts were always detected and a second transcript of CG13524 could also sometimes be detected in the head. The smaller band is of the expected size, and has the same sequence, as those of the annotated genes in FlyBase (data not shown). Direct
120
J.-J. Zhou et al. / Gene 327 (2004) 117–129
Fig. 1. RT-PCR products from the putative Plus-C OBP transcripts of D. melanogaster. Arrow indicates the actin PCR product as the control. Three reactions using different amounts of cDNA for each tissue type were done (see Materials and methods for details).
sequencing of the bigger band shows that it has an extra 64 bp, predicted as an intron in the FlyBase annotation. It is possible that the bigger PCR product is from an
alternatively spliced mRNA, or that there is an annotation error, or perhaps the splicing process in the head and the antennae.
J.-J. Zhou et al. / Gene 327 (2004) 117–129
Net Intensity (%)
B: head
200.0 160.0 120.0 80.0 40.0 0.0
200.0 160.0 120.0 80.0 40.0 0.0 L CG USH 1 OB 1218 P8 OB 3CD P8 CG 3EF 1 CG 2905 13 CG 208 3 CG 0052 3 CG 0072 3 CG 0067 30 CG 074 3 CG 0073 1 CG 3939 1 CG 3518 13 CG 524 1 CG 1732 172 84
L CG USH 1 OB 1218 P8 OB 3CD P8 CG 3EF 1 CG 2905 13 CG 208 3 CG 0052 3 CG 0072 3 CG 0067 30 CG 074 3 CG 0073 1 CG 3939 1 CG 3518 13 CG 524 117 CG 32 172 84
Net Intensity (%)
A: antennae
80.0 40.0 0.0
200.0 160.0 120.0 80.0 40.0 0.0 L CG USH 1 OB 1218 P8 OB 3CD P8 CG 3EF 1 CG 2905 13 CG 208 3 CG 0052 3 CG 0072 3 CG 0067 30 CG 074 3 CG 0073 1 CG 3939 135 CG 18 13 CG 524 117 CG 32 172 84
120.0
D: body Net Intensity (%)
160.0
L CG USH 1 OB 1218 P8 OB 3CD P8 CG 3EF 1 CG 2905 13 CG 208 3 CG 0052 3 CG 0072 3 CG 0067 30 CG 074 3 CG 0073 1 CG 3939 1 CG 3518 13 CG 524 1 CG 1732 172 84
Net Intensity (%)
C: leg 200.0
121
Fig. 2. Relative intensity of RT-PCR products of D. melanogaster putative Plus-C OBPs and dimer OBPs in different tissues, determined relative to the intensity of the actin RT-PCR band (as shown in Fig. 1).
Graham and Davies (2002) reported that OBP83C and OBP83D as well as OBP83E and OBP83F form single genes with two six-cysteine OBP motifs joined together. Their annotations have been updated in FlyBase as two OBP genes, CG15582 and CG31557. RT-PCR of these two OBP genes with primers that cross the boundary of the two six-cysteine motifs produced only a single band (data not shown), confirming that the genes encode single transcripts. This was also supported by direct sequencing of the PCR products (data not shown) and we name these two dimer OBPs as OBP83CD and OBP83EF, respectively, to reflect their hybrid nature. Galindo and Smith (2001) who treated them as four separate genes and studied their expression pattern by indirectly examining LacZ gene expression driven by the promoter region of each OBP gene found that OBP83C was expressed only in the labellum of the gustatory organ and failed to detect any meaningful expression of OBP83D or OBP83F. Galindo and Smith (2001) were unable to examine the expression of OBP83E because no appropriate initiation methionine could be identified. Using our RT-PCR technique, we now demonstrate that the transcripts of these two genes (OBP83CD and OBP83EF) are present in all tissues (Fig. 1) with OBP83CD being expressed at much higher levels (Fig. 2). These two OBP genes seem to be the result of a gene duplication that duplicated the coding regions but not the regulatory region (Table 1, Fig. 4 and more discussion in Section 3.4).
Clearly, this semiquantitative RT-PCR is a sensitive and effective way of establishing the relative expression of putative OBP genes, and hence identifying those likely to be involved in olfaction. This study has shown that the two dimer OBP genes are expressed in all tissue types, and 5 of 12 Plus-C putative OBPs of D. melanogaster are antennal specific. We have selected these for further studies. 3.2. Plus-C OBP and dimer OBP genes from other insects Searching the genome database of D. pseudoobscura revealed 11 gene sequences with similarity to the D. melanogaster Plus-C OBP genes and two sequences similar to the dimer OBP genes. There is a good correspondence between the genes in the two Drosophila species (Figs. 3– 5), suggesting that the role of these proteins is conserved in Drosophilids. The conservation of Plus-C and dimer OBP genes in two Drosophila species also suggests that they have important functions in Drosophila. The genes of D. pseudoobscura are named according to the D. melanogaster counterparts that were used in the tBlastn search, using ‘Dpse’ instead of ‘CG’ as the ‘identifier’ (Table 1 and Figs. 3 –5). The Plus-C OBP-related genes of other insect species are given unique names based on the naming systems used by Biessmann et al. (2002) and Vogt et al. (2002), but their closest sequences in GenBank are given in the second column of Table 1.
122
J.-J. Zhou et al. / Gene 327 (2004) 117–129
Table 1 Genomic organisation of putative Plus-C odorant binding proteins
(A)
(B)
C:
D:
OBP name
Gene IDa
gDNA IDb
Intron (bp)
SignalP
Boundariesc
OBP83CD OBP83EF OBP46A OBP47B OBP49A OBP50A OBP50B OBP50C OBP50D OBP50E OBP58B OBP58C OBP85A OBP93A Dpse83CD Dpse83EF Dpse12905 Dpse13208 Dpse30052 Dpse13939 Dpse30074 Dpse30073 Dpse30067 Dpse13518 Dpse13524 Dpse11732 Dpse17284 AgamOBPjj1 AgamOBPjj2 AgamOBP3788 AgamOBPjj4 AgamOBPjj5C AgamOBPjj6B AgamOBPjj5A AgamOBP5479 AgamOBP5470 AgamOBP114426 AgamOBPjj7A AgamOBPjj83B AgamOBPjj83C AgamOBPjj83A AgamOBPjj83D AaegOBP83A AaegOBP83B
CG15582 CG31557 CG12905 CG13208 CG30052 CG30067 CG30073 CG30072 CG30074 CG13939 CG13518 CG13524 CG11732 CG17284 n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. EAA04131 EAA45425 AAM97673 n.a. EAA04713 EAA44157 EAA44155 EAA11608 EAA44157 EAA12996 EAA43809 AF393485 EAA45009 EAA45532 EAA43372 n.a. n.a.
FBan0015582 FBan0031557 FBan0012905 FBan0013208 FBan0030052 FBan0030067 FBan0030073 FBan0030072 FBan0030074 FBan0013939 FBan0013518 FBan0013524 FBan0011732 FBan0017284 Contig5467 Contig5476 Contig1082 Contig5936 Contig5538 Contig5732 Contig5732 Contig5732 Contig5732 Contig1310 Contig1310 Contig1933 Contig5660 AAAB01008807 AAAB01008807 AAAB01008807 AAAB01008807 AAAB01008807 AAAB01008960 AAAB01008960 AAAB01008960 AAAB01008960 AAAB01008966 AAAB01008966 AAAB01008849 AAAB01008986 AAAB01008799 AAAB01008984 AABTK76TV AABHK06TV
2 (702 bp) 1 (489 bp) 3 (256 bp) 2 (130 bp) 2 (118 bp) 2 (117 bp) 1 (55 bp) 2 (130 bp) 2 (121 bp) 3 (172 bp) 2 (243 bp) 2 (116 bp) 1 (4340 bP) 3 (180 bp) 6 (6303 bp) 1 (427 bp) 3 (402 bp) 2 (98 bp) 2 (160 bp) 2 (124 bp) 3 (870 bp) 1 (72 bp) 1 (67 bp) 2 (110 bp) 2 (113 bp) 2 (10026 bp) 6 (5964 bp) 1 (70 bp) 2 (134 bp) 2 (139 bp) 3 (214 bp) 3 (224 bp) 4 (833 bp) 4 (1204 bp) 2 (959 bp) 2 (167 bp) 1 (78 bp) 2 (141 bp) 1 (125 bp) 1 (1834 bp) 0 0 n.a.e n.a.e
1 – 22 1 – 20 1 – 19 1 – 22 1 – 22 1 – 21 1 – 19 1 – 18 1 – 17 1 – 20 1 – 21 1 – 18 no 1 – 19 no 1 – 22 1 – 22 1 – 24 1 – 22 no 1 – 23 1 – 20 no 1 – 21 1 – 18 n.a.d no 1 – 20 no 1 – 28 1 – 22 1 – 17 1 – 25 1 – 24 1 – 24 1 – 23 1 – 19 1 – 16 1 – 21 1 – 21 1 – 24 1 – 22 n.a.f n.a.f
3R: 1933854..1935455 [+] 3R: 1935499..1936728 [ ] 2R: 5371805..5372656 [ ] 2R: 6366979..6367705 [ ] 2R: 7754772..7755644 [ ] 2R: 9434090..9434750 [ ] 2R: 9434975..9435745 [+] 2R: 9436024..9436700 [+] 2R: 9436850..9437485 [+] 2R: 9438298..9439057 [ ] 2R: 17748260..17749118 [+] 2R: 17749495..17750210 [ ] 3R: 4245559..4250522 [ ] 3R: 16854689..17055456 [ – ] 7253..14883 [ ] 5559..7169 [+] 105760..106761 [+] 76..771 [ ] 2125..2924 [+] 27527..28104 [+] 28942..30456 [ ] 30860..31590 [ ] 32007..32495 [+] 130889..131611 [+] 132024..132734 [ ] 11540..22018 [ ]a 80467..87301 [+] 2L: 4349239..4349907 [+] 2L: 4352303..4353123 [ ] 2L: 4354543..4355282 [+] 2L: 4364051..4364876 [+] 2L: 11440495..11441182 [+] 2L: 12158793..12160509 [+] 2L: 12184985..12187003 [+] 2L: 12185609..12187003 [+] 2L: 12188898..12189498 [+] 3L: 3706060..3706751 [ ] 3L: 3708598..3709444 [+] 3L: 1602396..1602954 [+] 3L: 11208926..11211644 [+] 2R: 1673383..1674240 [ ] 3R: 6522111..6522509 [+] 1..663 [+] 1..608 [+]
a
n.a. indicates the gene is not annotated as OBP in GenBank. No chromosome information available on the genome website. c The databases used are (A) Drosophila melanogaster Genome Assembly, BDGP Release 3 July 2002; (B) HGSC Drosophila pseudoobscura, Feb 2003; (C) Mosquito Assembly 2, 1st April 2003; and (D) TIGR’s Genome Projects, Aedes aegypti BAC End and cDNA Sequencing. d n.a. indicates the contig sequences are too short. e They are cDNA sequences. f There is no appropriate initiation methionine to predict signal peptide. b
For A. gambiae, the search done on the complete genome (using the Drosophila Plus-C OBPs) identified 11 Plus-C OBP-related gene sequences and four regular six-cysteine OBP-related gene sequences (Table 1 and Figs. 3 –5). No gene sequences like the Drosophila dimer OBPs were found (Fig. 4B, see also the Section 3.3). One gene sequence (AgamOBPjj4) has not been identified as an OBP before. Two OBPs, AgamOBP3788 and AgamOBP83B, have been identified previously and submitted to GenBank with ac-
cession numbers AAM97673 and AF393485, respectively (Robertson et al., 1999; Justice et al., 2002 (direct submissions)). Most of the genes have also been annotated by the mosquito genome project, but only the biggest open reading frame of each gene is deposited in GenBank (Holt, personal communication), and nearly all of them have no appropriate initiation methionine (Holt et al., 2003). We used gene identification programmes such as GenScan to define the coding region of each putative Plus-C OBP and dimer OBP
J.-J. Zhou et al. / Gene 327 (2004) 117–129
(Table 1). Two genes similar to the Drosophila OBPs were also identified in the A. aegypti BAC end and cDNA database but this covers only 6.4% of this mosquito’s genome. They are also identified to be regular six-cysteine OBPs, and their sequences together with four six-cysteine OBPs of A. gambiae and two previously annotated Drosophila six-cysteine OBPs are presented in Fig. 4A. Overall, there is little sequence similarity within the PlusC OPB family and the average similarity of Plus-C OBPs at amino acid level is 20% (data not shown). This diversity and the large numbers of Plus-C OBPs in Drosophila and Anopheles species support the idea that different OBPs interact with different chemical ligands. 3.3. The motifs of Plus-C OBPs and dimer OBPs All putative Plus-C OBPs from D. melanogaster and their counterparts from D. pseudoobscura, A. gambiae and A. aegypti were aligned, and the complete alignment is deposited in GenBank with an accession number of ALIGN_000581. For simplification, the alignment of the most conserved region is presented in Fig. 3, which highlights the conserved cysteines. The D. melanogaster Plus-C OBPs were originally identified by Hekmat-Scafe et al. (2002) as having an extra three cysteines (C6a, C6b and C6-cysteine) downstream of C6 and a conserved proline next to C6. However, they are better described as having an extra conserved cysteine (C4a) between C4 and C5 and two additional cysteines (C6a and C6b) downstream of C6 as indicated in Fig. 3. This maintains the eight-residue spacing between C5 and C6 in six-cysteine OBPs, and puts the highly conserved proline next to C6. It also results in a conserved spacing of nine amino acid residues between C4a and C5 (see Fig. 3). As shown in Fig. 3, only the cysteine residues C2, C3, C4a, C5, C6, C6a and a proline are conserved in all Plus-C OBPs. Unlike typical six-cysteine OBPs, most of the Plus-C OBPs (26 out 36) also have two conserved cysteine residues in the region before C1, designated as C1a and C1b (Fig. 3). Thus, the Plus-C OBPs have the motif of C1-X20 – 41-C2-X3-C3-X41 – 46-C4-X19 – 29-C4aX9-C5-X8-C6-P-X9 – 10-C6a-X9 – 10. Comparison of this with the motif of six-cysteine OBPs (C1-X20 – 66-C2-X3-C3-X21 – 43-C4-X8 – 14-C5-X8-C6) shows that the number of amino acids between C4 and C5 increases from (8– 14) to (29 – 39) with an extreme case of 125 for AgamOBPjj5a (Fig. 3). These extra amino acids are predicted to form a large loop between C4 and C5 projecting out from the surface of the compact OBP molecule (data not shown), a structure, not predicted for the regular six-cysteine OBPs (Sandler et al., 2000; Lee et al., 2002). It is not known whether any of the additional cysteines are involved in the formation of intermolecular dimers, or whether there is a free cysteine. Although AgamOBP83A-D genes and AaegOBP83A-B genes were identified by the tBlastn search against Drosophila dimer OBP genes OBP83CD and OBP83EF, they have only the normal six-cysteine motif (Fig. 4A, also see
123
the alignment deposited in GenBank with an accession no. ALIGN_000581). AgamOBPjj83B is in fact annotated as Anopheles OBP-1 by Robertson et al., 1999 AF393485, direct submission to GenBank). 3.4. Genomic organisation and the evolutionary relationships of Plus-C OBPs and dimer OBPs The presence of introns and signal peptides, and the chromosomal location of all genes encoding putative Plus-C OBPs and dimer OBPs are shown in Table 1. The alignment of all of these genes ALGN_000518 as deposited in GenBank) was used to construct a phylogenetic tree showing the relationship of the genes within and between species (Fig. 5). These data taken together can be used to determine possible evolutionary relationships between the Plus-C OBPs and the dimer OBPs. Most of the Drosophila Plus-C and dimer OBP genes have 1 –3 introns, as do most of the A. gambiae genes, although two Plus-C OBPs (AgamOBP5A and AgamOBPjj6B) have four introns and two regular six-cysteine OBPs (AgamOBPjj83A and AgamOBPjj83D) have no intron (Table 1). However, the total intron length is very variable. Most of the D. melanogaster Plus-C OBPs have a signal peptide of between 17 and 20 amino acids although one (CG11732) has no signal peptide predicted so far (Robertson et al., 1999, FlyBase 03 Aug. 2003) (Table 1). However, unlike their D. melanogaster counterparts four Plus-C OBPs of D. pseudoobscura (Dpse83CD, Dpse13939, Dpse30067 and Dpse17284) and one A. gambiae OBP (AgamOBPjj2) have no signal peptide in our study. It is possible that the 5V exons of these genes were not detected by the methods we used. Searching of the protein fold library (http://www. sbg.bio.ic.ac.uk/servers/3dpssm/) shows that two of these OBPs Dpse17284, Dpse83CD have similar folding (with more than 50% prediction certainty) to the B. mori pheromone-binding protein (1DQE). However, the lack of a signal peptide suggests that they could not be secreted into the sensillum lymph of the olfactory system, suggesting that these OBPs may be involved in other biological processes rather than olfaction (Vogt et al., 2002). The gene expression of D. melanogaster Plus-C OBPs in non-olfactory tissues supports such a role (Fig. 2), and could also be true in D. pseudoobscura and A. gambiae. Ten out of 12 D. melanogaster Plus-C OBP genes are located on chromosome 2R and the remaining two Plus-C OBP genes, and two dimer OBPs genes are on chromosome 3R. Nine of 11 A. gambiae Plus-C OBP genes are located on chromosome 2L and two on chromosome 3L (Table 1). Several gene clusters are clearly defined by the gene boundaries for the Drosophila genes with the biggest cluster being five D. melanogaster genes (CG30067, CG30072, CG30074, CG30073 and CG13939) between 2R:9434090 and 2R:9439057. The equivalent cluster in D. pseudoobscura contains four genes (Dpse30067, Dpse30074,
124
J.-J. Zhou et al. / Gene 327 (2004) 117–129
Dpse30073 and Dpse13939) between contig5732:27527 and contig5732:32495. Interestingly, the size of the chromosomal region that contains these clustered genes in both Drosophila species is of the same length, 4967 bp for D. melanogaster genes and 4968 bp for D. pseudoobscura. In this cluster, CG30072 and CG30074 of D. melanogaster are only 150 bp apart on chromosome 2R, while in D. pseudoobscura only one homolog Dpse30074 was found. The cluster that shares the highest identity at the amino acid level
of all Plus-C OBPs contains CG13158 and CG13524, and Dpse13518 and Dpse13524 with identities of 49% for CG13158 and CG13524; 47% for Dpse13518 and Dpse13524; more than 70% between the genes of the two Drosophila species. These genes may represent heterofunctional homologs of OBPs which have alternative splicing sites and their gene products from different transcripts may have different biological functions. A similar pair of Anopheles OBP genes AgamOBPjj5A and AgamOBP5479 are
Fig. 3. Alignment of predicted peptide sequences of putative Plus-C OBPs. The full alignment of all Plus-C OBPs studied in the current work can be found in GenBank with the access number ALIGN_000581. The conserved cysteine residues used to construct the motif are labelled above the alignment.
J.-J. Zhou et al. / Gene 327 (2004) 117–129
125
Fig. 3 (continued).
found in A. gambiae so that both have same 5Vsequence and a 24-amino acid signal peptide, but the coding sequence of AgamOBPjj5A has a long insertion between the cysteine C4 and C4a (Fig. 3). This insertion was predicted as part of the coding region of A. gambiae OBP AgamOBP52 in GeneBank (Xu and Smith, 2003, direct submission). Unlike the clustering of Drosophila Plus-C OBP genes, there are no clearly defined gene clusters for the Anopheles Plus-C OBP genes (Table 1 and Fig. 5). In fact, the smallest distance between two Anopheles OBP genes is 1847 bp between
AgamOBP114426 and AgamOBPjj7A (Table 1). Each gene cluster is specific to a given insect species and may have arisen independently in the fruit flies and the mosquitoes. However, gene duplications are apparent in two Drosophila species and A. gambiae as shown by the phylogenetic analysis in Fig. 5. Fig. 5 suggests that the Drosophila dimer OBP genes with two six-cysteine motifs (OBP83CD, OBP83EF, Dpse83CD and Dpse83EF) arose from the same ancestral genes. It seems likely that two six-cysteine OBPs formed a
126
J.-J. Zhou et al. / Gene 327 (2004) 117–129
Fig. 4. Alignment of predicted amino acid sequences of (A) putative OBP from A. gambiae (Agam) and A. aegypti (Aaeg) and (B) dimer OBPs of D. melanogaster (CG) and D. psedoobsura (Dpse). Two previously annotated D. melanogaster OBPs (LUSH and CG11218) are also included. The bars above the alignment in A indicate the regions that form the alpha-helix in the crystal structure of LUSH (Kruse et al., 2003).
J.-J. Zhou et al. / Gene 327 (2004) 117–129
127
Fig. 5. Unrooted neighbor-joining tree from the full alignment (GenBank: ALIGN_000581) of all putative Plus-C OBPs and dimer OBPs. The dimer OBP group was used as the out-group. The unrooted consensus tree was generated with 1000 bootstrap trials using the neighbour-joining method (Saitou and Nei, 1987) and presented with a cutoff value of 70.
128
J.-J. Zhou et al. / Gene 327 (2004) 117–129
dimer OBP by gene duplication. However, such duplication was imperfect with one of them losing its regulatory elements (Fig. 4B). The data in Table 1 shows that the dimer OBP genes of D. melanogaster are located on chromosome 3R in a head-to-head orientation with only 44 bp in between. Therefore, expression must be driven by different promoters; one upstream of the genes and the other downstream. The activation of these promoters appears to vary under different conditions as shown in Fig. 2 where OBP83CD is more highly expressed than is OBP83EF, particularly in the antennae. However, similar expression profiles were observed for two olfactory specific Plus-C OBP genes, CG13518 and CG13524 (Fig. 2,) which are located on the same chromosome in a head-to-head orientation separated by 377 base pairs and are phylogenetically clustered together with their homologs Dpse13518 and Dpse13524 of D. pseudoobscura (Table 1 and Fig. 5). The phylogenetic analysis separates CG30073/CG13939 and Dpse30073/Dpse13939 from the cluster of CG30067 (Dpse30067), CG30072, CG30074 (Dpse30074) (Fig. 5) although they are closely located on the same chromosome (Table 1). CG30067 (Dpse30067), CG30072 and CG30074 (Dpse30074) could have evolved from the same ancestral gene, whilst CG30073 (Dpse30073) and CG13939 (Dpse13939) could have evolved before the formation of these phylogenetically clustered genes. The expression of these Plus-C OBP genes is variable and they are separated from each other by at least 220 base pairs suggesting that they may be driven by their own regulatory elements. The CG13939 transcript was highly expressed in antennae, whilst CG30073 was highly expressed in body. The transcript levels of the phylogenetic cluster were very low, apart from CG30072 which was highly expressed in antennae, head and body (Fig. 2). Our data suggest that the evolution of Plus-C OBPs could have involved a common ancestor early in the insect lineage and diverged independently and rapidly in each insect species. Our data also strongly suggest that the Plus-C OBP genes arose by a gene duplication mechanism. However, the low sequence identity among the members suggests that there is a rapid evolution of these sequences following duplication.
4. Conclusions The challenge of annotating eukaryotic genomes has been comprehensively set out by Lewis et al. (2000). The use of algorithms and sequence similarity searches to identify putative gene functions can clearly play a role as applied by the present and previous studies for OBPs (Galindo and Smith, 2001; Graham and Davies, 2002) and olfactory receptors (ORs) (Kim and Carlson, 2002; Hill et al., 2002). Of course, the genes identified in this way are only ‘‘putative’’ OBPs and ORs. The value of such in silico data is that they identify genes which can
then be cloned and expressed so that their properties can be elucidated. The semiquantitative RT-PCR reported here is a sensitive and effective way of establishing the relative expression of putative OBP genes and hence identifying those likely to be directly involved in olfaction, and those that may be involved in other biological processes. The challenge now is to identify the ligands for the OBPs and determine the role they play in the chemical ecology of the insects.
Acknowledgements We thank the Drosophila genome project consortium, especially Prof. Michael Ashburner for making this work possible. WH and GZ were sponsored by the Chinese Government for overseas study. Rothamsted Research receives grant-aided support from the Biotechnology and Biological Sciences Research Council of the UK.
References Biessmann, H., Walter, M.F., Dimitratos, S., Woods, D., 2002. Isolation of cDNA clones encoding putative odourant binding proteins from the antennae of the malaria-transmitting mosquito, Anopheles gambiae. Insect Mol. Biol. 11, 123 – 132. Briand, L., Nespoulous, C., Huet, J.C., Takahashi, M., Pernollet, J.C., 2001. Ligand binding and physico-chemical properties of ASP2, a recombinant odorant-binding protein from honeybee (Apis melifera L.). Eur. J. Biochem. 268 (3), 753 – 760. Field, L.M., Pickett, J.A., Wadhams, L.J., 2000. Molecular studies in insect olfaction. Insect Mol. Biol. 9, 545 – 551. Galindo, K., Smith, D.P., 2001. A large family of divergent Drosophila odorant-binding proteins expressed in gustatory and olfactory sensilla. Genetics 159 (3), 1059 – 1072. Graham, L.A., Davies, P.L., 2002. The odorant-binding proteins of Drosophila melanogaster: annotation and characterization of a divergent gene family. Gene 292 (1 – 2), 43 – 55. Hekmat-Scafe, D.S., Steinbrecht, R.A., Carlson, J.R., 1997. Coexpression of two odorant-binding protein homologs in Drosophila: implications for olfactory coding. J. Neurosci. 17, 1616 – 1624. Hekmat-Scafe, D.S., Scafe, C.R., McKinney, A.J., Tanouye, M.A., 2002. Genome-wide analysis of the odorant-binding protein gene family in Drosophila melanogaster. Genome Res. 12 (9), 1357 – 1369. Hill, C.A., Fox, A.N., Pitts, R.J., Kent, L.B., Tan, P.L., Chrystal, M.A., Cravchik, A., Collins, F.H., Robertson, H.M., Zwiebel, L.J., 2002. G protein-coupled receptors in Anopheles gambiae. Science 298 (5591), 176 – 178. Holt, R.A., et al., 2003. The genome sequence of the malaria mosquito Anopheles gambiae. Science 298 (5591), 129 – 141. Kim, J., Carlson, J.R., 2002. Gene discovery by e-genetics: Drosophila odor and taste receptors. J. Cell. Sci. 115 (Pt. 6), 1107 – 1112. Kim, M.S., Smith, D.P., 2001. The invertebrate odorant-binding protein LUSH is required for normal olfactory behavior in Drosophila. Chem. Senses 26 (2), 195 – 199. Kim, M.S., Repp, A., Smith, D.P., 1998. LUSH odorant-binding protein mediates chemosensory responses to alcohols in Drosophila melanogaster. Genetics 150 (2), 711 – 721. Koganezawa, M., Shimada, I., 2002. Novel odorant-binding proteins expressed in the taste tissue of the fly. Chem. Senses 27 (4), 319 – 332. Kruse, S.W., Zhao, R., Smith, D.P., Jones, D.N., 2003. Structure of a
J.-J. Zhou et al. / Gene 327 (2004) 117–129 specific alcohol-binding site defined by the odorant binding protein LUSH from Drosophila melanogaster. Nat. Struct. Biol. 10 (9), 694 – 700 (Sep.). Kumar, S., Tamura, K., Jakobsen, I.B., Nei, M., 2001. MEGA2: molecular evolutionary genetics analysis software. Bioinformatics 17 (12), 1244 – 1245. Lee, D., Damberger, F.F., Peng, G., Horst, R., Guntert, P., Nikonova, L., Leal, W.S., Wuthrich, K., 2002. NMR structure of the unliganded Bombyx mori pheromone-binding protein at physiological pH. FEBS Lett. 531 (2), 314 – 318 (Nov. 6). Lewis, S., Ashburner, M., Reese, M.G., 2000. Annotating eukaryote genomes. Curr. Opin. Struck. Biol. 10 (3), 349 – 354. McKenna, M.P., Hekmat-Scafe, D.S., Gaines, P., Carlson, J.R., 1994. Putative Drosophila pheromone-binding proteins expressed in a subregion of the olfactory system. J. Biol. Chem. 269 (23), 16340 – 16347. Nicholas, K.B., Nicholas Jr., H.B., Deerfield II, D.W., 1997. GeneDoc: analysis and visualization of genetic variation. Embnet.news 4, 14. Nielsen, H., Engelbrecht, J., Brunak, S., von Heijne, G., 1997. Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng. 10, 1 – 6. Nikonov, A.A., Peng, G., Tsurupa, G., Leal, W.S., 2002. Unisex pheromone detectors and pheromone-binding proteins in scarab beetles. Chem. Senses 27 (6), 495 – 504. Oldham, N.J., Krieger, J., Breer, H., Fischedick, A., Hoskovec, M., Svatosˇ, A., 2000. Analysis of the silkworm moth pheromone binding protein – pheromone complex by electrospray ionization-mass spectrometry. Angew. Chem., Int. Ed. Engl. 39, 4341 – 4343. Park, S.K., Shanbhag, S.R., Wang, Q., Hasan, G., Steinbrecht, R.A., Pikielny, C.W., 2000. Expression patterns of two putative odorant-binding proteins in the olfactory organs of Drosophila melanogaster have different implications for their functions. Cell Tissue Res. 300 (1), 181 – 192. Pelosi, P., Maida, R., 1995. Odorant-binding proteins in insects. Comp. Biochem. Physiol. 111, 503 – 514. Picone, D., Crescenzi, O., Angeli, S., Marchese, S., Brandazza, A., Ferrara, L., Pelosi, P., Scaloni, A., 2001. Bacterial expression and conformational analysis of a chemosensory protein from Schistocerca gregaria. Eur. J. Biochem. 268 (17), 4794 – 4801. Pikielny, C.W., Hasan, G., Rouyer, F., Rosbash, M., 1994. Members of a
129
family of Drosophila putative odorant-binding proteins are expressed in different subsets of olfactory hairs. Neuron 12 (1), 35 – 49. Riviere, S., Lartigue, A., Quennedey, B., Campanacci, V., Farine, J.P., Tegoni, M., Cambillau, C., Brossut, R., 2003. A pheromone-binding protein from the cockroach Leucophaea maderae: cloning, expression and pheromone binding. Biochem. J. 371 (2), 573 – 579. Robertson, H.M., Martos, R., Sears, C.R., Todres, E.Z., Walden, K.K., Nardi, J.B., 1999. Diversity of odourant binding proteins revealed by an expressed sequence tag project on male Manduca sexta moth antennae. Insect Mol. Biol. 8 (4), 501 – 518. Saitou, N., Nei, M., 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4 (4), 406 – 425. Sandler, B.H., Nikonova, L., Leal, W.S., Clardy, J., 2000. Sexual attraction in the silkworm moth: structure of the pheromone-binding-proteinbombykol complex. Chem. Biol. 7 (2), 143 – 151. Scaloni, A., Monti, M., Angeli, S., Pelosi, P., 1999. Structural analysis and disulfide-bridge pairing of two odorant-binding proteins from Bombyx mori. Biochem. Biophys. Res. Commun. 266 (2), 386 – 391. Shanbhag, S.R., Hekmat-Scafe, D., Kim, M.S., Park, S.K., Carlson, J.R., Pikielny, C., Smith, D.P., Steinbrecht, R.A., 2001. Expression mosaic of odorant-binding proteins in Drosophila olfactory organs. Microsc. Res. Tech. 55 (5), 297 – 306. Stensmyr, M.C., Giordano, E., Balloi, A., Angioy, A.M., Hansson, B.S., 2003. Novel natural ligands for Drosophila olfactory receptor neurones. J. Exp. Biol. 206 (Pt. 4), 715 – 724. Thompson, J.D., Gilbson, T.J., Plewniak, F., Jeanmougin, F., Higgins, D.G., 1997. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25, 4876 – 4882. Vogt, R.G., Callahan, F.E., Rogers, M.E., Dickens, J.C., 1999. Odorant binding protein diversity and distribution among the insect orders, as indicated by LAP, an OBP-related protein of the true bug Lygus lineolaris (Hemiptera, Heteroptera). Chem. Senses 24, 481 – 495. Vogt, R.G., Rogers, M.E., Franco, M.D., Sun, M., 2002. A comparative study of odorant binding protein genes: differential expression of the PBP1-GOBP2 gene cluster in Manduca sexta (Lepidoptera) and the organization of OBP genes in Drosophila melanogaster (Diptera). J. Exp. Biol. 205 (Pt. 6), 719 – 744.