Gene 270 (2001) 41±52
www.elsevier.com/locate/gene
The chorion genes of the med¯y. II. DNA sequence evolution of the autosomal chorion genes s18, s15, s19 and s16 in Diptera Dina Vlachou, Katia Komitopoulou* Department of Genetics and Biotechnology, School of Biological Sciences, University of Athens, Panepistimiopolis, Athens 15701, Greece Received 23 November 2000; received in revised form 16 March 2001; accepted 12 April 2001 Received by D. Finnegan
Abstract We present a total of approximately 15 kb of DNA sequences, encompassing four chorion genes Ccs18, Ccs15, Ccs19, Cc16 and their ¯anking DNA in the med¯y C. capitata. Comparison of coding regions, introns and intergenic sequences in ®ve Dipteran species, D. melanogaster, D. subobscura, D. virilis, D. grimshawi and C. capitata documented an extensive divergence in introns and coding regions, but few well conserved elements in the proximal 5 0 ¯anking regions in all species. These elements are related to conserved regulatory features of three of the genes, including tissue- and temporal regulation. In the fourth, gene s15, signi®cant alterations in the 5 0 ¯anking region may be responsible for its changed temporal regulation in C. capitata. One long intergenic sequence, located in the distal 5 0 ¯anking region of gene s18, is homologous to ACE3, a major ampli®cation control element and contains an 80-bp A/T-rich sequence, known to stimulate strong binding of the origin recognition complex (ORC) in D. melanogaster. Analysis of the nucleotide composition of all chorion genes in C. capitata and D. melanogaster showed that C. capitata exhibit less biased representation of synonymous codons than does D. melanogaster. q 2001 Published by Elsevier Science B.V. All rights reserved. Keywords: Conserved regulatory elements; Divergence; Intergenic sequences; Gene ampli®cation; Codon bias
1. Introduction We are interested in the organization, developmental regulation and evolution of a clustered chorion gene family encoding chorion proteins in Diptera. In Drosophilidae the major genes of this family are organized in two clusters: the X-linked cluster consisting of two genes, s36 and s38, and the autosomal cluster consisting of four genes, s16, s19, s15 and s18. The regulatory properties of the two clusters have remained remarkably constant in the studied species: all genes are differentially ampli®ed in the follicle cells, shortly before the start of their transcription, and each one of them is expressed with similar temporal speci®city (Spradling et al., 1980; Grif®n-Shea et al., 1982; Wong et al., 1985; Levine and Spradling, 1985; Martinez-Cruzado et al., 1988; Fenerjian et al., 1989; Swimmer et al., 1990; Mariani et al., 1996). The structural and regulatory evolution of the autosomal Abbreviations: kb, kilo base pairs; bp, base pair; ACE, ampli®cation control element; AEE, ampli®cation enhancing element; ORC, origin recognition complex; Ccmar1, Ceratitis capitata mariner 1 transposable element * Corresponding author. Tel.: 1301-7274607; fax: 1301-7274318. E-mail address:
[email protected] (K. Komitopoulou).
chorion genes in Drosophila species has been studied extensively. Some gross features of the cluster are constant in all species: all genes have a single short intron and they have the same order, tandem orientation and spacing. In contrast to this organizational constancy, the cluster shows extensive diversi®cation and almost complete randomization of the intronic sequences. The coding regions are also divergent, especially in the more distantly related species. However, in the 5 0 end of each gene and in the proximal 5 0 ¯anking DNA, islands of sequence conservation have been found (Martinez-Cruzado et al., 1988; Fenerjian et al., 1989; Swimmer et al., 1990; Martinez-Cruzado, 1990). Some of these islands may correspond to cis-regulatory elements, responsible for the tissue- and temporal-speci®c gene expression patterns, which are highly conserved in evolution (Swimmer et al., 1992; Fenerjian et al., 1989; Mariani et al., 1996). Moreover, in all species, an 80-A/T-rich sequence that is .90% identical among Drosophila species is included in a region with great similarity to the ACE3 element that is essential for chorion gene ampli®cation. This sequence is suf®cient for the binding of the origin recognition complex (ORC), a factor necessary for the initiation of DNA replication (Orr-Weaver et al., 1989; Delidakis and Kafatos, 1987; Austin et al., 1999).
0378-1119/01/$ - see front matter q 2001 Published by Elsevier Science B.V. All rights reserved. PII: S 0378-111 9(01)00482-6
42
D. Vlachou, K. Komitopoulou / Gene 270 (2001) 41±52
We have extended the study of chorion genes to a more distantly related species, the Mediterranean fruit¯y Ceratitis capitata, a member of a different Dipteran family. A gene family encompassing six genes homologous to Drosophilidae chorion genes was isolated and described in detail in the med¯y. Two of them, the Ccs36 and Ccs38, have been found homologous to the X-linked cluster and mapped on the 5th chromosome (Konsolaki et al., 1990; Tolias et al., 1990; Zacharopoulou et al., 1992). The other four genes, Ccs18, Ccs15, Ccs19 and Ccs16, found homologous to the autosomal cluster, have been mapped on the 6th chromosome and will be referred to as the 6th chromosome locus (Zacharopoulou et al., 1992; Vlachou et al., 1997). All chorion genes are speci®cally ampli®ed in female ovaries and show strong conservation in their tissue and temporal speci®city, except for a notable temporal alteration in gene Ccs15 (Konsolaki et al., 1990; Tolias et al., 1990; Vlachou et al., 1997). In this report we present DNA sequences encompassing the four chorion genes Ccs18, Ccs15, Ccs19 and Ccs16 of the med¯y. We also compare in detail coding regions, introns and intergenic sequences of the locus in ®ve Dipteran species, D. melanogaster, D. subobscura, D. virilis, D. grimshawi and C. capitata. This study can lead us to an understanding of how chorion genes and especially their regulatory elements have changed during evolution. The most exciting future prospect of this work is the functional analysis of the putative cis-regulatory elements identi®ed by sequence comparisons. The evolution of developmental regulatory mechanisms in the species analyzed is an interesting biological issue that is now amenable to experimental studies due to the recently developed transformation techniques in the med¯y (Loukeris et al., 1995; Handler et al., 1998). In gene s15, signi®cant alterations in the 5 0 ¯anking region may be responsible for its changed temporal regulation in C. capitata. We are currently investigating this possibility by transformation analysis of the Ccs15 promoter in C. capitata and D. melanogaster.
Nucleotide sequences were identi®ed by comparing data from the EMBL databank using the BLAST program, the CLUSTAL W program, the DNA analysis program developed by Pustell and Kafatos (1984) and the GCG secondary structure predictions programs (version 8, Genetics Computer Group, Inc). 2.2. EMBL nucleotide sequence database accession numbers The accession numbers of the DNA sequences used for the alignments are: D. melanogaster chorion genes s18, s15 and s19, X02497 (Wong et al., 1985) and X06257 (Levine and Spradling, 1985); D. melanogaster chorion gene s16, X16715 (Fenerjian et al., 1989); D. subobscura chorion genes s18, s15, s19 and s16, X53423 (Swimmer et al., 1990); D. virilis chorion genes s18, s15, s19 and s16, X53421 (Swimmer et al., 1990; Martinez-Cruzado et al., 1988; Fenerjian et al., 1989); D. grimshawi chorion genes, s18, s15, s19 and s16, X53422 (Swimmer et al., 1990; Martinez-Cruzado et al., 1988; Fenerjian et al., 1989). 3. Results 3.1. Organization of the 6th chromosome chorion gene cluster in C. capitata Fig. 1 shows a diagram of the 6th chromosome chorion gene cluster in C. capitata. It also indicates the DNA encompassing the four chorion genes and their surrounding sequences (14,875 kb). The precise borders of the chorion coding regions were determined by reference to cDNA clones isolated previously (Vlachou et al., 1997). The surrounding regions were compared with the complete sequence of Drosophila species chorion locus (MartinezCruzado et al., 1988; Fenerjian et al., 1989; Swimmer et al., 1990; Vlachou et al., 1997) and other known sequences through the EMBL data bank. One defective copy of the transposable element Ccmar1 and sequences homologous
2. Materials and methods 2.1. Sequence analysis and alignments Genomic clones have been isolated from a med¯y genomic library, as has been previously described (Vlachou et al., 1997). They were mapped with various restriction enzymes and subcloned into M13mp18 and M13mp19 or pBluescript II KS (Stratagene). Overlapping restriction fragments were sequenced in both directions using the chain termination procedure. Samples were resolved in 4 or 5% polyacrylamide, 7.5 M urea gels, and autoradiography was performed using Kodak X-Omat ®lm. Each strand was sequenced at least three times. A few uncertainties were resolved by using synthetic oligomers to prime the reaction from internal points.
Fig. 1. DNA sequences of the 6th chromosome chorion locus in the med¯y C. capitata. Diagram of the locus showing sequenced regions in solid bars and non sequenced regions in empty bars. Chorion genes are indicated by black arrows, Ccmar1 by a gray arrow and number (1) and p19 and Tart segments by gray lines and numbers (2) and (3), respectively. Solid vertical bars show the position of the chorion-speci®c regulatory elements TCAGCT.
D. Vlachou, K. Komitopoulou / Gene 270 (2001) 41±52
to fragments of p19 mariner and Tart transposable elements were identi®ed (gray segments in Fig. 1). All the sequenced regions are available in the EMBL databank
43
with the following accession numbers: Ccs18 (3837 bp), AJ251917; Ccs15 (3535 bp), AJ251918; Ccs19-Cc16 (7503 bp), AJ251919.
Fig. 2. Sequence comparison of coding regions of the genes s18, s15, s19 and s16 in the species D. melanogaster (Dm), D. subobscura (Ds), D. virilis (Dv), D. grimshawi (Dg) and C. capitata (Cc), has revealed only a limited number of moderately conserved sequences, corresponding mainly to conserved regions of the deduced polypeptides. Asterisks show invariant bases in all ®ve species and invariant bases in at least four species are shaded.
44
D. Vlachou, K. Komitopoulou / Gene 270 (2001) 41±52
The med¯y chorion genes have maintained the same order, transcription orientation and intron/exon structure in the estimated 100 million years since the last common ancestor of Drosophila and Ceratitis. The genes Ccs18, Ccs15 and Ccs19 are interrupted by a single short intron (137 bp, 111 bp and 88 bp, respectively) within the signal peptide encoding region, and show short untranslated regions at the two ends, comparable in length to Drosophila species, with the exception of Ccs19 which has a much longer putative 3 0 untranslated region. In the gene Ccs16 the intron is located near the 3 0 end, as in all Drosophila species, but it has been duplicated in length compared to D. melanogaster. Comparison of gene structural features
showed that all med¯y chorion genes are longer than their Drosophila counterparts. 3.2. Detailed sequence alignments Interspecies comparisons of the chorion genes at the nucleotide level were carried out with CLUSTAL W and the DNA analysis program developed by Pustell and Kafatos (1984). The genes were compared in pairs and in all ®ve species. Although there is a greater conservation in the closely related species (data not shown), both comparisons revealed extensive diversi®cation for much of their length
Fig. 3. Sequence comparison upstream, in the ®rst exon and the intron of gene s18 in four Drosophila species and C. capitata. The invariant elements, established in Swimmer et al., 1990, were used as anchor points for aligning the sequences in between. These elements that have been enlarged in Ceratitis are shown in gray boxes, whereas the invariant elements in all ®ve species are shown in black boxes. Asterisks show invariant bases in all ®ve species and invariant bases in at least four species are shaded.
D. Vlachou, K. Komitopoulou / Gene 270 (2001) 41±52
and only limited uninterrupted segments were identi®ed in CLUSTAL W alignments. Fig. 2 presents the better conserved segments of the large exons and of coding regions of the small exons of chorion genes. In the genes Ccs18, Ccs15 and Ccs19, these segments of DNA correspond to the most closely conserved regions of the deduced polypeptides (Vlachou et al., 1997). The Ccs16 remains the most conservative gene of the locus and the Ccs15 the most divergent. Detailed sequence alignments were constructed for the proximal 5 0 ¯anking DNA and the nearby small exon and intron of each gene in all species. The alignments were initiated by computer identi®cation of the most conserved regions in Drosophila and med¯y (Martinez-Cruzado et al., 1988; Fenerjian et al., 1989; Swimmer et al., 1990). Matches of .3 nucleotides found in the same order in all ®ve species served as anchor points for aligning the rest of the sequence with the program CLUSTAL W. Figs. 3±6 demonstrate that although many of the perfectly conserved elements existing in Drosophila species have changed due to the existence of
45
intervening sequences in C. capitata, some characteristic elements have remained almost intact. In contrast, the intron sequences are totally randomized. The 5 0 untranslated regions have strongly diverged, showing very small elements of similar nucleotides, especially at the cap site and the coding region. In the 5 0 ¯anking DNA of gene Ccs18 (Fig. 3), the pattern of the conserved elements existing in Drosophila species can be followed quite easily, although only 9 boxes of $5 invariant nucleotides, clustered in regions of different conservation degree, were found in all species. The most conservative region corresponding to the minimal chorion promoter of D. melanogaster extends from 21 to 296 and includes TCACGT and TATA. Only the box containing the TCACGT has remained unchanged in all species. A second TCACGT was found at 2430 to 2425. The transcription start site (cap site) is more divergent than in all the other genes. Another highly conserved region in all species is an element of 74 nucleotides between 2775 and 2848 (2412 to 2482 in D. melanogaster). This segment is extensively
Fig. 4. Sequence comparison upstream, in the ®rst exon and the intron of gene s15 in four Drosophila species and C. capitata. This detailed alignment was established based on criteria that were analyzed elsewhere (Martinez-Cruzado et al., 1988). The alignments were initiated with .3 invariant nucleotides in all species used as anchor points. The invariant elements in all Drosophila species are shown in gray boxes, whereas the invariant elements in all ®ve species are shown in black boxes. Annotations as in Fig. 3.
46
D. Vlachou, K. Komitopoulou / Gene 270 (2001) 41±52
invariable in length with 65% overall identity in all ®ve species (.90% in Drosophila species). In gene s15 (Fig. 4), the intron has highly diverged, except at the 5 0 end where two conserved elements were found: the characteristic sequence GTAAG and a short imperfectly conserved internal element ATCC(T)T possibly related to lariat formation. The 5 0 untranslated region, although well conserved in Drosophila species, has extensively diverged in the med¯y, presenting few substitutions, one insertion of approximately 17 nucleotides and one deletion of 14 nucleotides. Signi®cant homology was found near the cap site and only one small box perfectly conserved. The proximal 5 0 region, between 21 and 291 encompasses two almost perfectly conserved elements, including the TCACGT and the TATA (one nucleotide insertion) motifs. This region has become elongated in Ceratitis. Further upstream a deletion, corresponding to the 263/282 in D. melanogaster, is evident in Ceratitis and further conservation is limited to four moderately diverged blocks (three of them have 5 contiguous matched nucleotides). Comparison of the 5 0 ¯anking and 5 0 untranslated sequences of gene s19 as well as of the small exon and
intron is shown in Fig. 5. Imperfectly conserved, but properly ordered boxes are again clustered in the proximal 5 0 ¯anking DNA. The region from 269 to 114 shows strong conservation in all species. Two almost perfectly conserved blocks, including the TATA (one nucleotide substitution) and TCACGT motifs, were found in this region. Further upstream, conservation is limited to one invariant pentamer. Two more perfectly conserved elements were found at the beginning and the middle of the intron in all species. Comparison of the 5 0 ¯anking DNA and 5 0 untranslated regions of all s16 genes is shown in Fig. 6. In the 5 0 ¯anking sequence the mosaic pattern of small conserved elements, separated from diverged sequences, is maintained in C. capitata, although some of these elements are divided by insertions of a varying number of nucleotides in shorter, perfectly conserved elements. The best conserved elements are concentrated in the 5 0 proximal region of the gene from 280 to 16. In this region two very well conserved elements are encountered, the TCACGT included in an almost perfectly conserved block of 11 nucleotides (one substitution) and the TATA box in a perfectly conserved element of 13 nucleotides. Only in genes s16 and s19 is the region
Fig. 5. Sequence conservation and divergence upstream, in the ®rst exon and the intron of gene s19 in four Drosophila species and C. capitata. The alignments were initiated with invariant elements established in Fenerjian et al., 1989 as anchor points. Annotations as in Fig. 4.
D. Vlachou, K. Komitopoulou / Gene 270 (2001) 41±52
around the TATA box well conserved. The TATA box is followed by a putative transcription initiation sequence (A)GTTAGT (cap site). The intron, found towards the 3 0 end as in all s16 homologues, is placed within codon 129 (111±114 in Drosophila species), resulting in a second exon of only 22 codons (27±29 in Drosophila species). Other sequence conservation besides those described above is absent. One important exception was found in the region between the Ccs16 and the paramyosin (Vlachou et al., 1997), previously documented by Fenerjian et al., 1989. This region is available in only four species. Fig. 7 shows sequence alignments in the region downstream of s16 of D. melanogaster, D. virilis, D. grimshawi and C. capitata. Although most of the perfectly conserved boxes have moderately diverged in Ceratitis (only three have remained invariant) and a very long insertion of 545 bp was observed after the perfectly conserved element CCCAATTAGTA, notable conservation still exists, stronger than what has been found in any other region of comparable length in the chorion cluster. This region does not contain any open reading frame, and extends from 11049 to 12425 relative to Ccs16 transcription start site.
47
Table 1 Base utilization at position III of codons for chorion genes of C. capitata Species Drosophila Dms36 Dms38 Dms18 Dms15 Dms19 Dms16 Ceratitis Ccs36 Ccs38 Ccs18 Ccs15 Ccs19 Ccs16
A
T
G
C
Total
%(G 1 C)
25 34 17 11 13 16
52 75 34 30 32 27
87 64 38 18 35 32
123 134 84 57 94 64
287 307 173 116 174 138
73.17 64.49 70.52 64.65 72.41 68.80
74 60 84 45 73 35
116 106 162 67 143 45
42 38 22 21 30 31
87 78 65 34 81 41
321 282 333 167 327 152
40.2 41.1 26.12 32.93 33.9 47.36
3.3. Comparison of codon usage patterns in Ceratitis capitata and Drosophila melanogaster We compared codon usage in D. melanogaster and C. capitata for all chorion genes. A summary of the results is
Fig. 6. Sequence conservation and divergence upstream and in the 5 0 untranslated region of gene s16 in four Drosophila species and C. capitata. The alignments were initiated with invariant elements established in Fenerjian et al., 1989 as anchor points. Annotations as in Fig. 5.
48
D. Vlachou, K. Komitopoulou / Gene 270 (2001) 41±52
shown in Table 1. The Drosophila melanogaster genes exhibit a de®ciency of A and T in the third position (64±73% G 1 C). On the contrary, the med¯y chorion genes show a slightly reversed bias (26±47% G 1 C). C. capitata uses codons TTA and TTG more frequently than codons CTT, CTC, CTA, and CTG. In med¯y genes there is also a small
bias against GTA, ATA, and NCG codons, which is not observed in D. melanogaster (Table 2). 4. Discussion The study of the med¯y 6th chromosome chorion genes
Fig. 7. Identi®cation of well conserved elements downstream of gene s16. A detailed sequence alignment from the region between s16 and paramyosin of D. melanogaster(Dm), D. virilis (Dv), D. grimshawi (Dg), and C. capitata (Cc), is shown. The sequences are numbered from the transcriptional start site of s16. The invariant elements established in Fenerjian et al. (1989) Fenerjian et al., (1989) were found to be either similar or variably diverged in C. capitata.
D. Vlachou, K. Komitopoulou / Gene 270 (2001) 41±52
49
Table 2 % Codon usage s18
Ala
Arg
Asn Asp Cys Gln Glu Gly
His Ile
Lys Leu
Met Phe Pro
Ser
End
Thr
Tyr Val
Trp
s15
s19
s16
s36
s38
Codon
Cc
Dm
Cc
Dm
Cc
Dm
Cc
Dm
Cc
Dm
Cc
Dm
GCA GCC GCG GCT AGA AGG CGA CGC CGG CGT AAC AAT GAC GAT TGC TGT CAA CAG GAA GAG GGA GGC GGG GGT CAC CAT ATA ATC ATT AAA AAG TTA TTG CTA CTC CTG CTT ATG TTC TTT CCA CCC CCG CCT TCA TCC TCG TCT AGC AGT TAA TGA TAG ACA ACC ACG ACT TAC TAT GTA GTC GTG GTT TGG
29 16 5 50 33 0 0 67 0 0 0 100 67 33 0 100 90 10 100 0 9 15 0 76 0 100 0 30 70 0 0 22 44 11 0 0 22 100 50 50 47 9 2 42 22 6 17 15 11 30 100 0 0 0 80 0 20 59 41 40 10 10 40 0
4 71 4 21 0 25 0 25 0 50 75 25 0 100 100 0 0 100 0 100 39 25 0 36 50 50 0 75 25 0 100 0 0 0 50 50 0 100 100 0 20 60 0 20 6 33 11 17 33 0 100 0 0 0 100 0 0 82 18 0 31 50 19 100
19 15 7 59 0 0 25 25 0 50 0 100 0 0 100 0 57 43 100 0 20 30 0 50 0 100 0 33 67 0 100 44 22 11 11 0 11 100 50 50 55 10 0 35 32 16 4 16 24 8 100 0 0 33 17 33 17 33 67 17 8 50 25 100
0 61 6 33 0 0 14 29 0 57 60 40 0 100 0 100 33 67 0 100 29 38 0 33 100 0 0 50 50 0 100 0 20 0 0 60 20 100 100 0 29 57 0 14 0 38 12 0 38 12 100 0 0 0 100 0 0 88 12 0 43 29 29 0
30 10 5 55 50 0 0 0 50 0 14 86 0 100 0 100 29 71 50 50 3 20 0 77 0 100 20 40 40 75 25 0 31 8 23 0 38 100 67 33 50 20 4 26 17 12 5 28 31 8 0 0 100 14 36 0 50 45 55 25 25 25 25 0
13 57 0 30 0 0 17 33 0 50 100 0 0 100 100 0 0 100 0 100 17 48 0 35 0 0 0 100 0 0 100 0 0 0 17 67 17 100 100 0 15 77 8 0 6 12 6 12 62 0 100 0 0 0 33 0 67 79 21 0 50 40 10 100
17 28 17 38 0 0 0 100 0 0 0 100 33 67 67 33 67 33 100 0 25 38 0 38 0 0 12 62 25 56 44 0 80 0 20 0 0 100 75 25 80 10 0 10 7 21 21 14 7 29 100 0 0 60 20 0 20 33 67 11 22 22 44 100
3 59 3 34 0 0 0 80 0 20 50 50 67 33 100 0 33 67 0 100 43 29 0 29 100 0 0 100 0 14 86 0 0 13 13 60 13 100 100 0 50 25 12 12 0 50 0 25 25 0 100 0 0 0 100 0 0 90 10 0 33 44 22 100
20 13 7 59 0 0 0 75 0 25 67 33 20 80 0 100 67 33 88 12 7 41 0 52 0 100 6 62 31 40 60 11 42 5 32 5 5 100 75 25 47 17 3 33 18 14 11 36 14 7 100 0 0 43 14 14 29 52 48 22 17 30 30 0
6 67 8 19 0 14 0 43 0 43 65 35 25 75 0 0 13 87 8 92 29 39 0 32 33 67 0 73 27 0 100 0 15 0 25 55 5 100 67 33 29 48 19 3 5 29 14 14 33 5 100 0 0 0 40 20 40 80 20 0 26 65 9 100
19 21 7 52 0 0 0 17 0 83 40 60 0 100 100 0 72 28 60 40 9 32 2 57 22 78 0 69 31 50 50 14 43 14 7 7 14 100 33 67 52 26 4 17 9 4 26 4 43 13 100 0 0 67 17 0 17 47 53 15 15 25 45 0
12 43 4 41 12 0 0 25 0 62 92 8 33 67 50 50 0 100 11 89 31 31 0 37 60 40 0 70 30 0 100 0 7 0 36 50 7 100 50 50 26 56 15 4 8 33 17 0 38 4 0 0 100 0 86 14 0 62 38 0 37 47 16 100
50
D. Vlachou, K. Komitopoulou / Gene 270 (2001) 41±52
complements earlier studies of the evolution of the locus in distantly related Drosophila species (Martinez-Cruzado et al., 1988; Fenerjian et al., 1989; Martinez-Cruzado 1990; Swimmer et al., 1990). Comparison of the organization, structure, and function of the locus in a member of a different Dipteran family can test previously excluded conclusions regarding the inhomogenous evolution of the intragenic and extragenic sequences and also detect novel events in the evolution of the locus. In this work we present a total of approximately 15 £ 10 3 base pairs of DNA, encompassing all the 6th chromosome chorion genes and their ¯anking sequences in the med¯y, and we compare the corresponding sequences in all species studied. These comparisons demonstrate how chorion genes and their ¯anking elements have diverged during the course of evolution. We have compared the 5 0 ¯anking DNA, the 5 0 untranslated sequence and the intron of each chorion gene in D. melanogaster, D. subobscura, D. virilis, D. grimshawi and C. capitata. Despite its organizational conservation, the cluster shows more extensive diversi®cation in C. capitata than does in Drososphila species, as might be expected from the long evolutionary distance among these species. One measure of diversi®cation is the complete randomization of the intronic sequences. Even the coding DNA sequences are more divergent in C. capitata, maintaining limited regions of homology. The islands of strong sequence conservation found in the 5 0 ¯anking ends of chorion genes in Drosophila species have changed in Ceratitis, but they still can be aligned easily and in the same order along the sequences. In all species, the chorion genes are preceded by the chorion-speci®c hexamer, TCACGT, like all chorion genes examined to date. This hexamer is essential for chorion gene expression as shown by mutagenesis and transformation analysis of genes s15 (Mariani et al., 1996) and s18 (Swimmer et al., 1992). Interestingly, the elements that encompass this hexamer are almost similar in all species and speci®c for each gene. Detailed mutational analysis of the Drosophila s15 gene promoter by transformation experiments suggested that the DNA between 2189 and 239 contains many positive and negative cis-regulatory elements, which are involved in specifying the highly precise expression pattern of s15 during development (Mariani et al., 1996). Speci®cally, the regions 2189 to 2149 and 2110 to 269 appear to be important for both early repression and late activation of the gene, and their functions overlap so that they both have to be missing for the temporal pattern to change. The ®rst region is highly conserved in all Drosophila species but the second is more divergent. The corresponding regions in C. capitata are from 2380 to 2197 and 2126 to 293, respectively. Both regions have suffered major changes. It can be suggested that these changes might be responsible for the temporal alteration of s15 expression (Vlachou et al., 1997). We are currently investigating this possibility by transformation analysis of the Ccs15 promoter in C. capitata and D. melanogaster.
For s18, the 5 0 ¯anking sequence conservation is both stronger and extends further upstream than in the case of the other three genes. Speci®cally, an A/T-rich 80 bp nucleotide segment found in the distal region of the s18 promoter presents considerable conservation in C. capitata too, with 65% identity. The homologous segment in Drosophila species is localized within the ACE3 element that was found to be involved in the control of ampli®cation (Delidakis and Kafatos, 1987; Orr-Weaver et al., 1989). In D. melanogaster, ORC (origin recognition complex), a protein complex that plays an essential role in DNA replication, binds speci®cally to the ACE3 element. It has been proposed that ACE3, possibly through its A/T-rich element, nucleates DmORC-DNA binding, and consequently leads to DmORC interacting with adjacent binding sites in the chorion locus (Austin et al., 1999). A remarkable sequence conservation was also found in the region between the genes s16 and paramyosin, a strongly conserved gene, located downstream of the chorion locus in all ®ve species (Vlachou et al., 1997). Transformation experiments in D. melanogaster have suggested that the homologous region may contain ampli®cation enhancing elements (AEEs) that are accessory to ACE3 (Delidakis and Kafatos, 1989). The most impressive conserved feature of chorion locus in both Drosophila and Ceratitis is the presence of the same genes, four chorion genes and paramyosin, in the same order and orientation. This organization within Diptera may be related to the ancient invention of chorion gene ampli®cation, a developmental mechanism that does not occur in other insects (Kafatos et al., 1987). The presence of the essential ampli®cation control element (ACE) upstream of s18 in all species and of a possible accessory ampli®cation enhancing element (AEE) ¯anking s16 gene support this hypothesis. The med¯y 6th chromosome chorion locus has been elongated, not only because of the longer chorion genes but also because of the longer intergenic DNA. Thus, in C. capitata the chorion genes are contained within approximately 14.5 kb from the beginning of Ccs18 to the end of Ccs16, compared to 5.5±6.3 kb in Drosophila species (MartinezCruzado et al., 1988). Approximately 3.5 kb of the `extra' sequences are due to the elongation of the chorion genes and the transposable elements, and 4.5 kb are due to numerous short and long insertions in the intergenic DNA. In addition, paramyosin is longer in the med¯y, showing three introns in a region where the Drosophila homologue has only one. If the compact nature of the Drosophila genome is a secondary feature, this multigene locus indicates a process of compaction in Drosophila, eliminating dispensable sequences in the intergenic regions as well as in introns. The increased length of the med¯y genes is correlated with the existence of a novel segment consisting of imperfect tandem repeats of the heptapeptide SYSAPAP associated with extensions consisting of alanine and proline (Vlachou et al., 1997). There are 13 such repeats in
D. Vlachou, K. Komitopoulou / Gene 270 (2001) 41±52
Ccs19, 14 in Ccs18 and only one in Ccs15. This peptide does not exist in Ccs16. A related octapeptide-repeat Y (G or S) AAPAAS is found in the C-terminal of the Ccs36 sequence (Aggeli et al., 1991). Because of the absence of these peptides from Drosophila, we suggest that the SYSAPAP repeats and the AP segments evolved de novo during the period separating the families of Drosophila and Ceratitis from their common ancestor, although we cannot exclude the possibility that they were lost selectively from Drosophila. Codon usage bias, the preferential use of particular codons within each codon family, is characteristic of synonymous base composition in numerous species including Drosophila, yeast and many bacteria. Preferential usage of particular codons in these species is maintained by natural selection, largely at the level of translation. (Sharp et al., 1988; Akashi, 1994). Many genes analyzed in various species showed a positive correlation between codon usage bias and the gene expression levels and thus provide direct evidence for selection on silent sites in these species. Moreover, for the same expression pattern, the selective pressure on codon usage appears to be lower in genes encoding long rather than short proteins (Moriyama and Powell, 1998; Duret and Mouchiroud, 1999). Highly expressed genes in Drosophila do tend to be highly biased in codon usage. Larval serum proteins, larval and adult cuticle proteins, yolk proteins, actins, alcohol dehydrogenase, superoxide dismutase, lysozymes, amylases, a- and b-tubulins and chorion are included among approximately 10% of the most highly biased genes (Moriyama and Powell, 1998). Comparison of codon usage in D. melanogaster and C. capitata showed that the med¯y chorion genes present a slightly reversed bias and that the preferred codons are A or T-ending. Similar results have been found for the med¯y vitellogenin genes (Rina and Savakis, 1991). There are two possible explanations for the observed difference between the two species. First, the selection of synonymous codons may be weaker for the med¯y chorion genes than for Drosophila, possibly due to lower rates of expression, resulting from the longer Ceratitis life cycle (double in length) with a comparable egg output. Additionally, the med¯y chorion proteins are longer than their Drosophila homologues and as has been reported, the frequency of optimal codons decreases with the length of the encoded protein (Moriyama and Powell, 1998; Duret and Mouchiroud, 1999). Alternatively, selection of synonymous codons may not be operating in the med¯y if its effective population is small: a mutation that is advantageous in a species with large effective population size may be neutral in a small population, where random drift overcomes selection (Shields et al., 1988). Comparison of the codon usage tables for Drosophila and Ceratitis available on the web site http://www.kazusa.or.jp/codon/. (Nakamura et al., 2000), showed that, indeed, med¯y genes present a rather low codon bias, favouring the latter explanation.
51
Acknowledgements We thank Fotis Kafatos for the critical reading of the manuscript. This work was supported by the University of Athens and by the General Secretariat of Research and Technology of the Greek Ministry of Development.
References Aggeli, A., Hamodrakas, S.J., Komitopoulou, K., Konsolaki, M., 1991. Tandemly repeating peptide motifs and their secondary structure in Ceratitis capitata eggshell proteins Ccs36 and Ccs38. Int. J. Biol. Macromol. 5, 307±315. Akashi, H., 1994. Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics 136, 927±935. Austin, R.J., Orr-Weaver, T.L., Bell, S.P., 1999. Drosophila ORC speci®cally binds to ACE3, an origin of DNA replication control element. Genes Dev. 13, 2639±2649. Delidakis, C., Kafatos, F.C., 1987. Ampli®cation of a chorion gene cluster in Drosophila is subject to multiple cis-regulatory elements and to longrange position effects. J. Mol. Biol. 197, 11±26. Delidakis, C., Kafatos, F.C., 1989. Ampli®cation enhancers and replication origins in the autosomal chorion gene cluster of Drosophila. EMBO J. 8, 891±901. Duret, L., Mouchiroud, D., 1999. Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis. Proc. Natl. Acad. Sci. USA 96, 4482±4487. Fenerjian, M.G., Martinez-Cruzado, J.C., Swimmer, C., King, D., Kafatos, F.C., 1989. Evolution of the autosomal chorion cluster in Drosophila II. Chorion gene expression and sequence comparisons of the s16 and s19 genes in evolutionarily distant species. J. Mol. Evol. 29, 108±125. Grif®n-Shea, R., Thireos, G., Kafatos, F.C., 1982. Organization of a cluster of four chorion genes in Drosophila and its relationship to developmental expression and ampli®cation. Dev. Biol. 91, 325±336. Handler, A.M., McCombs, S.D., Fraser, M.J., Saul, S.H., 1998. The lepidopteran transposon vector, piggyBac, mediates germ-line transformation in the Mediterranean fruit ¯y. Proc. Natl. Acad. Sci. USA 95, 7520± 7525. Kafatos, F.C., Mitsialis, S.A., Nguyen, H.T., Spoerel, N., Tsitilou, S.G., 1987. Evolution of structural genes and regulatory elements for the insect chorion. In: Raff, R., Raff, E.C., Liss, A.R. (Eds.), Development as an Evolutionary Process, Vol. 8. New York, pp. 161±178. Konsolaki, M., Komitopoulou, K., Tolias, P.P., King, D.L., Swimmer, C., Kafatos, F.C., 1990. The chorion genes of the med¯y, Ceratitis capitata, I. Structural and regulatory conservation. Nucleic Acids Res. 18, 1731±1737. Levine, J., Spradling, A., 1985. DNA sequence of a 3.8 kilobase pair region controlling Drosophila chorion gene ampli®cation. Chromosoma 92, 136±142. Loukeris, T.G., Livadaras, I., Arca, B., Zabalou, S., Savakis, C., 1995. Gene transfer into the med¯y, Ceratitis capitata, with a Drosophila hydei transposable element. Science 270, 2002±2005. Mariani, B.D., Shea, M.J., Conboy, M.J., Conboy, I., King, D.L., Kafatos, F.C., 1996. Analysis of regulatory elements of the developmentally controlled chorion s15 promoter in transgenic Drosophila. Dev. Biol. 174, 115±124. Martinez-Cruzado, J.C., Swimmer, C., Fenerjian, M.G., Kafatos, F.C., 1988. Evolution of the autosomal chorion locus in Drosophila I. General organization of the locus and sequence comparisons of genes s15 and s19 in evolutionary distant species. Genetics 119, 663±677. Martinez-Cruzado, J.C., 1990. Evolution of the autosomal chorion cluster in Drosophila, IV. The Hawaiian Drosophila: rapid protein evolution and constancy in the rate of DNA divergence. J. Mol. Evol. 31, 402±423. Moriyama, E.N., Powell, J.R., 1998. Gene length and codon usage bias in
52
D. Vlachou, K. Komitopoulou / Gene 270 (2001) 41±52
Drosophila melanogaster Saccharomyces cerevisiae and Escherichia coli. Nucleic Acids Res. 26, 3188±3193. Nakamura, Y., Gojobori, T., Ikemura, T., 2000. Codon usage tabulated from international DNA sequence databases: status for the year 2000. Nucleic Acids Res. 28, 292. Orr-Weaver, T.L., Johnston, C.G., Spradling, A.C., 1989. The role of ACE3 in Drosophila chorion gene ampli®cation. EMBO J. 8, 4153±4162. Pustell, J., Kafatos, F.C., 1984. A convenient and adaptable package of computer programs for DNA and protein sequence management, analysis and homology determination. Nucleic Acids Res. 12, 643±655. Rina, M., Savakis, C., 1991. A cluster of vitellogenin genes in the Mediterranean fruit ¯y Ceratitis capitata: sequence and structural conservation in dipteran yolk proteins and their genes. Genetics 127, 769±780. Sharp, P.M., Cowe, E., Higgins, D.G., Shields, D.C., Wolfe, K.H., Wright, F., 1988. Codon usage patterns in Escherichia coli Bacillus subtilis, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster and Homo sapiens; a review of the considerable withinspecies diversity. Nucleic Acids Res. 16, 8207±8211. Shields, D.C., Sharp, P.M., Higgins, D.G., Wright, F., 1988. `Silent' sites in Drosophila genes are not neutral: evidence of selection among synonymous codons. Mol. Biol. Evol. 5, 704±716. Spradling, A.C., Digan, M.E., Mahowald, A.P., Scott, M., Craig, E.A., 1980. Two clusters of genes for major chorion proteins of Drosophila melanogaster. Cell 19, 905±914.
Swimmer, C., Fenerjian, M.G., Martinez-Cruzado, J.C., Kafatos, F.C., 1990. Evolution of the autosomal chorion cluster in Drosophila III. Comparison of the s18 gene in evolutionarily distant species and hetero-speci®c control of chorion gene ampli®cation. J. Mol. Biol. 215, 225±235. Swimmer, C., Kashevsky, H., Mao, G., Kafatos, F.C., 1992. Positive and negative DNA elements of the Drosophila grimshawi s18 chorion gene assayed in Drosophila melanogaster. Dev. Biol. 152, 103±112. Tolias, P.P., Konsolaki, M., Komitopoulou, K., Kafatos, F.C., 1990. The chorion genes of the med¯y Ceratitis capitata. II. Characterization of three novel cDNA clones obtained by differential screening of an ovarian library. Dev. Biol. 140, 105±112. Vlachou, D., Konsolaki, M., Tolias, P.P., Kafatos, F.C., Komitopoulou, K., 1997. The autosomal chorion locus of the med¯y Ceratitis capitata I. Conserved synteny, ampli®cation and tissue speci®city but sequence divergence and altered temporal regulation. Genetics 147, 1829±1842. Wong, Y.-C., Pustell, J., Spoerel, N., Kafatos, F.C., 1985. Coding and potential regulatory sequences of a cluster of chorion genes in Drosophila melanogaster. Chromosoma 92, 124±135. Zacharopoulou, A., Frisardi, M., Savakis, C., Robinson, A.S., Tolias, P., Konsolaki, M., Komitopoulou, K., Kafatos, F.C., 1992. The genome of the Mediterranean fruit¯y Ceratitis capitata: localization of molecular markers by in situ hybridization to salivary gland polytene chromosomes. Chromosoma 101, 448±455.