J. Mol. Biol. (1990) 215,225-235
E v o l u t i o n o f the A u t o s o m a l C h o r i o n Cluster in Drosophila IIIt. Comparison of the s18 gene in Evolutionarily Distant Species and Heterospecific Control of Chorion Gene Amplification Candace Swimmer 1, Maryanne G. Fenerjian ~ Juan Carlos Martinez-Cruzadol:~ and Fotis C. Kafatosl'2§ 1Department of Cellular and Developmental Biology Harvard University, 16 Divinity Avenue Cambridge, M A 02138, U.S.A. 2Institute of Molecular Biology and Biotechnology Research Center of Crete, PO Box 1527 Heraklion 711 10, Crete, Greece (Received 26 February 1990; accepted 18 May 1990) We present a total of 6"1 x 103 base-pairs of DNA sequences, encompassing the s18 gene and flanking regions within the autosomai chorion cluster of three Drosophila species. Against a background of extensive divergence in the intron and even in parts of the coding region, islands of strong sequence conservation are evident. These are particularly notable in the 5' flanking DNA where they extend to approximately - 6 0 0 base-pairs from the transcription start site. The most conserved segment of the entire chorion cluster is 71 base-pairs in the s18 5' flanking DNA, which in D. melanogaster is part of a region defined functionally as containing amplification control elements (ACE3 region). Transformation analysis, using chimeric transposons of D. melanogaster and D. grimshawi DNA, revealed that amplification control elements of D. grimshawi can support amplification in D. melanogaster. The functionally defined ACE3 region of D. grimshawi includes the conserved 71 base-pair segment, but also non-conserved sequences further upstream, which apparently enhance amplification.
1. Introduction
Amplification
In Drosophila melanogaster the chorion (eggshell) proteins are encoded by two major gene clusters: s36, s37 and s38 located on the X, and 815, s16, s18 and 819 located on the third chromosome (Spradling, 1981; Griffin-Shea et al., 1982; Parks et al., 1986; Fenerjian et al., 1989). Each of these genes is expressed with a distinct temporal profile. Prior to expression, both gene clusters begin to replicate differentially, leading to amplification levels of approximately 20-fold on the X and 50 to 100-fold on the third chromosome (Spradling, 1981}. Paper II in this series is Fenerjian et al. (1989). Present address: Dept. de Biologia, Recinto Universidad Mayaguez, Universidad de Puerto Rico, Mayaguez, PR 00708, Puerto Rico. § Author to whom correspondence should be addressed at Dept. of Cellular and Developmental Biology, Harvard University, Cambridge MA 02138, U.S.A. 0022-2836/90/180225-11 $03.00/0
of
the
autosomal
cluster
in
D. melanogaster has been shown to be controlled by several cis acting regions. Upstream from the s18 gene lies the amplification control element (ACE3H) region that is essential for high levels of amplification (Orr-Weaver & Spradling, 1986; Swimmer et al., 1989). In addition, four amplification-enhancing regions (AER a to d) have been mapped within this cluster (Delidakis & Kafatos, 1987, 1989). Deletions removing the AERs reduce amplification, although not as drastically as does removal of the ACE. A major origin for replication used during the amplification process has been mapped to within AER-d (Delidakis & Kafatos, 1989; Heek & Spradling, 1990). We have been interested in the evolution of the autosomal chorion cluster within the genus [[ Abbreviations used: ACE, amplification control element; AER, amplification-enhancing region; bp, base-pair(s); nt, nucleotide(s).
225
© 1990 Academic Press Limited
C. Swimmer et al.
226
Drosophila. P r e v i o u s l y we h a v e r e p o r t e d the isolation of this cluster f r o m D. subobscura, D. virili8 and D. grimshauri, and the sequence of the 815, 816 and 819 genes from these species (Martinez-Cruzado et al., 1988; Fenerjian el al., 1989). D. melanogaster is t h o u g h t to h a v e had its last c o m m o n a n c e s t o r with D. subobscura 20 to 50 million y e a r s ago, a n d with D. virili8 and D. grimshawi 50 to 80 million y e a r s ago ( T h r o c k m o r t o n , 1975; B e v e r l e y & Wilson, 1984). All four species specifically a m p l i f y this chorion cluster in o v a r i a n tissue (Martlnez-Cruzado et al., 1988), a n d express the genes in the s a m e characteristic order (Fenerjian el al., 1989). T h e genes show poor D N A c o n s e r v a t i o n in the coding regions, b u t relatively large blocks of sequence similarity 5' to the transcriptional s t a r t site. We present the sequence o f the s18 gene from D. subobscura, D. virilis a n d D. grimshawi. I n general, the spatial p a t t e r n o f sequence conservation is similar to t h a t e n c o u n t e r e d in the o t h e r three genes. H o w e v e r , the 5' flanking sequence conservation is s u b s t a n t i a l l y stronger, and e x t e n d s f a r t h e r u p s t r e a m . W e also r e p o r t functional studies directed t o w a r d s testing w h e t h e r some o f the conserved sequences play a role in amplification. A series of chimeric constructs were made, in which D. melanogaster sequences were partially replaced by D. grimshawi D N A , and the ability of these constructs to a m p l i f y in D. melanogaster was tested. We observed t h a t some D. grimshawi sequences can s u b s t i t u t e for the D. melanogaster A C E 3 region and allow high-level amplification when i n t r o d u c e d in D. melanogaster. T h e A C E 3 region o f b o t h species includes a 71 bp D N A s e g m e n t t h a t is the m o s t e v o l u t i o n a r i l y c o n s e r v a t i v e portion of t h e chorion cluster. H o w e v e r , D N A f a r t h e r u p s t r e a m from the conserved region a p p a r e n t l y c o n t r i b u t e s to the amplification response o f the chimeric constructs.
2. Materials and Methods
(b) Plasmid construction Chimeric constructs (see Fig. 6) included the entire
D. melanogaster chorion cluster, from the SalI site upstream from s18 to the EcoRI site downstream from s16. For future expression studies, both the s18 and s15 genes were marked with a fragment of DNA from a moth chorion gene (as in Mariani et al., 1988). Deletions of the D. melanoffaster DNA extended from either near the 5' end of the s18 gene to the XbaI site at - 9 3 1 (A + 7 / - 9 3 1 ) or between the Bali sites at - 1 8 7 and - 6 1 2 ( A - 1 8 7 / -612). Within these gaps, segments of D. qrimslmwi DNA were inserted to construct the chimeric plasmids. These constructs were named "crag" followed by the coordinates of the D. qrimshawi DNA that they included. For the cmg--5 series, we inserted into the A + 7 / - 9 3 1 deletion D. grimshawi fragments extending from an artificial XbaI site (introduced during the generation of deletions with DNase; see above) to the HpaI site at +7; these replacement constructs were designated as beginning at - 5 , because the - 4 to + 7 sequence of D. grimshawi is the same as in D. melanogaster. These constructs were built in a series of steps. The D. melanogaster sequence was cut at the ClaI site at + 18 and ligated to D. 9rimshawi DNA at its HpaI site through an adaptor, which corresponded to the D. melanoga~ter sequence from + 8 to + 17. The XbaI end of the D. 9rimshawi sequence was ligated to the D. melanogaster XbaI site at - 9 3 1 of s18. These constructs were confirmed by sequence analysis. For the c m g - 1 8 9 / -- 747 and c m g - 3 1 1 / - - 7 4 7 constructs the D. grimshawi DNA was cut at the SspI site at - 7 4 7 and at an artificial XbaI site (also introduced during generation of the DNase I deletions as above) at either - 1 8 9 or - 3 1 1 . The ends were repaired and EcoRI linkers were added. These fragments were inserted into A--187/--612 at an EcoRI site added across the deletion junction. The constructs were confirmed by extensive restriction analysis. (c) Fly transformation Injection of D. melanogaster embryos and the selection and characterization of the transformed lines were (lone as described (Delidakis & Kafatos, 1987). Restriction analysis confirmed that each line had the expected, unrearranged construct.
(a) Sequence analysis and alignments Restriction fragments from D. subobscura and D. virilis were subcloned into phage M13 vectors, while restriction fragments of D. grimshawi were subcloned into pSDLI2 (Levinson et al., 1984). Deletions of D. virilis were generated using the method of Dale et al. (1985); deletions of D. grimshawi were generated with DNase I (Laughton & Scott, 1984). The s18 gene of D. subobscura was first mapped extensively with restriction enzymes and specific subclones were constructed and sequenced. Sequencing was done using the chain termination method of Sanger et al. (1977), with [35S]thio-ATP (Biggen et al., 1983) and either Sequenase (US Biochemicals; Tabor & Richardson, 1987) for D. subobscura and part of the D. qrimshawi sequence, or Klenow enzyme for D. virilis and D. grimshawi. Gradient gels (Biggen et al., 1983) of approximately 38 cm were used. Sequences were analyzed using the computer programs of Pustell & Kafatos (1982, 1984) or Staden (1982, 1984). The criteria used to generate sequence alignments were the same as described by Martinez-Cruzado et al. (1988).
(d) DNA analysis Amplification of the transposons was assayed in a mixed population of stage 13/14 follicle DNA from the D. melanogaster transformants. Male DNA served as the single copy control. Amplification blots were performed and quantified as described by Delidakis & Kafatos (1987). 32p-labeled probes were prepared by nick translation (Rigby et al., 1977) of gel purified fragments. Probes designated as ch R and ry R were as indicated in Fig. 6. Amplification levels are expressed in terms of the amplification of the transformant rosy (ry) band in follicles, as a percentage relative to amplification of the endogenous chorion (ch) band. Due to the variation in the amplification levels of the endogenous chorion band in different samples (24 to 180-fold), both amplifying and non-amplifying values can fall in the 2% to 6% range (see Fig. 6). Statistical analysis used the Wilcoxon 2-sample test (Sokal & Rohlf, 1981). Staging of the D. qrimshawi follicles was done as described by Martinez-Cruzado (1988). DNA was
Autosomal Chorion Cluster in Drosophila
P Xb
H B
H BI
911
Xb
Xh S H
Xh
H B! P
E
227
B
BI
Px3 p p
B P
P I
D,subobscuro
H
H Xh
P
H Xb B~
k~\~-k~.\\\-.-.--.~.k~-..~.-.-k~ I •
B~ P
P
P
D gr/mshowi ~k . . . .
-~,
Xh I
H
J --L%-.-~-4~..-o[~
B RI Ell I I I
•
PIBIIXh BI XhH H i ~ • .................L\4£~4
D virl'lis
E - - J
D melonogosler
PB[ If
BIx3 BI II I
HXh Ir
'
I~ E ~
E H I I
Xh I
E Xh °
•
S P B| I I I
Bl P
I I
Blx2 °1 ~ 1 1
E I
El I
H Hx2 I I
Sl P Xh Hx2 HH P I
'
I
"ll[f.
BB I
o
HE II
B] J.
B]
I E~
E I
PBII I[
P BOx2 P •
(:-----I
I iNC.ORF
P E ~l
Figure 1. Organization of the autosomal chorion cluster in 4 Drosophila species. The DNA sequences that are reported in Fig. 2 and those that arc used for Fig. 4 are shown by hatched bars; other previously reported sequences are indicated by filled bars. Locations of the 4 chorion genes are indicated by arrows; each gene is interrupted by a single intron. A downstream gene (NC-ORF) is also indicated. Chorion-specific hexamer elements (TCACGT) are shown: ( . ) in the sense strand and ([-1) in the anti-sense strand. Maps are aligned at the 5' end of the s18 gene, distances are indicated by dots at l0 a bp intervals, and restriction enzyme sites are abbreviated as follows: B, BamHI; BI, BglI; BII, Bg/II; E, EcoRI; H, HindIII; P, PstI; S, SalI; Xb, XbaI; Xh, XhoI.
prepared and analyzed as above. Labeled fragments were prepared by nick t,'anslation of purified restriction fragments. 3. Results (a) Organization of the chorion cluster The organization of the autosomal chorion gene cluster in D. melanogaster, D. subobscura, D. virilis and D. grimshawi is summarized in Figure I. The genes have maintained the same order, transcriptiona] orientation and exon/intron structure t h r o u g h o u t the evolution of the genus. Also indicated are the sequences t h a t have been reported elsewhere (Wong et al., 1985; Levine & Spradling, 1985; Martlnez-Cruzado et al., 1988; Fenerjian et al., 1989), as well as those presented here. The new sequences total 6"1 × l03 bp spanning the s18 gene and flanking sequences in D. subobscura, D. virilis and D. grimshawi, and are shown in Figure 2. The start of transcription, the TATA box, the TCACGT motif characteristic of chorion genes and the polyadenylation signal are indicated, and the conceptual translation is shown below the DNA sequence. As in D. melanogaster (Wong et al., 1985; Levine & Spradling, 1985), the s18 gene is compact (684 to 709 bp from the transcriptional initiation site to the polyadenylation signal). I t consists of a small first exon, a short intron and a major second exon t h a t includes all but five codons of the translated region. The sequences shown in Figure 2 are contiguous to those presented previously (see Fig. l; Martfnez-Cruzado et al., 1988). (b) I nterspeeies comparisons The computer programs of Pustell & Kafatos (1982, 1984) were used to compare the s18 and flanking sequences in different species. Two representative matrix comparisons are shown in Figure 3, where the D. grimshawi s18 gene is compared to both a distantly and a more closely related species
(D. melanogaster and D. virilis, respectively). Although there is greater conservation between the closely related species, both comparisons reveal extensively conserved regions upstream from the start site of transcription as far as approximately - 6 0 0 bp. Both the degree of conservation and its e x t e n t upstream from s18 are substantially greater than for the other three genes of t h e cluster (Martinez-Cruzado et al., 1988; Fenerjian et al., 1989). Indeed, for s18 the upstream DNA regions are more conserved than the exons (although the proteins are conserved at the amino acid level). A more detailed alignment of the DNA sequences of all four species is shown in Figure 4. The top panel of t h a t Figure diagrams the most extensively conserved elements of the 5' flanking DNA: identical blocks of nine or more nucleotides for four-way matches, ten or more for three-way matches and 12 or more for two-way matches are shown, irrespective of the order in which they appear. I t is notable t h a t these blocks almost invariably appear in the same order in all species. Thus, despite some deletions/insertions, comparable regions of the 5' flanking DNA can be discerned easily. For example, the D. grimshawi DNA between - 1120 and - 1500 apparently corresponds to the D. melanogaster DNA between - 850 and - 1220. The b o t t o m part of Figure 4 shows a detailed alignment of sequences from all four species. Despite numerous small insertions/deletions and locally extensive diversification, the first exon and much of the 5' flanking DNA up to a p p r o x i m a t e l y --500 in D. melanogaster (--600 in D. grimshawi) can be aligned easily. In contrast, the intron sequences are almost totally randomized. The 5' flanking DNA has 22 species-invariant sequences of >_5 bp, m a n y of them clustered in three conservative regions, from -- 1 to - 7 9 , - 126 to - 162, and - 1 8 6 to - 2 5 0 in D. melanogaster (this species will be used as reference hereafter, unless otherwise indicated). A fourth and by far the most highly conserved region is the 71 nt segment between
O. virili$
O. S u b o b $ c u r o I AAGCTTGTGG ATGCTEAGAT ATGCGGAGET GECTTAATAT TTEACTETTA ACTCTTAGEA GAETETTTAA ETAATATACA
I CTGCAGCACr CTGCIGTATG CCAATCTTGG CCAGCTIET6 TTAAGTATGT TGTTGAAEAA GCEATTGATG CIEGAGTAAT
81
81
AATEAAAAGT TTGECETGAT ATTCTETTAG AATTATGGCT TCAATGTTTA AGTTATACGA ATAEGAATAT GTAG116TAG
CGGCTTATAA AATTCTTATT CTCATTCT£A TITGGGATTT GATGAAAACC CTGAGAGAAG TGATGAAGAE TATTGGTTGT
TTCTGTTTGC ECAGTTTTAT TATTATAAAC CTTTAAATAA TTEATAAGTT TAA|GGCAGC AAATAAAGEG GG¢TTGTGGC
35] TGGGAATCTT ATTGAAATTA ATTGCGETTA CCTCtAAGCC AAGATGTTAT ATATTGTTTA ATAATAGCTT CTGCAAAGAT
GATGTTEATC CATCTATAGG ATGCTGATCE CCTAAGCCTT TTCAAETTET AATTAATTEA TTATTTCAGT CATTCATTCG
Z4t TTGTCTAGTT TTAACAGTTT GETTGCTGAA ATGTTGArTA AAA£TGAAAT CTTTAAGTTC rCTGEACAGT CAATTATTTG
321 AGEAAAGTGC TCCCAAAGTT TTAET¢GTTA ATTATTATTA TAGAAAAAGC TTAGETTGGC ATATTTGCEC TECETCTACE
3Zi TTCTTATGTT GAGTGE&AGA TGGAITGEAT GTGCAAAACT TAATAEAAAC TCGATTTATT AATTCGETGG &AATAAATEA
401 AGGGGCTTTA TTGTTTATAA TTTTATTGTA ATTTTATETG AAETTTTTTT GCTTTTGTAT ATAAATTCTA EEAACGCAGC
40t TGTGCATTTA ATTCCECATT CTGTTGCTCT GTTCAT¢CAA ETAATATGEC 8CTGTGACTG AGTGTAGEGC ATTCGAAAAT
481 AGAAAATEGA AGCEAATGEC TTTTGGCTTT TGTGTGTCAE CGAAEGAAAA ACGATGTEAE GGETEGCATG GAGCTCGGAG
481 AATTTCAAAT TATTTGCAGC GTTTTATTAT GATTTTATTT GAGTAAAGTG CTTGCCGTTG CATAGTTTTT AATTGTTTAT
861 ECACCGTGGA GTCAATGCEA EACAAAAGTG GAEACAAATG GTEAATGAAA TGGGGEGTGG AGCTGTEACG AGTEATGGCA
561 TGTTTATAAT TTrATTGTAA TITTATEGTA ACTTTTTTTG CTTTTGTATA TAAATTETAC CAAEGEAGEA GAAAGETEAA
641 AAGTGCTCGA GCGGTAGAAG ETGETTCTGA GTGCCACTGG GAGGGGATCA CGAGYEATAA EACGTTGCGT AATGGGAETG
541 GTCAATGECT TGGCTATATT TETGCGCTTG CTGGAAATTG TGTCAAGCTC GGCCAEAGIG GAGEAAAEAG TGAEAAATAG
721 GGGETGGGCT ETCECGCAGA 6GATGGEAIA ACEACAGAEG ACGACGACGC ETGCTATCTG GACCGAAGCC EAATTGAGAG
721 TGAAGEAAAA AAGAACAAAA AAAACCAAAA AAAGAAAAAA AGCAAAGAAA ATGEEGCCCA CTCGCTGCAC TGGGEACGEA
861 EEAAGTTETT TTTGGAETET GECGCTGGTC TGGECTGCTG CACACGTEGC TGACGGTGAE GAEGACGT£1TGAGATTGCT
801 CGAGTCATEG CCAGCTTGCA T&ATGAEGAA CAGGGAGEAC TCTCAGA(GA IGGCG1AAGE ATACAGACGE ETG¢GCTATC
Z43
881
881 GCACAAAGCG CAGAATGGGA 6CICGGAAAC TTGGGAACTC GGAAACTECG GCGAGAGCCA AAATIGCAIC ACA11666(A
TGGACCCGCG CCGAATTAGT GCAGCGAGTT TTGGAGECAA GTCGCTTGGC AAATGCAEGT CGCCAGCGGC GAEGTCTTCA
861 GTCAEGTAAG EGAATAAT~A ATGCCTGGAT TTCGTATIAA AACATACTAT ¢EGCAGCAGT CAGGE fAG TTACCCCACA
961 GATTQCGGET GGAACACAAA GAACTTGGEA TGGCACAG&E GAGAGAE[GT CTGCTAGAGT CTAACTTGCA TEAIATTEG~
1041 AECAGAEAAA CCAAACCAAA CEAAATAECA AA ATG ATG AAG TTC ATG GTAAGTCCC CGC&CAG¢(~ IACCTC¢TC¢ Ae~ ~ ¢ t Lys Phe ~ r t 1117 GCAGCACCAG AGACTCAT£T AAATCCTTTC CTTAG TGC ATT TIC ATC TGC GCC GTG GCT GCA GTt TCG GCC Cys t i e Phe I l e Cys 814 Vdl Aid Ala Val See At4
1041 AC~--~AAACG AATTGGEATT GCETGGAITT TTAGIATATA AGCAAAACTG GTCGCTGEAE TCEAGACATT AGTTAGETAE 1121
AGe GGA TAT GGA GGT EGG CGT CCC AGC TAT GGC AGC GCA ECA ATT GGC GCC TAG GCC TAC GAG GIG 5er Gly Tyr Gly Gly Arg Arg Pro See Tyr Gly Set Ala Pro l i e Gly AI4 Tyr A l l Tyr Gin V41
CCCACAAACT AAGCCAATTA GCC&EAAA ATG ATG AAG TIC ATG GTA AGTGE¢AGCG CATCCTGCTG CCATCTGTAG Met xet Lys Phe ~et 1197 TCTCCAACTT ATTTG¢¢ACA AITGTGTTCT TCAG rGC ATT rTC GTC TGC GCC ATC GCE GCC GTT TEG GCE C¥s l i e Pne vat Eys k l a l i e Ala Ala Val Set Ala
1254 CAG CCT GCC CTG ACC GTC AAG GCA ATC ATC CCC TCG TAT GGC GGA GAG CGA GGC TA| GGC CAC AAC Gin Pro Ala Leu Thr Val Lys Ala ] l e l i e Pro Ser Tyr 61¥ Gly Gin Arg Gly Tyr Gty His ASh
AAT GCG TAG GGT CGC GAG 6GT TAG GGC AGC GIG CCC GTC GGC GGC TAT GCE TAG GAG GT6 GAG CCT ASh AI4 Tyr Gly Arg Gin Gly Tyr Gly Set Va) Pro val Gly 61¥ Tyr AI4 Tyr Gin V41Gln Pro
1320 CAG GGA GGA TAT GAG G¢G GCA CCE ATT GE( TCG GEC IA( GGC AAI GCT GAG ATT GGC AAC GAG T&[ G)n Gly GI¥ Tyr Glu Ala 814 Pro l i e AI4 Ser AI4 Tyr Gly Ash AI4 Asp l i e Gly ASh Gin Tyr
1333 GCT CTG ACC GTC AAG GCT ATC GTC CCA GCT GGT GGA TAT GGT GGT GGC AGE TAC GGT GGE GGC TAT Ala Leu Tar Val LXS Ala l i e Val Pro Ala Gly Gly Tyr Gly GIX Gly Set TXr Gly Gly Gly Tyr
1386 GGA ¢CT GTC TCT GGC TCC CGC TAT GGT GGT GEA EGG CEC GTA GAT CGT GAG GCC ATT GC1 E1G GcC// Gly Pro Va) 5er Gly 6er Arg Tyr Gly Gly AI4 Pro Pro V41 Asp Arg Glu AI4 I1r Ala Leu Ala
1399 GGC AAC AAC AA( TAG GGC CGC AGC ATT GAG GTG CCC GTC TCE GCC CAC TAC ACC TCG AGC CG? GGA Gly Ash Ash ASh Tyr GLy Arg Srr Tie GIu V41 Pro V41 See 814 H15 Tyr Thr S i r Ser Ar 8 Gly
1482 AAG CTT GCE CTG GEE GEA EC( AGT GCE GGC GGT GEE ETT GTE TGG CGT GAG GC¢ CCA CGT CGT GTG Lys Leu Ala Leu Ala Ala Pro Set Ala Gl¥ G1¥ Pro leu V41 Trp Ar 9 Glu Ala Pro Ar G Arg val
1465 TAT GGC GCC GCG ECC GT¢ GAT CGT GAG GCC AT£ TCC CTG GCC AAG CTT AGT CTG GCT GEA CCC AAT Tyr Gly Ala Ala Pro Va) Asp Arg Gin Ala l i e Set Leu AI4 LVS Leu Set Leu AI4 Ala Pro Asn
1516 GAG CAT GEE TAT GGC CCC A~C AAC TAT GGT GCA CCE GAG GAG AGG TAG GCA CGC GC~ GAG GAG GCC Gln His Ala Tyr Gly Pre Get ASh Tyr 61¥ A~a Pro 61. Gin Ar 9 Tyr Ata Arg Ata GSU Glu A ; i
1831 GET GGT GET C{1 CTC GTC TGG AAG GAG CEA CGT GAG ATT GTT GAG £GE TET TAT GGC GET GAG EAG • 14 G1y Ala Pro Leu Vsl Trp Lys Glu pro Arg Gln 11e v41 61~ Arg Set Tyr G]¥ Pro G)n Gln
1584 GAG GGE Get TOG GCG GCA GCT GCA TCC AGC TCA GTG GCT GGT GTT GCC AAG AAG GGA TAG AGG AAG Gin GAy Ala See Ala 814 AT4 AI4 Set See Set val A14 GI¥ V41Ala l y s lyS Gly ?yr Ar8 l y s
1697 AGE TAT GGC CAG AAG EAC AG£ TAC GGE TAT GGC GAG CAG GCG CAG GG¢ GeT TCG GEA GCC GET GeE Ser Tyr Gly Gin l y s HIs See Tyr Gly r y r Gly Glu G)n Ala Gin Gly Ala Ser Ala 814 Ala A14
1650 TEA TCC TAT TA~ ATTGG AGTCCTAITA AGGCGGAACA AGCACAGTGA AT¢TAAAGAA TAAATG¢AAC 6AAACATCAA Ser Set Tyr - * -
1663 TCC AGC Tee GTG GCT GGC CAG CAC TET GGC TAG AAG AAC TOT GGC TAC AAG AAC TCE AGE TAG TAA Ser 5er Set val Ala GI¥ Gln H$S Ser G1¥ Tyr Lys ASh Set 61~ Tyr LyS Asn Set Set Tyr -**
1727 GAETTAAAAC TTTCTAGAGT TEATT¢GTTT GGGTGGCTTT TGGTTTCAGA TTTETGGCTC GTAGCTTTGG GCTGTTTAGA
1729 TEGAGCTGAG AAACGETACA GTGAATACAA AGCTAAEAAG ETTGGGACTT ATATAGACGC ATAAGAEAGT AGAAAGTAA~
1807 GCTGCACAAG CGETEAGTTT GTGGTkCGTT CTGGAAATTC TEGGTCTGTT TGGGCTTGAA GTGAATGCAT ATGGAGCATG
AA~
1188
1261
18._o9
1887 AATGGTGTGG CTAAAGAAAA GETAAAGAIT TAAGCATAG
D. grimshowi
1281 TCATTGGCAC GCACGAGTCA TCAGCTTGCG TAATGACGAC GACGCCCACA ACGACGACGA CGAGAACGAT GTGAAGCAGG
1 GCTGACCEGC TGAAAEATGT 6CGCATCAGE TGGCCAGEGA CTAGAATCAA CTACTCAAGE GCTCTGCAGT AEGTGTCGTC
GAGCAGGGAG CACTCTCAGA CGATGGCGTA AGAGCAGACG CCTGCTATCI GGACCCGCGC CGAATTAAGC AGCGAGTTTT
81
1441 GGACTGAGTT CAGGGGGTTA GACATGCAGT TGTTTGGA~T GGATTGCACG TCGCCAGCGG CAAAAGCGAC GTCTTCAGAT
1361
AACGAGTTTG ATAAAAAAAA GAAAAAGAAA AGAACAGTTC AAATTTTGTA TAGATATATA £AATCGAECE GCCTTGTGTT 161 TATTGATGT6 TGTTTTCGTC TCTCGTCTCT CGAATAEGET GCTACAATTA TTGTTTAGEC EEGCCTTTGG CAGEGG¢AAT 241 CGTTCATCAA ETGCTTGGCA TCGGCAACAC TGGCAATATG CTTCTEGAAE ACAACCAETG CEAGEACAAT GG¢CAEATGT 321 ATGATETGET EA~AAAGAAA ACAAGC¢ATT AGCCTCATCA ATCGAAGCAA TCGCTTAAAC ATTACAAATC ACACATCTTA 4OI CCTTGAGCCA CATAGTACGE ATTTTGTGGG GATTTAGTTT TETGTAEGGG CTGACGGGCG EGCTGCTETT ATAAAGTCGC 481 6TGTCCCGEC AGAEGTGCCC CTCATCAAGA TTAGAGAGCC EAGTGCACTG GGTGTGCGTC AATCTTGGAC TGCTTCTGTT 561 GTTAAGTATG TTGTTAAACA AGCTGATGAT GCTTGAGTAA TCGAGAGTTA TATTETCATT CTEACCCAA8 ATGTGAGAGA 641 ATTGTCTTAT TAAGTTTAEG ATTAATTTAA AAGAAACACT GTCATAGTTT ATTA¢TCTTT GETTGTTGAT TCAAAGSTTT 721 ATETTAAAGT TTTAATTGET CTCTCTTCAT AGTTAACTCA GTGACTTAEG ATATGATCTT EAAGTCTTTT AATECAAAAG 801 TGTAATTASA 6TAAATTACC AATCAATAEE ATTTCTATTT ETATTACTTC 1ATGCTAAAT ATAATGAAET GAATTTTAGA
881 CTTAATGTGA AAACCTATGA ATAATCACAA TATTTATTTG CEAAAAATAA TAACAATAAA ATCCATAATA ATACTEAAGA 961 AAGEACTTAT TTtGTGATGG ATAATTATGA GTGAATATGC EGCCGCTGTT GEGSTAGEGC ATT,EAATSA GTTCAAATEA 1041
35Z1 TGCAATTGGA AAACACAAAG AACGCACAGA GACGATCGTC TGCCAGAGCC AAAAAC¢TTG ¢AT¢ATATTC GTCACG,~__.TAAG 1601 I"~ ¢GTATTGGCA I;GCCTGGAT TTTTACTATA TAAACAAAAC GGGGCGCTCC AGTCAGACAT TAGTTAACTC TACCCAAAGA 1601 CAAACTAACA CCAACTATCC AAACAAA ATG ATG AAG TTC ATG GTAA GTGTCTGCGA ATCCTGCACG CAT¢CCATTT Met Her L¥$ Phe Met 1757 GGCAACCAGT TGCTTATTTT TCCCAATCCA CATTATTTTC TGCAG TGC CTT TTC GTT TGC GCC ATC GCC GCT Cy$ LOU Phe V41Cys Ala 11e A14 A14 1829 GTT TCT GCT TCC GAG TAT GGT AAT G?G GGT TAT GGA CGT GTG CCC GTT GGT GGA TTG GCC TAT GAG Val Set Ala Set ASp Tyr Gly AGe Val Gly Tyr Gly Arg V41 Pro Val Gly Gly Leu Ala Tyr Gin 1896 GT6 GAG CCT GCT CTG ACT GTT AGC TCT ATC GTG CCC 611 GGC GGT TAT GGT GGC GGC TAT GGC GGC v a l Gin Pro Ala Leu Thr Val Set Set l i e 941 Pro Val 61¥ 01¥ Tyr Gly Gly Gly Tyr Gly Gly 1961 GGC AGG GGC TAG GGC AGG GGC TAG GGT CGC AGC GTT GAG GTC CCC GTT GCC Get GTG TGG ACC CCG Gly Ar 9 Gly Tyr Gly Arg Gly Tyr Gly AP9 Ser Val Glu 941 Pro VAI 814 814 941 Trp Thr Pro 2027 AAC TCC CGT TAT GGT GTC GCA CCC GTT GA1 CGT CAA GCC CTC GGC £TG GCC AAG CTT AGC CTG GCT Ash Set APg Tyr Gly Va|'A14 Pro 941 Asp Ar8 Gln Ala Leu 61y Leu A14 Lys Leu Set Leu Ala 2093 GCA CCC GGT GCC GGT 663 CCT CTG GTC TTG AAC GAG CCC CGT EGG ATA ATC AAG GTG TCC GGE TAT At4 Pro GAy A]a Gly Gly Pro Leu Val Leu Asn Glu Pro Ar 0 Arg [ l e 11e Lys Val 5er Gly Tyr 2159 GGC CCA GAG CGC GGC TAG AAG GAG CEA TTG GGC TAG GGA TCA ATe GAG CAG GET CAG GGC GCT TOG Gly Pro Gin Arg Gly Tyr Lys Gin Pro Leu Gly Tyr Gly Set l i e Glu Gin AI4 Gin Gly Ala Ser
TTTGEAATGT TTTATTAGTA TTATAAAGTG CTTTGCGCTG EAATTCGAAA TTTGTTTATA ETTTATAATT TTATTGTBAT
TTTATCGTAA ETTTTTTTGC TTTTGTATAT AAATTTTACE AACGEAGCAG AASGTTCAAG TCAATGCCTT TTGCTGTTTG
2228 GCT GCC GCT GEC TCC AGC TCC GTG GCT 666 GAG AAC AAG GGA TAG GAG AAE GGC GGC TAT TAA GCA AIa Aia Ala Ala Set Ser Set Val Ala Gly Gln Ash Lys Gly Tyr Gin ASh Gly Gly Tyr - - -
IZO1 TGEAEEGCTG AATAATTGTG TCAAGCTCGC CACAGTGGTG EAAAEACTGT CAAAATGTGG ATEAAAATGA EGECGAETAC
2291 ACAGCAAEGC AACAGTGATT AAGCACAGCT GAAAAGCTCT GGTTCTTATA GACGCATAAG GCAGTAGGAA GTAATAAA
1121
Autosomal Chorion Cluster in D r o s o p h i l a
229
D virilis -840
-840 : ......... I .........
--560 : ......... : .........
i
L___...I...
i
~_.!
!
-58o ! ..... A~. . . . .
: .........
+ 1 : .........
+280 : ......... : .........
: .........
iLL i
i
!
i
!
i
i
i
i
!
i
i
i
i
i
i
: .........
i G6 i
i
+840 : ......... : --840
i
....... i ......... ......... i ......... i ......... i ......... ! ......... ! ......... i ......... i
i i i
!i \ i
i
0i
-28o~ ......... i ......... i'",~ ..... i'""~
~ : i ) :
-F'560 : .........
L"i6AA...... !......... ~......... i ......... i ......... i ......... !......... i ......... i ......... i"t ...... i - 5 8o
i ......... !, .... i
-280 : .........
:
i
i
i
!
!
!
i
i
........ i'%~ ..... i ......... ! ......... i ......... i ......... i ......... i ......... i ......... !-2.o
:
.__~ ° ~
:
:
J
~
:
~
:
:
~
:
~
~
:
i
:
i
:
:
......... ~......... ~......... ~ ......... ~..... ...-~.... ..... .,i ......... i ......... ~......... ~......... ~......... i ......... i ......... ! !~ :
\ ",,4,~D
+280..................................................... <"
"i .......
.................................................. ~......... ~
..................................................
i ...................
.... , ......... t......... t......... i.........
t
.... i......... ! .........
: .......
L: . . . . . .
:..
+56oii i~ i! ~i! ,i~ i! i! ~! ~ i:
i ......... i ......... i ......... i ......... ! ......... i ......... i ......... i ......... ~ ....... °i ......
i
i
. o 0 , , ° . . 0
i
i
. . . . . . . . . . . . . . . . . . . . . . . . . .
i
°
i
....
o°o°0
i
° . . . . .
°°°
i
! !,G
i/
!
!
!
'"i.........i.........!
i ......... i +58° "
".". . . . i
° . . . . . .
, . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . -560
-280
+'1
ii+2so
+280
+560
+840
+840 +tt
20
D. me/anogaster Figure 3. Matrix analysis of interspecies 818 sequence conservation. Pictured are comparisons in 2 closely related species (D. yrimshawi and D. virilis) and 2 distantly related species (D. grimshawi and D. melanogaster). Letters indicate sequence matches, with A corresponding to identity in a 19 bp window, and each subsequent letter representing a 2% lower conservation score; L represents a score of 78, the lowest shown here. Direction of transcription is from upper left to lower right. Exons are boxed, and the untranslated terminal regions are separated from the coding portions with broken lines. Note the extensive conservation in the first 600 nt of 5' flanking DNA. Sequences are numbered relative to the transcription initiation site, and the 2 matrices are separated from one another by a continuous line. Matrix settings are: range = 9, scale = 0"95, hash levels = l, jump interval = 1, minimum value plotted = 78.
- 4 1 2 a n d - 4 8 2 : it is invariable in length a n d includes the three longest perfectly c o n s e r v e d blocks of the 5' flanking D N A (16, 24 a n d 25 nt; at least 9 0 % overall i d e n t i t y in all four species). I n d e e d , this s e g m e n t is the m o s t e x t e n s i v e l y conserved D N A in the entire a u t o s o m a l chorion gene cluster, including the c o d i n g regions. I t is n o t a b l e t h a t this s e g m e n t is in the middle o f t h e p r e d o m i n a n t amplification control element, A C E 3 ( O r r - W e a v e r & Spradling, 1986), which was m o s t r e c e n t l y m a p p e d
between - 3 1 0 a n d - 6 3 0 see Fig. 4, top).
( O r r - W e a v e r et al., 1989;
(c) Chorion gene amplification is comparable in D. m e l a n o g a s t e r a n d D. g r i m s h a w i W e h a v e s h o w n p r e v i o u s l y ( M a r t i n e z - C r u z a d o et al., 1988) t h a t t h e chorion gene cluster u n d e r g o e s amplification in t h e o v a r y o f all Drosophila species examined
(Spradling
&
Mahowald,
1980).
To
Figure 2. Sequences of the chromosomal regions encompassing gene s18 in 3 Drosophila species. The sense strand is shown and the encoded polypeptides are conceptually translated. Arrows mark putative transcriptional start sites, determined by homology with the known start site of the 818 gene in D. melanogaater (Levine & Spradling, 1985). The TATA boxes, polyadenylation signals (AATAAA) and chorion-specifie hexamer (TCACGT) are highlighted by lines above and below.
C. Swimmer et
230 .,; . . . . . . . . . . . . ,
,
,o. . . . . , . . . . ., . . . . .,
,oo
~1.
-.,oo
-,oo
-SOO
400
-400
-=oo
-zoo
-too
H F
DmACE3 SMI .~"
O. m e / o n o ~
-931 n *1
n
n
- ¢31i
.
r*
+Z/ZT//
/// IL 2,I ,
I
D8 ACE3 -499 -494 Om -499 Os -702 Ov -583 09 -594
•
-482
~TTTGCGCTG~
.
A'~'CGA~T ~ ' ~ .
.
.
.
.
.
.
.
.
.
.
.
. ~TATA(~I : ~ , ~ A. :. . . T. , . i ~ ] .
-557
-427
-412
~
I~:~*~
#~il.
~cI~.~'~
:~ '
~.'
-502
-533
- 399 - 394
i ~ ~ . . ~ : ~ X c ...............
-487
"~ATA~
Tc,~e,~4~:0~
............... :......................................
i
• T
- 361 - 357
........
.
. . .
-504
........ ~
~'~CGCTTC~G~ .'A~'T!T' I : ~ |
-474 -469
-4Z9
-527
- 367 - 362
~,m:,c ~e
~ = ~
-45Z
G T. . . .A. . . .;. k .~ .: | .~ i. ~. G. ' ~. £. :.~ .' ~. I .
:
.
-594 -589
Dm -4Z7 Ds -557 Ov -490 Dg -502
-458
~:~.~c~
•
m
1-441 -436
........
I~:r~l.. GO~A~($A ........ ..................
-435 -431
~-~ - 4 4 2 -343 -339 Dm -343 Os -388 Ov -405 Dg -419
~
-294
G~AA~ •~i. . . . . . . ~ .
-364 -Z50
-~65 D$ "311
-241
-Z31
~39bpmGGAG
-191
. . . . .
A
.
*186
.
-162
.
.
. .
-358
-215
C ~ I~:1" .
G
-Z09 -Z05 C
~
...... ~Gi~... GCCCAA~;~G~...
:::.. I~:1
.
-157
"348 -344
.
-148
.
-144
.
-141
.
A
-135
• o • . . . . . ~C~...CA ~ . • Ce~TGC. • .GGATT. . . . . . . . . . CG~CC~ . . . . . . ~ . . ~ T~...~(~ . . . . . . ~{~G ~ CA ~ . . . ~ ..... ~C~AGTTCT I ~ ~CT~CCGC . . . . . . . . . . . . . . . . . ~G~TC~C T~,C.. ~CA I ~ ' ~ 1 ~TGA~TGACGA~ [C¢~¢~1~G I ~ # ~ 1 • • • ~ . . . . . i~ ~ . . . . . I~..-~,,~ ~GC~ . . . . . . . . . . . . . . . . . . . . A G ~ C ~ C T ~ . . . AA~.. I ~ 1 ~¢~:A~*~. . . . . . ~ I~;I ~c I ~ £ ~ / ; I GGC~AA..i~ ~ :~ ~TG~:GTTCAGGGGGTTAGACATGCAG~T~TT~ATTGGAT~ . I ~ " ~ 1 ~CAG~:;CAAAA:(~ I ~ C ~ I ~C I ~ : ~ e ] AAT~G~AAAA:~ .........c~......... I ~ ....... 1_,,._19~ . . . . . . . . . . . . . . . . . . ,- ............ ~ . . .. ................ "22Z -Z17 ,~l -17Z -167 -149 -145 -14Z -136 -189
-131 -IZ6 Om - 1 3 1 Os -143 Ov -132 Og -1Z4
-Z23 -Z2Z
................... G~TCTCGT~..GGT~~CA~ ......... ~ . . ~ ............. GGACTG~.~CTGGGC~CCCGCl + ~ I A~CCA~GACGACGAC I ~ | " "
Dg -343
Om -199 Os -ZZ7 Dv -Z05 09 -ZZ8
-Z70 -Z66
TCGAA~T . . . . . . ATGC~. ~TCCGGA~AA ~ GCGGTCGGAAT~ACG ~ GTG.~GCGGTAG~G.. ¢i~CT~ . . . . . . . . . . . . . i~AGT*¢ei.~¢~GG . . . . . AG~GkX. . . . . - ~ ' " " I ~ T ] . . . . . A~CG~I~¢~ IT~aT~I ,~,G~ .~A . . . . . . ~ T A G ~ A A ~ A A A A ~ 41bp"~G. CC~C~CGCTGCACTG~C~C . . . . . ~ " ' " I~:'L~A~]. . . . . CGCC~GCi~f~AIXAmel • ~ , ~ . ~ T . . . . . . ~AATG~GAT~AAAA . . . . . . . . . ~AC.~G~ACTC...ATT~C~C . . . . . ~ i ' ' - | G [ ~ T ~ ........ ~ G ~ ~
-419 -415
Om
-Z88
TAMPA. . . . . . . . . . . . . .
-79
-72
I~t--1 e b p - - ~ T T ~ . . . . ~ A ~ T : . . . ~ . G A A . . . ~A~.. . . . . . . C C ~ - G C ~ ¢ ~ A . . : I ~ 1 I~] ......... ~TT¢~CAT~C~..~¢~A~AC~ ........... ~T~T~...:~/~1 I ~ ! ......... ~.. :.....~C~A~T .... ~ ........... ~ A ~ C ~ I~S~AI - 1 2• 4
- 1" 1 9
. . . . . . . . . . . . . . . . . . . . .
L
................... -81 "
-97
-74 "
-66
-58
C ~ C CCAI ~ 1 ~ATA~.. ~ .... I ~ 1 ~'''4~ t ~ .... I . ~ ¢ ~ ! ~ . . . T ~ ¢ .......
-68 "
-60 " "
.....
Exon -45 Om Ds Dv Dg
-50 -46 -51 -50
Coding Region I
Om DS Ov Og
-37
TATA
+~
+8
GC~TC~ . . C~GI~T&~A~T TG~.. ~AA TA~ ~ . . C~T~G., ~:1CG~... TC ~ . • ~ C ~ C ~ G ~ T . . ~G .. :~A I ~ 1 - , ~ c ~ X ~ i •~ A T C ~ G ~ . . ~ G I ~ . . . ~ . : ~ ' 1 . CC~C~AA'~GAC~AAC ~ ' i ~ ( ~ ' A ~ A ~ . ~ . ~ . . . ~.~ • .~T I ~ 1 T.~::~A~ ' ~ T~G ~ ~TGGT~ T G ~ C~C " ~ I ~ ! 1 • • ~'~ii~ C-~c~ . . . . . . . ~ ; r ~ .~T~. ~ , .~
J I
I ; ~ ~ , ~ > ~ | ~ i ~ ] | ~ ~ ] ! ~ ~
CTTAA~'TT~C~ATTGTTTC,~.~CAAC,.~T~AACT., o ~ G T ~ T A T G - - 9 6 b p - - ~ G A ~ T T T . . . . . ~TTG~CC . . . . ~ . ~ i ~CC)C~. ,i~AC~ . . . . . . . . . ) ~ . . . . . T~T~C~CC~C/~A.)~AGA~ . . . . . . . . . . . . . . . . . . . . . . . TCAT~iTA~-~'~CC~T)C~ , ~ ! ~G~A~. o!~GC~ . . . . . . . . . ~ G C . • • T ~ A T C~..~TA~TCT~. • • A~I. . . . . . . . ~ . - ~',~~ . . . . . . . G~J~ACc~ TG~G~T~ C~ ~G~T~. I~GA~ ~GCACGC~T'~C~A~TT~..~A~TTG~, . . . . . . . ~..~TTCCCAAT~AC.~TA~T~T~GC~
I
Intron
E xon
Fig. 4.
Autosomal Chorion Cluster in Drosophila
231
in D. grimshawi and D. melanogaster (Spradling, 1981; Orr et al., 1984; Spradling & Leys, 1988).
(d) Interspecies assays of amplification control
elements sc - 4 ~
ch " l ~
Figure 5. The time-course of chorion gene amplification in D. grimshawi. DNA samples were isolated from adult males (c~) or ovarian follicles of the indicated stages (1 to 14), digested with HindIII, subjected ,to electrophoresis and blot hybridized with mixed homospeciflc probes, including a single copy, unamplified control sequence (sc) and a genomic clone fragment of the s15 chorion gene (ch). Amplification begins by stage 10A and accelerates in choriogenic stage follicles, eventually reaching a level of 140-fold.
compare the amplification process in more detail, we blot hybridized DNAs from staged D. grimshawi follicles and adult males, using as probes both a fragment of the D. grimshawi chorion cluster encompassing the s15 gene and a Control DNA fragment known to be single copy (MartlnezCruzado et al., 1988; Fig. 5). The ratio of intensities of the chorion-specific and single copy genomic bands in follicular DNAs, compared to the corresponding ratio in male DNA, indicated the timecourse of amplification. As shown in Figure 5, chorion amplification begins b y stage l0 and accelerates in later stages (11 to 14), reaching a maximum of a b o u t 140-fold. Thus, both the time-course and the final e x t e n t of amplification are comparable
Considering the comparable features of amplification in different Drosophila species and the extraordinary sequence conservation in the middle of the D. melanogaster ACE3 region, we undertook transformation assays to localize amplification control elements of D. grimshawi t h a t might function to support amplification in D. melanogaster. Figure 6 shows the design and summarizes the results of these experiments, as well as presenting typical data. As previously reported, the S I R 3 transposon bearing the entire 1 0 x 103 bp autosomal chorion cluster of D. melanogaster shows extensive follicular amplification when integrated in most chromosomal sites (Delidakis & Kafatos, 1987). The top line in Figure 6 shows a diagram of this control transposon, which also bears the P-element ends and the ry + gene as a transformation marker. F o r this and all other transposons, amplification could be monitored in Southern blots using as probe both the 3' end of the ry + gene and the downstream end of the chorion cluster (ry a and ch a probes, respectively; Fig. 6). With an appropriate choice of restriction enzyme digests, a transposon-specific r y t band (containing both ry + and chorion locus segments) could be resolved from the endogenous rosy bdnd, ry e, as exemplified by the autoradiograms of Figure 6. Similarly, the endogenous chorion band, che, was distinguishable from a transformant-specific band (dots in Fig. 6), which encompassed both chorion and insertion site DNA, and thus was distinct even for independent t r a n s f o r m a n t lines bearing the same transposon. In our standard procedure (see Materials and Methods), amplification in follicular DNAs was evaluated relative to male DNA samples from the same t r a n s f o r m a n t lines. Occurrence of amplification in the transposon was first evaluated for ry t, with ry e serving as an internal single copy control. Amplification of ry t was also expressed as percentage of endogenous (chc) amplification; this is the preferred method for quantitative comparisons, since endogenous amplification levels can v a r y widely. As reported (Delidakis & Kefatos, 1987), and as shown by the histograms in Figure 6, even
Figure 4. Sequence conservation and divergence upstream and at the beginning of the s18 gene. In the top panel, blocks that are invariant in all 4 species ( _>9 nt) or 3 species (_> l0 nt) are shown with filled boxes; open boxes represent blocks that are identical in 2 species ( >_12 nt). Arrows with numbers indicate the DNA segments included in the chimeric constructs of Fig. 6. The functionally defined region of amplification control elements (ACE3) in 2 species is also indicated. The bottom panel shows a detailed sequence alignment, established using criteria that are detailed elsewhere (Martinez-Cruzado et al., 1988). All nucleotides that match in 3 or 4 species are shaded, and invariant elements ~ 5 nt in length are boxed. The boxed elements served as anchor points for aligning the sequences in between. Sequences are numbered from the mRNA start site (q-1). Arrows with numbers indicate the D. gr/mshaud DNA segments that were included in the chimeric constructs of Fig. 6. Note the strong conservation of "islands" in the 5' flanking DNA and the 1st exon, which are interspersed with more extensively diverged regions. Note also the extreme divergence of the intron. Dm, D. melanogaster; Ds, D. subobseura; Dv, D. virilis; Dg, D. grimshawi.
C. S w i m m e r e t al.
232
P
N~ •
ry ÷
AER-d AER-c
,,a• ,IsI
AER-b
AER-a
I ~,n2
I
m p c~ R
71 /
-6121
1
-6121-747 I
S1
-931 I
$1 I
-931 I
Sl
-931
Sl
-931
$1
-931
Sl I
-831 I
71
-931 I
I
m
cm9-1891-747
I. n,,-,,I
-187 I
cm9-311/-747
~,,
-811 I
+7t
m
m
m m
m
''''
....
m '''
¢m9"$t'97
-6
cmg-Sl-196
-S
cmg-51-2O$
-5
cmg-51-442
-6 I
cmg-5/-642
- 6I
cmg-61- O4
I
I
-265
I
I
-442
1
I
1
-642 I
-904 I
~
.......
.-,,,.
, lunl.,i.,,,,
/
m
,m~,,
!,
,m
10
I
m m 80 '''
,:14,4~.,,e.,,.~1~12
cm 9 - 189/-747
-196
cmQ - 3 1 1 / - 7 4 7
3
I
2
3
I
'mi~' i ', l .... 20
108
i ....
30
| ....
4O
S0
I " n l .... 60 70
2
I
I .... 90
i 100
I -904
2
2
I
5
5
d, d , d ,
o', d, d ,
'
.
i
..
"i
ryl
m q~
•
i .... 80
of endoQenous)
-642
-442 I
d,d,
• wp
.,/,.pro
.=. . . . . . . . . . . . . . . . . . . . . .
cmo -5/
I
2
m,.) ''''
,-,. ,-. ......................... .....
Amplificotiorl (%
I
m .....
._;-.L.....,M. ...........................................
&+71-981
-97 -5 ~ -186
I
I
..............
-1891-187 I
I
I
m
''
A-107/-812 $18
-6121-747 I
Sl
~
~r.~ I
,;
r~
ry t
ry,
ry~
ch,
oh=
14 I
I
I '4
5z
,91
18
Percenta e
I
amplification
F i g u r e 6. Structure and amplification of chorion transposons bearing exclusively D. melanogaster sequences, or chimeric combinations of sequences from D. melanogaster (thin lines) and D. grimshawi (thick lines). In the top panel, constructs are diagrammed on the left and their amplification properties are indicated by the histograms on the right. In the histograms, individual transformant lines are indicated by single blocks (filled if amplifying, open if not); amplification levels are shown numerically as percentage of the endogenous autosomal chorion cluster. The top entry is the wild-type, undeleted S1R3 construct. Note the ry + gene, which is used as transformation marker, the P element ends, the ry x and ch a hybridization probes, and the previously defined D. melanogaster ACE and A E R regions; the histogram summarizes new d a t a (5 lines) as well as d a t a previously presented (Delidakis & Kafatos, 1987). Other constructs are diagrammed with a magnified view of the DNA between the upstream SalI site of the chorion cluster (S1) and the beginning of the 918 gene. These constructs are either simple deletions or chimerae. They are numbered with the endpoints of the D. melanogaster DNA flanking the deletion (A deletion constructs), or D. grimshawi DNA inserted in the gap (cmg chimeric constructs). The d a t a for A constructs (previously named ctc) were presented by Swimmer et al. (1989); cmg d a t a are new. The bottom panel exemplifies the amplification analysis of transformant lines of the indicated chimeric constructs. Lane numbers represent individual transformant lines, and ~ and f indicate the source of DNA from these lines (adult males or stage 13 to 14 follicles, respectively). DNA samples were digested with SstI, XbaI and XhoI, except for those
Autosomal Chorion Cluster in Drosophila Table
233
1
Statistical evaluation of differences in amplification levels
¢O
¢~
S 1R3 A- 187/-612 crag- 189/-747 cmg-311/-747 A+ 7/- 931 emg-5/-97 to emg-5/-442J emg- 5 / - 642
~
I~
I
I
I
0"001
0' 1 0.005
0"1 0.005 NSD
÷
0'001 NSD 0'001 0.001
I
r2q
@q
L
I
I
0'001 NSD 0'005 0'005 NSD
0.005 0"05 NSD 0"1 0"01
NSD
0"01
0"001 NSD 0"001 0"001 NSD
0-05
P<: values are listed for the null hypothesis that any 2 constructs amplify to the same level. NSD = P>0"l. The test used was the Wilcoxon 2-sample test. Values indicating statistically significant differencesare underlined. Because of small numbers, the results of constructs cmg-5/-97, cmg-5/-196, emg-5/-285 and cmg-5/-442 were pooled. NSD, not significantly different. the percentage amplification levels v a r y substantially in different independent lines bearing the same transposon. Because of this sensitivity to chromosomal position effects, a large number of lines must be analyzed to evaluate 'by statistics the amplification properties of a given transposon. In the experiments summarized in Figure 6, the intact S1R3 construct ranged from 3 to 213~/o amplification, with the average value for 17 lines being 58~/o. As previously reported, and shown in Figure 6, amplification is nearly abolished in a deletion derivative of S1R3, A--187/--612, which lacks only 424 bp including the ACE3 region (Oft-Weaver & Spradling, ]986; Swimmer et al., 1989; Orr-Weaver et al., 1989; Fig. 6, top). However, when the missing DNA was replaced by a corresponding portion of the D. grimshawi DNA, nearly normal amplification was restored ( c m g - 1 8 9 / - 747; significantly different from A - 1 8 7 / - 6 1 2 at P <0"005, and not significantly different from S1R3 at P < 0 " l ; all statistical estimates are shown in Table 1). Thus, the ACE3 region of D. grimshawi can support essentially normal amplification, presumably in concert with the various amplification-enhancing regions of D. melanogaster, downstream from s18 (AERs; Delidakis & Kafatos, 1989). Indistinguishable results were obtained with cmg--311/--747. Therefore, the ACE3 region of D. grimshawi is defined functionally as - 3 1 1 to - 7 4 7 ; the --189 to - 3 1 0 region of D. grimshawi, which contains five conserved sequence blocks, does not appear to be i m p o r t a n t for amplification. Additional DNA replacements were undertaken in the context of a more extensive deletion, A-t-7/
-931.
For these experiments, precisely aligned
D. grimshawi and D. melanogaster sequences were fused at - 5 / - 4 , with the D. grimshawi DNA extending upstream to various endpoints, as diagrammed in Figures 4 and 6. Taken as a group, the four constructs bringing in D. grimshawi DNA to - 9 7 , - 1 9 6 , - 2 8 5 and - 4 4 2 showed no significant improvement in amplification (not enough lines were tested for a statistically valid evaluation of individual constructs). With replacement up to - 6 4 2 , a possible minor improvement occurred, but was not statistically significant. However, replacement up to - 9 0 4 resulted in substantial improvement: c m g - - 5 / - 9 0 4 amplified significantly higher (P<0"01) when compared to either A + 7 / - - 9 3 1 or the four constructs bringing in up to - 4 4 2 of D. grimshawi DNA, and was also marginally better (P < 0"05) than e m g - 5 / - 642. Nevertheless, even this most extensive replacement did not restore amplification to its normal level: c r a g - 5 / ± 904 was significantly different from S1R3 (P<0"005; Table 1). The implications of these results will be considered in Discussion.
4. Discussion
This study completes our analysis of the structural evolution of the autosomal chorion cluster in the genus Drosophila. Together with the first two reports in this series (Martinez-Cruzado eta/., 1988; Fenerjian et al., 1989), and previous reports on D. melanogaster (Wong et al., 1985; Levine & Spradling, 1985), we now know the sequence of
derived from cmg--5/-442 line 2, emg -- 5/ -- 642 lines 1 and 2, and emg-- 5/ -- 904 line 1, which were digested with SstI and XhoI. The DNAs were blot hybridized using a mixture of ry R and eh~ probes (see top panel). Endogenous rosy and chorion bands (rye and chc) and transposon-derived rosy bands (ryt) are labeled. The fourth band in each lane is the variable transformant-specific chorion band, eh, (dots). The percentage amplification levels (numbers at the bottom of each lane) were evaluated by dividing the f/2 ratio of ry, by the f/d ratio of eh e.
C. Swimmer et al.
234
33 x 103 bp of DNA, encompassing four major autosomal chorion genes and their flanking DNA in four Drosophila species. Certain gross features of the cluster are constant for all species: the fixed order, tandem orientation and tight clustering of the genes and the existence of a single short intron in every case (which is near the 5' end in s18, s15 and s19, but displaced towards the 3' end in s16; Fig. 1). In contrast to this organizational constancy, the cluster shows extensive diversification, as might be expected from the long evolutionary distance between these species. A measure of the diversification is the almost complete randomization of the intronic sequences. Even the coding DNA sequences are divergent, especially in the more distantly related species (Fig. 3). In contrast, "islands" of strong sequence conservation are found in the proximal 5' flanking DNA. We have hypothesized that these islands correspond to cis-regulatory elements responsible for the remarkable tissue specificity and gene-specific temporal patterns of expression, which are highly conserved in evolution (Fenerjian et al., 1989). For example, in every case a characteristic hexamer motif, TCACGT, is found embedded in a gene-invariant sequence:
s18 s15 s19 s16
GTCACGTAA GTCACGTAA GCGAGATCACGTTT TTTGGTCACGT.
The TCACGT hexamer is known to be necessary for expression of the s15 gene (Mariani et al., 1988). Interestingly, the variant sequence that encompasses this hexamer is identical in s15 and s18, both of which are expressed in late choriogenesis. At least one other conserved element has been shown experimentally to be important for gene expression: a linker-scanning mutation that abolishes a conserved TATGAAAT motif at -- 108 to - 101 of s15 reduces in vivo mRNA levels fourfold (B. Mariani, personal communication). For s18, the 5' flanking sequence conservation both is stronger and extends further upstream than in the case of the other three genes. The >90% conservation in an upstream 71 nt segment ( - 4 1 2 to - 4 8 2 in D. melanogaster) is particularly notable. Since amplification is quite similar in D. melanogaster and D. grimshawi (Fig. 5), and the conserved segment is in the middle of the 321 nt D. melanogaster amplification control element, ACE3 (Orr-Weaver & Spradling, 1986; Orr-Weaver et al., 1989), a reasonable hypothesis is that this segment plays an important role in the amplification process. Our functional tests for amplificationcontrolling elements are not inconsistent with this hypothesis, but additionally indicate that nonconserved sequences are important in amplification. The D. gr/mshaw/ ACE3 can be defined functionally as --311 to - 7 4 7 (Fig. 4, top), since the presence of that region in cmg--311]--747 clearly restores amplification, to a level statistically indistinguishable from that of S1R3 (Fig. 6 and Table 1).
The 71 nt conserved segment indeed is contained within that region. The identical amplification behavior of c m g - 189[- 747 and c m g - 3 1 1 / - 747 demonstrates that the D. grimshawi - 1 8 9 to - 3 1 0 region is unimportant for amplification. The limited results of c m g - 5 ] - 4 4 2 also suggest that the D. grimshawi ACE3 does not wholly reside downstream from - 442. The c m g - 1 8 9 ] - 7 4 7 and c m g - 3 1 1 / - 7 4 7 comparison further demonstrates that a deletion can be made near the ACE3 region without non-specifically affecting amplification. This conclusion is important because of the differing properties of c m g - 5 ] - 6 4 2 and c m g - 5 [ - 9 0 4 . These replacement constructs differ by a 262 bp DNA segment, half of which is within the D. grimshawi ACE3 region and half is further upstream. This entire segment is A + T-rich, like much of the surrounding DNA, and is otherwise unremarkable; it contains no conserved sequence blocks (Fig. 4, top). Yet its deletion slightly reduces amplification: c m g - 5 ] - 6 4 2 amplifies worse than c m g - 5 / - 904 at P < 0"05 (Table 1). It should also be noted that c m g - 5 ] - 9 0 4 does not amplify as well as S1R3 (P<0-005, Table 1). One interpretation is that some functionally important, but not sequence conserved DNA is located even further upstream, in the segment defined by D. grimshawi - 9 0 4 and D. melanogaster -931 (approx. 150 bp; see the alignment of Fig. 4, top). Alternatively, the D. grimshawi sequences downstream from - 1 8 9 may not be as conducive to amplification as the corresponding D. melanogaster sequences. In summary, the D. grimshawi ACE3 region ( - 311 [ - 747) functions well in D. melanogaster, and can substitute for the latter's ACE region. Despite the presence of the highly conserved 71 nt segment in the ACE3 of both species, the data available are not sufficient to establish unequivocally whether or not this segment functions in amplification. In contrast, the data indicate that non-conserved sequences farther upstream are important. These results are consistent with a recent study of the D. melanogaster ACE3 region (Orr-Weaver et al., 1989). In that study, reduction in amplification resulted from a - 3 1 0 to - 3 6 8 deletion, and probably also from deletions of either - 5 2 0 to - 6 3 0 , or - 3 7 0 to -410; in contrast, deletion of the major conserved segment ( - 4 2 0 to -480) had no detectable effect. As these authors comment, their results establish that in D. melanogaster the ACE3 function depends on at least two non-essential elements, but do not exclude the possibility that the major conserved segment is also important for amplification (e.g. for attaining the highest gene copy levels; Orr-Weaver et al., 1989). Although some deletions/insertions around the ACE3 region that do not reduce amplification are possible (Orr-Weaver et al., 1989, and this work), we cannot exclude the possibility that the nonconserved DNA upstream from the 71 bp conserved segment functions merely as a spacer: the positioning of a yeast autonomous replication sequence (ARS) relative to a precisely located nucleosome
Autosomal Chorion Cluster in Drosophila
strongly affects the replication activity of the ARS element (Simpson, 1990). F u r t h e r studies should examine the role of the conserved and nonconserved sequences by means of short nested deletions and limited replacements with random sequences. Considering the a p p a r e n t substructure of ACE3 (Orr-Weaver et al., 1989), and the possibility of r e d u n d a n t elements, such studies should await the development of procedures for directing transposons to a predetermined D. melanogaster chromosomal site. Targeted integration would avoid the extensive chromosomal position effects, thus making the assay of amplification properties more sensitive and less labor intensive. In the meantime, similar studies can be undertaken to test whether the D. grimshawi 5' flanking DNA can support s l 8 expression in D. melanogaster, and whether any blocks of conserved sequence elements are good candidates for a transcriptional regulatory role. We thank Dr R.C. Lewontin for his interest and advice, members of the Kafatos laboratory, C. Delidakis, T. Orr-Weaver and A. Spradling for helpful discussions, B. Klumpar for photography and E. Fenerjian for secretarial assistance. We also thank M. Conboy and K. Fahrner for assistance and Dr R. Dorit and Dr H. Chernoff for advice on statistical analysis. The work was, supported by an American Cancer Society grant to F.C.K. and a National Institutes of Health grant to R.C. Lewontin. C.S. held an NIH postdoctoral fellowship and M.G.F. and J.C.M.-C. held NIH predoctoral fellowships supported by Training Grant GM-07598.
References
Beverley, S. M. & Wilson, A. C. (1984). J. Mol. Evol. 21, 1-13. Biggen, M. D., Gibson, T. J. & Hong, G. F. (1983). Proc. Nat. Acad. Sci., U.S.A. 80, 3963-3965. Dale, D. M. K., McClure, B. A. & Houchins, J. P. (1985). Plasmid, 13, 31-40. Delidakis, C. & Kafatos, F. C. (1987). J. Mol. Biol. 197, I 1-26.
Delidakis, C. & Kafatos, F.C. (1989). E M B O J . 8, 891-901. Fenerjian, M. G., Martfnez-Cruzado, J. C., Swimmer, C., King, D. & Kafatos, F. C. (1989). J. Mol. Evol. 29, 108-125.
235
Griffin-Shea, R., Thireos, G. & Kafatos, F.C. (1982). Develop. Biol. 91,325-336. Heck, M. & Spradling, A. C. {1990}. J. Cell. Biol. In the press. Laughton, A. & Scott, M. P. (1984). Nature (London), 310, 25-31. Levine, J. & Spradling, A. C. (1985). Chromosoma, 92, 136-142. Levinson, A., Silver, D. & Seed, B. (1984). J. Mol. Appl. Gen. 2, 507-517. Mariani, B. D., Lingappa, J. R. & Kafatos, F. C. (1988). Proe. Nat. Acad. Sci., U.S.A. 85, 3029-3033. Martlnez-Cruzado, J.C. (1988). Ph.D. thesis, Harvard University. Martfnez-Cruzado, J. C., Swimmer, C., Fenerjian, M. G. & Kafatos, F. C. (1988}. Genetics, 119, 663-677. Orr, W., Komitopoulou, K. & Kafatos, F. C. {1984). Proc. Nat. Acad. Sci., U.S.A. 81, 3773-3777. Orr-Weaver, T. L. & Sprad!ing, A. C. (1986). Mol. Cell. Biol. 6, 4624-4633. Orr-Weaver, T. L., Johnston, C.G. & Spradling, A.C. (1989). EMBO J. 8, 4153-4162. Parks, S., Wakimoto, B. & Spradling, A.C. (1986). Develop. Biol. 117, 294-305. Pustell, J. & Kafatos, F. C. (1982). Nucl. Acids Res. 10, 4765-4782. Pustell, J. & Kafatos, F. C. (1984). Nucl. Acids Res. 12, 643-655. Rigby, P.W., Dieckman, M., Rhodes, C. & Berg, P. (1977). J. Mol. Biol. 113, 237-248. Sanger, F., Nicklen, S. & Coulson, A. R. (1977). Proc. Nat. Acad. Sci., U.S.A. 12, 5463-5467. Simpson, R. T. (1990). Nature (London), 343, 387-389. Sokal, R . R . & Rohlf, F . J . (1981). Biometry: The Principles and Practice of Statistics in Biological Research, W. H. Freeman, San Francisco. Spradling, A. C. (1981). Cell, 27, 193-201. Spradling, A.C. & Leys, E. (1988). Cancer Cells, 6, 305-309. Spradling, A. C. & Mahowald, A. P. (1980). Proc. Nat. Acad. Sci., U.S.A. 77, 1096-1100. Staden, R. (1982). Nucl. Acids Res. 10, 4731-4751. Staden, R. (1984). Nucl. Acids Res. 12, 499-504. Swimmer, C., Delidakis, C. & Kafatos, F. C. (1989). Proc. Nat. Aead. Sci., U.S.A. 86, 8823-8827. Tabor, S. & Richardson, C. (1987). Proc. Nat. Acad. Sci., U.S.A. 77, 5789-5793. Throckmorton, L . H . (1975). In Handbook of Genetics (King, R. C., ed.), vol. 3, pp. 421-429, Plenum Press, New York. Wong, Y.-C., Pusteil, J., Spoerel, N. & Kafatos, F. C. (1985). Chromosoma, 92, 124-135.
Edited by P. Chambon Note added in proof. The sequences reported in this and preceding papers of this series have been submitted to the EMBL database under the following accession numbers: X53421 (D. viriliz), X53422 (D. grimshawi) and X53423 (D. subobscura).