Balbiani ring 1 gene in Chironomus tentans

Balbiani ring 1 gene in Chironomus tentans

I. Mol. Biol. (1992) 225, 349-361 Balbiani Ring 1 Gene in Chironomus tentans Sequence Organization and Dynamics of a Coding Minisatellite Gabrielle ...

3MB Sizes 0 Downloads 95 Views

.I. Mol. Biol. (1992) 225, 349-361

Balbiani Ring 1 Gene in Chironomus tentans Sequence Organization and Dynamics of a Coding Minisatellite Gabrielle Paulsson, Christer Hijiig, Kerstin Bernholm and Lars Wieslandert Department of Molecular Karolinska institutet.

Genetics. Medical Nobel Institute S-104 01 Stockholm. Sweden

(Received 26 September 1991; accepted 17 January

1992)

Balbiani ring (BR) genes in diptera encode large secretory proteins and are classical model systems for studies of gene expression. In Chironomus tudans, four closely related BR genes. RR 1, RR 2.1, BR 2.2 and BR 6 form a gene fa.mily. The RR genes have been partially characterized and are known t,o contain long arrays of tandemly arranged repeat units with an hierarchical repeat organization. Here. we report the sequence organization of the complete transcribed part of the f
repetitive DNA; seyuence secretory prot,rin

1. Introduction Repetit’ive sequences are ubiquitous components of eukaryotic genomes. The generation of repeated sequences appears to be an intrinsic consequence of mechanisms responsible for growth and change of genomes during evolution. Studies of non-coding repeats in intergenic rUNA (Tautz et al., 1986). satellite DNA (Willard 8r Way, 1987) and minisatellites (Jeffreys et trl.: 1990) show that the number and sequence of the repeats change at’ high rates, presumably due to recombination (Smith, 1976) and slippage mechanisms (Levinson & Gutman. 1987). In coding repeats, remodelings also evolutionary short time-scales o(‘(‘u1 during 7 Author to whom all correspondence shoultl br addressed.

homogenization;

gene family;

(Calinski et nl.: 1987; Lendahl et al., 1987: Djian & Green. 1989; Philips et al., 1990), but the exact mechanisms and the rules for this dynamic behavior are not. known. \I’e wish t’o study the behavior of coding repetit.ivr sequences using the Balbiani ring (BRf) gene family in the dipteran species Chironomus tentans. The BK gene family consists of the BR 1. RR 2.1, BR 2.2 and BR 6 genes, which encode approximately 1 x lo6 dalton salivary gland secretory proteins. After secretion, the proteins form the waterinsoluble protein fibers of a larval tube, necessary for feeding and protection during t.hr aquatic larval

$ Abbreviations used: BR, Balbiani ring; bp, basepair(s): kb lo3 bases or base-pairs; u.v.. ultraviolet light; (‘-region. constant region; SR-region, subrepeat region.

period (Hotella. 1988; Grossbach. 1977: R,vdlantier & Edstriim, 1980: Kao & Case, 1985). 1n each of the, four BR genes, 200 to 250 hp repeat units are known to be arranged in tandem to form a long repetitious core block. Each such repea$ unit consists of t,wo halves, one constant (C:) region and one subrepeat (SK) region. The repeat structure of the 13Ft genes is therefore hierarchical (see Fig. I ) and appears to be the outcome of mechanisms I-C rnodeliny repeated sequences originatJing from il intjo common ancestor gene-specsific repeat srcluences (I’ustetl of al.. 1984: LVieslander /It (11.. 1984; (irond d nl.. 1987: HiZip et ~1.. 1988). The l
2. Materials

and Methods

(a) Animals

Animals were cultured in the laboratory as dewribed (Meyer rt nl.. 1983). The line of C’. tenfans has been cultured for more than 10 years and is inbred to a considerable extent..

High molecular weight USA was extracted from cultured C’. tentans epithelial cells (Wyss. 19&Z) or from salivary glands of individual larvae essentially as described by Gross-Bellard ef al. (1973). RNA was extracted from saliva,ry glands as descnribed (EdstrGm et al.. 1982). RNA to be used for S,-mapping and as template for sequencing was pelleted through a cushion of 5.7 M-C&I in @l M-EDTA (pH 7.5) in a SW. 50 rotor at 35,000 revs/min for 12 h. (c) (ionstruction

and screeniny genomic libraries

of cL)SA

an&

BR gene-specific mRNA was separated from all other salivary gland RNA in @79/, LGT agarose (FMC Corp.) and extracted with phenol (Wieslander, 1979). To improve purity, the recovered RNA was run on a second TAT gel and re-extracted as above. The purified RR mRNA was reverse transcribed (Amersham) using random oligodeoxynucleotide primers. BcoRI linkers were added and the cDNA ligated to igtl0 vector arms (Promega). The cDKA library was screened with a mixture of probes representing the repeat units of the RR 1 (Wieslander et al., 1982; Case & Byers, 1983),

Subclont+ from t,hr t~lcc~trol)horrti~all~ purititd g:tbrlcJrnic, insert,s \vcrv gt~rlrl%td after rtkst,rictiotl r~nzymr tlig,rtbstion with enzymes c.leaving p-repeat unit,s onc.caor after ult,r;tsonic random fragmentation. In both vases. thy fragments were matIc> blunt ended by treatment with thtb filrt~o~. fragment. ‘I‘h(~ sonicatefl I)N:t was srparattstl it) I.:!“,, (w/v) agarosr grls and 600 to 800 bp fragments t~lutrtl (Ijrrtzerr pf rrl.. 1981). The fragmen1.s M& hllcnf cbncl ligated to thr MlSmpX rec~tor

The insert from the cI)S;\ c,lonr LI4ti was isolated and subcloned into the pGen-I vector (Promega). 3H-lahelrtl RBA was obtained by Sp6 polymerase transcriptiotl. Probes for j/l .sil~ hybridization w-it,h t,htL genomic. EMBL4 (.lont’s were labeled by randonl priming (Borhringer) on the entire czlonr using biotirl-labeled dATP (BRL) in t.he nuclrotitlr mix. Salivary gland syua.sh preparations were made anal treated with 0.07 M-KaOH for 3 min prior to hybridization. 3H-labeled K,K.L\ was detected with autoradiography using Ilford liquid emusion and biotin was firtectfvl with a swonflar!~ antild~ c~~uplerl to Ruorrsf~eili.

Total salivary gland RK.A \vas tlenaturrcl li)r .j ltlin at 60°C in 40 rnM-trirthanolarlrillr H(‘I (pH 7.4). IO m11fi~rrnaniide ilIltl EDTA containing 50 O() (v/v) 2.2 M-formaldehvdr~. The RNA was then f,lt~c.t,rol)horrsrtl as described (Lehrach et (I/.. l!ti7). cbxc*rpt that tht) gel buffer and rlec%rophoresis t)utTer c3~nsistrd of 40 rnhttriethanolaminr H(‘I (pH 7.4). 10 mM-El)TA. The Its,\ was blotted on to nylon filters (Amrrsham). Repeat unit lengths were measured by t,lec.trol)horc~tic, separation of digested wll culture I)?;:1 in 3”,, SuSirvr (FMC Corp.). 1 (!. standard agarosr gels. After, vacuum blotting. the nylon fiit.rrs were baked at X03 for 30 milt and subsequently u.v.-irradiated (St,ratalinkfLr 1800). Two-dimensional drnsitometry of autoradiograrns was performed as described (Wranpr ~1uI., 1989). Pulsed-field gel electrophoresis was ru11 on a CHEF-OK. TM II (BioRad) in 13”, agarosr gels Gth a pulse time st,arting at 1 s and going up to 4 s during a total rlmning time of’ 24 h. The gel was u.v.-irradiated heforr vacuum transfer as abovr. (f) /):\:A

cln,d KS;1

Restriction fragments from int,o ~)Blursc~ript subcloned

sequrvwiny

genomic /,-c~lones were (Strat,agene). IXdtwxj

Seqwnce

. ... .... . -

Organization

3.51

?f thP RR 1 Gene

(pH t?5), 400 rnM-KaCl, 1 rnM-EDTA at’ the appropriate tetnperature (t,-5°C) for 3 h. The hvbridization mixture was diluted to 300 ~1 with 250 rnM-h’aCl. 30 mM-Nac)Ar (pH 48), 1 mM-ZnCl, containing 30 to 70 units of S, nuclease (Boehringer) and incubated at 37 ‘(’ for 1 h. The samples were extracted with phenol/chloroform. precipitatrd with isopropanol, dissolved in SO*/, formamide and run on 6” 0 polyacrylamidr sequencing gels.

BRlg4

(bi -5 I

3. Results Switch

(a) 7’hP KR 1 grne is built from repeat nrra!/s

p wy

(c

j one j repeat ““it

/ one j repeat “01,

Figure 1. A representation of the BR 1 gene. In (a), the cloned genomic and cDNA fragments are shown relative to the BR 1 gene. The genomic clones originate from the two different gene alleles of the cell line and have fi arrays of different lengths. The BR lgl and BR lgd clones represent longer and the BR lg4 clone shorter fi arrays than depicted and have therefore been shortened (BR lgl and BR lg2) and divided (BR lg4) to show where they begin and end In (b). the intron-xon organization is depicted. Filled parts in exon 1 and 5 represent non-translat,ed regions. In exon 4. the 2 repeat unit arrays, / (st’ippled) and */ (open) are shown. In (c), the junction between the fi and y repeat arrays is enlarged and the structure of the repeat, units is shown (C st,ands for C-region and SR for SR-region). sequencing was performed in combination with [%]dATP with the modified phage T7 DNA polymerase (Sequenase. USB) after producing nested deletions according to Hrnikoff (1984) as modified by Promega. RR mRXA was directly sequenced according to Gelirbter rt al. (1986) as described (Paulsson at nl., 1990). Thr ends of t,he purified genomic fragments in the BR lgl and BR lg2 clones were sequenced according to Maxam & Gilbert (1977) aft’er end-labeling with phage T4 polynucleot,ide kinase. restriction enzyme digestion and purification of the labeled end fragments by polyacrylamide gel rlectrophoresis. (g) Mapping

technics

(i) clj:VA primer extension Specific oligodeoxynucleotides were end-labeled by T4 kinase and (y-32P]dATP (Sambrook et al., 1989): @2 to @4 ng of the kinased oligodeoxynucleotide was hybridized with 15 pg of total gland RKA in 400 mM-NaCl. 40 mMPipes buffer (pH 65) for 3 h at the appropriate temperature (t,--5°C: Wallace Ki Miyada, 1987) after 3 min heating at 80’(‘. After precipitation with ethanol the RNA-primer complex was introduced into the Amersham cl)NA kit 1st st,rand synthesis reaction. The obtained rl)IGA was precipitated with ethanol, dissolved in formamide and heated before electrophoresis in ST& polyacrylamide sequencing gels. (ii) ,S, nuclease “apping Total RX4 and end-labeled oligodeoxynucleotides were hybridized in 30 @I of 50% formamide, 40 rnM-Pipes

two d
\Ve have determined the complete structure of the RR 1 gene in Chironomus tentacs (see Fig. 1). From earlier work, roughly half of the BR’ 1 gene was known (Wieslander et al.: 1982). With the aim to solve the structure of the remaining 5’ half of the HR 1 gene, a genomic library was screened with a probe specific for the repeat unit in the 3’ half of the gene, the y repeat unit. Two positive i clones, RR lgl and KR lg2, having approxirnat’ely 14 kb inserts were isolated. The y repeats were located at the 3’ end of these genomic fragments. The remaining 5’ part in both clones was built entirely of another type of repeat unit, that, we denote 0. A very similar type of repeat unit has been found in a sibling species to C. tentans (Saiga et al.. 1988). As in the y repeat unit, the B repeat unit has a const’a.nt ((‘) region and a subrepeat (SR) region (Fig. l(c)). The structure of the two. almost completely overlapping genomic fragments (see Fig. (a)) were determined a,s follows (data not shown). Tt’ was demonstrated by direct, sequencing of the fragment ends that both fragments start within a ,G repeat, unit and end within a 7 repeat unit. ljigestion with the restriction enzymes Hhal or RsmI. which both cleave t,he fl repeat unit once but not the y repeat unit, resulted in only two bands. a 2 kb band in the case of the RR lgl clone. or a 1 kb band in the case of the BR lg:! clone and the /? repeat unit monomer band. Digestion with PvuTT, which cleaves once in each y repeat unit but not in the fl repeat unit. produced the opposite result: a 12 kb band in the BR lgl clone, or a 13 kb band in the RR lg2 c4one and a faint y repeat, unil monomer band. in a ladder Partial HhaI digestion resulted pattern, in which each step corresponds to the length of one /3 repeat unit. In addition, a large number of random fragments cloned into an Ml3 vector were sequenced and onl? fl and occasional y repeat unit monomers were found. These results show that t,he b repeat units are organized as a homogenous block; no other t’ype of sequence interrupts t’he assay; of tandem repeats (Fig. l(b)). liy sequencmg the b-y junction in t.he t,wo genomic clones, we could also conclude that no intron is situated between the two types of repeat arrays. In the RR lgl clone, the junction between the fl and y repeat unit’ arrays was isolat,ed by randomly fragmenting the genomic insert and subcloning 600

t,o X00 bp fragments into Ml3. One subclone that hybridized both to /I and y-specific oligodeoxynuc~leotide probes was sequenced. The switch from 1 to ;I is within a repeat unit; a t)ypical /I C-region is immediately followed by a typical 7 SR-region (shown schematically in Fig. 1((a): for sequence. see Fig. 5). In the BR, lg:! clone the junction was identic+al. The 1 kb KsmT generated fragment representing one end of the genomic insert was subcloned and sequenced. This fragment starts wit,h the part of’ a fl repeat, unit that is cleaved off by f/ha1 or NsmI. followed by y repeat unit,s in tandem. Xs we can locate t’he end of the 7 array at the 3’ end of the gene (Hiiiig et al.. 1986: Saiga et al., 1987) and the beginning of the /3 array at the 5’ end of the gene (see below), our conclusion is that the long repetit,ive core block of the BRl gene is built from two types of repeat, units. the /? type in t)he 5’ part and the y tvpe in t)he 3’ part. Each type of repeat Y unit. is arranged as an unint,errupt,ed array; no mingling of the two types of repeat unit,s occur and t,he t.wo arrays are juxtaposed (Fig. 1). (b) The structwe

of the 5’ rnd

To isolat’e t’he very 5’ end of the KR 1 gene. we constructed a random primed cDNA library specific for BR gene transcripts. BR 1 gene mRNA (as well as BR 2.1, BR 2.2 and HR 6 gene mRNA) has a length several-fold larger than all other mR,K’As in the salivary gland cells and can t’herefore be rlec.t)rophoretically purified (Wieslander, 1979). The cI)NA library was screened with a prohr cocktail caontaining p and y repeat units and Y-specific regions covering all known BR’ 1 gene sequences. Repeat units and 3’ sequences from the t’wo BR :! genes and the BR 6 gene were also included. (Iones that> were negative were picked and rescreened wit’h labeled cDNA madr from rnRKA representing all larval tissues except the salivar) glands. In situ hybridization (Fig. 2(a)) and Northern blot analysis (dat)a not shown) confirmed that, one of the remaining negative clones, 1,146. lrybridized t,o a 37 kb t’ranscript from the ISR I gene. The sequence of the coding strand of the 149 bp long 1,146 cDNA contained one open reading frame with no repeat unit. According to cI)NA primer extension analysis, the 1,146 sequence is located 489 bp from the .5’ end of the mRNA (positions 489 to 637 in Fig. 3). Using several oligodeoxynucleot’ide primers, the sequence of the 5’ end was determined with the mRNA itself as sequencing template. With the possibility of making a probe unique for the 5’ end of the BR 1 gene. we screened a genornic* librarv with a synthetic oligodeoxynucleotide corresponding to a sequence close to the 5’ end of the mRNA. Two overlapping calones. HK lg3 and BR lg4, were isolated (see Fig. I(a)) and shown b? in ,situ hybridization t’o originate from the BK 1 gene locus (Fig. 2(b)). Both clones cont’ained one

large EcoKT fragment. which hybridized to a 11’ repeat unit probe. These fragments extended from the EcoKT sit)e in the cloning box at one side of’t.hr genomicn inserts t)o a common internal /i:roR 1 sit,r. The shortttst, of the fragment,s. 9.4 kb in Irngt h present in t,he ISK lg3 clone. hybridized onI>- to a fi repeat, unit, probe. while the 1% kb fragment prclsent in the RR lgl clone, hybridized t,o both fi and ; repeat unit probes (data not shown). The 9-t kb fragment was subrlonrd from tht, KR lg3 clone and further characterized. Sc~qur~ncc determination and restriction enzyme analysis proved that a 5 kb p repeat, unit array was present at one end. These repeat unit,s were removed from the subclone by BsmI digestion. ~,lec,trophoreti(. separation and religation. The transcription start site of the BR 1 gene was determined by primer analysis and S,-mapping (data not ext,ension shown) and found t.o lie just 5’ from the sub(~lone. i.e. 8 bp 5’ from the F:rc/Rl sit?. within tht, prectding EcoRI fragment. The complete sequence of the 5’ end of the HR 1 gene from the t,ranscription start site lo the first /1’

Sequence Organization ?f the HR 1 Gene

110

130

150

170

19ij

ATGGtGAAATATTCi'CTTTTTATC~TACTGTTCC~AGTAG~C~GTMTAGTG~CCGGA~TG~TCGTGATGA~CGCGATCAT~CCG~TCTTC MetGlyLysTyrSerLeuPh~IleLeuLeuPheLeuValGlyTh~VslIleValSe~GlyHisAspA~gAspAspA~gAspHisS~~Gl~SerS~ 210

230

250

270

ATGTGAW;PACAGAATCAAC~TCG~TTGTTCTGAtGGA rCysCluGluGlnAsnGlnLysThrLeuValCysSerAspGluValLeuGluIleIleLysLysGluAspProGlyXisHisIleProProLysh 290

310

330

350

370

GGAGGATTGCCCRRATCGPCA~ACCAGCACC~CA~C~CTG~GGCC~AGTAGGACC~G~C~CCT~ATGGTGATG~TG~GGTGAC rqArgIleAlaGlnIleGluThrThrProAlaProProGl~snAlaGluGlyProValGlyPrOGluGlnProAspGlyAspAlaGluGlyAsp 11 410

390

12 430

450

476

GACTACTAC~ATGATGATG~TATTACT~~GATGATGAC~ACTATG~TG~GGACTACG~TATGAGTAC~~TACGAGT~CG~TATG~GATGA G~~T~~T~~T~~AS~A~~A~~T~~T~~T~~A~~AS~AS~T~~TY~AS~G~~AS~T~~G~UT~~G~UT~~G~UTY~G~UT~IG~UT~~G~UA~~G~ 13

4 !I0

510

530

550

b73

~TCAACAACAGCAG~TGTCCTCCCAGG-G~GG-~~T~*~~ATT~~GG~~ATG~G~T~~AG~ACTG~~~G-T~~ uSerThrThrAlaClyValValProThrGlyLysLysGlyLysAspAsnLysAsnIleProGluTyrGluGluTyrSerThrGluGluGluIl~A 590

610

630

650

ATGAGG-~AGAAACATC~GMGM~~ATGGCGAGT~TGATG-T~T-G~G~~GC~GGC~~CCAGGTG~~~GAC sn~luGluThrGluThrSe~GluGluGlnAspGlyGluSe~AspGl~IleIleLysLysLysLysGlyLysGlyLySLysP~OGlyGlUGlUAsp 670

690

710

730

750

AAGAAGGATTCAGA~.GAGTCTGCT~CAAAGG~~TAGTT~G~TCGAGTG~~~CC-G~GTTCAG~TGTGACGGA~AGATG~ LysLysAspSerAspGluSerAlaSerLysGluSerSerSe~S~~SerSerGluGl~ProLysGluValGlnLysCysAspGlyGluMetLy~Ly 770

790

810

830

a50

GAAGGCAGnAACTI;AATGTGCTAC~G~GGAAAGTTCCAGTACTCC~TCTGAA~CGACGACAG~~G~G sLysAlailuThrl~luCysAlaThrLysLysGlyLysPheAsnGl~GlnThrCysGlnCysSerTh~P~~LysSe~Gl~S~~Th~Th~~l~Gl~G

Cl

a:0

a90

310

930

95G

M\A~GTG-CACAG~CAGCTG~T~TGMG-~C~G-TT~~TGAGGMT~TGA~~~GC~GAG-T~GTG~~ATG~T-TCTGTA luGlyGluThrGlnProAlaGluSerGluGluAs~roGluIleSerGluGluS~~AspGln~aGl~LysSerSe~Gl~SisAspLysSe~Vsl 970

990

1010

1030

CATCCCARAkAATCTAAGC~AGAGAAAT~GGTACT~~TGAG~G~AG~~TG~TGT~C~G-CG~C~TTCGA~GCTAG HisProLysLysSerLysProGluLysCysGlyThrAlaMetArgLysAlaGluAlaGluLysCysRlaLysArgAsnG1yLysPheAspAlaSe

c2

1050

1070

1090

1110

1130

CAGAi'GTAGATGTAtCTCACTAGAAAACCAAGT~TCTGGAC~-T~CtAILATCTGGA~CAAGATCAAeCAAATTAGGACCAAGATC~ rArqCysArqCysThrSerThEArgLysProSerLysSerGlyArqLysSerTh~LysSerGlyProArgSerSe~LysLeuGlyProA~gSerA 1150

1170

1190

1210

i23G

ATAAATCTGeACC-TC~GC-TCT~GACC-T~~GC-CC~~AC~GA~C~GC-T~TGGACC~G~TC~C-~CTGGA snLysSerGlyProLysSerSerLysSerGlyProLysSerSerLysP~oGlyP~~A~gSe~SerLysSerGlyProArgSerAsnLysSerGlY 1250

1270

1290

1310

CCAh~ATCGAGCtiTCTGGACTtiGATCMGCAiATCAGGACCidGATCAh'dAAAAh

133ii

TCTGbATC-TClinGCAAATCTkGACCAAGRTt

ProArgSerSerLysSerGlyLeuArgSerSerLysSerGlyProArgSerSerLysLysSerGlySerLysSerSerLysSerGlyProArgSe 1350

1370

1390

1410

AAACIIARTC~GGACC-~C~GCAAATtTGGACCAATCTG rAsnLysSerGlyProLysSerSerLysSerGlyP~~A~gSerSsrLysS~~ValP~oArgP~oS~~LysS~rGlyS~~LysSerSerLysS~~G 1430

1450

1470

1490

1510

GACCAAGATCAnnCAAnTCTGGAC~~GATCGAG~-TCAG~~CT-CCTG~GGCCTAG~-TCTCGA~C-CCAT~CAGACCAG~ lyProArgSerAsnLysSerGlyProArgSerSerLysSe~GlyPr~LysProGl~ArgProSe~LysSerArgProLysProSerArgProGlu 1530 1550 GTTTGCGGT~GTGCARTGAGA~GAACT~GCAGAAGAA~

1570

1590

161'3

VslCYsGlySerAlaMetArgArqThrLysAlsGluGluCysAlaArgLysAsnGlYA~qPh~As~GlyLysAs~CysA~qcysTh~S~~A~~A~

CP CMGCCi\AGChAG nLy*ProSerLys

Figure 3. IGuclleot.ide and corresponding amino acid sequence of the 5’ transcribed from the 5’ end to the first /I repeat, unit in the /? repeat array is shown. Positions C-regions are undrrlined.

part of the RR I gene. The sequence of introns are marked hy arrows and

repeat unit, of t,hr fl repeat array was dt~terniinc~d and is shown in Figure 3. A caomparison between the genomic sequen<+e and t.hc caI)NA and mRNA sequences revealrd that three int rons are present in the 5’ end of the gene. In PXOII 1. 2 or 3, repeat units are not present’ but appear gradually a few hundred base-pairs inside axon 1. which harbors all repeat units. Before thr repeat units start there are many shorter repeat struc%urcs. not obviously relat,rd t,o the repeat units in exon 4. Exon I is 409 bp. with a 95 hp nntranslatt~tl region and a 60 bp typical signal l)eptide-c:otlifig region. Most not,ahly. t,he following region contains fhight positions where a codon is directly repeated on(ae or twice. Two caysteine codons and about I Ooo prolinr (hodons are present. hoth being prominent features of I3R repeat unit,s. Jntron 1 is 1.7 kb long and highly AT-rich (78”,,). In its 5’ end an 84 1)~) sequence is repeated in t’andern approximat.ely t’en times. forming a minsatellit,e. Also introns 2 and 3 are very AT-rich (il (Jo and 76 Oh, respectively). fi:xons 2 and 3 are short. 1X and 48 hp. respch(*tivcbly. Axon 2 consists largely of codon repeats and is very similar to the last 1X hp sequence in exon I. Kxon 3 consists almost entirely of seven tIaridem copies of a 6 bp sequence. lCxon 4 is exceptionally large. 36 to 40 kb. The first 660 hp preceding the first, repeat unit contains many codons that arca directly repeated one or several t,imrs and. in this respect. is similar to the 3’ part of exon 1. The first two repeat units are not of b type, Tn the very first) repeat unit. the (‘-region is not followrd by a regular %-region. but hy a region in which again several codons are directly repeated oncar or twic*e. Proline codons. which are typical of SK-regions, are frequent and make UJI about IO”,,. Th
and diversity within reprut unit nrrnys

the HK 1 grnu

The determination of the complete gene structure allows analysis of the degree of sequence identit? between the 140 to 150 repeat units in the BR’ 1 gene, which is a reflection of the efficiency of the mechanisms crea,ting and maintaining the repetit’ive sequences. (i) The (‘-regions All (I-regions are precisely defined and have the same length throughout t’he entire exon 4 and can therefore be aligned. In Figure 4. the sequence variation between the C-regions is depicted. Wit’hin the p array, all C-regions are virt’ually identical with the sequence shown in Figure 5(a). The first (I-region. the two last C-regions and 12 random internal C-regions were sequenced. Most are identical or differ by a single base-pair substitution. The last one is the most different, having three base-pair

T

T :T

Figure 4. L)eyrrr of’ srciurnc~e identity IAwrrtr (‘-regions at \:arious positions within the Bft I g:rntl. ‘I’hr of repeat units in the UK 1 grne ill’? tlf~pic~t~rti. arrays (‘-regions in the /I repeat arra\- arr stippleci anti V-regions in the 7 repeat array are oprn. Variant (I-regions at the 5’ idrntit~ and 3’ ends are filled. The range ot‘ sryucnce wit,hin the b and y arrays are given as well as tht* drgrre of srquenf~ by

identity

between

tirfinrtl

(‘-rfyions.

its

indicatrd

arrows.

suhstit’utions compared with the internal (‘-regions. The same sit,uation is found in thr 7 array (t,he first, last and 8 random internal regions wt~ sequenced: Fig. 5(a)). Onl,v the last (‘-region decreases the percentage identit,y from 100 to SS”,,. This homogeneity within the t,wo array-s caontr’ast,s the sharp border between the arrays: from one C-region to the next. there is a 19”,, drop in degrrta of identity. At the ext)rrme ends of the repetitive ww block. a few variant (‘-regions are present, The ver>’ first (‘-region. 5’ from the /I array. and the two very last (‘-regions (which are WO,, identical with thach other). 5’ from the 1’ array. are unique in t’hr srnsc that they are only around 50”” identical with any other (‘-region in exon 4. .\t some positions these t’hree V-regions are identical with each oth
Sequence Organization of the RR 1 Gene

:Eh!?

C-regkon

CysGlySerAlaMetArgArgThrLysAlaGluGluCysAlaArgLysAsnGlyArgPheAsnGlyLysAsnCysArgCys

TGCGGTAG'iGCAATGAGPG~CTAAAGCAGAAGAATGT

5’

SR-reqion

ACC~~A~CT~GA~PA~~A~~~ AAATCTGGACGAABATCAAOC 8 AAATCTGGACCAAGATCAAGC AAAT@@GGACCAAGATCAA@a AAATCTGGACCAABATCAAGC AAATCTGGACcAA@A'TCAAGc AAA@C@GGAccAAGATCAAGC AAATCTGGACCAAGATCAABC AAATCTGGACCAAGATCOAGC AAATCTGGACOAAGATCAAGC AAATCBGGACCAAGATCAAGC AAAAAATCTGGAOCAA@ATCAAGC AAATCTGGACCAAGATCAABC AAATCTGGACCAABATCAAGC AAATCTGGACCAAGATCAAGC AAATC@G@ACCAAGAoCAAGC AAATCTGGAoCAA@)ATCAAGC AAATCTGGACCAAGATCAAOC AAATCTGGACCAAGATC@AG@ AAATC@GGACC@

P SR-region

ACC@OAQ~A@AQA@~CCAAGCAA@TCAGGA@CT AAACCCGAAAGACCAAGCAAATCAAGACCT 8 AAACCTGAAAGACCAAGCAAATCAAGACCT AAACCTGAAAGACCAAGCAAATCAGGACCT AAACCTGAAAGACCaAGCAAATCaoGACC@ AAACC@oo@AGACCA~@@o@~

y SR-reyor

ACTOOAOC00GTAA~OOA80~A~A~~~~~~ QAACCOAGOAAGGGATCTAAAC~TAGAAGAG AAACCAAGTAAGG@ATCTAAACCTAGACCAGAG AAACCAAGTAA@GGATCTAAACCTAGACCAGAG AAACCAAGTAAGGGATCTAAACCTAGACCAGAGGGA Figure 5. Sequence comparison between (Y-regions and SR-regions. In (a), the p and y types of (‘-regions are compared. In (b). the 5’. p and y SR-regions are shown. The SR-regions are split into their subrepeats and positions that exhibit base-pair substkutions relat’ive to the consensus sequence are circled. Tn thr 5’ SR-region, the 2 last subrepeats are of fl type and in the I)rpceding subrepeat we assume a deletion of 9 hp. randomly cloned internal SR-regions. When we (bornpared the subrepeats within an individual SRregion wit*h each other. they differed by one to several base-pair substitutions (Fig. 5(b)). In contrast’, complete SR-regions within the /I or y array were all almost identical with the sequence

shown in Figure 5(b). Therefore. there is a striking difference in degree of sequence identity at these two levels of repetition. As a consequence. within an SFLregion each subrepeat can be identified as to sequence and position. In both the fl and y repeat arrays. SR-regions

having different numbers of suhrepeat,s are pt~~srt11. Thr fi type SR-regions differing by one subrepeat. fivr versus six, were rlectrophoret,it:all~, detec*trd in clones. and vrrithe ISR lgl and KR lg2 genomic tied by sequencing. In t’he y repeat array. SKregions having four or five subrepeats have heen tlrtec~trd by sequencing c~loned repeats. F’rotn t Iirl srquenwd SM-regions we ~)ulti observe t,hat ottl~ c~rtrt,rally locba,ted subrepeats hare Iteen ad(lc~(l or delrttYl. To estimate the representation of repeat units of variable lengths in the complet,e exon 1. cell c*ultjurc~ I)?;A was digested with /2xmT, which cleaves the j rtlprat unitj once. or I’ouJJ, which (cleaves t ho ;I repeat unit once. After (~lec‘troI)horrtic~ separation in hig~t~resolution gels and probing with repeat unit probes. t,tte hybridization signals wt’re approxitnatel,v quantified relative t)o each other (Fig. 6). The lengths of the repeat units observed tit will with the assumption that length variaCons are due to different numbers of subrepeats. For thr fl repeat unit monomers. about 400,, of the SK-regions had No,, fivt, suhrepeatjs. 400,, had six subrepeats & had seven subrepeats. A wrak band c,orresponding iti length t,o repeat, unit> dimers was seeri. Ahnt So,, of all t>he repeat units were present as ditners. L\Tit’hin thr 1’ array. approxitnatrl~ %5°0 of t)hr monomer SR-regions had four subrrpcbats. 500,, had five and 25” ‘0 six subrepeats. Also in this caase. a stnall amount, of dimers. 9?;,. was recaorded.

((1) Thr fl and “u’arrnys of lYcriat~le .sizrs hzlild (1 IZR 7 yrrre of almost consfnnt lrnyth To estimate the sizes of the two types of repeat unit arrays. genomic J)KA prepared from cell c*ultjure cells was (hut with one or a combination of two different enzymes (Fig. 7(a)). Pz:uJJ cleaves at the 5’ end of the b array and oncae in each ‘; repeat unit. liberat’ing t’he entire b array. Two bands were seen in Southern blots probed w’ith a fl repeat unit prohe. one 8.3 kh and one 22% kh hand. /3xmJ. OII the other hand. cleaves once in each fl repeat unit but leaves t,he y array intact. In c~ombinatiott wit,h li:coRI digestion. the complete y array is therefore released. including 1.9 kb consisting of intron 4, exon 5 and 3’ flanking sequences (see Fig. 7(b)). Two bands were detected in the HsmJ-EcoR,J-digestc:d I)NA using a y repeat unit probe, one 20 kh and one 29.5 kb band (Fig. 7(a)). The total length of the KR 1 gene in the same cell culture DNA was measured by pulsed-field gel electrophoresis. From previous data (Hi%g et ml.. 19%) and the genomic 5’ gene sequences it was established the EcoRJ cleaves out the entire KR 1 gene as a single fragment including only 1.2 kb of flanking sequences (see Fig. 7(b)). In Figure 7(a) it can be seen that a 42 kb and a 45 kb EcoRI gene fragment was recorded and that therefore two RR 1 gene length variants are present in the diploid cell line. Jn each of these length variants, a long p and a short y array and mice versa must be present: two

P

Y

1018-

516-

_

394-

154142-

Figure 6. Rrpresent~atiott of rt*prat units of‘ variable lrngths in t,he RR, I grnr. Total genomic: I)IGA isolatrti from the C’. tr~tans wll line was digested to c~oml’lrtion with rithrr Wsn/I or /‘PUTI. Aftrr c~lt~ctrophorrsis in a I!“,, XuSirvr. I o. standard agarosta RIAIand Southern hlof tinp. the filtrrs warm I)robrtl with /I or ;’ rciprat unit probes. Jlonomrrs of 3 different lengths wr SWII. The signals W~IY~ cluantitirid its tlrsc~rihrcl in Matc~rials and Methods. Sk markers arp given in hlj.

caombined long repeat, unit arrays or two short OIIW are not c~otnpatible with the length of the gettrs. To test if the KR, I Irng:th variants found in I)?iA from cultured cells are naturally oc.curritig in t,hta larval population and t.o c~xplorc~ the rxt’ettt of length polymorphism. we prepared 1)N.A f’rom larval salivary glands and repeated the restric+oti OII two cleavages and Southern blots as ahovr. occasions, DNA was c,xtracQtl f‘rom separate 100 larvae belonging to the same Iahoratory popular t.ion and subsequently analyzed. Th(J data ar(t present,ed in Figure 7(a), Jn hot h 1)N.A preparations. four disc*rt+l /‘P/( 1I bands, represent)ing the fl array were rec.ordrd. wit,lt lengths of 8.3. 13. Ifi and 20 kb (Fig. 7(a)). In l)NA preparation 2. four lZsmI---EcoRI bands. (‘orrem sponding t,o the y array were seen, having lengths of 18.5, 20. 23 and X8.6 kb (Fig. 7(a)). In preparat,ion 1. only three RsrrrJ--KcoRJ bands were present. wit)h lengths of 18.5, 23 and 28.5 kh. PvuJI digests of’ DNA from individual larvae showecl that the observed size variants represent) alleles of the Santa gene and not, two gene copies in the haploid genottte (Fig. 8). The relative int)ettsities of the bands in Figure 7(a) therefore reflect pat%lv the frequencay of’ t,he alleles in t,he subltopulat,ions. ‘I’hr 13 kb p arra) is, for example, present in all individuals and thcb only one recorded to he present, in homozygous individuals. In preparation 2 (Fig. 7(a)), the 13 kb fl array is clearly very frequent and the corresponding 23 kb y array is equally frequent. We have too little data to say if BR 1 gene alleles with t,he 13 kb /I

Sequence Organization

Cell culture

Larval Prep. I

DNA

Probe

p

B

P

Y

PvuIl

j/

B5mI

&RI

PvulI

PY

&RI

BY

;’ ,’

Cell c;i+ure

P ,+

Larval

__*,’ ,

,.I’

DNA ,’ :’ .’

?r-- 8.3 13

(

Y

1

@f :

28.5 23 23

16

Larval

DNA

; 8.3 13

E 1

29.5 20

8.3 ?,! 22.6 --___ P DNA

Figure 8. Size of the /3 repeat array in individual larvae. IjS.4 from individual larvae was digested with PvuII. electrophoresrd in 0.7 ?h agarose gels, blotted onto nylon filters and hybridized with a p repeat unit probe. Lanes 1 to IO represent individual larvae. Size markers are the same as in Fig. 7.

PVSD _a

, ,I’ r’

357

Prep 2

(a)

,,,’

BR 1 Grne

DNA

8smI FcoRl

qf the

E1

(2;

qp

E 1

28.5 73

20 ~~-__--____ P

18.5 Y (bl

Figure 7. (a) Size of the BR 1 gene and its repeat arrays in cell culture DEA and larval DNA. The tot’al RR I gene length was measured by pulsed-field gel elect,rophoresis of EcoRI-digested DNA. followed by Southern blotting and probing with a /? repeat unit probe. The 7 repeat unit probe ga,ve the same bands (not shown). The entire b arrav was cleaved out with PruII and was detected by hybridization with the specific p repeat unit probe. The complete y array was cleaved out by a combination of KSWZI and EcoRI digestion. It was specifically detected with a y repeat unit probe. The larval DKA was extracted on i’ separate I)ccasions. preparation 1 and 2. each time from ahout 100 larvae from the same population. Size markers are indicated to the left of the lanes and consisted of RRL high molecular weight markers. The lanes corresponding to fi and y arrays in the larval DNA were run in 0.7”, agarose under standard electrophoresis condit,ions. (b) A representation of the b and y arrays in the various BR 1 gene alleles. The BR 1 gene is shown wit,h positions of cleavage sit,es for the restriction enzymes rimployrd. Below. the alleles found in cell culture DNA and larval DKA are shown. and the b and y arrays are combined to add up to t’he observed total gene lengths and to fit t,he hybridization intensities recorded. P, PvuII; 13. RsrnI: E. EroRI. The EcoRI-BsmI fragment at the 5’ end is 346 kb in length. The measured length of each array is given in kb immediat,ely above the arrays.

and 23 kb y arrays provide a selective advantage or if the recorded frequencies reflect stochastic fluct,uations in the laboratory population. In preparation 1, the 20 kb y array is missing (Fig. 7(a)). The relative intensity of the 23 kb array is. however, approximately equivalent to the 13 kb plus 16 kb fl arrays, and we conclude that in this subpopulation, no 20 kb y array was present. DS’A from preparation 1 was also digested with EcoRT t’o size the entire RR 1 gene. Two bands were found with lengths of 41 and 43 kb (Fig. 7(a)). In Figure 7(b). our int)erpretat’ions of these data are schematically summarized. Four gene lengt,h variants were found, ranging in size from t,he transrription start. site t’o the poly(A) site between 40 and 44 kh. The length variation is due to the presence of of different lengths. &‘e have fi and y arrays detected b arrays with five different lengt,hs. from X.3 kb to 22% kb and five length variant.s of the y array with lengths from 18.5 kb to 295 kb. To be compatible with the observed hybridization intensit’ies and fulfil the requirement to add up to the measured total gene length. t’he rarious fi and y arrays should be combined as shown in Figure 7(b). In general. there is an inverse length relat,ion, the shorter the fi array. the longer t,hr y array, and Se wrsa.

4. Discussion We have characterized the complete transcribed part of the BR 1 gene in C. tenkms. The gene has five exons. The fourth exon is extremely long and built’ entirely from tandemly arranged repeat units. Two types of related repeat units. /? and y, are present within this exon and form two separate homogeneous repeat arrays; within both arrays all repeats are virtually identical. The lengths of the t’wo repeat arrays change drastically between gene alleles in an inverse fashion, while the t’otal lengt,h of the exon varies by less than 10%. The length of the BR 1 gene as a consequence varies between different alleles and was measured to

be 40 to 44 kb. Subtracting the combined length ot’ the four introns, 2648 bp, gives an mRNA length of 37 to 41 kb. This value agrees well with previous determinations of the cytoplasmic mRNA (Chase& Haneholt, 1979). Almost the entire mRNA therefore consists of the homogeneous repeat structure found in exon 4, only about 480 bp at the 5’ end (exons I to 3) and 600 bp at the 3’ end (exon 5: Hiiiig et ai., 1986; Saiga et al.. 1987) has a different, sequencse organizat,ion. (a) Exon

4; a giant exon with variable internal repeat structure but conserved length

Exon 4 in the BR 1 gene represents a 36 to 40 kb long exon while, in general, exons are shorter than 800 bp (Hawkins. 1988). Above 600 bp, exons are not, likely to have evolved from random sequences (Senapathy, 1986) and considerably longer exons have therefore been suggested to have arisen by fusion of exons (Blackhart et al., 1986). Extremely long exons (e.g. see Kidd, 1986; Tsujimoto & Suzuki. 1979; Rothnagel & Steinert, 1990), including exon 4 in the BR 1 gene. as a rule consist of repeats. It is likely that such very long exons are the result of repeat array expansions within one exon rather than of merging several different exons. Defined repeat arrays, non-coding as well as coding, are common in eukaryotic genomes and charaeterist.ieally change length due to loss or gain of repeats (e.g. see Jeffreys et al., 1988; Lyons et al., 1988; Hughes, 1990: Hourcade et al., 1990; Costa et al., 1991; Muskavitch & Hogness, 1982; Manning Rr Cage, 1980; Sorimachi et al., 1990). The generation of length polymorphism is thought, to be a consequence of homologous but unequal cross-over (Smith. 1976) or slippage during DNA synthesis and/or repair (Levinson 8: Gutman, 1987). Evidence from minisatellit)es (Jeffreys et al., 1990; Wolff et al.. 1989) and observations in human proline-rich prot,ein genes (Lyons et al.. 1988) suggest t)hat these are ma,inly int,ra-allelic events. Tn the BR 1 gene, exon 4 does vary in length but by less than 1Oreo. A similar magnitude in length variation has been seen between alleles of the silk fibroin gene (Manning & Gage, 1980). Zn contrast, minisatellit,e alleles may contain from less than a hundred t)o thousands of repeats (Jeffreys et al., 1988) and in some of the coding repeat arrays manyfold differences in the number of repeats are present. e.g. in the apopolysialoglycoprotein gene in trout (Sorimachi et al.. 1990) and in t’he mouse filaggrin gene (Rothnagel 62 St,einer. 1990). These differences in the degree of length polymorphism could either be due to gene-specific constraint.s in the meehanisms generating the length variants or to functional constraints at the prot’ein level. The latter assumes that t’he BR 1 gene has a functionally optimal lengbh. Tt is known that the BR I gene protein product is a main component in salivary gland protein fibers (Wellman 6.1 (Yase. 1989). As for collagen and silk fibroin, it appears important for fiber formation that the prot,eins have

a considerable and relatively defined Irngt tr ‘I’hrx apopolvsialoglycoproteins and filaggrin proleins :IIY cleaved between every repeat unit int,o fu n(ational protein monotners. Function may therefore not. bth as dependent on the number of repeats in 0~. precursor protein as in t,he RR, 1 gene-encoded protein, in which t,he monomers remain in a single protein molecule. The length of the BR I gene is largely conserved in different alleles in spite of considerable recon struction of exon 4. Short, about 3 kb. variations itt length within one repeat array appear to be ac.c*rptable, while larger changes havt, resultetl itI a compensatory c,hange in length in the other repeat array. This can perhaps Ijest be t~xplained as sr~lrc~ tion of unequal cross-over products ha\ritlg an optimal overall length. \$‘e then suggest that if unequal pairing and subsequent cross-ov(‘r o(‘(*urs within the /? array- in the gem to produc-ts. for example. 8 and 22 kb long /I arrays. t’hlb (‘err<*sponding gene products would be LOOshort and too long, respet+vely. and selr~trcl against.. If. trowt.vfr. a second cross-over happens to o(‘cur at thtx sarn(x time in t,he y array. this would result in both pr(‘s(‘rvation of total optimal lengt’h and c~oupling of’ a short fl arrity to a long ;’ ifrrav, and ilirc /Y~s/I. The alternative. t)hat the two cross-avers o(‘(‘ur at two separate o(~casions. is possible but seems IWS liktlly, since this would require the t,emporary presf’n(‘t’ of very short and/or long genes. (b) li’eguelzcr

homogenization, homologoux and repeat hierarchars

pairing

Within each of the t’wo repeat arrays in rxon 1 of the BR 1 gene, all repeat, units are close to IOO”,, identical. The sequence homogenization mrchanisms operating must t,herefore be very efficient: two almost identical repeats may be more t,han 20 kb apart. Two results of the homogenization mechan isms are quite striking. First’ly. there is a distinct. difference in the degree of homogenizat’ion at different levels in the hierarchic repeat structure. All C-regions and SR-regions. i.e. complete repeat units. are virtuall,y identical, but at the next’ repeat level. the subrepeats within single SR-regions are different. Secondly. homogenization stops abrupt,lyand does not extend from one array to the next. in spite of substantial sequence simila,rity between [I and y repeat units. One possible explanation to acbrount for these observations is t,hat, one repeat unit has been selected and then rapidly amplified to expand int,o a new arrav (Siirnegi et ab.. 1982). This appears to br t,he case in the involucrin gene in hominoid sper-ies. where many repeats have been added in a unidirrc,tional fashion (Tseng & (ireen, 1988; Djian 8: (:rrrn. 1989). As several mechanisms of sequence exchange are thought to operate simultaneously in tandem repeat structures and lead t,o turnover of the repeat unit (Dover, 1987). an alternative explanation seems more likely. We suggest, that the turnover unit in

Sequence

Organization

the repeat array is decided mainly by the probability of efficient pairing of sequences and that, therefore, the turnover units change or are conserved due to competing pairing efficiencies. Structural evidence for this has been seen in the proline-rich protein genes in man (Lyons et al.. 1988). The ordered and hierarchic BR 1 gene structure may be an extreme example, where the alternating C-region and SR-region organization as well as the ,6 and y array structure restrict. pairing to follow this organization. This hypothesis is supported by the observations that homologous recombination appears to be governed by perfect) stretches of homology and that the recombination rate decreases drastically below 100 to 200 bp of shared sequence. and increases with increasing length of homology linearly stretches (for references. see Bollag et al., 1989). Furthermore, theoretical calculations (Smith. 1976) and observations in minisatellites (Jeffreys et al.. 1990), intergenic rDNA (Cross & Dover. 1987) and in several genes (e.g. see Teumer & Green, 1989: (ialinski et aZ., 1987: Prat. 1990; Goldsborough et al.. 1988) argue that) such rrmodelings within tandem arrays are common, often producing hierarchic repeat structures (e.g. see Costa et al., 1991; Tautz Pt rrl., 1987; Willard & Wave. 1987). (c) Origin

of the zlarious

repeat

wits

The conserved codons and the degree of similarit) between the flanking and /? and y type of C-regions suggest that one has arisen from the other. The sequence similarities between the three types of subrepeats are also evident, but no simple way to convert one subrepeat into the other can be seen; as base-pair reduplications/deletions as well substitutions are required. A comparison of the BR 1 gene in C. tentans and in the sibling species C. pallidivittatus is informative. In the latter, three types of repeat units, ~1,/3. and y are present in t,he same BR 1 locus. The SR-regions in the y repeat array have diverged between the two species in a concerted fashion (Lendahl et al., 1987). In C. pallidivittatus, its subrepeats are only seven codons long. Equally long subrepeats wit#h almost identical sequences are present in the most 5’ SR-region in the C. tentans HR 1 gene. This relationship, taken together with the gradual change from the 5’ subrepeat to the fi type of subrepeat in the same SR-region, shows that the 5’. ,l3 and y subrepeats can change and may be interconvertible. The j repeat unit type is very similar in the two species but forms only a short, 3 to 4 kb array in (‘. pnllidivittatus. The c( type is exclusively present in P. pallidivittatw in about a 15 kb array (Galler et al.. 1984). It has not been shown t’o reside in the same gene as the fl and y repeat unit types, although this is likely. The SI repeat array must, then have been either eliminated from the C. tentans. BR 1 gene or int’roduced into the C. pallidiwittatus BR 1 gene since separation of the two sibling species

of the BR 1 Gen,e

359

(Lendahl & Wieslander, 1985). Extensive and fast remodeling then occurred in the BR 1 gene repeat arrays, resulting in replacement of one type of repeat with anot,her. Our analysis of the BR I gene shows that drastic changes in the number of one t’ype of repeat unit do take place, most likely as a consequence of unequal cross-avers. We thank Professor E. R. Schmidt for advice on in situ hybridization, Dr 0. Johansson for help with immunofluorescence microscopy and H. Mehlin for help with scanning densitometry. This work was supported by the Swedish Natural Science Research Council. the Swedish Medical Research Council, Magnus Bergvalls Stiftelse, To&en and Ragnar Siiderbergs Stiftelser and Karolinska Tnstitutet’. The gene sequence of C. tentans BRl has been deposited in the EMBL Data Library under accession number X64322.

References B¨ein. H., Pustell, J., Wobus, IT.. Case, S. T. & Kafatos, F. C. (1986). The 3’ ends of two genes in the humnnii. J. Mol. Balbiani ring c locus of Chironomus Evol. 24. 72-82. Blackhart, B. D., Ludwig, E. M., Pierotti. V. R.. Caiati, L.. Onasch, M. A., Wallis, S. C.. Powell. L.. Pease. R.. Knott, T. J., Chu, M.-L., Mahley, R. W.. Scott, .I.. McCarthy, B. J. & Levy-Wilson, B. (1986). Structure of the human apolipoprotein B gene. .J. Biol. Chem. 261, 15364-15367. Bollage. R. J.. Waldman, A. S. & Liskay. R. M. (1989). Homologous recombination in mammalian cells. Annu. Rec. Genet. 23, 199-225. Botella. L.. Grand. C., Saiga, H. 6 Edstriim. J.-E. (1988). Nuclear localisation of a DNA-binding C-terminal domain from Balbiani ring coded secretory protein. EMBO J. 7. 3881-3888. Case. S. T. & Byers, M. (1983). Repeated nucleotide analysis in Balbiani ring 1 gene of sequence Chironomus trntans contain internally nonrepeating and subrepeating elements. J. Hiol. (‘hem. 258.

7793-7799. Case. S. T. & Daneholt, B. (1979). The size of the transcription unit in Balbiani ring 2 of Chironomus fentans as derived from analysis of the primary transcript and 75s RNA. J. Mol. Biol. 124, 223-241. Costa, It., Peixoto, A., Thackeray, J. R.. Dalgleish, R. & Kyriacou. P. (1991). Length polymorphisms in the threonine-glycine-encoding repeat region of the period gene in Drosophila. J. Mol. h’vol. 32. 238-246. Cross. S. (‘. P. 8: Dover. G. A. (1987). A novel arrangement of sequence elements surrounding the rDPZA promoter and its spacer duplications in tsetse species. .J. Mol. Riot. 195, 63-74. Djian. P. 8: Green. H. (1989). Vectoral expansion of the involucrin gene and t’he relatedness of the hominoids. Proc. Kat. Amd. Sci., IT.S.A. 86. 8447.. 8451. Dover, G. A. (1987). DNA turnover and the molecular clerk. .I. ,lfol. Evol. 26, 47-58.

Dretzen. (3.. Hellard. M., Sassoni-(‘0%

I’. & C‘hambon. P.

(1981). A reliable method for recovery of DNA fragments from agarose and acrylamidr gels. Anal. Riochem. 112, 295~-298. Edstiim. J.-E.. Sierakowska, H. & Burvall. K. (1982). Dependence of Balbiani ring induction in (‘hironomus salivary glands on inorganic phosphate. Develop. Hiol. 91. 131.137.

/:. t’aulsson

360 Frischauf.

A. M., Lrhrach.

H.. Poustka, A-M. B Murray. replacement vectors carrying polvlinker sequences. J. Mol. BioZ. 170, 827 ~842. (ialinskr. M. R.. Arnot. 1). E.. Cochrane. A. H.. Barnwell. ,I. W., Nussenweg. R. S. & Enea. V. (1987). The (Gcumsporozoite gene of the plasmodium cynomology complex. (‘rll, 48. 31 J-319. (ialler. R.. Rydlander. L., Riedel. X.. Kluding, K. & Edstriim, J.-E. (1984). Balbiani ring induction in phosphate metabolism. Proc. ,+“af. =I&. Sci., I..S..-l.

h‘. (1983). Lam bda

81. 1448-1452. Geliebter, ,J.. ZetT. R. A.. Melvold, R. W. & Nathenson. S. G. (1986). Mitotic recombination in germcells generated two major histocompatibilit,y complex mutant, genes shown to be identical by RPU’A sequence analysis: Kbm9 and Kbm6. Proc. iVat. Acad. Sci., /..k?.S. 83. 3371-3375. Goldsborough, A. I’., Robert, L.. Schnick. D. & Flavrll, Ft. K. (1988). In Proceedings of Mr C-II International pp. 727~ 733. Institute for Wheat Genetics Symposium, Plant Science Research, Cambridge. Grand. (1.. Saiga, H. & Edstrb;m. J-E. (1987). 111 Krsu1f.s and Problems it/ Cell Differentiation (Hennig. I%‘., ed.). vol. 14. pp. 69.-80, Springer-Verlag. Berlin. in (‘r/l Grossbach, C. (1977). In Results and Probkms Diflerrntia~tion (Beermann. W.. tkd.). vol. 8. pp. 147 196. Springer-Verlag. Berlin. Gross-Bellard, M.. Oudet, I’. & (Ihambon. 1’. (1973). Isolation of high molecular weight D&A from mammalian cells. Eur. .I. Biorhrm. 36. 3%38. Hawkins, .J. I). (1988). A survey on intron and axon length. Sucl. .4cids Res. 16. 9893-9908. Henikoff. S. (1984). Ihidirectional digestion with exonuclease III creates targeted breakpoints for I)XJA seyuencing. Grnr. 28. 351--359. HiiSg, C.. Engberg. C. 8r Wieslander, L. (1986). A BRl gene in Chironomus tentans has a composite structure: A large repetitive coreblock is separated from a short unrelated 3’ terminal domain by a small intron. Xucl. Acids Kes. 14. 703S7 19. HGiig, c‘., Wieslander, I,. & Daneholt. B. (1988). Terminal repeats in long repeats arrays are likely to reflect, the early evolution of Balbiani ring genes. .J. Mol. Hiol.

200, 655-664. Hourcade. D.. Atkinson, the amino receptor1 Hughes. A. I,. evolution

Miesner, D. R.. Bee. (1., Zeldes, W. & J. I’. (1990). Duplication and divergence of terminal coding region of the complement (CRl) gene. J. Biol. Chem. 265. 974-980. (1990). Involucrin genes of primates: rapid of a repeating DNA segment. Trends Evol.

5. 2-4. ,Jeffreys. A. J., Royle. N. J., Wilson, V. & Wong, Z. (I 988). Spontaneous mutation rates to new length alleles at tandem-repetitive hypervariable loci in human DXA. Nature (London), 332. 278-281. ,Jeffreys. A. J., Keuman, R. & Wilson, V. (1990). Repeat unit sequence variation in minisatellites: a novel source of DiXA polymorphism for studying variation and mutation by single molecule analysis. (yell, 60.

473485. 12’:Y. & Case, S. T. (1985). A novel giant secretion polypeptide in Chironomus salivary glands: implications for another Balbiani ring gene. J. Cell. Biol. 101, 1044.-1051. Kidd. S., Kell ey, M. R. & Young, M. W. (1986). Sequence of the notrh locus of Drosophila melanogaster. relationship of the encoded protein to mammalian clotting and growth factors. Mol. (‘~11 Biol. 6. Kao,

3094-3 108.

et’ al. Lehrach. H.. Diamond. I).. iVoznr~-. .J. 51. K: Nortilir~~. H (3977). RX.4 molecular weight, determinations lry ,gpI ele&rophoresis under denaturing c*onditions. a {,riti cal reexamination. Biochemistry, 16, 4743~~751. Lendahl. I’. & Wieslander, I,. (1983). Kalbiani ring 6 gent> in C’hironomun tentans: a diverged memhclr of’ tht, Balbiani ring gene family. (‘rll, 36. 1027 1034. Lendahl. I~T. HL Wieslander. 1,. (1985). Abrupt ~volutionary change in t,he Balbiani ring gene family in two sibling species of (‘hircmomus .J. Mol. F:w/. 22.

6% 6X. Lrndahl. I’., Saiga, r.. Hiiiig, (‘., Edstriim. .1.-E. HL Wieslander. L. (1987). Rapid and concerted evolution of repeat units in a Balbiani ring gene. Cunrtic.u. 117.

43 49. Levinxon. C.. & Gutman. (2. A. (1987). High frequencies of short frameshifts in poly (‘A,‘T(> tandem repeats borne by bacteriophage Ml3 in Escherichia coli R-l?. Xucl. Acids Res. 15. 5323 -5338. Lyons. K. M.. Stein, ,J. H. & Smithies. 0. (1988). Length polymorphisms in human proline-rich genes grnerated by intragenic unequal crossing over. f:pnptic.s.

120. 267m278. ,Manning, F. R. & Gage. I’. I,. (1980). Internal structure of the silk fibroin gene of Llombyx tori. II. R,emarkable polymorphism of the organization of crystalline and amorphous coding st’cpxww. .J. Biol. C’hpttv. 255. 9751-9757. Maxam. A. 11. $ Gilbert, LV. (1977). 1\ new method for srqurllcing I)h’A. I’roc. .Vnf. .3~/d. Sri.. 1’.,q..1 74. ?560 564. Meyer. I<.. Mtihr. R.. Eppenbergrr. H. $1. B I,rzz~. 11. (1983). The ac%ivity of Balbiani rings I ant1 2 itI trntans larvae under salivary glands of’ (‘hironomus different modes of development and after J)iloc*arpinr treatment. I1evelop. Biol. 98. PBSb77. Muskavitrh, M. A. T. $ Hogness, 11. (1982). An t~spamtable gene that encodes a Drosophila gene Jjrotrin in not Pxprrssrd in variants lacking remott~ uf)strram sequrnrrs. f’ell, 39, 1041L 1051. Paulsson. (;.. Lendahl. I’., Galli, ,J.. Ericsson. (‘. K: Wieslander. 1~. (1990). The Balbiani ring 3 gene in Phironomus kntnns has a diverged rrprtitiv~~ st,rucature split by many introns. .J. ,Vlol. Rio/. 211.

331L349. Philips. M.. Djian. I’. & Green. H. (1990). The in\-oluc*rill gene of the galago. .I. Biol. f’hetu. 265, 7804 7807. Prat, A. (1990). Conserved sequences frank variable tandem repeats in two alleles of the (: surfacr protein of Pflrameeium primaurrlia .I. Mol. Hiol. 211. 52 I I‘, ‘1 t *r Pustell. J.. Kafatos. F. (“., Wobus, Li. $ Baumlein, H. (1984). Balbiani ring DNA: sequence comparisons and evolutionary history of a family of hierarchiral repetitive protein encoding genes. -I. Mol. I&Y. 20. 281-295. Rothnagel, .J. A. & Steinert. I’. &Vl.(1990). The structure of the gene for mouse fillagrin and a comparison of th(, repeating unit. .J. Biol. C’hem. 265. 1862 ~186.5 Itydlander. 1,. & Edstrb;m. J.-E. (1980). Sequences translated by Balbiani ring 756 RPU’A in vitro are present in giant secretory protein from (‘hironomus trntans. Phromosoma, 81, 101~~113. Saiga. H.. (:rond. (‘.. Schmidt. E. R. & Edstriim. .I .-E. (1987). Evolutionary conservation of the 3’ ends of members of a family of giant secretory protein genes in (‘h,ironomu,s pallidiuittatus. ,I. Mol. Evol. 25. 20-28. Saiga, H.. Botella, L. & Edstriim. J-E. (1988). Subrepeats within the BRlP repeat unit in Chironomus pallidioit-

Sequence

can be classified into different types depending on codon usage. J. Mol. Evol. 27, 298-302. Sambrook. J.. Fritsch, E. F. & Maniatis. T. (1989). Molecular Cloning: A Laboratory Manual, 2nd edit.. (“old Spring Harbor Laboratory Press. Cold Spring Harbor, X’T Senapathy. I’. (1986). Origin of eucaryotic introns: a hypothesis based on codon distribution statistics in genes and its implirations. Proc. Sat. ‘4cad. Sci.. f..S.A. 83. 2133.-2137. Smith. 0. I’. (1976). Evolution of repeated DNA sequences I)?- unequal crossover. Science, 191. 528.-535. Sorimachi, H., Emori, Y.. Kawasaki, H., Suzuki. K. & Inoue. Y. (1990). Organization and primary sequence of multiple genes coding for the apopolysialoglypoproteins of rainbow trout. J. Mol. Biol. 211. fatus

X--48.

Studencki. A. B. & Wallace, R. B. (1984). Allele specific hybridization using oligonucleotide probes of very high specific, artivity: discrimination of the human [IA and PS plobin genes. DNA, 3, 7-15. B. (1982). Siimegi, .I.. Wieslander, L. & Daneholt. .4 hierarchic arrangement of the repetitive sequences in the Ralbiani ring 2 gene of Chironomus tentans. (‘fll. 30. 579 -587. Tautz. I).. Trick, M. & 1)over. G. ,4. (1986). C’ryptic simplicity in DPI’A is a major source of genetic2 variation. Saturn (London) 322, 652-656. Taut,z. I).. Tautz. (1.. Webb. D. A. & Dover, G. A. (1987). Evolutionary divergence of promoters and spacers on the rDNA family of four Drosophila species: implications for molecular coevolution in multigene families. .J. Mol.

Biol

of the BR 1 Gene

Organization

195. 525-542.

Teumrr. *J. & (ireen, H. ( 1989). Divergent evolution of part of the inrolucrin gene: unique intragenic duplic.at)ions in the gorilla and human. Proc. ,Vat. Acad. Sri., f,‘.S.A. 86, 1283-1286. Tsrng. H. 8r (:rcJen, H. (1988). Remodelling of the involurrin gene during primate evolution. CelZ, 54. 4!)1--496. Tsujimoto. 1’. & Suzuki. Y. (1979). Structural analysis of the fibroin gene at the 5’ end and its surrounding region. (‘ell. 16. 425-436.

361

Wallace. R. B. & Miyada: C. G. (1987). Oligonucleotide probes for the screening of recombinant, DNA libraries. In Nethods EnzymoEogy (Berger. S. L. & Kimmel. A. R.. eds). vol. 152, pp. 432-442. Acadrmir Press Inc.. San Diego. Wrllman. S. E. & Case, S. T. (1989). Disassembly and reassembly in vitro of complexes of secrrtory proteins from Chironomua tentuns salivary glands. J. Biol. Chum. 264. 10878-10883. Wertman, K. F.. Wyman, A. R. dz Botstein. D. (1986). Host,/vector interactions which affect t,he viability of recombinant phage lambda clones. (knr. 49. 253-262. M’ieslander. L. (1979). A simple method to recover intact high molecular weight RNA and DNA after electrophoretir separation in low gelling temperature agarose gels. ilnal. Riochrm. 98. 30:jp309. Wirslander. L. Rr Lendahl, C. (1983). The Balbiani ring 2 gene in f.‘hironomus tentans is built from two types of repeat units with a common evolutionary origin. ti:MBO J. 2. 1169-1175. Wieslandrr. I,.. Siimegi. J. & Daneholt. K. (1982). Evidence for a common ancestor sequence for the Balbiani ring 1 and Balbiani ring 2 grnes in Chironomus 6956-6960.

tmta,na. Proc. ,Vat. Acnd. Sci.. I’.S.il

79.

Wirslander. I,., HSiig, C., Hi%g, J-O.. ,Jiirnvall. H.. Lendahl. LT. & Daneholt. B. (1984). (‘onserved and nonconservrd structures in the srcrrtor,v proteins rmoded in the Balbiani ring genes of (‘hironomus tentans.

J. Mol.

Evol. 20. 3041312.

N’illard. H. F. bt Waye, J. S. (1987). Hierarcshic order in chromosome-specific human alpha satellit,e DR’A. Twn,ds Oenet. 3, 192-198. Wolff, R. K.. I’laetke, R., Jeffreys. A. J. & White. R. (1989). Irnequal crossing over bet,ween homologous chromosomes is not the major mechanism mrolved in thr generation of new alleles at VNTR loc*i. GrrLomics, 5. 382-384. ~$‘range. ii.. Eriksson, P. 8r Perlmann. T. (1989). The purified and activated glucocorticoid rec*eptor is a homodimer. .I. BZoZ. Chem. 264. 5253-5259. WFSS. (‘. (1982). Chironomus tentans epithrlial cell lines sensitive to ecdysteroids. juvenile hormon. insulin and heat shoclk. Expt. CPU Rp.s. 139. 297 307.

Edited by M.

)‘anir