Dispersal process associated with the L1 family of interspersed repetitive DNA sequences

Dispersal process associated with the L1 family of interspersed repetitive DNA sequences

,J. Mol. Hiol. (1984) 178, 795-813 Dispersal Process Associated with the Ll Family of Interspersed Repetitive DNA Sequences (‘HARLES F. VOMVA, SAND...

1MB Sizes 0 Downloads 72 Views

,J. Mol. Hiol. (1984) 178, 795-813

Dispersal Process Associated with the Ll Family of Interspersed Repetitive DNA Sequences (‘HARLES

F. VOMVA,

SANDRA I,. MAKTIK, (‘LYDE H. fhiF,I,I, AND hfARSHAI,l,

A. HUTCHISON

ITT

(,‘urriculum in CenetGx and Department of Microbiology and Ifmmunology The University of North Carolina, Chapel Hill. S.(‘. 27.514. I:.S.il. (Received

30 March

7983)

We have determined the complete nucleotide sequence for five members of the LlMd repetitive family from the beta-globin gene region of the RALR/c mouse. The five repeats are different lengths, each terminating at the 5’ end at different points with respect to one another. We have analyzed the nucaleotides around t,he endpoints of the five repeats for clues as to thtk mechanisms involved with the dispersal and 5’ truncation of this repeat family. Each Ll memher is flanked by a pair of short direct repeats. Since these direct repeats differ in length and sequence in chach of the five cases, the dispersal mechanism does not’ involve a sequence targeted process. The sequence at the 3’ end is conserved and its organization resembles the 3’ end of a polyadenylated RXA, suggesting that transcripts of the repeat are involved in the dispersal process either directly or as intermediates in the generation of complementary DNA copies of the sequence. One of the Ll repeats is a recent insertion. since it is found in the Hbbd chromosome. but not in the Hhb” chromosome. This suggest’s a dispersal procaess that has been active as rrc.rntly as 4 million years ago.

1. Introduction Tt has become apparent that’ the eucaryotic genome is a very dynamic structure (for reviews, see Hood et al., 1983; Arnheim, 1983). Contributions to this genomic flux (Ijover, 1982) arise from the continued dispersal of members of interspersed repetitive sequence families and the genetic exchange processes that are involved in the reduction of sequence diversity in such families. Since it is difficult to study t,hese processes directly, one must, in general, examine the features of the sequence elements involved for clues as to the processes acting upon them. For example, this approech has been useful in that it has led to a recognition of the similarity in structure among transposable element,s in different organisms. It is reasonable to assume that all of t,he various transposons with similar structure were dispersed by similar mechanisms. There are other classes of repetitive elements that do not possess the sequence feat)ures characteristic of transposons but are also dispersed throughout the genome. We assume that these repeats are dispersed by a mechanism different 79.5

S!jli

(’

Ii

1’01,IVA

ET .1f,

from that of’ transposons. An example of such a repetitive element is the Alu repeat family. This repetitive family was initially described irr C’hinese hamster ovary cells (,Jelinek. 1978) and later shown to be homologous (Haynrs it al.. 1981) to repetitive sequences in monkey (Dhruva et al., 1980). human (Deininger et al.. 1981); mouse (the Bl repeat; Krayev et al., 1980), and chicken (t,he CR1 repeat): Stumph et al.. 1981). The Ah-type repeat displays several features that) are conserved in each of the genomes in which it has been described. Each of the repeats contains within it an RNA polymerase III promoter and one end of the repeat resembles. in its structure, a polyadenylated RNA (for a review, see Jelinek & Schmid, 1982). These structures are featured in a model that, describes the dispersion of these sequences (Jagadeeswaran et al., 1981). In that model, an existing Alu element is transcribed by RNA polymerase III. The transcript primes itself for reverse transcription into a cT)NAf’, which is then integrated at the new location. Some of the structural features important for this model are shared with DNA sequences homologous to snRNAs and a similar model has been proposed for the dispersion of these sequences (Van Arsdell et al., 1981). These examples, together with the discovery of numerous pseudogenes that resemble Lee et ul.. 1983). processed mRNA transcripts (Vanin et al., 1980; Gwo-Shu suggest that insertion of DNA copies of RNA transcripts is a common mechanism for the integration of new sequences into the genome (Sharp, 1983). The Alu elements, snRNA pseudogenes and retroviral proviruses, as well as processed pseudogenes have another feature in common. They are all flanked at the ends by small direct duplications of the sequences into which they were integrated. This feature is shared with sequence elements that resemble procaryotic transposons (see Calos & Miller, 1980) and models have been derived to account for the short repeats (Galas & Chandler, 1981: Harshey & Bukhari, 1981). Presumably, the integration mechanisms associated with each of these families share a common step. There is another highly repetitive sequence family dispersed in the mouse genome that shares some of these sequence features. The LlMd (for LlNES (Singer, 1982) one in Mus domesticus) repeat family was initially described as many separate repetitive elements (Meunier-Rotival et al., 1982; Gebhard et al., 1982; Brown & Dover, 1980; Fanning, 1982), which have been linked subsequently into one large element at least 7 kb long (Fanning, 1983; Voliva et al., 1983). This repeat. like those described above resembles a polyadenylated RNA in that there is a conserved A-A-T-A-A-A sequence followed by many adenine residues at one end (Fanning, 1983; Gebhard et al., 1982; Wilson & Storb 1983; Voliva et al., 1983): which suggests that this repeat also disperses via an RNA intermediate. Sequences homologous to LlMd are found on polyribosomes (Soriano et al., 1983) and are transcribed by RNA polymerase II (Shafit-Zagardo et al., 1983). Individual members of LlMd, like other sequences that transpose or disperse, are flanked on each end by short direct repeats (Wilson & Storb, 1983). However, the Ll repeats differ from the other described repetitive families in that, while the 3’ end (which resembles the polyadenylated RNA) is conserved, t Abbreviations used: cDNA, complementary DNA; snRNA, base-pairs: bp, base-pair(s); RF DNA. replicstive form DNA.

small nuclear RNA; kb. IO3 bases or

L-ASSO(‘IATED

DISPERS.AL

I’ttO(‘ESS

797

the opposite end is usually truncated (Voliva et al.. 1983) at points that appear random with respect to one another. Consequently, the length of each repeat member varies. We have determined the complete nucleotide sequence of five members of the LlMd family that are found in the beta-globin gene region of the BALB/c mouse. The sequence data are used here to examine different mechanisms that could give rise to the dispersal and the truncated form of the members of this repeat family.

2. Materials and Methods (a) (‘Iones and DXA preparation Restriction fragments containing LlMd homologous sequences were subcloned from the appropriate Charon 4A clone into AM13p2 (Gronrnborn 8: Messing. 1978), M13mp8 or Ml3mp9 (Messing & Vieira, 1982: Vieira & Messing. 1982). The clones constructed include EcoRI fragments X, U, VV and V (all cloned into M13mp2). t,he 250 bp EcoRI-BumHI fragment from EcoRI fragment 4.5 (cloned into both Ml3mp8 and M13mp9), and the 500 and 900 bp BumHI fragments from EcoRI fragment 4.5 (both cloned into M13mp9). These fragments are defined in Fig. 1 (see also ,Jahn et al., 1980). Clones were prepared by ligating restriction fragments, purified by polyacrylamide gel electrophoresis, into Ml3 RF DNA that had been cleaved with EcoRT. Typically. the reaction included 0.5 pg of insert DPU’A, 0,2 pg of cleaved Ml3 RF DXA. and 1.5 to 3 units of phage T4 DKA ligase (Bethesda Research Laboratories) in a total volume of 20 ~1. The reaction was buffered according to the ligase manufacturer’s specifications. and was incubated from 4 to 12 h at 20°C. The ligase mixture was transfected into competent Escherichia~ coli strain JMlOl according to the protocol of Dagrrt B Ehrlich (1979) and plat,ed in the presence of 300 PMisopropyl-n-thiogalactopyranoside (from Sigma) and 093($, (w/v) 5-bromo-4-chloro3 indolyl-fi-n-galactoside (from Bachem. Inc. or Sigma). Ten colorless plaques from each transfection were cored and individually suspended in 0.3 ml of fresh overnight E. coli *TM101 grown on glucose minimal medium (Nller, 1972) in N-ml polypropylene centrifuge t’ubes (Corning). To this was added 40 ml of VT broth (Miller. 1972) and the cultures were incubated with agitation at 37°C’ for 10 to 12 h. Phage DNA was prepared from each culture. The vulture was clarified by centrifugation at 4000 revs/min in a Sorvall HS-4 rotor. and 5 ml of the supernatant was saved. To the remainder of the supernatant, polyethylene glycol 6000 (Fisher) was added to lO’$, (w+) and h’aC1 to 0.5 M. The liquid was chilled overnight at 4’(’ or for 1 to 2 h in ice-water and then centrifuged for 20 min at 4000 revs/mm in an HS-4 rotor. The pellet was resuspended in 200 ~1 of Tris/EDTA (1 mM-Tris. HCI (pH 8.1). 0.1 mnt-EDTA) and extracted once with chloroform. 3 times with saturated phenol. and twice with ether. Sodium acetate was added to 0.3 M followed by 750 ~1 of ethanol. After being chilled in a solid CO,/ethanol bath for 5 min. the phage DNA luecipitate was vollectrd by centrifugation in an Eppendorf Microfuge. The pellets were resuspended in 50 ~1 of TrisiEDT.4. Each phage DKA preparation was tested for size of the insert by electrophoresis beside marker DKAs (including $X174 and M13mp2 phage USA) in 0.8($ (w/v) agarose gels. The orientation of t>he insert in each (alone was tested using an annealing assay: 3 ~1 of one isolate was mixed with 3 ~1 of each of the other isolates and 1 ~1 of H buffer (described below). After boiling for 1 min. the mixture was plunged into ice-water. then incubated for 1 h at 65°C. The products of this annealing reaction were analyzed by electrophorrsis through 1% agarose in E buffer (40 miw-Tris. 20 m?n-sodium acetat,e, 2 mM-EDTA. 18 mMNaCI. pH 8.0) for 500 Vh. In this assay. 2 phage DK;As that have the insert in the same orientation migrate as one mixed with itself. whereas inserts having t’he opposite orientation show spurious bands that migrate more slowly in the gel (W. Barnes. personal communication). Two clones. each of the appropriate size but opposite orientation, were selrct,ed for sequenre analysis.

EcoRI

EcoRI

,SSf

I ‘Bgl I

LHgio II Barn HI

T oTROR1

c % L 4

I

\ \ \ \ \ \

\\\

Ll-ASSOCIATED

DISPERSAL

799

PROCESS

Large preparations of phage DKAs were prepared by adding 9 ml of fresh overnight E. coli JMlOl (grown on minimal glucose medium) and 1 ml of the phage sample (saved from the 40.ml culture grown previously) into 1200 ml of YT broth in low-form flasks. The culture was incubated at 37°C for 8 to 9 h with agitation. The culture was clarified by centrigation in a Sorvall GS-3 rotor at 8000 revs/min for 15 min. The phage were precipitated with polyethylene glycol and NaCl as described above; and pelleted by centrifugation at 8000 revs/min in the GS-3 rotor. The pellet was resuspended in 10 ml of Tris/EDTA. The phage suspension was layered onto a step gradient in SW27.1 ultracbentrifuge tubes. The gradient was composed of 3 ml of CsCl in Tris/EDTA at a density of 1.7 g/ml, 3 ml of CsCl in Tris/EDTA at a density of 1.5 g/ml, and 6 ml of CsCl in Tris/EDTA at a density of 1.4 g/ml. The step gradients were centrifuged at 23,000 revs/min for 3 h. Phage bands were collected, dialyzed against 2 1 of Tris/EDTA (4 changes), then wit,h ethanol as described extracted with chloroform, phenol and ether, and precipitated above. The phage DNA was resuspended in Tris/EDTA at a final concentration of 0.2 pg/ml. Typical yields were 0.5 to 1 mg of phage DNA per liter of culture. (1)) &yuencing The

DNA

sequence

was

determined

using

a combination

method of Sanger et al. (1980) and the chemical degradation (1980). The sequencing

strategy

of each of the fragments

of the

chain

termination

method of Maxam & Gilbert

is shown

in Fig. 1. Over

9Oyb of

the repeat sequence was determined on both strands. Sequences flanking the repeat were usually determined in only one orientation. Some double-stranded DNA fragments to be used for priming sequence reactions were generated from phage DNAs. Equimolar amounts of 2 phage DNAs that contained the insert in opposite orientations were mixed in the presence (pH 7.5), 6.6 mi%-Mgcl,, 1 mu-dithiothreitol, 50 mM-Pu’acl).

of H buffer

(6.6 m&f-Tris

. H(‘1

The mixture was boiled for 3 min, plunged into ice-water for 1 min, and then incubated for 1 h at 65°C. The hybrid molecules were digested with the appropriate enzyme and the fragments were fractionated on 40 cm polyacrylamide gels in E buffer. Single-stranded debris migrates slowly under these conditions, while the hybrid double-stranded fragments migrate at their characteristic rate. This method is not appropriate for use with fragments with large segments of internal palindromy (e.g. the U fragment) or with enzymes that cleave singlestranded

DNA

(e.g. HaeTII). (c) CornpAw

analysis

Homology search analyses were done using a diagonal-traverse algorithm written and described by White et al. (1984).

homology

search

FIG. 1. The location of and sequencing strategy for the 5 LlMd elements. (a) The location in the beta-globin gene region of each of the LlMd elements is shown. The arrow indicates the direction of the 5’ truncated end (see the text). EcoRI sites are marked as + and globin homologous sequences are indicated. The sequencing strategy for (b) the W, 0.07, C and X EcoRI fragments, containing LlMd-l,LlMd-2 and LlMd-8, (c) a portion of the V fragment, containing LlMd-3, (d) and the 0.9 and part of the 4.5 fragment containing LlMd-4 is shown. Only the enzyme sites used in the sequencing are shown. Brackets around the name of an enzyme (e.g. [HgiaI]) indicate that the site does not exist in that particular fragment. Homology to those primers was high enough to allow the use of the primer. provided the sequencing reactions were not digested with the enzyme. Dideoxynucleotide chain terminator reactions are indicated by H, while chemical degredation sequencing is denoted by r, A bar flanked by asterisks (*-*) indicates sequences taken from other publications. The sequence for the V fragment (see (c)) from the EcoRI site to the BarnHI site is taken from Phillips et al. (1984), while the sequence for the 0.9 kb EcoRI fragment (see (d)) is presented by Hutchison et al. (1984).

3. Results b\‘c have determined t)hr corn plet,e nucleotide sequence of five members of tht Ll Md repetitive famil,v found in the KALR/c mouse beta-glohin locus (Fig. 2). The Ll Md rlemrnt,s are different in length. At the 3’ end. each contains the sryuence :‘,‘-A-A-T-A-A-A-:’ followed by an adenine-rich region and each extends in the 5’ direction for different distances. The longest element is LlMd-4. which is 2095 bp long, followed by l,lMd-I (1806 bp), LlMd-2 (1433 bp) and Ll Md3 (633 bp). The shortest element) is LlMd-8, which is only 180 bp long. Thr l,lMd-8 element, is a recently discovered member of this repetitive family in the globin gene region. We have reported that there were seven elements in the gene region (I’oliva et al.. 1983). but t,he identity of LlMd-8. inserted into the LlMtl-2 clement. was obscured in that hybridization analysis. The nature of the insertion became evident only upon DNA sequence analysis of that region. The repetitivr elrments arc flanked on each end by direct repeats of varying Iengt,hs (Fig. 3) that are generated during the integration of the LlMd element,. The evidence for this is derived from the organization of LlMd-8. which is inserted into LlMd-2. Since the direct repeat that, flanks the Ll,Md-8 sequence is found only once in t,he canonical l,l%ld seyuence. the duplication must have occurred during the integration process. The direct repeat in each 1~1 element is flush against the end of homology at the 5’ end of the repeat, and folloas the adeninr-rich region at the 3’ end (Fig. 2). The length of the direct repeat pair varies with each LlMd element. The longest is 15 bp and brackets Ll Md-1, while the shortest, is 7 bp and brackets LlMd-8 (Fig. 3). The globin locus has expanded over evolutionary time by a series of duplications from a single gene to one that contains seven globin genes in the mouse (Hardies et al.. 1984). The current arrangement of globin genes and repeats (Fig. 1) is consistent with t’he elemental unit being duplicated as a globin gene flanked by an I,1 Md repeat. This model is ruled out by our data. since each Ll element is flanked by a different pair of direct repeats. arguing that each insertion was a separat’e event. Each 1,lMd element is flanked on the 3’ end by an adenine-rich region. The lcrrgth of the region varies between repeats (Fig. 2). The shortest (measured from the end of A-A-T-A-A-A to the beginning of the direct repeat) is only 6 bp (flanking LlMtl-X), while the longest is 59 bp (flanking LlMd-2). Since the adcnine-rich regions immediately follow the sequence 5’ A-A-T-A-AA 3’ (the putative signal for polyadenylation). the 3’ end of the elements resemble the 3’ end of a polyadenylated RNA. This structure has been noted in members of the Ll repeat family found in other locations in the genome (Gebhard et al., 1982; Fanning, 1982,1983; Lueders. 1982; Wilson & &orb, 1983), in members of the homologous simian Ll (or KpnI) repetitive family (Thayer & Singer, 1983; Singer et nZ., 1983), as well as in cDXAs homologous to the simian Ll repeat (DiGiovanni et al., 1983). The organization of the 3’ end suggests that’ LlMd, like many other sequences (Sharp, 1983), disperses CL an RNA intermediate. The Ll elements may have different lengths if the RNA transcripts, through which they are dispersed, are of varying lengths. This may happen if, within

650

703

900

950

1000

1050

1100

1200

Flc:. d (W//f.)

ATAGTACTACCGGASGATCCAGCAATACCTCTCCTGGGCATATATCCAGAAGATGCCCCAACCGGTAAGAAGGAC~CATGCTCCACTATGTTCATAGCAG II /I II fI I,, ,/,,,1////I,/,II/,I,,,,,,,,/,ll,,,,,,,/,11/,,1,,,,,,, I I I ,I, I II II II II II II I,, I I I, I I/ I1 I, I, II, I /I II /I II II II I/ /I /I 1I8I II I I / / I I / I / / 1 / I / I / I / I I I I / I / I I / I I I I I I I I I I I I I I I I I I I I I I I I I ;/I\\;;/ ATAGTACTACCGGAGGATCCAGCAATACCTCTCCTGGGCATATATCCAGAAGAAGCCCCAACTGGTAAGAAGGACACATGCTCCACTATGTTCATAGCAG I I I, I I I I / I I / I I ,11,11l1//,lI, ,/,/,,/,,,/,1111, ,,,,/1,111//1,/,,!1I ;;;I;;;; / I ! / / / ! ! I I I I I / I,4, I,I I 1, I I II II II /I I,, I I / (1 II /I II1/ II I/ (I II ,llllIlI1,//,, ,,1,/,,,,,,/,1,,, I,,!,,,,I,I,,IIIIIII ATAGTACTACCGGAAGATCCAGCAATACCTCTTCTGGGCATATATCCAGAAGATGCCCCAACCAGTAAGAAGGACACATGCTCCACTATGTTCATAGCAG ,,,,,,,II,,,/I,,I,/,,,,,,/,,,I,,I ;j ///,,//1/,,,,,,,,,1111 I,!lII !/ I (/ ,, I Ii//i// 1 I / I / / / , I I I I / I I I / , I 1 I, I I / I1 / I I I ) / I / / /l,//11,111,,,/,, ATA~TATTA~~~GA~~A~~~~~~~~~~~~~~~~~~~~~~~~~'~~~~~~~~~~~~~TT~~~~~~GGTAAGAA'~GA~ACATG~T~CA~TATG'~TCA~AGAA~ * -EAM-5

1150

TCCATTSTTGGTGGGATTGCAAGCTTGTACAA CCACTCTGGAAATCAGTCTGGCGGTTCTTCAGAAAATTGGAC AOX"JT';GASAAAGAGSAA,ACTCC ,,/,/,,//,,11,11111IIIIIJI,,,,,III,II I,IlIII!IIIIII I,I, I,( / I/, / / I, I I/ I,/ I I,, II!IIIJ11111~1 / I I /I, I II, / ,I /I II I/ /I 1I /I II /I II /I /I /I I,I / I/ I/, I I,I / II II /I II II ,,1111111,//II/I,II~/IIII~IIIIIIIJIII AG~ATGTG3AGAAASAGSAACACTCCTCCATTGTTGGTGGGATTGCAGGCTTGTACAACCACTCTGGAAATCAGTCTGGCGGTTCCTCAGAAAATTGGAC ,,,II,,IIIII,IIIIIIIII/I~IIII,I I;;; IllIIIIIIIII(I /, I I/ II,,I I II,,I ( / ( II II I,/ I I,, /, I / I, I1 11 / 1 I I I, I I I, I I I I1 I I I I ! ! I I II I I I / I I / I I I / I I, / / , I I / / / / I / / I I I I I I I I I / 1I I I / / / I I / I I I I I I I I I / / / 1/ , I I I I AGGATGTGGAGAAAGAGGAACACTCCTCCATTGTTGGTGGGATTGCAAGCTTGTACAACCACTCTGGAAATCAGTCTGGTGGTTATTCAGAAAATTGGAC ~//~I/,/1111///11,/II/,1//,,1/I/I//I,/,I ,,,,,,,,,111,111/,1,II/~, ,,IIIIIIIIII ; II I I I I IIIIIIIII ~/,,11/I/I,/,IIII,I,,//,,,,I,,,,IIII,II/ /,,,,,,/,I,,,I,,I,,1111/1 1,IIIIIIIlIJ AGGAT~TGGA'~AAAGAGGAACACTCCTCCATTGTTGGTGGTGGCATTGCAAGCTTGTACAACCACT~TGTAAATCAGTCTGGCGTTTCTTTAGAAAATTGAAC

LIMd-8

TAAT7ATCAGGGAAATGCAAATCAAAACAACCCTGAGATTCCACCTCACACCAGTCAGAATGGCTAAGATCAAAAATTCAGGTGACAGCAGATGCTGGCG I, I I I I,, , I, I / , / I I, I I I I I I I I1 / I I I I, I I I I I I I I / I I I, I I I I I / I / / I I I I I I, I / I I I I / I I I, I I I I I I I I I I I I I I1 /, I I ( I ( I I TAA"CA?:AGGGAAATGCAAATCAAAACAACCCTGAGATTCCACCTCACACCAGTCAGAATGGCTAAGATCAAAAATTCAGGTGACAGCAGATGCTGG( I / I / 1), 1/, I / I I I II II II II (I II I(, I /, II, , ,I ,/ ,1 ,/ I/ II I/ ,/ II II II II II II /I II /I /I ,I I/ I/ II, / I/ II II I/ II II II II /I, I I/ ,/ ,I, I I/ I II ,/ // // // I, II I/ /I ,I ,/ I,I I I/ I, II II II II II I,/ I It II, / II II t I I I ( I I TAATCATCAGGGAAATGCAAATCAAAACAACCCTGAGATTCCACCTCACACCAGTCAGAATGGCTAAGATCAAAAATTCAGGTGACAGCAGA

,AAATGGGGCTCAGAACTGAACAAAGAATi'b I, I, I,, ,

850 ACCTGAJGAATACCGAATGGC AGAGAAACACCTGAAAAAA.TGTTCAACATCC? / I I, I I I / I / / / I I /I I/ I/ II /I ,I I,I I II I( /I /I // I I/ /, /,1 I I/ I/ / /I I/ !/ II !I / I/ I1I, I I//II/ II/II/ / I I / / / I / I / 1/ / ACCTGAGGAATACCGAATGGCAGAGAAGCACCTGAAAAAA.TGTTCAACATCCT ,,,,,,,I,,, Il11/11//1/11 III 11111,///111, II//II~IIIIII III /, , I, I I I I I I I1 I I ,"""I I / 1 I / /,,,I/,/,// ACCCTAGGAATACCGAATAGC AGAGAAACACCTGAAAAAATTGTTCAACATCCT III ,,I I,, I,, I// IIIII/IIII ACCTGAGG.......AATGGC AGAGAAAC

LlMD-:!

LlMD-1

LlMD-4

CONSENSUS

LlMD-2

LlMD-1

LlMD-4

Z 3N SE:i Y J 3

LlMD-2

LlMD-1

LlMD-4

CONSENSUS

LllrlD-2

LlMD-1

LlMD-4

CONSENS-J3

1400

1350

1450

1500

ATCCTGAGTGAGGTAACACAATC AC AAAGGAACTCACAC I / I / I I I,, III II TGGACCTGGAGGGCATC IIII,/,II/I //!,I 11,,,1/II,,,,II,/,II,/I I I ,II/JJIJ//J/JJ I,llll~J~/I II,II II/I//~/I//III//J~lII~~ TGGACCTGGAGAGCATCATCCTGAGTGAGGTAACACAATC ACAAAGGAACTCACAC II IIl,,,,l/ll ,,I,, 1,,I,/III/Ill111/ //I 1 1 //I///1III 111111/1/11 /III/ I I I I I I / I I I I I I I I I I 1;;;; ACAAAPGAAPTCTCAC a ATCCTGAGTGAGGTAACCCAATC TGGACCTGGAGGGCATC Iiil,,,‘;\l l/l //;;I ,/I/I //,I,,IIII ,,,I,, 111 / I I I / 1 / /II//lIll/ I I I, I / I t I I / I I 4 8 / I/ / /II;E TGGACCTGGAAGGCATCATTCGGAGTSAAGTAACACAATC ACAAAGGAACTCACAC

k?ZAAAGTGTGGACACTTTGCCCCTTCTTAGAATTGGRAACA , / , I,, / , / / III,,/, III,,,, ;I; I,,, I/Il,,/I//,// IIIII,, /II/I/, 1/ IIII ACTGAASTGTGGACACTATGCCCCTCCTTAGAAGTGGGAACA ~/l!lIIIlIlI, III/II/ /!\,,I, ~I II///,l/I//,, /III,,/ III,,,, 1I I I i I ACCAAASTGTGGACACTTTGCCCCTTCTTAGAATAGGAAACA //\I,lII,/l/,,,,, I / I I / / I / I / / / I, ;;I ,/I,I,II/I,,, I / I I I I I I,, I I / IIII ACCAAAGTSPGGACACTGTGCACC'tTCTTAGAAT'CGGGAACA I,Il///1/111,,,,/ ;// 11,111C1,/,II,I //I I III//I// //II IIIIIII/IIIIIII 14, AS'~AAASTSTGGACACTTTGCCCCTTCTTAGAATTGGAAAC.

1550

1600

AATA~GTACTCACTGATAAGTGGATATTAGCCCA.AAACTTAGGATACCCAAGATATAAGATACAATTTGCTAAATACATGAAACTCAAGAAGAA~GAA'~

A03T4'TAAAAA3AA"GAATTTATGAAATTCCT,AGCCAAATGGA I , !, / I,, , , / > I, \ /,I, /, , / , I 9,/ \/ b ,/ // I/ b )/ It I I,\, , I, ,I,,\, ,, ,I ,I, , II It AG21ATThAA4A;AA"SAATTTATGAAATTCCTAGCCAAATGGA I II /I,,I /I, I I, I I,, / /I,, I II I,, I I /I, I I, I // I1 I I, I I II I,, I / ,I II II II I/ AGC:AIPAAA4A'7AATGAATT'PATGAAATTCCTAGCCAAATGGA l/lI'll,~I\, ,,1llII,,I,\1\/III,/ III,,II ,/l//l,//,,, l/IIIII/lII/IllIIl/I /I/II/I AAS"ATT4AAAAGAG'GAATTTA'PGAAATTCCTAG:GAAATGGA

1300

1250

Y3-'

ill?L-

L\MD-2

Li

LlMD-4

i

5::: NJ E N3 J :i

LlMD-J

51 i4D-1

LlMD-'

LlMD-4

CONSEN 3'JS

Ll?l9-2

LlMD-1

LlMD-4

': #)I$3EIJ

LlMD-2

LIMD-1

LlMD-4

CGNSEN.3U::

TGTATCAAA AGATGGCCTAGTCGGCCATCATTGGAAAGAAAGAGAGGCCCATTGGACAGGCAAACT~'~ATATGC / I,, I,,,, ,,,, ,,,,, ,,,~~,,,,~,,~~~,~~, /,,/,I* I I/

TGTATCAAAAGATGGCCTAGTCGG.SATCACTGGAAAAGAGAGGCCCA~TGGACANGCAAACTTTATATGC 11///1,IIII,,,I,I//I,I,I III//\\ II 111111t,1111,11,,1 III ,,,I, I ,I,

1850

AGGGGA I ,I II )I /I I/ AGGGGA I, I // /I II (/ AGGGGA ;/:: 1 AGGGAA

CONSENSUS

MD-l

LlMD-9

LlMD-3

LlMD-2

Ll

LlMD-4

2050

2100

ACGTAATAAAAAA ANGGG AATGGGTGGGTAGGGGAGTG. GGNGGGAGGGTT.TGGGGGACTTTTGGGATAGCATTGGAAATGTAAT?y+FGAAAAT I I I I, / I I, I I I I I I I I I GG. / , I, I I I I I I I I, 1 I I I I1 I ! I I, / I IHII , I / I I I I ,IIIl,,1111,,,/II/II / I I I I /II ,,I,IIIIIIIIII,,Il I,,, I I / / !I IIII/,//III I/ I/ II, I II II !I II I! I,, I I I II II, ( I,,I,IIIIIIIIIIIIIl GGGGG AGTGGGTGGGTAGGGGAGTG. GG. GGTGGG.TGGGTAAGGGGGACTTTTGGTATAGCATTGGAAATGTAAATGAGCTAAATACCTAATAAAAAA ,,I/IIII,,IIIIIIIIII,,/,,,,,,,,,,,,,/,,,,,, I/ II I // 1I I I /I I 1/ f/ // I I / I / ,I II,,/ I / II , I/ II /I II I/ II /I II II I,I, I/ II II \ I / IIIII II I,IJ,11,,,,,,/,,,,II,,/,,,I/I,IIIII~/,III,, i/III I/IIJIII/,I,I,,,,/I/,,,//,111/III,I/I/~I/// AGTGGGTGGGTAGGGGAGTG. GG. GGTGGSTGGGT ACGGGGGACTTTTGGTATAGCATTGGAAATGTAAA?+FCT.A?fT ACCTAATAAAAAT III/II/I III/I I I I \/,\I\, I,,,, 11,,,1/11111lIII,/ I\ III,I~/,II iiiiiibiiiiiiiiiil I/Ii III/ I 11111//1111,/ /\,I,!\\/\\,, /\l\, ,,1,11/11/111,1111 I/II III/ ) 1; iiiiiii I,,,,,, iiiii ,,I,, \ \ 1I I ACGTAATAAAAAA AATGGGTGGGTAGGGAAGTG.GG. GGGP. . AGGGTA;T~~~G~+ATTTGGGATAGCATTGGAAATGTAATTGAGGAAAAT ;TFFyGF+ATTTGGGATAGCATTGGAAATGTAATTGAGGAAAAT ,I /, I /I, , I,I / I/ II II II // I,I1 /I I11, I (, / II // /I I/ I,I, // /I,,I I II II II II II /I ;;I/ III1Il1I ,,/l/IlIIII/Il, /( I//I//II I I I / I I / I I I I 1 I / I ///I I!, / / I 1/ /I III / /11lI AATGGGTGGGTAGSGGAGTGT .TGGGAGACTTTTGGGATAGCATTGGAAATGTAATTGAGGAAAAT ACGTGATAAAAAA ‘GG? 'GGGGGGA'SAGTG : , , , , / / I I, / / , / I I I I ! I I ! I /I II I1, / II /I /I I/ I/, I II /I II /I /I II // / /I 1 I 1; ""' III/I 11,/1\1111111/ 11,/,,1,,1,,,/ II/I I/I/I/II ,,,,,//I/I,,,/ 11111//I/I,,,/ ,////II///I/I/IIIIII1111 I,/// I/ //,I/ ATTGG AATGGGTGGGTAGGGGAGTG. GG. GG.GGGA'fGGT: .TGGGGGACTTTTGGGATAGTATTGGAAATGPAATTGAGGAAAAT ACGTAATAAAAAA

!

il MD-9

LlMD-3

LIMD-'

LlivlD-!

LlMD-4

r,3VISEII33S

L-ASSO(‘IATF:D

AAAAATCATCTTTGG---------LiMd-1 AATTGCCAAATTCA---------LlMd-2 CATAGGATTTG---------LlMd-3

TTACTGGAGC---------LlMd-4 TAAGGAT---------LlMd-8

IIISPERHAI,

E’ROC’ESS

(1806 (1453

bp)---------AAAAATCATCTTTGG bp)---------AATTGCCAAATTCA

( 633

bp)---------CATAGGATTTG

807

(2095 bp)---------TPACTGGAGC ( 180 bp;---------TAAGGAT

Fro. 3. The direct repeats that flank the LlMd elements. The direct repeats that flank the 5 sequenced LlMd elements from the beta-globin gene region are shown. Indicated between the short direct repeats is the name of the particular element and the size. in base-pairs. of the insertion.

existing copies of the Ll element, sequences capable of acting as RNA polymerase II promoters arise through mutation (the repeat is reportedly transcribed by RNA polymerase II; Shafit-Zagardo et aZ., 1983). This would predict that homology in the consensus sequence to the putative RNA polymerase II promoter (Breathnach & Chambon, 1981) should lie 5’ to the endpoint of each of the individual LlMd members. The location of each of these potential transcription promoters is plotted in Figure 4. Also included in the Figure are the endpoints of each of the five repetitive elements as well as the endpoints of four members of the family from the immunoglobin gene region, R3 and R5 (Gebhard et d., 1982), and C, and VKi6, (Wilson & Storh, 1983). Promoter-like sequences are not consist,ently found in a position in the consensus sequence that could account for the truncation of the Ll repeats analyzed. Sequences with some homology to the RNA polymerase II promoter are found 5’ to the endpoint of five of the elements (LlMd-1, LlMd-2, C,, LlMd-3 and R3; see Fig. 4). However. the best matches t’o both the T-A-A-T-A-A-A-A and C-C-A-A-T sequences that we can find upstream of the Ll repeats do not represent good matches to the canonical promoter. In addition, the distances between the endpoint of the element and the T-A-A-T-A-A-A-A sequence are (except in the case of LlMd-1) shorter than would be expected. There are no sequences with homology to t’he RNA polymerase II promoter in the 50 bp that precede the other LlMd elements. The sequences around the 5’ end of each of the elements has been searched for direclt repeats and palindromes. Preceding the 5’ endpoint of LlMd-2 in the consensus sequence, a short palindrome was found with the sequence 5’ Y-(G or . )-(“-A-A-A-G-Y-T-T-C-T-G-C-A 3’ (beginning at position 650 in Fig. 2). There were no such structures found at the 5’ t,runcated end of any of the other LlMd elements examined. Thus. they do not appear to be determinative in the 5’ truncation process. The divergence of the adenine-rich region at the 3’ end of the repeat elements from a pure adenine tract should reflect the time since the element was inserted. It is assumed that, since the elements appear to be dispersed via a polyadenylated RNA intermediate, the sequence at the adenine-rich region of a newly integrated element would be pure adenine. These sequences would then diverge with time so that the oldest element would present the most diverged adenine tract, and a recently inserted element would show the least divergence from pure adenine. By this criterion, the LlMd-4 element is the most recently inserted sequence. It has an almost pure adenine tract at the 3’ end of the element (Fig. 2). The recent insertion of LlMd-4 into the beta-globin locus is consistent with the observation

Foten*icl

promoters

(a) CCAAT...50

bp...TAATALAL...jO

bp...repea:

start

XAA:...52 GCAAT...43 CCATT...55 ACAAT...46 CCAAG...57

3, bp...ATGGTCACTT...LlMd-1 bp...SAA:AGkA... bp...TNATGAAA...17 bp...AGGCAAAAGT...LlMd-2 bp...TCAGAAAA...15 bp...TAGTACTACC...C, bp...CAAGATAT...14 bp...CTAAATACAT...LlMd-3 bp...CANTATGA...17 bp...GAGCTCTTGN...R3 (b)

FIG. 4. (a) The location of potential RNA polymerase II promoters in the consensus sequence for LlMd. Sequences that match T-B-A-T-A-L-A-L. where L equals A or T. with >75qb homology are marked by 1. The instances of T-A-A-T-A-L-A-L that are preceded by homology to CC-A-A-T ( > 80’5, homology and between 40 and 55 bp 5’ to the T-A-A-T-A-L-A-L) are marked by +Also shown are the 5’ endpoints of 9 members of theL1Md repetitive family. including LlMd-1, LlMd-2, LlMd-3, LlMd-4, LlMd-8 from the beta-globin gene region. C’,, and \Tr16, (Wilson & Storb, 1983), and R3 and R5 (Gebhard et al., 1982) from the immunoglobulin gene region. (b) Homology to 5 of the 1~1 elements analyzed are preceded by homology to (‘-(‘-AA-T and T-A-A-T-A-L-A-L. The alignment of the sequences is shown. The expected consensus sequence and the spacing is derived from Breathnach & (Ihambon (1981). Only LlMd-1 is preceded by homology to these RNA polymerase II signals at the appropriate distances. Homology to these sequences also is found 5’ to the endpoint of LlMd-2, C, (Wilson & Storb. 1983), LlMd-3. and R3 (Gebhard et al.. 1982). although the sequences are not in the expected locations. No homology to these sequences is found 6’ to the endpoints of LlMd-4, LlMd-8 or v r167 R5.

that this element has no allele on the [Hbb]” c h romosome found in the C57BL/lO mouse. That is, the Ll Md-4 element has been inserted into the [Hbb]” chromosome between these allelic regions. The t’wo since the last genetic exchange chromosomes share common PstI sites that bracket the Bh3 gene (Fig. 5). but the fragment containing the gene is a different size in the two chromosomes. Southern blot hybridization of a /?h3-specific probe to PstI digests of genomic DNA from the two mice shows a fragment that is 9.6 kb long in BALB/c and 7.5 kb long in C57BL/lO, a difference of 2.1 kb (the length of the LlMd-4 sequence). The time since the last genetic exchange in this region of the two chromosomes can be estimated by the comparison of available sequence for the two ph3 alleles (Hutchison et al., 1984). There are eight mismatches between the

Ll -ASSOCIATED

DISPERSAL

3

2

I

-

21.6 -

809

PROCESS

4

w

21.6 *

P Qi t 4b * 3.6 *

t 5

ii

4.8 * 3.6 *

v

J

[ Hbbl’ CHbbld

kid 0

2

4 I

6

6

'?

kb

FIG. 5. LlMd-4 is not found in the (Hbb]’ chromosome. Whole liver genomic DNA from BALB/c and C5i’BL/lO mice, digested with EcoRI and with PstI, were probed with a BhS-specific fragment. The results of the hybridization are shown. The lanes are: 1, BALB/c genomic DNA digested with EcoRI; 2, C57RL/lO genomic DNA cut with EcoRI; 3, BALR/c digested with PstI; and 4, C57BL/lO digested with PstI. The size markers are not shown, but their positions are indicated. The fragment detected in the EcoRI digests is the same size in both mice, about 4.8 kb. However, the P&I fragment is smaller in the C5781 genome (about 7.6 kb compared to 9.7 for the BALB/c fragment). The entire region for the RALB/c mouse has been cloned (Jahn et al., 1980; Voliva et al.. 1983). The restriction map for that mouse is shown. EcoRI sites are indicated by + , and PstI sites by &. The locations of (!lh3 and the probe used in the analysis are shown. Also shown is the interpretation of the results for the C57Bl/lO mouse, Insertion of LlMd-4 (which is about 2.1 kb long) can account for the size difference between the Pstl fragments in the 2 genomes.

two sequences in the 211 bases compared. Using the method of Jukes & Cantor (1969), this is 0.038 nucleotide changes per site. By using the rate given by Li et al. (1981), the last genetic exchange occurred about 4 million years ago (0.038 nucleotide changes/site/(2 x 4.6 x 10e9) nucleotide changes/site year). Therefore, LlMd-4 must have inserted into the [IMId chromosome since 4 million years ago.

4. Discussion Sequence analysis of five members of the Ll family found in the mouse betaglobin locus is consistent with the structure proposed by Fanning (1983) and by Voliva et al. (1983). The independently characterized sequences called the

XI0

(’

1:. \~()I,1

\..\

87’

.I/.

R-familv. th- arc’ all c*onstiturlnt parts of thr) largrr I,lMd structurfi. Thr prt’s~ce of short direct repeats bounding thta truncated members oft he 1,l family strongly suggests that it is t hc t runcaa,ted form of the repeat. that is dispersrd. Insertion of one I,1 clcmrnt. I,1 M&8. into a “defined sequence. I,lMti-2. allows us to deduce that it is the tar@ sequence> that is duplicated during the integration process. Because the conserv4 end of the rfkprat resembles t,he 3’ rnd of a polyadenylated KKA, we assume that the I,1 Md &mrnts disperse /Gcx an RNA intermediat~e. Thr~rc are stlvcxral steps in the dispersal process where length variation in the int~c~grated sequencc~ could be introduced. One possihilit? is that) there are a f’en rrpeat,s t’hat) have been trunc~atrd by deletion and then transcribed using promoters t)hat Hank the 6’ end of t)hr I,1 Md elrment, Transcription would then begin in sequences adjacent, but unrelated. to LlMd. proceed through the el~~mrrrt~, and terminat,r at the 3’ rnd of t’hr repeat sequence. \Vr would then expect to find subfamilies of the dispersed l,lMd rlemrnt,s with particular sequences at the 5’ end present only in members with that particular 5’ truncation point. However. since I11 Md homology to all members of the family begins immediately 3’ of t hr direct repeat. it is unlikely that transcript,ion t.hrough delrted Ll elements is thrb source of truncation. The possibility t,hat t’ranscriptions are initiated at) multiple RSA polymerase II the I,1 element was eliminated, since the defined promot,rbr sites without truncaatior points do not fall near &es resembling canonical promoter sequences. A third possibility is that) thr RKA molecules from which all the dispersed I,1 Md elements are generated are homogeneous in length. perhaps t,ranscribed from one or a few transcriptionally active LlMd elements. The lelrgth variation would be int,roduced during the proccbss lay which the RNA molecule is transcribed into a I>NA c’ol~y of t)hr sequence. It might be suspected. in such a case, that somrt hing particular ahout the sequence at the 5’ endpoint would account for the truncation. An enzyme with capabilities like or similar to t,hat of reverse transcriptase would ~JCJrequired for such a process. The enzyme, in r~itro, is wellknown for the propensity to proceed along the t,emplate for different dist’ances. thereby generating a population of cI)NAs of heterogeneous lengths. I,rngth variation generated by termination of “reverse transcriptase” in V&O a,long the RNA template might) be expected to be dur to stretches of secondary struc*tur(b. The consensus sequencr~ of the I,1 Md family from the glohin gene region has been examined 6’ to t’he truncation point of each of the repeats. Only in one example is there a strrt,c:h of obvious secondary st’ructure. Tn the sequence 5’ t)o the endpoint of 1,1Md-2, a palindrome may be formed l)y the sequencr Y-(&Cl-Acase is there significant secondary A-A-(:-Y-T-‘I’-C’-T-(:-(1-A. In no other homology found 5’ t,o the endpoint of the 1,lMd elements. However. it is possible between sequences near the that hairpins may form in t,he RNA template endpoint and a palindromic2 sequence located much further upstream. Large numbers of thr LIMd family are dispersed throughout the genome. There are about 100,000 copies of t’he 3’ conserved sequences (Gbhard rt al., 1982) and about. 1000 copies of the 5’ most sequences (M. t3. (jorner. unpublished results). Martin et al. (1984) have described an open reading frame in the LlMd sequence

I,I-ASSo(‘IATEI)

I)ISPERSkl,

F’KO(‘ESS TTTT-

HI1

w

FIG. 6. .-\ gap repair model. Presumably. chromosomes are subject to single-strand nicking such that sometimes 2 occur close together. We assume that if the chromosome comes apart at this point repair is usually effected by what would be essentially cohesive end joining and ligation. How~ever. we imagine that occasionally. for unspecified reasons. the broken ends of the chromosomes cannot come back together, which would create a physical gap between these ends. Ire propose that this physical gap could be closed by a multi-step process that utilizes an mRS.1 molecule to “plug”the gap. In thih model, one end of the gap is rendered sticky by filling in a staggered end with a polgmerase and thP addition of poly(T) if transcripts are to be captured or poly(A) if cDNA copies are to be captured. A transcript is shown bound to one end of the gap by its poly(A) and attached to the other end by weak hybridization. This structure should serve as a substrate for known repair mechanisms. Now the pap could be filled in with a DSL4 polymrrasr primed with the poIy(T) and the new strand ligated to the other end. The RX.4 could then be removed and the resulting single-strand gap closed by pol~meraw and ligasr. The model presumes then that Ll transcripts are abundant transcripts in germ lme cell!, that are undergoing physical gap repair.

that. evolves as if it encodes a prot’ein. Tn speculating on possible models for thr dispersal procc’ss, it is tempting to try to incorporate a role for this Ll protein that could account for the unusual structure of t*he repetitive family, give rise to t hex dispersal of the family, and provide a function upon which selection might a(+. The most obvious way to account for the struct’ure of the Ll family would be if thra Ll prottain encoded a reverse transcriptase that produced cDr\‘A copies of t)ransc*ripts that then could be inserted into the genome. Such a process might rven be specific for Ll transcripts (Martin et al.. 1984) such that’ selection would be based on the selfish propagation of the Ll family. A different type of mechanism that could involve the Ll protein in the dispersal f)rocess would be gaprepair using Ll sequences to “patch” broken chromosomes (Fig. 6). Tn t
SI:!

(’ 1: \.()I,1 v.4 fi7’ .-I 1,

rc~fwt1t.i~ sequences. it also provides a ration& f’or the presence of’ each mc~rnbcr That is. each repeat would represent a place where the vhrotno~om~~ had brrn broken and t>hrn rctpairecl by inserting a r~~l)etitive scquen~c itrto t hr gap. Such a repair provrss need not he exclusive for utilizing I,1 transcripts and for the presence of processed h r II CC’ would provide a pcntv-al explanation pseudogenes in the genome. Whatever the mechanism for dispersal, the presence of LlYd-4 in the [HUI]~ gene region but not in the [Hhb]” allele indicates that dispersal is a recent. if not current process. LIMti-4 inserted a maximum of 4 million years ago, but perhaps much more recent’ly, making it unlikely that all Ll,l/[d elements dispersed to t,heir current locations in the genome at one point in the distant past. Thus, dispersal appears to be a relatively important mechanism for the maint,enance of homology between LlMd family members. of’ the* f’amily.

We thank P. (‘ok. M. Comer and 8. Hardies for their help and discussion. C.F.V. was supported by a predoctoral training grant GM07092 from the 2r’IH to the Universiby of North Carolina. S.L.M. is a fellow of the *Jane Coffin Childs Memorial Fund for Medical Research. This research was supported by Public Health Service grants AI08998 and GM21313 from the NH.

REFEREICES Arnheim, IT. (1983). Tn Evolution of Genes and Proteins (Nei. M. & Koehn, R. K.. eds). pp. 38-61, Sinauer Associates, Inc.. Sunderland, Mass. Brrathnach, R. & Chambon, P. (1981). Annu. Rev. Bioehem. 50. 349-383. Brown, S. D. M. & Dover, G. (1980). .I. Mol. Biol. 150, 441-466. Brown, S. D. M. & Piechaczyk, M. (1983). .J. Mol. Biol. 165, 249-256. (‘alas, M. P. & Miller, J. H. (1980). Cell, 20. 579-595. Dagert’. M. & Ehrlich, S. D. (1979). (Gene,6, 23-28. Deininger, P. I,., ,Jolly, 1). ,J., Rubin. C. M., Friedmann, T. & Schmid, C. W. (1981). J. Mol. Bid. 151, 17-33. I)hruva. 1~. R.. Shenk. T. & Subramanian. K. N. (1980). Proc. Nat. Acad. Sk., 11.9.A. 77. 4514-4518. Di(:iovanni, I~., Haynes, S. R., Misra, R. & Jelinek. W. R. (1983). Proc. AVat. Acad. AS%., I’.S.A.

80, 6533-6537.

Dover. (i. (1982). Nature

(London),

299, 11 l-117.

Fanning, T. (:. (1982). Nucl. Acids Rrs. 10, 5003-5013. Fanning, T. G. (1983). Nucl. Acids Res. 11, 5073-5091. (ialas, D. ,J. & Chandler, M. (1981). Proc. Nut. Acad. Sci., U.S.A. 78, 4858-4862. (iebhard, W., Meitlinger, T., Hochtl. .J. & Zachau, H. G. (1982). J. Mol. Biol. 157, 453-471. Gronenborn. B. & Messing, .J. (1978). Nature (London), 272, 375-377. Gutai, M. W. & Nathans. D. (1978). .J. MoZ. Biol. 126, 275-288. (:wo-Sbu Lee, M., Lewis, S. A., Wilde, C. D. & Cowan, N. J. (1983). Cell, 33, 477-487. Hardies, S. C., Edgell, M. H. & Hutchison, (1. A. III (1984). J. Biol. Chem. 259, 3748-3756. Harsheg, R. M. & Bukhari, A. I. (1981). Proc. Nut. Acad. Sci., U.S.A. 78, 1090-1094. Haynes. S. R., Toomey, P. P., Leinwand. L. & Jelinek. W. R. (1981). Mol. Cell Biol. 1, Hood. I,., Hunkapiller, T. & Kraig, A. (1983). In Modern Cell Biology (McIntosh, J. R., td.), vol. 2, pp. 305-328, Alan R. Liss, ?u’ew York. Hutchison, C. A. III, Hardies, S. C., Padgett, R. W., Weaver, S. & Edgell, M. H. (1984). ,I. Biol. Chem., in the press. ,Jagaderswaran, I’.. Forget. B. G. & Weissman, S. M. (1981). Cell, 26, 141-142.

I,1 -ASSO(‘IATED

DIXPERSAI,

PKO(:ESS

XI3

.Jahn. (‘. I,.. Hutchison, C. A. ITT. Phillips. S., Haigwood, X. L., Voliva. C. F. & Edgell. M. H. (1980). (lell, 21, 159-168. .Jrlinrk. W. R. (1978). Proc. Nut. Acad. Sci., U.S.A. 75, 2679-2683. .lelintbk. W. R. & Schmid, C. W. (1982). A nnu. Rev. Riochem. 51. 813-844. .lukrs. T. H. & C’antor, C. R. (1969). In Mammalian Protein kfetaholism (Munro, H. X;.. pd.). vol. 3. pp. 21-132, Academic Press. New York. KraycJv, A. S.. Kramerov. D. A.. Skyryabin, K. G., Kyskov. A. I’.. Baev. A. A. & (:eorgiev. (:. I’. (1980). N~cl. Acids RPN. 8, 1201 -1215. I,urdtxrs, K. K. 6t. Paterson, B. M. (1982). Nuel. Acids Res. 10. 7715.-7772. T,i. 1Y.-H.. (:ojobori, T. & Nei, M. (1981). h’ature (London), 292. 237-239. Martin. S. I,., Voliva, C. P., Burton. F. H.. Edgell. M. H. 8r Hutchison, C. A. TTI (1984). Pror. Nut. ilcad. Sci., I:.S.A. 81. Y308-2311. Maxam. A. M. K: Gilbert. W. (1980). Methods Enzymol. 65. 499.-560. Mrssing. .I. & Virira. ,J. (1982). Grne, 19, 269.-276. Mrunirr-Rotival. M., Soriano. I’.. Cuny. G.. Strauss. F. 8: Bernardi, G. (1982). Proc. X&. ,4md. Ski., f:.S.A. 79, 3555359. Miller. .J. M. (1972). In Experiments in Molecular DPnetics, pp. 431-433, Cold Spring Harbor Laboratory, Cold Spring Harbor, Sew York. Phillips. S. J.. Hardies, S. C.. ,Jahn. (1. L.. Edgell. M. H. & Hutchison. C. A. III (1984). .1. Niol. (‘hwvr. 259. 7947-7954. Sanger. F.. (:oulson. A. R., BarrelI. K. U.. Smith. A. J. H. 8: Roe. K. A. (1980). .I. Mol. Biol. 143. 16lm 178. Shatit-Zagardo. I<.. Brown. F. J,.. Zavodny. P. ,J. & Maio. .J. ,I. (1983). *Vatwe (London). 304, P77m2x0. Sharp. I’. A. (1!$83). Nature (London), 301. 471-473. Sinpctr. M. F. (1982). Cell, 28, 433-434. Singer. M. F.. Thayer, R. E.. Grimaldi. G.. Lerman? $1. & Fanning. T. G. (1983). Nucl. Acids Hus. 11) 5739-5745. Soriano. I’.. Meunier-Rotival, M. & Bernardi, C. (1983). Proc. iVat. Acad S-i., I:.S.rl. 80. 1816-1820. Sturnph. \2’. F:., Kristo. P.. Tsai. M-J. Sr O’Malley. B. LV. (1981). ‘Vucl. Acids Kes. 9, 5383-5397. Szostak. .J. W.. Orr-Weaver. T. I,.. Rothstein, R. ,J. &‘zStahl, F. W. (1983). Cell, 33. 25-35. Thayrr. R. E. 1G.Singer, M. F. (1983). ;Uol. C’ell. Biol. 3, 967-973. Van Arstlell. S. W.. Denison, R. A., Bernstein, T,. B.. Weiner. A. M.. Manser. T. & (:rstrland, R. F. (1981). Cell, 26, 11-17. Vanin. E. F., (:oldherg, G. I.. Tucker. P. W. 8 Smithies. 0. (1980) Xature (London), 286. “r’+-~%%& Virira. .J. & Messing, J. (1982). Urne, 19, 259-268. \‘oliva. (‘. F., .Jahn, C. L., Commer. M. B., Edgell, M. H. 8r Hutchison, C. A. III (1983). SW/. Acids Re.7. 11. 8847-8859. \Vhitr. (‘. T.. Hardies. S. C., Hutrhison, (‘. A. TTI & Edgell. M. H. (1984). ~Vucl. Acids RPS. 12. 751 --766. W’ilson. R. & Storh, U. (1983). N~cl. Acids &s. 11. 1803-1816.

Edited by 8. Rrenner