J. Mol.
Biol.
(1977)
114: 93-117
Rapid Gel Sequencing of RNA by Primed Synthesis with Reverse Transcriptase G. G. BROWXLEE Medical
AND E. M. ('ARTWRIUH~
Research Council Laboratory of Molecular Biology Hills Road. Cambridge CB2 2QH. E~~a~land ( ~QC~ivQd ‘i FQ~RULP-,I~l!j?i)
‘I’110 ktwwl 75.rcsiduc scqucncc adjawrlt, to poly(A) it1 ctlicken ovalbumin tllessrnger ItNA was used as a model sequrnw to dcvclop a generally applicable rapid gel sequencing method for RNA based OII t.bc “plt~s and minus” method of Sanger & Coulson (1975). Tlw mKNA was copied in t,trv prrsence of a specific “primer”, (I(pT,,-G-C), and rewrse transcriptase t,o give :32P-labelled comI)lomcntary DNA. Aftcr reannealinp mKNA, ~nlnus illcrlbations (lacking ( n? of t Ctc: follr df?oxyrlllcleositle t,riphospbatw) LVCI’C perfor~nrd britll reverse trailsc-ript,ase. For plus wact,ions tile Klenow fragment of DNA polymerase I (Escherichia coli) was used in t)llv t)rcsencc of Mn2+ atltl a sirlglc dw)xynncleoside tripllospbatc. The sequence could be rrad front rcsitlues 20 to l!)l (counting from tllcx drolly) from the pattern of bands on a11 antoradiograpb of a 1 Zqi, acrylamide gel. The tww region of seqwnce (76 to 191) ~3s corlsid(,rrtl t,c) be greater than !G(j, accurate and probably cwtirely correct to rcsiduc~ 130 but beyond that, rr3itlues 131 to 191, tber? arc’ a f&v lu~cc,I,tuilltics tlrie to limitations of tlie metbotl. ‘I’I te scc~uenc~~ does rlot overlap v,itb tllc* c~ar~)ox~l-tc~r~~lirlal rcsgion of ovalbumin arId. f~wt~lwrmorc. tllcb 3’ rlolr-wading rcyiotr is lilic~ly to I)(, ilt least 250 residllrs I~~fl~.
1. Introduction Squwcing of nucleic ncids has changed grcatlv . since hhci first sequence determination of a naturally occurring ESA, that of the alanine trmsfw Rh’A of yeast (Hollq rt ~1.. 1965). The most) importaut advanw was t hc idw of squtmcing uniforml!. ““I’-l;~h<~lled nucleic acids @anger et al.. 1965). Thi s greatly simplified t*he problem of t hc t’ractiouation and detection of the degradation products. Iwcause high-resolution mic:ro-nnalytical methods could he used. Radioac+iw seyutw~ing was first. a,pplied to t h(l in, viva 32P-lahelled 5 S REA (Brownlw ~1 al.. 1968) and then many other mol~cwles, especially the tjransfw RN&. NTI‘V s(‘( ~wnwd (Barn41 & Clark, 1974). Y’h(~ potential of this approach, howwcr, is twstj ilhArat,cd I)y the recent completion of t ht. srqucnw of 3569 residuw of the I)actt:riophagc MS2 RNA (Fiers ef al.. 1976). Hilkkr it nb. (1969) introduwd copying nwtho:ls. that is the synthesis of 32PIiLlj(~ll(~d products enzymatically irr Gtro it1 t)heir stquwc~~ determination of parts ot the bacteriophage Qfi RKA. They did exploit somc~ oft hc advwutagcs of this approac4~ (SW Ibownlee. 1972, for discussion). but in the mail1 the same methodology was uwd as had haen developed for RKVA lal~ellcd ir/. uivo. ‘lb full pottmtial of the copping approach for sryuencing n’as onl>T realiwd in the sc~alletl “plus and minus” method 4X IIXd. This is tht moht dc~cloped try Sanger & Coulscm (1975) for scyuv~zing significant tlr~vt~lopmcut sinw tlrca introduction of r;~clionc+ive queucinp. The method
91
(:.
(:.
I~KOCVNI,El~:
ASI)
E:. II.
(‘:lK’l’WKI(:H’I‘
involves the synthesis in vitro of’ (:o~n~)lementnr~ ""P-lahelled ["'O'hlCtS to givts a specific pattern of Ibands on a highly resolving acrylamidc gel. ‘l’hc 3’ terminal residue (in the plus system) or the subsequent 3’ neart,st-nc~igllI,our rcsiduc~ (in the minus system) is defined from the method of synthesis as (Lither tic. dA. dG or dl’. ThtL method is rapid and a sequence of 50 to 100 residues can be det,ermined simply by inspect’ion of an autoradiograph in an experiment lasting one to two days. Thrl approach is thus completely different from the degrxdativc m&hods previously used for sequencing nucleic acids (and prot’eins). where possibly t,hc most timrbconsuming step of the sequence determination is the isolation of overlapping OhgOnucleotides so as to allow a unique sequence to he established. Here there is no overlap problem. The only disadvantage in the method arises from the f;Lct that there are sometimes difficulties in the interpretat,ion of tho pattern of hands on the acr.ylamide gels. so that occasional (brrors in sequence may hc made. This ma,v be overcome if some form of cross-checking of the sequence is carried out’, and Sanger and his colleagues have successfully used the method to establish. with only a fe\\ uncertainties, the complete sequence of the 5375 residues of +X DNA (Sanger rt ccl.. 1977). An alternative rapid proccdurc for sequencing DNA was rcccntly developed by Maxam & Gilbert) (1977) which has many of t,he advantages of the Sanger & Coulson method. Because it relies on a different) principle, that of specific chemical degradation of 32P end-labelled DNA. for generat’ing a pattern of bands on a gel. it should prove to be a valuable additional method. In our previous sequence studies of various specific messenger RNA molecules in higher organisms we have used in, Ctro copying methods because of difficulties in obtaining sufficient radioactive ,yield of 32P-labelled mRNA in, ~ivo. Either DI1’B polymerase I of Escherichia coli (Proudfoot & Brownlee. 1974) or reverse transct,iptase (from avian myeloblastosis virus) may be used to copy mRNA if a suitable primer such as d(pT,,) or d(pT,,-G-C,,). which anneals with the poly(A) at the 3’ end of the mRKA, is used. 32P-labelled complementary DNA was synthesised and its sequence determined by the direct, DNA sequencing m&hods using degradation n+th endonuclease IV (Proudfoot, 1976: Cheng cf al.. 1976). The method gave unique sequences of the order of 70 residues in length. but difficulties were experienced in obtaining some overlapping oligonucleotides. so that alternative methods \+‘ertdesirable. The purpose of the present study was to develop a rapid gel sequencing method fol RNA, analogous to that of Sanger & Coulson (1975) for DNA. Wc chose to study the sequence at the 3’ end of ovalbumin mRNA as the first 75 residues were known from t’he earlier sequence methodology (Cheng et al., 1976). As may be seen (see Materials and Methods, and Results). a protocol has been established which follows the same principles as Sanger & Coulson (1975) Imt differs markedly in detail. Like that method it is fast. the entire experiment can be performed in one to two days. and the sequence is “read off” from the pattern of bands on an autoradiograph of an acrylamide gel. The validity of the new’ method was verified as the sequence of residues 20 to 75 from the poly(A) could be read off from the gels. Moreover, the method allowed a further 116 residues to be sequenced. Because of occasional artefacts and other problems (set Results) the new sequence cannot be considered absolut,ely reliable but the accuracy is likely to be greater than 95Sa (and probably, in fa.ct. 100°/,) up to residue 130, but less accurat.c~ (ibo:lt 90 to 95O{,) from residue 131 to 191 (see Fig. 11).
RAPID
GEL
REQUEN(‘lX’(:
2. Materials
OF
RNA
!I:?
and Methods
(a) Materials Ovalbumin mRNA (Haines et al., 1974; Cheng et al., 1976) was a generous gift from Drs M. T. Doe1 and N. H. Carey of Searle Research Laboratories, High Wycombe, Bucks. England. d(pTlO-G-C,,) was kindly provided by Drs S. Gillam and M. Smith, Department of Bioch~~mistry, University of British Columbia. Vancollver, (Ianada. Reverse transsupplied by Dr tJ. m’. Beard. rript)ase (from avian myeloblastosis virus) was gel)erously ‘I’llr~ “Klenow” subfragment of DNA polymprasc 1 of E. co/i was purchased from Boell“2P-labr~11rd deoxynucleoside r~ng,rrr Chemical Corporation, Mannheim, W. (:rrnrat)y. triphosphates (about 100 Ci/mmol) were purchased rltlLer frown thci I<,adiochemical Centrr. dmcrsllam, Bucks., England, or from NellEngland NII&:LI. Corporat,ion, U.S.A. Utrlahrllcd dooxyn’“cleosid~, triphosphatos were from Bo~~trringr~r (‘tlemical Corporatiork. Scl)tladrx (: 100 was from Ptrarnlacia, Uppsala, S\vt~icxrr. (h)
‘I’llis (i)
irl\.olves
I’rernirtcure
7 tlistirlct. of rnl<~VA
hilt and
The
sepcencincq
~t~raiglitfor~2.arTI.
pm-cdlrre
stc’l)s.
pirner
Jrr ii typical case. 2 ~1 of highly purified ovalblunirl ]nKNA (1 Ing/ml in distilled water). 1 /d of 0.5 mg d(pT,,-G-(&)/ml and 7 ~1 distilled water were mixed and stored at’ - 20°C’. 1 ;rl of this premixt’ure was enough for cacll preparat,ion of complementary DN,4, see (ti) hrlow. so that t,his premixtllre was enouglt for IO copying rcact,ions. (ii)
Preparation
of 321’-labelled
cI)SA
.4 total of 2 to 5 pLc(i of [ a-32P]dATI’ were evaporated t,o dryucsss in a small ailiconised test tube (-5 cm x 1 cm id.) in a desiccator. (The silicone treatment involved rinsing with “Krprlcotr”, B.D.H., baking in an oven overnight nrld rinsing with distilled water.) To this \vas added 10 ~1 of the minna A mixture (Table I), 1 ~1 of the premixture ((i) above) and 0.5 ~1 of rc’verse transcriptase (concn = 8000 units/ml). The contents of the tube were mixed by rolling the liquid around the bottom of the t,ube and then transferred b\ capillary to a IO-p1 Micropet Yankee disposable capillary tube and incubated for 30 min, horizontally, witjhout sealing the ends of the capillary. in a. rack in a 37°C oven. The contents of the capillary were hlonn into phenol/Na, EDTA (a mixture of 20 ~1 of watersaturated phenol, 5 ~10.1 M-Na, EDTA and 10 ~1 distilled water in a 0.5-ml Reactivial (Pierce, Rockford, Ill., I7.S.A.)) at room temperature. After \ort,exing for 30 s and centrifugation for 3 rnin at 2000 g, the upper aqueous layer was recovered. The lower phenol layer was vort,exed with 20 ~1 water, rocentrifuged and the second upper aqueous layer combined with the first aqueous layer. Phenol was removed by extraction with ct,her as follows. The combined aqueous layers (about 50 ~1) were mixed using gentle vortexing with 2 changes of dry ether (0.5 to 1.0 ml). The final traces of ether were removed by evaporation using a gentle stream of compressed air. (It is necessary sometimes to add a further 20 ~1 of distilled water t,o the ether extraction if the aqueous sample becomes too small t,o see.) From 1 to 2 ~1 of the sample were sometimes analysed by acrylamide gzl electrophoresis at t,his stage (see (vii) below) to assess whether the incorporation was satisfactory and whether the correct size range of products had beeu synt,hcsized. The conditions of labelling of the initial transcript can be varied to produce differences in lengths of transcript. Thus when dCTP was limit’ed to a final concentration of 50 nM in t,he reaction, short transcripts as seen in Fig. 4 bvero synthesized. Sometimes shorter inrubations (5 min at 37°C) were performed as ill Fig. 5(a) to attempt to obtain higher yields of the smaller products. In other experiments (Fig. 5(b) and (c)) attempts were lnade to obtain as even a spect,rum of products as possible by taking portions of the initial reactlon after 5, 10, 15, 20 and 30 min. However, in general the time of incubation, i.e. whether j min or 30 min, had relatively little effect on tho size distribution of the products, alt~hough the radioactive yield was less in the shorter times. In the experiment of Tables 4 wcrc synthesized (one under standard and 7 (line (iv)) 2 independent samples of cDNAt t .Abbrpviation
used:
cl)X’A,
complementary
DNA.
-SIllcone rubber
Column, Sephadex GIOO in Falcon I ml pipette c
-
tubmg
Reserwr
buffer
22~cm
-Test
tube
filled with 1 x buffer ( 100 x buffer is 0.2 fil-‘l’ris. HCI (pH 84)), 0.005 .\I-Ne, EDTA) Inakitrg sure that no air bubbles were present. It IV~LS often h& t,o do this by Iroldilq the pipcttcl almost horizontally. A deaerated suspellsiotl of Scphadex Cl00 in 1 x huRrr was introduced using a Pastcur pipette and pack4 to within 3 eni of the top, avoiding air buhhlt~s. The column was washed, wit11 the reservoir ICWI set 10 to 20 cm abovc~ thr t.op of tllc. column, for at least 1 11. (Columns wei- rtrlitinr~ly st.ort:tl at 1’00n~ temp. for periods of alp to 2 weeks, but were always rewashed wit11 frcsllly made buffcAr for I II ,just heforr rlstx.) Tl~r% of ttlc CU~III~III cDNA was loaded usirlg a drawn-out 50.~1 oapillq, tll b(:. wit11 the bottom clamped off. After washing in the sample uitll 2 port,ions (50 ~1 oacll) of I x b~lff~r. tllc. column was filled and connected, avoiding air bubbles, with the rcLsc,rvoir now plwcrtl about 15 cm above the column exit (Fig. I ). This avoids too fast flow-rates. \vhich cm~sc loss of resohition. The collunn effluent. was monit’ored using a porta.ble Ciciper counter at~d 2-drop fractions (about 0.1 ml/fraction) \v(‘r(l collected manually into silicorrised test t,llt)es. The radioactive profile read directly from tllo scale on ttle Inoltitor of t,lre Geiger courlter is shown in Fig, 2. Thcl breakthrougll, high molecular weigllt fraction, was pooled as ~IIOM.II and freeze-dried. Tlrc radioactive yield was l~sunlly t)catwc?crr 0.01 and 0. I r(‘i.
The cDNA was dissolved in 20 ~1 of tliatillr4 watc*r and t ~1 of 1 mg ovalbllmin ml added (a S-fold excess over t.hat uxecl in stcbp (ii), above). AlternativeI>-, weight of a less plu-ifiotl mRNA \vas rascal. Tirlls I 1~1 of a 3 mg/ml preparation
mENA/ a gr’tAat(.l. of o\.:ll.
RAPIJ>
(:EI,
SEQUESC’ING
OF
RN.4
450,
[a -32P]dATP
‘i
I 24
6
8
IO
Frocfon
14
(opprox.
16
16
20
0. I ml)
c’, 10 ~1 of minus A, 10 ~1 of each. in siliconised test tubes. was added 1 PLI (from 0.5 to 2 1*1 tLave experiments) of the annealed Ttrr sohrtion was mixed and transferred t.o a cljN.4, and 0.5 1~1 of rcversc transcriptasc. IO-p1 capillary tube (unscaled) arld incubated for 30 Inin at 37°C: in an oven. The contents were tt1t.n mixed wit’h 10 ~1 of formamideldye mixture (a mixture of 100 ml deionised formamide (SW (vii) below), 20 1111 0.1 iv-Na, EDTA and 0.02% bromptlenol blue, 0.020;, xylt:llt: cyallol FF) and boiled fol 2 to 3 min. tile sample \-olume reducing to about, IO ~1 by evaporation of water. After quick cooling in ice-water at O”C, ttle samples were loaded irntnediately usirrg drawn-out 20.~1 capillary tubes onto the 12yb acrylamide gel in 8 >Iit1 IO ~1 of formamide/dye mixtlu-e was urea. A control sample (I ~1 of annealed cDNA) rllso t)oilvtl ant1 loaded olrt,o a 5tll slot, on t,hc acrylnnridr g’1. 1<‘01rr
rninrts
sepnrat,v
12
(Z and
t 4 x buffer dithiothmitol.
reactions
w-err>
IO pt of rninrrs
is a
mixtuw
seat alp using
‘1’ rnixtnws
of
WZ! M-‘h’is.H(‘l
10 pL1 of lnirllls (set’ ‘I’ablc I). To heen IIS& in (liffrrrllt
(pH
X.0).
0.2 M-KU,
042
M-M@,
and
0.04 M-
(vi)
The plus
reaction
Eight reactions of each of the four cDNA and 1 ~1 of in 10.~1 capillary of the formamide/dye
were set up using 10 ~1 of eacll of the four 10 @ plus mixtures and IO ~1 200 @I plus mixtures (Table 2). After the addition of I ~1 of the annealed the Klenow enzyme (concn -= 1.25 unit,s/pl) tllo solutiort wan incuhatud tubes for 10 min at 37°C. Tl~c reaction was stopped by adding 10 /LI u as doscribed in srct,iotI (\-). above. mixture and proceedin, TABLE
Composition
1 x buffer1 dcTP (1 rnM) dATP (1 rnx) dGTP (1 m&r) dTTP (1 m>r)
2
of 10 p11 plus mixtures~
1000
1001 t
1000
1000
IO
IO 10 ~~
.~ 10
t The 200 PV plus mixtures (see text,) were prepared by modifying this recipe and using PO 1~1 of 10 mM-deoxynu&osidn t,riphosphates instead of IO ~1 of t,hr 1 mnz-d~oxynucleo;ide tripho
Acrylamide
gel electroph~wesis
in 8 Aw-urea
Acrylamide slab gels (12%; 40 cm x 40 cm x 0.15 cm) were prepared in hatches of 3 at a time. Gels not immediately required but needed witllin 1 to 2 days were covered with Saranwrap and left, at’ room temperature. For longer storage, gels were irnmersed in 8 M-urea in 1 x TBE buffer (see below). Gel composition: 144 g urea, 36 g acrylamidc ,(B.D.H., for electrophoresis) and 1.5 g metbylene bisacrylamide (B.D.H.) were dissolved in 250 ml of distilled water, warming slightly. About 3 g Amberlite monobed MB- 1 (B.D.H .) ion-exchange resin was added and the solution stirred for 30 min at room temperatllw. After filtering off the resin through a sinterod disc into a l-1 Biichner flask, 30 ml of 10, TBE buffer (108 g Tris, 55 g boric acid and 9.5 g Na, EDTA in 1 I distilled water) was adtlctl. The volume was made up to 300 ml with distilled water and the solution deaerated on 2) water pump, then 1 ml of freshly made lOy/, (w/v) ammonium persulphatc and 0. I rn1 of tetramethyleth,ylenediamine were ad&d. The solution was poured into the speciall!. prepared glass plates for tha gel electrophorcsis apparatus (from Kavclll Scientific, lAttl.. Haverhill, Suffolk, England). The glass was tapped to aid the ascent of trapped air bubblr~< and the slot former inserted, giving 10 cm i 1.3 cm or 12 cm Y 1.1 cm slots (dedetlding on the dimensions of the slot former), taking care to avoid trapping air bubbles. Finall).. the former was clamped in position with 2 centrally placed “Bulldog” clips. The reservoir solutions for electrophoresis were 1 x TBE buffer. The a,pproximatr conditions for t.tllk electrophoresis are shown in Table 3. Formamide gels (90%) were mado as follows: 18.0 g acrylamide and 0.75 g methylrrlc~ bisacrylamide were dissolved in about 100 ml of deionised formamide (made by stirring 99% formamide for 30 min with 1.5 g Amberlite MB-l); 15 ml of 10 X, TBE buffer were added and the solution was made to 150 ml with more deionised formamide. The solution was filtered through Whatman no. 540 filter paper to remove some undissolved salt (from tho
Approximate
conditions Cottdit
for
acrylam~ide
gel electrophoresis
iorlst
3. Results (:L) Prirbciplv
of fhP nwfhod
bYgure 3 sho\z3 a schematic: representation of tlrcs protocol developed. After mixing ovalhumin mENA and d(pT, ,,-G-C&). a ~2P-lulwlled cDiYA wits synthesized with ~www transcriptase. Under t,he conditions of labelling used (see Materials and Methods). synbhesis is effectively limited hy the Iow concentration (2 to 5 pM) of the Ix-“~P]~ATP in the reaction. A wide spectrum of products from 20 to 200 residues or more long accumulates mostly (but not complet~ely) before an A residue. After ~~xtraction with phenol to renlovt’ protcin. and gal filtration on Bephadex GlOO to rtwow f~xwss tleoxynucltrosid~. tr iphosphatw and salt, the purified cDN.4 was Ovalburm
ytct. 3. l’rinriplr see tt.xt.
of the gel ~eqrwmring
mRNA
method
using
the minus
anti plus
reactions.
Yor
drtails
IO0
c:.
t:.
I~l~o\~~NI,I~:~:
.\SIJ
1:.
.\I.
~‘.\I:‘l‘\1’l-:lf~li’l
with ~rcdl mltS.-1 (11~~~fw~c1~~~~ hfwusf~ tlkf, tfwplkrtf~ IIIRS.\ drallfl \\;Idqyadod t)v t Iw I~YT~~~ transcviptasc~. SW t)f~Ion). I*‘orrr fsfiu;tI Ivwtionh 01’ tltf, cI>NA: RKA hybrid \vvI’~’ t’hcn c~xtc~rltlf~cl in it ~/i//c/.v rc~ac+ion. ‘I’lrc~c~ :IIY rcLillf,rl tJittif)rt.. nith revcrsc trallscriptase itIlt thrc,cb (of tllct fi)ur,) ~~~~l;~t~f~IIfd tlt~c~x~~rrllc~l(~c)~iflf~ tripllr,+ phatrs. The four minus (-C. --A. --(i iLlId -‘I’) r’f’i\f’tiOrls illltl il ~*O~lt~~Ol\\-t’l’f’ f’V;l(,tionatcd in adjacent slot,s of a lay,; ~I~I~~~kwlidf~. 8 1\1-lll‘w gd (I?:L~l,qf’I’ & (~OdSCJll. 1 !ji!‘i) and an autoradiograph \vas prcparcd. 13ccaus~~ thcl firlill rcac+ion is f*;tr~Gtl out in t II(. abscncc of one triphosphate. the varving Ic*ngths of c:I)N;I ~~r~dnc:cdin tllfb ilJiti;\l Dranscript repair to such a Icngth until they wquirv tJhca missing triphosphatv. Syrlthesis t’hen stops, causing t,hc accumulation of R hrd OII thr gd. ‘I’hr, position of tJhk band and others in t,hc adjacent slots rc~flrct~s thf, st’qur~ncc~. ‘1’11(~ g(sI ih “I.(‘a(l” t)>. recording t,hc most prominrmt, l)ands (consitlcring ~a& of t,lrca four minus r’cbac.1ions) corresponding to any one: sizcl length. In thrb ;r,lu.q reaction. filrt~lif~r pohom of’ tlrfh cDNA : RNA hybrid \verc incubated ~4it h csach of t hc four d(,oxvnll~lf.osicif. t’ripho+ phat)cs singly and t*hc Kleno\r~ frapmc~nt of DXA pol>mc:rast~ I under condition> where t,he 3’ exonuclcasc is most~ acti\-cs, ~JH !I.2 (and in t’tlcb ~~r~wnw of’ Mn2 +). Under these conditions t*hr 3’ exonucleas(~ a($ivit,y of DNA polyuwr;~sc~I dcgmtkcl tJrc* cDNA until the residue corresponding to t,hcb singk tl~:fJx~riuclefJsidf* t~~iptlfHt~tlilAf~ wa,s 3’ terminal. The cxonuctease acti&>, MYIS them blocked or significantly inhibitcbtl. causing 811 accumulation of ii profluct and ;I corrfymiding I~mitl 011 t,lw gfd. arlnfded
The scqucncing prot)ocol is crit’ically dependent on the annealing of fresh mRNA to the initial 32P-labetled transcript. Figure 4 sho\\~ an cxperimcmt \rhcre a minus reaction was carried out, using tht: standard probocol (see Materials and Methods) compared with a parallel minus reaction (on the samr: initial transcript) in which t,hcb annealing had been omitted. The pat)tcrn of bands is clearly different in tho foul minus reactions (-C, -A. -G and -T) dcbrived from the annealed rcact,ion as COIU pared to the non-annealed slots (-c. -a. -g and --t), where t,hr band pattern is similar although not identical in all four minus reactions. The sequoncc: can 1)~:wad off from the -C, -A. -G and -T slots and is rccordcd in Table 4, line (ii). and compared with the known cDNA seyumce in linr: (i). The sequence can Ir~ededuced from residues 26 to 59 on this part’icular grl, although thcrc are several places w,hcrcb ambiguities arise because of bands occurring in mor(’ than OIWslot. These ambiguities were investigated in some detail (see Iwlo~~~). lout despitr t)his it is clear that thcx gel is interpretable. We conclude that to obtain “reada blc” gc,ls it is nrc~~ssary to minfwl fresh mRNA to t’he initially synthesized cDNA. Prc~sumably the mRNA is da~magetl lay RNAase act)ivity, possibly the intrinsic RNAasc H act,ivit,v of rcvtbrs(i transoriptasc~ (MGtting et al., 1971). (c)
The
m itbux
reucfiott.
The minus reaction was investigated in detail using thr: standard reaction condit,ions or slight variations of this (see Materials and Methods. section (b) (ii)) to obtain a wide spectrum of products of different length. The results arc shown in Figure 5(a) and (b) and in Table 4, lines (iii) and (v). The Table is constructed by noting, for each product length, the most prominent band in the four minus reactions. Where ambiguities arise because bands appear in more than one minus reaction, the alternatives are recorded by a capital letter (C, A, G or T) if in about equal yields, or by a
F&g.
B.g.
(vi)
(“ii)
S(b),
B(b).
minus
minus
(80~,&,
formsmide)
(eotinomyoin)
c ~~AAAGGAACAAAACAGCACATTCACAAGAC--AATTTCTTTGG()T t GACC-.ATTGCATATGG-TACA
m
c
~,\pIl., -c
-A
GEL -G
SEQI-EXC’ISC; -1
0
OF -C
KS.1 -0
101 -9
-I
59 54
47
41
35
26
FIG. 4. Acrylamide gel (12!,6) c>lertrophoresis of minus wactions of annealed (-C. -A, --G and - T) and non-annealed ( -c. _ a, --g and -t) el)NA using short run normal conditions (Table 3). “0” is a control, whew no minus reaction ww carried out. The numbers refer to the positions of residues read off from t.hr goI, see Table 4, line (ii). Only a part of the grl is sho\vn and the “true” origin is abow the t,op of the Figure.
lower case letter (c, a, g or t) if in lower yield. Minor bands defined as those present, in less than 25q,b (estimated by visual inspection of the autoradiograph) of the major one were ignored. Interpretation of the gel patterns was initially quite difficult, especially where hands were weak in intensity. Also, as t,he resolution of the individual nucleotide
101
-0
-C
-A
(0)
-G
-T
-0
; I
I I
-C
-A
(b)
-0
-T
-C
; I
I I
-A
-G
(cl
-T
i, 1 I
Fra. 5. Acrylamide gel electrophoresis of standard minus reactions: (a) is a longer transcript than in Fig. 4, allowing the sequence to be read from residues 23 to 111 (see Tables 4 and 7, line (iii)); (b) is another gel giving the sequence of residues 20 to 135 (see Tables 4 and 7, line (v)); (c) is a variation of the minus reaction using the Klenow fragment of DNA polymerase 1. Numbers show the positions of various residues in the sequence. “0” is a control (no minus reaction) and “Kl” refers to the minus reaction performed with the Klenow fragment. The conditions of electrophoresis were intermediate between the “short,” and “intermediate” COTIditions of Table 3 at normal temperatures. Only a part of the gel is shown.
RaPID
GEL
SEQUENCIXG
OF RN4
103
lengths decreased with increasing chain length, it was progressively more difficult t’o decide if bands in the four minus reactions corresponded either to the same or differentsized products. Nevertheless, the resolution is good enough to distinguish the sequence corresponding to the known poly(A)-adjacent sequence (in line (i) of Table 4). By comparing this with the individual results in lines (ii) to (v) (note that the raw data for line (iv) are not shown) it can be seen that there are six problem areas where, if the sequences were unknown, ambiguities might arise. These areas are noted in Table 5 with various experiments (see below) which were successfull? used to correct the ambiguities on the gel. TABLE 5 Ambiguities l’rohlern
in the m,inus method
Areas (see Table 4)
Solution
Residues 38 to 42
Actinomycin D, cw room t,emperature, Klenow minus
(iI)
Compression
Run high amperage gels
(iii)
Residues 56 and 68 (t,hird member of C-C-C and G-G-G run)
Plus syst,em
(ix.)
Residue 34
Kleuow
(v)
Residue 60
Partly
(i)
(68)
(d) Correction of ambiguities
or
or 909; formamidrs
minus solved by Klenow
minus
in th,P minus reaction
(i) RP.9idue.s 35 to 12 One of the causes of the difficulties in this region was related to the presence of multiple wrong bands in the minus G slot (see Table 4). coupled with a failure of thr synthesis to stop at positions 38 and 40. Although G was observed at position 38, it was always very faint. In order to test whether a misincorporation was causing the difficulty. perhaps by a “loop-back” phenomenon. t’he effect of actinomycin D was tested on the minus G reaction. as it is known to inhibit t’he formation of doublrstranded DNA (Ruprecht et al., 1973). Figure 6(a) (slot labelled “-G,Act”) shows the effect of adding actinomycin D at a final concentrat’ion of 50 pg/ml on a minus G reaction compared with a standard reaction in duplicate (labelled -G). The pattern of bands is considerably altered and correct, bands now appear at positions 38 and 40. A band also appears at, position 39, however. showing that the difficulty was not complet~~ly solved. Interestingly, a correct pattern could also bc achieved even in the absence of’ actinomycin if the reaction was carried out at room t,emperature (slot labelled -G. R.T). Another solution is the use of the Klenow fragment of DNA polymerase instead of reverse transcriptase (see below). The difficulty may thus be caused by loop-back such as is shown in Figure 5. The sequence C-A-A-A-A (residues 33 to 37) iq prtbsumed to fold back onto the pT,,G primer sequence. This would t)hen allou, incaorporat,ion of six A residues to pair with the first six T residues of the primer. While u’e have no direct evidence that this actually occurs, there is suggestive evidence that t,he incorrect -G bands are due t’o synthesis of a different sequence. in that the incorrect bands (labelled g41-g43) d o not quite correspond in position to corrtlct I)wnds (e.g. 41 or 42) in other slots (see Fig. 6). The reason why t’his reaction is
G. -C
-T
G. -A
BROWSLEE -G
-G
AND -G
E. -G
M.
C!AR’l’\VRIGH’I’ -C
-A
-G
-T
(b)
6. Acrylamide gel olcctrophoresis of various minus reactions carried out, with actinomycin FIG. “-G, Act”) and room temperature incubation present. (a) The effect of actinomycin D (labelled (labelled “-G, RT”) on the --G reaction. For comparison the standard minus reactions (--C, -A, -G, in duplicate, and -T) were included as controls. (b) All 4 minus reactions carried out in the presence of actinomycin D. The sequence is recorded in Table 4, line (vi). Short, normal run conditions were used (see Table 3) and the position of various residues is marked (see text. and Table 4).
inhibited at room temperature may be related to the lower activity of RNAase H at lower temperatures. If the template strand rema,ins more or less intact, then presumably a loop-back cannot occur. Figure 6(b) shows the standard method carried out in the presence of 100 pg actinomycin per ml (a higher concentration than in Fig. 6(a)) for all four minus reactions and the results are shown in Table 4. line (vi). No difficulty arose in reading the sequence 38 to 43 correctly, and the additiona. difficulty associat,ed with a C band at position 40 was also solved.
RAPID
GEL
REQUENCIKG
OF RSA
pT-T-T-T-T-T-T-T-T-T-G-----------,, . . . .
(5’) (3’1
o-o-o-
10.5
.
o-o-o-A-A-A-A.sC---37
; ----
----’
33
FIG. 7. A possible “loop-back” accounting for the mismcorporation m the rmidws 38 to 43. “a” marks the presumed wrong mcorpot%tion of ;\ rtssitlt:cxs.
(ii) Compression
at
12 rpactiorl
at
residue 68
On the standard gels the sequence reads T-G-G-T in this region instead of T-G-G-GT, t*he correct sequence (see Table 4). There were two problems here: t,he first is the “compression” of sequence, which might allow incorrect reading were the sequennt’ unknown: the second, which is distinct from the compression, concerns the absence of a band corresponding to the third G residue. However, this cannot be known unt’il the compression is corrected so that this second point will be discussed below. Figure 8(a) shows the effect’ of fractionation on a high amperage, or hot, acrylamidr gel whilst in Figure 8(b) and Table 4, line (vii). arc shown results with a 90:/, formamide gel in an attempt to overcome compression. The distances between bands were measured for both gels in the region of residues 65 to 70 and are recorded in Table 6 i&h. for comparison, the distances measured for a standard gel (Fig. 5(a) was chosen). The distance between G(67) and T(69) was in both the hot’ gels and the formamidtb gels about twice that of a single residue, suggesting the sequence T-G-G-N-T (S indicating any nucleotide) was present. Compression of bands has been noted before (Fiddcs, 1976) and is associated wit,11 regions of hairpin loops. Such a loop had been suggested for t*his region of ovalbumin in RNA (Cheng et al., 1976) and the equivalent hairpin loop of the cDNA is shown in Figure 9. It is possible that this structure may be only partly denatured in 8 M-Ure:H~ perhaps disrupting the three A.T base-pairs, but leaving the three GC pairs intact. If t,his is so. it is reasonable to imagine that cDSA molecules terminating at, the junct’ion of when a hairpin loop either just fails or just succeeds in forming might have anomalous mobilities. For example, cDNA molecules up to 66 residues long ma? fail to form a hairpin loop, whereas those 67 or 68 residues long may succeed. Prrsumably the resultant’ compression is corrected by melting of the hairpin loop at TABLE 6 Distance
on gels (mm)
between, nucleotides
65 and $0
Residue no.
Nucleotide
Standard gel (Fig. 5(a))
65 66 66
T G G
2.4 3.0 3.0
0.5 i.0 3.0
2.5 2.5 2.5
68 67 69 70
?Jt G :T A
2.8
5.0
4.8
:
2.3
-:
High amperage gel 90?& formamide (Fig. S(a)) Fig. 8(b))
t Band not seen on gel. $ No measurement possible as A(70) was not present on the gel.
gel
-1
-G
-T
-G
-C
-A
,
-T
,
-G
,-G(l),
69 -66
69
65
(0)
(b)
ITIC. 8. Special gel electrophorc& conditions to overcomr “compression”. (a) .\ G reaction bracketed by -T reactions PIII on a hot gel (condition 2, Table 3). Note that the hot, run has 50” lefers to a partial minus reaction which included caused some curvature of the bands. “-G, dGTP at a concentration of 50 nx. (b) 311 4 minus wactions separated on a 90:/b formamide gel photographic cxposurc of the - G (see Materials and Methods). The, -~ G( 1) slot was a shorter slot, mounted side by side with it, to show the separation of bands 66 ant1 67. A(70) is rather faint: “f” shows a frontal effect on the electrophoresis believed to be caused by too high a concentration of buffer in the rcwvoir solutions (see Mntwials and Methods). Only a part of the gel is shown.
higher temperatures formamide.
in the hot 8 M-urea gels or by the denaturing
(iii) Runs: C-C-C (residues
effect of 907;,
54 to 56) and G-G-G (residues 66 to 6X)
In general, the minus method gave runs of nucleotides. For example, a run of four A residues is present (residues 34 to 37). But C(56) and G(68), both third members of a run, were never observed and are recorded as a space (dash) in Table 4. In the case of G(68) an experiment was carried out in the presence of 50 nM-dGTP instead of minus G, followed by fract,ionation on a high amperage gel (Fig. 8(a), slot G50), in an attempt to overcome this problem. However, no band, only a space, was present for G(68). We conclude that bhe problem here and also with C(56) (experiment not shown) is not related to the availability of substrate. Presumably these products fail to accumulate in significant yields because of differential rates of synthesis, caused most likely by the same hairpin loop in the mRNA that causes the compression
RAPID
GEL
SEQUEKCIKG
OF RX.4
107
60 “NAd
A’ I
\
T\
i i’i T
.
A
-
c
-
I c
I 66+G I G
I If7 C-A-T
I I ’ c I .
A-G-A-A-C
50 +
phenomenon (see Fig. 9). Thus, if the rate of addition of G(67) \\.ere slow (because of the need to unwind a helix in the mRNA) compared to t,he rate of adding of G(68). then the third G band (68) might never be visible. The same argument can be used to explain the absence of C(56). Direct evidence for the presence of C(56) and G(68) \vas therefore lacking in t,hr minus reactions. although it was seen later in the plus reactions. (iv) Kesidue”4 : use of Klenow subfragment
oj DNA
plymwase
I is the minus reaction.
Residue 34 was ambiguous showing either a C or an A residue, the latt)er being correct. This indicated that synt’hesis proceeded one residue beyond t’he correct st’op position (at residue 33) in the minus C reaction. A possible reason for this is a misincorporabion of a T residue by the formation of a G-T wobble pair as illustrated below. mRNA cDNA, cDNA,
: (5’) U-U-U-U-G-U-U-C-C correct minus C product : (3’) A-A-G-G misincorporation T-A-A-G-G (3’)
. . . poly(A) (3’) . . T,,p (5’) . . . T,,p (5’)
This artefact was not corrected by the use of actinomycin D (Table 4, line (vi)) but was absent when the minus reaction was performed with the Klenow fragment of DNS polymerase I (see Materials and Methods). The results of this variation of the minus reaction are shown in Figure 5(c) and Table 4, line (viii). In addition, the results show that residues 38 to 41 are also read correct&v. There are, however. many unassigned residues and some other incorrectly assigned residues with this variation. The principal reason for this is that runs of nucleotides were usually absent. only the first member of a run being in good yield. For example T-T (45 and 46) always shows as a clear doublet in the standard minus reaction, but only T(45) is present with the Klenow fragment. A subsidiary difficulty arose because of thcx presence of faint satellite bands one residue shorter than a correct band giving rise in some cases to wrong assignments (e.g. G(28) and G(37)). Presumably the reason for obtaining the correct results at residue 34 is due to the 3’ exonuclease activity of the enzyme correcting mistakes in incorporation. However, the problems of this method, the lack of runs and the satellite bands. may also be the result of t’his same acbivitp.
This residue \\-a~ ambiguous in th(s standard minus rc~action. showing a pt&wnc~~ for G but, also giving T and =2. thrl corrcact rcsidur being A (see Table 4). ‘I%~ l
of the minus reaction, to sequence longer products
Table 7, lines (iii) and (iv). shows the results from two standard acrylamide gels previously discussed (see Fig. 5(a)) recording the sequence read beyond residue 75. In addition, with longer t’imes of fractionation in which the shorter products were run off the bottom of the acrylamide gel (see Fig. IO), the sequence could be read with only a few uncertainties to residue 191 (Table 7, lines (x) and (xi)). Although the resolut,ion of the larger product,s was much improved by the long fractionation. it was difficult to establish lengths of runs of nucleotides. Thus there is uncertaint,J at, residue 129 (A run) (not seen in Fig. 10 but, present in other experiments). 135 (G run) and 188 (T run). although in these cases the most likely length of run is recorded in the finally deduced sequence (Fig. 11). There is no band at positions 160 and 175. Position 160 is likely to be a purine, and 175 almost certainly a G residue. Position 177 is ambiguouss: giving mainly A in one experiment and G in another. The region of sequence 76 to 191, although previously unknown: was in some ways simpler to interpret than residues 20 to 75. possibly because there seemed to be no evidence of hairpin loop structures. Nevertheless, it was considered essential to develop an independent, plus system to cross-check the sequence. (f) The plus system The plus system of Sanger & Coulson (1975) on 4X DNA relies on the 3’ exonuclease activity of bacteriophage T4 DKA polymerase to degrade molecules until fragments are produced terminating in the same nucleotide as is present as the single deoxynucleoside triphosphate in the reaction. Thus in the presence of dATP, all fragments produced by the presence of T4 polymerase end in A residues. Attempts at using T4 polymerase with hybrids of ovalbumin mRNA and cDNA were unsuccessful. Exonuclease activity was apparent but the band pattern was unrelated to the single
RO
Loo
110
c t CAAACAA~~TG~ATAGT-ATATA-CATA--TaCTACATA-CATGA~TG-G-~-AG-TA
t ---AC~A--TG-ATAGT~A~A~A-~~TATC t 53g G
c T gg GAAAC-A~~TC~.~TAGT-ATA-A-CATATCCCACATA-CATGGA~*TA-~*T~A-~~A~~~-AG-T*-~~~A~~~~-~~-A--T~~~A~A~~~T
~~--C&CATTTGABTAGTAAT~TACCATATTTGCCAA~AT~GCATGATTGAGTCA,,GTTAG--AGAATGTG
A,--C~ATTTGAATAGTAATATACCATATTTGCGAACATAGGATGATTGAGTGAAGTTAG
A,,-~C~ATTTCAATAGTAATATACCATATTTGCCAA
80 c AAAACAATTTGAATAGTAATATACCATA
76
G
120
t
150
GTTAGC-A-AATGTGAATTATAAC-TTTT-ATA-TG~ATCTTAG-A~T~AGGAGAG~’~’~G’~
140
CAAGT-AG--A-AATGTGAA~TATAAGATTTT~ATAATGCA~TCTTAG-AGTCAGGACAGCT,.,GT
130
100
170
180
190
I91 I67
I79 175 170 167 160 IS6
132 l?IC. 10. Long fractionation of a standard minus rcactiotl. used (SW Table 3) running the shorter products off the bottom from rwitluw 131 to I91 (SW Tahl~ 7. line (x)).
Sormal of thr
long run conditions wcw gel. The sequence w-al: read
deoxynucleoside triphosphate present in the reaction. Similarly, att’empts at using a mixture of a 3’ exonuclease and a pol.ymerase (exonuclease III of E. coli and reverse transcriptase) in the same reaction failed. Preliminary experiments showed that the Klenow fragment of DNA polymerase I of E. coli, in the presence of Mn2 +, could produce specific band patterns with each
I II)
(4. (1. I~I~oM’Nl,l~:15 A0 (5’)
E. Xl. (‘AR’L’WRJGH’I 90
100
A~,*<~i; /.mf, -ALA-T-T-T-G-A-A-TEA-G-T-A~A-T-A-T-A-C-C-
HO A-T-r\-T
.ZSI)
T-T-G-C
-A-T-A-A-T-G-C-A--T
C-A-A
120
130
C-A-T-A-C-C-A--T-G-A-T-T-G-A-G-T-C-A-A-
180 I70 C-T-T-A-G-G-A-R-T-C-A-G-G-A--G-A-G-C-TAUT-GUT
1%
of the deoxynucleoside tjripho;phat,tts. The first experiments at pH 7.X gave rathr~I too complex a band pattern suggesting that the degradative activity was too wcbak. This was somewhat improved by incubating at higher pH. in glycinr~,‘SaOH buflt~r at pH 9.2. although no\\’ there WCI’R too f’cw bands present, (results not shown). Therefore, a number of experiments were carried out t,o optimize t bra reaction Timditfionx. Tho reaction time, t’he tcmpcrature (whether 20°C or 37°C). and tllc> t’onwnt’ration of the single dcoxynucleosidc triphosphate all had significant effect,s on thch band pattern obtaiued. Mn2 + was absolut~rly requiretl (no rcact’ion ot~~rrt~~ il‘ it. bvas omitted) but no differences n-erc noted over the range of concentration from 0.2 to 1.2 mM. From these results two stnndard conditions were chosen (set, Material 3 and Methods), differing only in the conccntratjion of the single deoxynut~ler)side triphosphate present. Figure 12 shows this plus reaction using an in&mediate length fractionation, the result’s being recorded in Tables 4 and 7, along with t,n-o othtbr plus reactions (figures not shown). Usually, but not always, runs of nuclcotides wert~ absent, making the gels harder t’o read tha,n the minus reactions. l’or cxamplc (SW Table 7, lines (xii) to (xiv)) only T(85) of the T-T-T sequence (83 t,o 85) IVRS present in the plus reaction. The results in thtb Table derive mainly from the 10 PM plus condition, but some bands which arc st,ronger in the 200 pM condition are includetl. Obvious artefacts. one residue longer t,han the correct, band, were omitted unless tjhtly were significant, in both reactions. The plus reaction was particularly valuable in checking the unccrtaint~irs in t ht* minus reaction. It is most effective in checking runs. Thus for the known part of’ thcx sequence C(X) and G(68) the third rnrmtwr of t’he C-C-C and of the G-G-G run \\‘erc confirmed (Table 4). Turning to t’htx new region of sequence (76 to 191). A-A-A-A (76 to 79) (rather than A-A-d). VW confirmed. as individual members of the run could be seen. despite the difficulty of the nrtefact C and G at position 76. (~(137) was confirmed also. although it’ must be noted t,hat’ no information as to length of runs is present here. Residue 81 (ambiglzous in the minus reaction) was clearly an A, as was residue 115. Some regions of the plus reaction; e.g. 85 to 94 and 117 to 126. were particularly clear. while otht~l~s \vtlre more difficult and artefact,s occurred (residues 100 and 101). Very long fractionations to confirm the sequence from rrsiduta 160 onwards were not attempted.
Ii.1l’ID +C
GEL +A
REQUENCIKC: +G
ST
tC
()I(’ KNA +A
tG
III tT
159 149
60
68
1”.
deoxy-
ide in
?ntifiod, ne (xii).
4. Discussion (a) Thr method mi nus and plus gel proc-dure for sequencing the 3’ end of ovalbumin I nRNA )ed here is a significant improvement on previous methods based on de:grada4th endonuclease IV (Cheng et al., 1976). There. the main problem n ‘as the WLl:ion of the complex mixture of fragments. Each purified oligonuc leotide
I I:!
c;. c:. I%I-tOWSI,EE
.\SI)
E. hl. ('r\lC'I'~VRI(:H'I
had t,o he individually &ted and analyscd for its sequences by furthw tligwtion \vith cbnzymes. and further fractionation. Finally. longs products had to lw iwlattld wlric*l1 overlapped the products already analyscd. this proving to lw partiwlarly ditiwlt in some cases (Proudfoot. 1976). The 11cw gt.1 method is cxtrernc~ly scwsitivcb. wquiI,ing as little as 0.01 &i of cDNA (a,s cor~lpad to I &i for thcl pwvious ~~11tlorluclt~;rs 8 IV method) and the whole experiment can 1~ carried out in one to two days. ‘I’hc~squcwv of 172 nucleotides was cst~ablished by this method (from residue 20 to I91 cnounting from the poly(A)). although it was ncct~ssary to nul 13 gets (Tables 1 and 7) to obtain this sequence. The minus method using ~~wv-sc~ transcriptasc is rnor~~ wlin I+ t hurl t h(l plus method, although the luttcr is a11 important check on the validity of thfb rc~sults of the minus method Szostak et al. (1977) and Haseltino of nl. ( 1977) have described anot,hcr IW\V approach to sequencing RNA involving the synthwis of 5’ cwi labcllt~d ( ~3Z~‘Ipl~osphatr) c*I)N\‘A using a primer molecuh~ iLlld reverw transcript~asc. Th(l labc4lerl cDK;A \vas t,t1t:n subjected to sequence detr:rIninatiorl using t’hrb new Maxam & Gilbwt (1977) gel procedure. This approach is more timc~-c~onsuming than t hc RNA sryucwc:c~ mt,t,hotl described here. Morcovcr, it) tlopcnds OII t#hc degradation of DNA lal)ell(~d at n. single terminal posit,ion. Consequently. t,hi-: m&hod is much Iws scnsitivc than t hc rapid RNA m&od described Ilrw. This is an important corlsic~c~1,atioIi when lvorking with biolopiwl mattbl,iat sucatl :ls mRNX. whit+ is difficult to pwpar~c~ in large amounts.
One limitation of the method is that ttrc seqwnw up to residue 20 from the primcbr was absent. This is not too serious. as conventional methods using limited synthesis react’ions are quite successful for this region (Cheng et wl.. 1976). Nevertheless. it, is possible that a sequence closer to t,hrb primer could have been est’ablished had Sephadex G25 rather than GlOO been used for t’hc gel filtration step (see Mat)erials and Methods, section (b)(iii)), thus in all probability avoiding the loss of fragments. Another: more significant limitation. was t,hc problem of sequence accuracy. This hw been discussed by Sangrr d al. (1977) in t’heir sequence studies of $X DNA. Thtw a. sequence is not regarded as proven entirely from gel results. Other data. such as depurination data of both the sense: ( 1) and nonsenw (-) strand of +‘he DNA. aw regarded as necessary to ensure lOO(~~,accuracy. In our study of ovalbumin mRSA the sequence has been established from residue 1 to 75 by previous methods but% beyond that it remains unchecked by tradit)ional sequencing methods. X’rverthc+ss. t,he good agreement brturcn the minus and the plus results suggest, bhe seywnw is greater than 95”/;, accurate and probably entirely correct, at l&t to residw 130. Beyond that, loss of rrsolut,ion of individual nucleotides meant t,hat’ it n-a,s mow difficult to resolve runs of nucleotidw. ‘I’hwc ww also t,wo uncertaintics caused by the absence or low yield of products so that, the srqurncc: for residues 131 to 191 must be regarded as more tclntativc end the accuracy is judged t,CJ be in the rangcb 909 to 950/iI. Presumably this sc~quence could tw checked by other gel methods. (Maxam & Gilbert, 1977) when the tIolxl)le-strande~l DNA wrresponding to this region of ovalbumin mRNA has been pu+ied. Other procedures for checking this sequence, for example by analysing thr products of endonuclease IV digest,ion or 1)~ depurination analysis. are relatively cumbersomc~, and are unliket?;, without great. effort, to provide a comprehensive cross-check.
RAPID GEL SEQUENCING OF RNA
113
Undoubtedly the minus and plus method describedhereis capableof improvement.
Theplusmethod is ratherdependent onthereaction conditions, anda lesscritical procedure wouldbeanadvantage. Similarly, it wouldbeusefulif othergelfractionation methods avoiding the local compression phenomenon could be developed. Unfortunately, the resolution of the 90% f ormamide gels used here was not quite as good (in the region of residues 75 to 191) as the standard 8 M-urea gels, so they were not, used routinely. (c) Comparison with the Banger & Godson (1975) method The new method is based on similar principles to the plus and minus method of Sanger & Coulson (1975) but differs significantly in detail, because an RNA rather than a DNA template is used. Thus reannealing of the template RNA strand is necessary here and reverse transcriptase is used both for the original preparation of 32P-labelled cDNA and for the minus reaction, whereas DNA polymerase I is used for both these steps in the Sanger & Coulson method. For the plus reaction, DNA polymerase I (Klenow fragment) is used here instead of T4 polymerase. This ability of DNA polymerase I (E. ccli) to degrade the DNA strand of a strict DXA : RNA duplex has not been described before. Like the polymerase activity with RNA templates (Proudfoot & Brownlee, 1974), it is strictly dependent on Mn2 + . We presume that the known 3’ exonuclease activity (Lehmann & Richardson, 1964) described for DNA duplexes is responsible for this activity. This is supported by the 170 190 160 A-C-A-A-G-C-U-C-U-C-C-U-G-A-Y-U-C-C-U-A-A-G-A-U-G-C-A-U-U-A-UIf A/U1
lr HinfI
160 140 150 Y.-A-A-A-A-U-C-U-U-A-U-A-A-U-U-C-A-C-A-U-U-C-U-C-C-C-U-A-A-Cr EcoRI* 110 130 120 U-U-G-A-C-U-C-A-A-U-C-A-U-G-G-U-A-U-G-U-U-G-G-C-A-A-A-U-A-U-
SO 100 90 G-G-U-A-U-A-U-U-A-C-U-A-U-IJ-C-A-A-A-U-U-G-U-U-U-U-C-C-U-U-Gt ECORI * 50 60 70 U-A-C-C-C-A-U-A-U-G-U-A-A-U-G-G-G-U-C-U-U-G-U-G-A-A-U-G-U-G-
30 20 40 C-U-C-U-U-U-U-G-U-U-C-C-U-U-U-A-A-U-C-A-U-A-A-U-A-A-A-A-A-C-
IO A-U-G-U-U-U-A-A-G-C-poly(A) FIG. 13. 3’-End sequence of ovalbumin mRNA. Residues 131 to 191, counting back from the poly(A), are tentative, Y is pyrimidine. Restriction sites for AM, HinfI and EcoRI* we shown, as is the sequence A-A-U-A-A-A, which is common to other mRNA species (Proudfoot & Brownlee,
114 Phe
mRNA
(5’)
.
Phe
.
Gly
.
Arg
.
Cys
.
Ser
.
Pro
cog4
-.
-.
/ I--.
(3’) U-C-N
A-G-R -.
1.
--
.H
G-T
.--.’
I. Y-Y-A-N-G-G-N-C-A-N-A-C-R-C-A-N-C-T-N-C-C-R-A-A-~=A-A
<
YM
,/--
.---I
/H -. ,a
(5’)
Val
U-U-~~U-U-Y-G-G-N-C-G-N-U-G-Y-G-U-N-A-G-Y-C-C-N-U-R-R -.
cONA
.
.. -.
.
.
.I
.
.
k.
-. (3’)
14. Predicted DKA4 sequence corresponding t,o the carboxyl-terminal region of ovalbumin. FIG. Note the polarity of the mRNll and DNA sequence, the correspondence between them being shown by the broken line. Y is a pyrimidine, R a purine and N any nucleotide.
fact that it is more active in glycine/NaOH buffer at pH 9.2 than in TriseHCl at pH 7.8. It is instructive to compare the quality of the gels with those of Sanger & Coulson (1975). One advant’age here was that only occasiona,l bands were missing in the minus reaction. In particular, individual members of runs of a single nucleotide were usually present. This was t,he main reason why gels could be read up to 191 residues from the primer, which is further than has been previously reported. We also found that, special reaction condit’ions for the randomisation of the initial t,ranscript (emphasised by Sanger & Coulson) were unnecessary as a sufficiently randomsized copy was obtained with the standard reaction conditions. Moreover, the repair synthesis of the minus reaction, as well as t’he degradative plus reaction, further evened out the spread of radioactivit’y on the gels. The main fact#or responsible for the residual uneven band intensity was the hairpin loop (Fig. 9). which resulted in very strong bands immediat,ely before this region (on the 5’ side) and rather weak bands immediately beyond (on bhe 3’ side). This loop wa,s also responsible for several difficult’ies in our method. resulting in artefacts or lack of bands. These difficulties n-err reproducible (as was another artefact possibly related to loop-back occurring) and overall they probabl,v occur at a higher frequency than in t,he Sanger $ Coulson method. However. some of these art,efacts can be avoided or minimized by varying the conditions of the minus reaction. In particular. the use of act’inomycin D or the use of DNA polymerase 1 was instructive for the previously known sequence (see Results) but they were not, necessary in establishing the new sequence (residues 76 to 191), as the st,andard minus reaction seemed to work well, possibly because of the absence of secondary structjurr. It should be noted t,hat’ hairpin loops have caused severe problems with t’he Sanger & Coulson met,hod also in one region of 4X DNA (Piddes. 1976). (d) Signi$cance
of the 791-residue sequence at the 3’ end of the mRN.4
(i) No overlap with coding sequeme 13. The sequence of the last 191 residues of ovalbumin mRNA is shown in Figure As discussed earlier, the sequence is likely to be correct for residues 1 to 130, but for 131 to 191 the sequence is in the range of 90 9/, to 95% accurate. (Two of the un-
RAPID
GEL
SEQCENCIA-(:
OF RX;\
1I 5
a partial sequence predicted from a knowledge of the amino acid sequence (Thompson et al., 1971) and the genetic code for the carboxyl-terminal region of ovalbumin (Fig. 14). This may be conveniently illustrated by comparing the high density and relative positions of obligatory C residues shown by arrows in Figure 14, with the sequence shown in Figure ll? where C residues are uncommon. Furthermore, it is unlikely that a further more tentative sequence read from the gels for residues 192 t,o 250 (sequence not shown. but read from t’hc result’s of Fig. 19) would also overlap with this carboxyl-terminal region. We conclude that the 3’ non-coding region of ovalbumin mRNA is at least 250 residues long. This is much longer than the 3’ non-coding regions of the /3-globin mRNA s. which are 95 residues long for rabbit and 134 residues long for human mRNAs (Proudfoot’. 1977) : but this is perhaps not surprising as there may bc AS many as 800 non-coding residues in ovalbumin mRNA (Cheng et d., 1976). (ii) Secondary structure Previously, Cheng et al. (1976) noted that a stable hairpin loop could be drawn betw,een residues 53 and 69 in ovalbumin mRNA. This was deduced from sequencing results but no direct evidence for the proposed secondary structure was available. Here we noted that a number of difficulties, e.g. compression, absence of residues 56 and 68 in the minus reaction, difficulty of residue 60, can all be correlated with the presence of this loop (Fig. 9). Moreover, direct evidence for its existence is available because of the accumulation of bands at positions 53 and 54 in the initial reverse transcript as well as in all four minus reactions, often as artefact bands (see Fig. 5(b)). Other loop structures can be drawn in other regions of sequence (Fig. 13) by maximising base-pairing. However, we have no direct’ evidence for their existence and. in any case, because of the very A-U rich nature of the sequence (SST, A.IT calcuated from Fig. 13) these are likely to have a low st’ability. (iii) Restriction
sitea
Our sequence (Fig. 13) predicts that that part non-coding region should contain sites for the AZuI, specificity (5’) A-G 4 C-T (3’), for HinfI for EcoRI*, specificity (5’) j, A-A-T-T (3’). This future characterisation of clones for ovalbumin (e) Other applications
of t’he gene corresponding to this 3’ restri&ion enzymes (Roberts, 1976) specificity (5’) G 4 A-N-T-C (3’) and information should be useful in the gems.
of the minus and plus method: future
work
The method described here has proved to be applicable to other mRNA molecules and other short deoxyoligonucleotide primers. Eight separate sequences have been read using the method, including internal regions of the mRNA in a and /3-globin mRNA (rabbit and man) and in immunoglobulin light chain mRNA (Proudfoot, 1977; Baralle, 1977aJ; Hamlyn et al., 1977). There is no reason to suppose it will not work with any RNA molecule for which a suitable specific primer with a defined 5’ end can be synthesized. The need t’o synthesise primers is perhaps the main rate-limiting step in this work. Nevertheless, methods for the synthesis of oligonucleotides up to the decanucleotide level are reasonably standard (Agarwal et al., 1972) and improvements in methodology either by solid phase methods (Gait & Sheppard, 1976) or by improved synthetic methodology (Ttakura it al., 1975 : Gillam d al.. 1974) will undoubtedly help.
116
(4. G. BHUWNLEE
AND
E.
M.
CARTWRTGH’L
Presumably the sequencing method will also work with longer primers such as one strand of a restrict,ion fragment, or with tRNA primers (for specific viral RNAs). For these longer primers it would be desirable to cut off the primer before gel fractionation. This could be achieved for an RNA primer by alkaline hydrolysis and for a restriction fragment by recutting (after annealing to excess double-stranded DNA from which the restriction fragment was originally derived) with the restriction enzyme. The advantage of restriction fragments as primers is that a number of these could be isolated simultaneously and no prior knowledge of the RNA sequence is required as is the case with the synthesis of a complementary oligonucleotide. This would overcome the need for chemical synthesis of oligonucleotides and should make the method even more generally applicable. REFERENCES Agarwal, K. L., Yamazaki, A., Cashion, P. .T. & Khorana, H. G. (1972). Angew. Chem. Int. Ed. Engl. 11, 451-459. Baralle, F. E. (1977a). Cell, 10, 549-558. Baralle, F. E. (19775). Nature (London), 267, 279-281. Barrel& B. G. & Clark, B. F. C. (1974). Editors of Handbook of Nucleic Acid Sequences, Joynson-Bruvvers, Oxford. Billeter, M. A., Dahlberg, J. E., Goodman, H. M., Hindley, J. & Weissman, C. (1969). Nature (London), 224, 1083-1086. Brownlee, G. G. (1972). In Laboratory Techniques in Biochemistry and Molecular Biology, (T. S. and E. Work, eds), Vol. 3, Part I, Determination of Sequences in RNA, pp. I-265, North-Holland, Amsterdam. Brownlee, G. G., Sanger, F. & Barrel& B. G. (1968). J. Mol. Biol. 34, 379-412. Cheng, C. C., Brownlee, G. G., Carey, N. H., Doel, M. T., Gillam, S. & Smith, M. (1976). J. Mol. Biol. 107, 527-547. Fiddes, J. C. (1976). J. Mol. Biol. 107, l-24. Fiers, W., Contreras, R., Duerinck, F., Haegeman, G., Iserentant, D., Merregaert, J., Min Jou, W., Molemans, F., Raeymaeckers, A., Van den Berghe, A., Volckaert, G. & Ysebaert, M. (1976). Nature (London), 260, 500-507. Gait, M. J. & Sheppard, R. C. (1976). J. Amer Chem. Sot. 98, 8514-8515. Gillam, S., Waterman, K., Doel, M. & Smith, M. (1974). Nucl. Acids Res. 1, 1649-1664. Haines, M. E., Carey, N. H. & Palmiter, R. D. (1974). Eur. J. Biochem. 43, 549-560. Hamlyn, P. H., Gillam, S., Smith, M. & Milstein, C. (1977). Nucl. Acids Res. 4, 11231134. Haseltine, W. A., Maxam, A. M. & Gilbert, W. (1977). Proc. Nat. Acud. Sci., U.S.A. 74, 989-993. Holley, R. W., Apgar, J., Everett, G. A., Madison, J. T., Marquisee, M., Merrill, S. H., Penswick, J. R. & Zamir, A. (1965). Science, 147, 146221465. Itakura, K., Katagiri, N., Narang, S. A., Chander, P. B., Marians, K. J. & Wu, R. (1975). J. Biol. Chem. 250, 4592-4600. Lehmann, I. R. & Richardson, C. C. (1964). J. BioZ. Chem. 239, 233-241. Maxam, A. M. & Gilbert, W. (1977). Proc. Nat. Acad. Sci., U.S.A. 74, 560-564. Molling, K., Bolognesi, D. P., Bauer, H., Busen, W., Plassmann, H. W. & Hausen, P. (1971). Nature New Biol. 234, 240-243. Pinder, J. C., Staynov, D. Z. & Gratzer, W. B. (1974). Biochemistry, 13, 5373-5378. Proudfoot, N. J. (1976). J. Mol. Biol. 107, 491-525. Proudfoot, N. J. (1977). CeZZ, 10, 559-570. Proudfoot, N. J. & Brownlee, G. G. (1974). FEBS Letters, 38, 179-183. Proudfoot, N. J. & Brownlee, G. G. (1976). Nature (London), 263, 211-214. Roberts, R. J. (1976). Crit. Rev. Biochem. 4, 123-164. Ruprecht, R. M., Goodman, N. C. & Spiegelman, S. (1973). Biochim. Biophys. Acta, 294, 192-203.
RAPID
GEL
SEQUENCING
OF
RNA
117
Sanger, F. & Coulson, A. R. (1975). J. Mol. Biol. 94, 441-448. Sanger, F., Brownlee, G. G. & Bar&l, B. G. (1965). ,J. Mol. Biol. 13, 373-398. Sanger, F., Air, G. M., Barrell, B. G., Brown, N. L., Coulson, A.R., Fiddes, J. C., Hutchison III, C. A., Slocombe, P. M. & Smith, M. (1977). Nature (London), 265, 687-695. Szostak, J. W., Stiles, J. I., Bahl, C. P. & Wu, R. (1977). Nature (London), 265, 61-63. Thompson, E. 0. P., Sleigh, R. W. & Smith, M. B. (1971). Aust. J. Biol. Xci. 24, 525-534.