d. Mol.
Riol.
(1981) 145, 463-470
LETTERTOTHEEDITOR
Isolation
of a Xenopus Zuevis a-Globin Gene
We report here the construction of a Xenopus Levis gene library and the isolation and characterization by nucleotide sequence analysis of a clone containing an 2globin gene.
From this analysis it is clear that the general organization of the gene is similar to that in mammalian a-globins. The two introns are considerably larger than those of the mouse a-globin gene but interrupt the coding regions at precisely the same amino acid positions. To construct the recombinant phage library, Xenopus DNA (from a Xenopus tissue-culture cell line) was partially digested with the restriction enzyme Hin.dIII and ligated to the EK2 Hind111 vector X-,ssas previously described (Proudfoot & Baralle, 1979; Baralle et al., 1980). The ligated DNA was then packaged in vitro (Blattner et al., 1978) and the resultant phage suspensionamplified by one round of plating. The final library consisted of the pool of plaques obtained. Initially the library was screened with ‘*P-labelled complementary DNA synthesized using poly(A) messengerRNA from either Xenopus oocytes or tissue culture cells as a template. About 15 to 20 plaques per 1000 screened positive by this procedure, indicating that a significant number of the plaques contained Xenopus DNA inserts. Examination of the DNA of some of these clones indicated that there was a large number of separate distinct clones, with the size of the inserts ranging from about, 7 x lo3 to 12 x lo3 base-pairs. Xerwplls globin mRNA, for use in the identification of globin gene-containing clones was isolated as follows. RNA was prepared from Xenopus reticulocytes 14 to 16 days after the last injection of phenylhydrazine (Thomas 8.~MacLean. 1975: Battaglia CI:Melli, 1977). Cells were collected in cold amphibian Ringer solution containing 1 mg heparin/ml and washed by centrifugation and resuspensionat 4°C’ in the same solution. The cells were lysed by addition of 2 volumes of a solution of 5 mM-MgCl,, @02% Nonidet P40. Nuclei were pelleted by centrifugation (lO,OOOg, 10 min). To the supernatant was added 2 volumes of a solution containing 50 mMTris.HCl (pH 7.8), 0.8% sodium dodecyl sulphate, 5 mM-EDTA, and 500 pg proteinase K/ml. After digestion for one hour at 37°C RNA was purified by extraction with phenol/chloroform and precipitated with 2 volumes of ethanol. Globin mRNA was purified by two passagesdown a poly(U)-Sepharose column (Lindberg & Persson, 1972). TO isolate a globin gene from the Xenopus library 20 agar plates were prepared, each containing about 10,000 plaques. Two replica copies of each plate were ma& on nitrocellulose filters (Benton & Davis, 1977). These filters were hybridized with 32P-labelled globin cDNA probe, synthesized from the glob& mRNA fraction described above. Three colonies hybridized specifically (each in duplicate). These 0022~2836/81/02@4634M $02,00/O
463
11 1981
Academic
Press 1~. (London) IA.
464
G.
A.
PARTINGTON (a)
(b)
AND (c)
F. (d)
E.
BARALLE (8)
(f)
I106
FK. 1. Restriction enzyme digests and Southern (1975) analysis of the Xempua genomic globin clones. Following r&xiction of 6 pg of DNA per clone with 1 unit of enzyme per pg DNA the samples were electrophoresed in a. l$O/, agarose gel. The gel w&s stained with et,hidium bromide, rinsed and
LETTERS
TO
THE
EDITOR
485
were replated by streaking onto a bacterial lawn and replica copies were again prepared on nitrocellulose filters, and rehybridized with the globin cDNA probe. All three again hybridized. After plaque purification, large liquid cultures were grown for isolation of phage DNA. The arrangement of globin hybridizing sequence within the insert fragment was determined by restriction enzyme analysis. The digestion pattern of each of the clones with Hind111 is shown (Fig. 1). All three clones isolated were cleaved in the insert region by HindIII. Furthermore, all three clones showed a similar pattern of bands on digestion with HindIII. This similarity in the digestion pattern of all three clones indicates that the insert sequences are not likely to be a fortuitous combination of non-contiguous sequences in the genome. The presence of additional bands not common to all of the clones confirms that the clones detected are each separate individual clones and did not arise from a single clone during t’hr amplification procedure. Transfer of the digests from the agarose gel to a nitrooellulose filter. by the method of Southern (1975), and hybridization to either “2P-labelled globin cDNA (not shown) or a Xenopus globin cDNA clone B52 (Humphries rt al., 1978) indicated that three bands in each clone were complementary to globin mRNA. The sizes of hybridizing fragments were 305 (+ 106) bpt, 605 (f 106) bp and one or more fragments in the triplet set. all of which were about 1800 bp. The clone labelled hXG4 was arbitrarily selected for. further analysis. It is clear from hybridization of cDNA to restriction enzyme digests that two small Hind111 fragments (305 and 625 bp) of hXG4 contain globin sequences. An analysis of the nucleotide sequence of these two fragments was undertaken to obtain more detailed information about the structure of the gene. The purified 305 and 625 bp fragments were end-labelled and re-digested with Ha,eIII (see legend to Fig. 2). HaeIII cleaved the 305 bp fragments to generate endlabelled X5 and 220 bp fragments, whereas the 625 bp fragment yielded two endlabelled fragments of about 420 and 120 bp, respectively. These four HindIII/HaeIII fragments were eluted from the gel and subjected to the Maxam & (iilbrrt (1977) sequencing procedure (Fig. 2). The total number of nucleotides determined was 637, which includes sections of the coding sequence (exon). intervening sequences (introns) and the 3’ untranslated sequence. The nucleotide sequence (Fig. 3) predicts an amino acid sequence homologous to amino acid residues 28 to 99 and 126 to 141 of the a-globins (Dayhoff, 1972). The amino acids are numbered assuming no extra amino acids in the regions not sequenced (as is thp t .\l)breriation
used : hp. base-pairs.
photographed under u.v. illumination. The DNA fragments were then transferred to nitrooellnlose paper and hybridized to the 32P-labelled cDNA plasmid B52. The plasmid was nick-translated as described by Rigby e/ al. (1977) using [‘*P]dCTP as label (4OOCi/mol) except that POmm-HEPES (pH 7.4) replaced phosphate buffer. After extensive washing in 1 x SXC (WC is 0.15 M-NaCI, 0.015 M-Na citrate). OW, sodium dodecvl sulphat,e at 65T hybridization to the blot was visualized by indirect autoradiography (Laskey & Milis. 1977). (a) to (c) Ethidium bromide stain of Hind111 digests of eenomic elobin clones hXG2. hXG3 and hMG4. respectively. The bp markers were determinid ftom“HisdIIcand &$I digests of pBR322 electrophoresed on the same gel (not shown). (d) to (f) Autoradiograph of hybridization of clones hSU2. hXG3. and AX04 to the probe. Some hybridization to part,ial digestion products is evident.
G.
A.
F ‘ARTINGTON
AND
(a)
FIG. 2
F.
E.
BARALLE
LETTERS
TO
THE
EDITOR
467
case in newt and echidna (Dayhoff, 1972)). The amino acid sequence derived from the nucleotide sequence closely matches the conserved amino acids of other tglobins (i.e. amino acids 31,39,41-43,47,58,59,69,83,84,87,88,91,93-95,97,98, 126-l 28,136,139-141). However, there are three exceptions, 4sp has been replaced by Asn at positions 51 and 57, while at position 80 Leu has been replaced by Met. Further data from protein sequencing are necessary to confirm these changes. As yet no protein sequence has been published for any Xenopus globin. The coding region of the nucleotide sequence is interrupted by introns at nucleotides 13 to 191 and 394 (the 3’ end of the large intron is in an unsequenced segment). This corresponds to amino acids 31132 and 99/100 of the amino acid sequence. The interruptions, which divide the coding region into three blocks, are at precisely the same amino acid positions as the two introns of the mouse n-globin gene. This further strengthens the view put forward by Leder et al. (1978) and further discussed by Nishioka & Leder (1979) that all adult N and /I active vertebrate globin genes will be interrupted at these two analogous positions. Furthermore, the identification of nucleotides 13, 191 and 394 as splicing points is reinforced,,by the presence of the consensus sequence (Catterall et al., 1978; Breathnach et al., 1978) at t,he exon-intron (consensus II) and intron-exon (consensus I) junctions. The sequence was screened for consensus sequences using a computer program developed by Staden & Brownlee (unpublished) in which a high score correlates with a splicing sequence. The splicing point at nucleotides 13 and 191 produced t,he maximum scores, while the one at position 394 yielded a rather lower score due to the presence of the sequence AGT instead of GGT, an unusual but not unique feature for the consensus II sequence (Breathnach et aE., 1978). The size and sequence (but not the position) of introns in globin genes diverge very widely for different species and for different genes of the same species (Efstratiadis et al., 1980). The cY-globin gene from Xenopm is an extreme example of this divergence. When compared to the mouse cu-globin gene (Nishioka & Leder, 1979) it is apparent that there is no significant conservation of either sequence or size (Xenopus at least 177 bp, mouse 122 bp) of the small intron sequences. Likewise, the sequenced section of the large intron has no sequence homology with the large intron of the mouse n-globin gene, and is about three times larger (about 130 bl’ cf. to 134 bp). The data presented here demonstrate that intervening sequences have been Flc;. 2. Sequence data: gel analysis of the Maxam & Gilbert (1977) degradations of the HindIII/HaeIII fragments (420 bp (a) and 120 bp (b)) labelled at the3’end. T, C, A and G denote: CfT, C-specific, A >(’ and G-specific cleavages. The arrow in (a) indicates the position where the sequence read is the complement of the exon-intron junction (position 394/395, Fig. 3). The arrow in (b) indicates the position where the sequence corresponds to the TAA terminator codon (position 589, Fig. 3). The sequenced fragments of XXG4 were isolated as follows. The clone was digested with HindIII, and the fragments were labelled by “filling in” the 5’ overhang with DNA polymerase (Klenow fragment) using [n-32P]dGTP (2060 Ci/mmol) as described by Proudfoot & Baralle (1979) and fractionated on a 60, polyacrylamide gel. The 305 and 625 bp fragments were eluted from the gel (Maxam & Gilbert, 1977) and digested with HaeIII. Fractionation of the Hoe111 digest on a 6y0 polyacrylamide gel showed that the 625 bp fragment yielded 2 end fragments of about 420 and 120 nucleotides, while the 305 band produced 2 end fragments of 85 and 220 nucleotides. The 305 band is most probably cut by HaeIII once only. although the possibility that 2 Hue111 sites are present adjacent to one another cannot be ruled out. The 625 bp fragment is cut at least twice by HaeIII.
Gc 140
TAG
r
LEU
A
1 A
230
1 NR
TGTACTC 150
SER
640
500
S?O
G C A A .r 460
1 A A G T 400
HE- I SER r G A G I: 340
THR
r
FRO C A G
rAAATAC1
li
c
SER A G c
AA
TGAAC 100
IYR
T
FRO 230
TCFIrl 170
ASN
ASP VAL ALA G T G G A T G C T 2YO
FM
T
TGGATCTAAACAAGTTA 110
50
ASP
TGT
LY8
G c
A T
TYR
‘I’ C T
C C T
T
.r
ARG
280
T
T
240
A A T
A T
A T A
T A 420
ALA I; C 360
650
590
***
530
nucleotldes
660
600
540
t
T A A A C A G A A A T G 470 480
C A A c 410
T
FHE
180
120
60
LEU ASN GLU C T G A A T G A 300
FHE
TCAT
LYS LEIJ SER ASP LEIJ HIS A A G C T G A G T G A C C T C C A T 350
LLYS L-Y!? VAL. A A I? A A A G T 1 Xi0
LLYS
r
tic
40
FIG.
-
.
3. The partial sequence of the Xenopus Zaevis a-globin gene. The nucleotide sequence is shown 60 nucleotides per line. The amino acid translation for each coding block is indicated above each coding sequence. See the text for a comparison with the published a-globin sequences. The numbering of the computer print-out has been kept on the other side of the 280 nucleotides gap after position 517. The underlined sequences are only 98’& certain and the double underlined nucleotides are sequences deduced from the restriction sites and not read through by sequencing. Hence the fragments separated by these sites are tentatively joined but not formally overlapped. Hyphens have been omitted from the sequences for clarity.
VAL
510
A G C A 450
630
THR
AATAA
I
ACAACAGCAGCAGAAGTCTCAACATCAGACATCAGTTAATTATATGCAATCAAACTGACA 610 620
r
A
570
500
A .r
r T I: 390
f’“E
THR
rC
GL.Y 1 G G I:
OLY ti G A 330
LYS
150
TAA1A’f.r 90
FWE ASP LYS LEU ALA THR ‘JAL SER GATAAGTTCTTGGCTACCGTATCTACTGACATCCAAATATCGTTAAGGCTCAGCA 550 560
r
AAAACAGATCT
r
r
GLY ASN T G G C A A C
A A A A G T G A 440
CAGTATATAGG
490
ALA G C r
LEU ASF A!;N ILE ALA r G G A 1 A A I: A T C G f: I .3?0
ASF LEU ARG VAL. ASF MO G A C C T G A G A G r G G A I: C c 370 3GO
r
I
TYR A r
SER ASN H I :; T C C A A C C A I 310
SCR LYS GLN IL t. SER C A A A A C A G A r C A G r 2.50
T A A G T G T A A A G G G 430
C T
ALA A G C T
r
FRO
AAAAATAAC
T TGA
HIS C A 170
TGCcl
GO
HIS HIS ASN C C A C C A C A A T 150
TAAAAGG 130
TGTCAGATGGCC
?lO
TGCT
TTAT 70
30
FM I 1. E ‘JAL ASN MET T~T~TA~AGGATGT~CATAGI~AA~:C~~:AA~;ACI:AAAA~~~~A~TT~:~~TAATTTTGA~TT 170 700
TT
TTAACTGGT
SER LEU TYR ARG AAGCTTGTACAGGTAAATTATATCTCrGAATGAAlGAATGCTGTTAACCAACTGCTAATA 10 20
LETTERS
TO
THE
EDITOR
4ti9
preserved in the cu-globin gene of Xerwpus in a similar manner to that of higher vertebrates, thus strengthening the view that their retention may have an important role in the expression of the genes they interrupt.? We thank J. B. Gurdon for microinjection, J. Hopkins and C. Shoulders for expert technical assistance, J. Paul for providing the cDNA plasmid, and K. Murray for &*s. This work was carried out in C2 containment facilities in accordance with the GMAG regulations then operative. Medical Research Council Laboratory of Molecular Biology Hills Road, Cambridge CB2 2&H, England Received 5 September
GEOFFREY A. PARTINGTON FRANCISCOE. BARALLE
1980
REFERENCES Baralle, F. E., Proudfoot, N. J. t Clegg, J. B. (1980). Ann. N.Y. Ad. Sci. 344, 7tG-82. Battaglia, P. & Melli, M. (1977). Devel. Biol. 60, 337-350. Benton, W. D. & Davis, R. W. (1977). Science, 196, 18&182. Blattner, F. R., Blecht, A. E., Denniston-Thompson, K., Faber, H. E., Richards, J. E., Slightom, J. L., Tucker, P. W. & Smithies, 0. (1978). Science, 202, 1279-1283. Breathnach, R., Benoist, C., O’Hare, K., Gannon, F. & Chambon, P. (1978). Proc. Nut. Amd. Sci., U.S.A. 75, 48534857. (latterall, J. F., O’Malley, B. W., Robertson, M. A., Staden, R., Tanaka, Y. & Brownlee. G. G. (1978). Nature (London), 275, 51&513. Dayhoff, M. 0. (1972). Editor of Ath of Protein Sequence and Structure, vol. 5, pp 198-200, National Biomedical Research Foundation, Silver Spring, Maryland. Efstratiadis, A., Posankony, J. W., Maniatis, T., Lawn, R. M., O’Connell, C., Spritz, R. A., De Riel, J. K., Forget, B. G., Weissmann, S. M., Slightom, J. L., Blechl, A. E., Smithies, O., Baralle, F. E., Shoulders, C. C. & Proudfoot, N. J. (1980). Cell, 21, 653-668. Humphries, P., Old, R., Coggins, L. W., McShane, T., Watson, C. & Paul, J. (1978). ,Ir’&. Acids Res. 5, 90&924. Kay, R. M., Harris, R., Patient, R. K. & Williams, J. G. (1980). Nucl. Acids Res. 8. 26912707. Laskey, R. A. & Mills, A. D. (1977). FEBS Letters, 82, 314-316. Leder, A., Miller, H. I., Hamer, D. H., Seidman, J. G., Norman, B., Sullivan, M. & Leder, P. (1978). Proc. Nut. Acud. Sci., U.S.A. 75, 61874191. t While this paper was in preparation, Kay et al. (1980) and Richardson et al. (1980) published sequence data obtained from cloned Xenopw globin cDNA. The limited sequence data presented by Kay et al. (1980) of the 3’ region of an a-globin cDNA clone are in complete agreement with our data. Richardson et al. (1980) present the complete sequence of a cloned a-globin cDNA which is consistent with our sequence but with 2 exceptions. Beginning at nucleotide position 306 of our sequence, the sequence reads A-G-C-T-T-C-C-A-A-C whereas their sequence at the analogous position reads A-G-C-TT-C-A-C-A-A-C, i.e. there is one extra base. The same strategy of sequencing was used in both cases, that is end-labelling at the Hind111 site. To resolve this ambiguity, sequencing using another labelled restriction site would be necessary. However, the change of phase that would result in our sequence by the addition of this extra base would necessitate the addition of two more bases on the 5: side of the Hind111 site to restore the correct phasing. It is very unlikely that two bases were missed in the reading of the fragment. Nucleotide position 383 of our sequence is a G residue, whereas their sequence reads A at this position, changing the amino acid coded from glycine to serine. We suggest that the difference arises because this G residue forms part of an EcoRII site, and artifactual reading at these sites is common as a result of methylation (Nishioka & Leder, 1979).
470
G.
A. PARTINGTON
AND
F.
E.
BARALLE
Lindberg, U. & Person, T. (1972). Eur. J. B&hem. 81, 26254. Maxam, A. M. & Gilbert, W. (1977). Proc. Nat. Acud. Sci., U.S.A. 74, 56&564. Nishioka, Y. & Leder, P. (1979). CeEZ, 18, 875-882. Proudfoot, N. J. & Baralle, F. E. (1979). Proc. Nut. Acud. Sci., C!.S.A 76, 5435-5439. Richardson, C., Capello, J., Cochran, M. D., Armentrout, R. W. & Brown, R. D. (1980). Devel. Biol. 78, 161-172. Rigby, P. W. J., Dieckmann, M., Rhodes, C. & Berg, P. (1977). J. Mol. Biol. 113, 237-251. Southern, E. M. (1975). J. Mol. Biol. 98, 551-564. Thomas, N. & MacLean, N. (1975). J. Cell. Sci. 19, 50%52C.