Cloning and expression of the NaeI restriction endonuclease-encoding gene and sequence analysis of the NaeI restriction-modification system

Cloning and expression of the NaeI restriction endonuclease-encoding gene and sequence analysis of the NaeI restriction-modification system

Gene. 155 (1995) 19-25 © 1995 Elsevier Science B.V. All rights reserved. 0378-1119/95/$09.50 19 GENE 08595 Cloning and expression of the NaeI restr...

648KB Sizes 0 Downloads 100 Views

Gene. 155 (1995) 19-25 © 1995 Elsevier Science B.V. All rights reserved. 0378-1119/95/$09.50

19

GENE 08595

Cloning and expression of the NaeI restriction endonuclease-encoding gene and sequence analysis of the NaeI restriction-modification system (Nocardia aerocolonigenes; Streptomyces lividans; 5-methylcytosine-methyltransferase;overexpression)

Christopher H. Taron*, Elizabeth M. Van Cott, Geoffrey G. Wilson, Laurie S. Moran, Barton E. Slatko, Linda J. Hornstra, Jack S. Benner, Rebecca B. Kucera and Ellen P. Guthrie New England Biolabs, Beverly, MA 01915, USA Received by F. Barany: 6 May 1994; Revised/Accepted: 3 August/23 September 1994; Received at publishers: 7 November 1994

SUMMARY

NaeI, a type-II restriction-modification (R-M) system from the bacterium Nocardia aerocolonigenes, recognizes the sequence 5'-GCCGGC. The NaeI DNA methyltransferase (MTase)-encoding gene, naelM, had been cloned previously in Escherichia coli [Van Cott and Wilson, Gene 74 (1988) 55 59]. However, none of these clones expressed detectable levels of the restriction endonuclease (ENase). The absence of the intact ENase-encoding gene (naelR) within the isolated MTase clones was confirmed by recloning the MTase clones into Streptomyces lividans. The complete NaeI system was finally cloned using E. coli AP1-200 [Piekarowicz et al., Nucleic Acids Res. 19 (1991) 1831-1835] and less stringent MTase-selection conditions. The naelR gene was expressed first by cloning into S. lividans, and later by cloning under control of a regulated promoter in an E. coli strain preprotected by the heterologous MspI MTase (M.MspI). The DNA sequence of the NaeI R-M system has been determined, analyzed and compared to previously sequenced R-M systems.

INTRODUCTION

Type-II R-M systems have been found in a wide variety of bacteria. While these systems produce enzymes which are important tools in molecular biology, the study of R-M systems can also provide insight into the mechanisms of DNA-protein interactions. The majority of type-II ENases need to bind to only a single recognition sequence in the DNA to cleave within that site. However, there is a small group of ENases which cleave some DNA substrates poorly unless a second recognition site, a Correspondence to: Dr. E. P. Guthrie, New England Biolabs, 32 Tozer Road, Beverly, MA 01915, USA. Tel. (1-508) 927-5054; Fax (i-508) 921-1350; e-mail: [email protected] *Present address: Department of Biochemistry, University of Illinois, Urbana, IL 61801, USA. Abbreviations: aa, amino acid(s); Ap, ampicillin; bp, base pair(s); DTT, dithiothreitol; ENase (R-), restriction endonuclease; Hy, hygromycin B; hyr, gene encoding HyR;kb, kilobase(s) or 1000bp; mSC, 5-methylcytosine; MTase (M.), methyltransferase; naelM, gene encoding SSDI 0378-1119(94)00806-X

so-called 'activator site', is present (Kriager et al., 1988; Conrad and Topal, 1989; Oller et al., 1991). Included in this group is the NaeI endonuclease (R.NaeI), an ENase isolated from the Actinomycete Nocardia aerocolonigenes, which recognizes the sequence 5'-GCC+GGC-3 ' and cleaves the DNA after the second C to give a blunt end (Roberts, 1987). Studies indicate that for some DNA substrates, R.NaeI must bind two GCCGGC sequences within the DNA before cleavage at one of these sites can occur (Conrad and Topal, 1989; Yang and Topal, 1992). The aim of this study was to clone, express and M'NaeI; N., Nocardia; naelR, gene encoding R.NaeI; nt, nucleotide(s); ORF, open reading frame; PAGE, polyacrylmide-gel electrophoresis; PCR, polymerase chain reaction; Pollk, Klenow (large) fragment of E. coli DNA polymerase I; r~, resistance/resistant; R-M, restrictionmodification; S., Streptomyces; SDS, sodium dodecyl sulfate; TRD, target recognition domain; Ts, thiostrepton; tsr, gene encoding TsR; u, unit(s); Vsr, very short patch repair; XGal, 5-bromo-4-chloro-3-indolylfI-D-galactopyranoside.

20

sequence the NaeI ENase gene (naeIR) to aid in the understanding of its unusual mechanism for DNA binding and cleavage. The NaeI MTase gene, naeIM, has been previously cloned from a partial Pst! library in pBR322 and localized to two adjacent PstI fragments (1.65 and 1.45 kb) (Van Cott and Wilson, 1988). We report here the cloning of naelR, its expression in S. lividans and its overexpression in E. coli containing M'MspI. We also report the nt sequence of the NaeI RM genes and discuss the aa sequence homologies shared with other reported proteins.

Pstl

Pst I parlial

Ligate klul

P~J RESULTS A N D D I S C U S S I O N

Pstl

N. aerocolonigenes

pEVCnaelT-5

chromosomal DNA

7.31 kb

~L~at~ Mlul

Pstl

tet

Fig. 1. Diagram of the steps involved in cloning the SacI fragment containing the region of D N A upstream from naelM, pEVCnaeI7 5 was constructed by removing the 1.65-kb PstI fragment using partial PstI digestion from pEVCnaeIM59 (Van Cott and Wilson, 1988) deleting the 5' end of naelM. By cloning the desired SacI fragment, determined by Southern blot analysis, from the N. aerocolonigenes genome into pEVCnaeI7-5 in the proper orientation, the intact naelM gene is reconstructed. A SacI library of N. aerocolonigenes D N A was constructed by digesting purified N. aerocolonigenes D N A with SacI, D N A fragments about 6 kb in size were purified and ligated to SacI cleaved, dephosphorylated pEVCnaeI7-5. The ligation was used to transform competent E. coil K802 and transformants were selected for tetracycline (Tc) resistance. D N A from pools of transformants was isolated. Methylase selection of the plasmid pool using 8 u of R.NaeI per gg of D N A for 3 h at 3T'C yielded no clones so a less stringent methylase selection was perfomed on the purified plasmid D N A using 1.25 u of R.NaeI per gg of D N A for 30 rain at 3TC. The digested mixture was used to transform competent E. coil AP1-200 (Piekarowicz et al., 1991 }.

(a) The n a e l M clones do not express the ENase in E. coli or in S. lividans

A clone previously obtained by MTase selection in E. coli (Van Cott and Wilson, 1988) containing naeIM and 3 kb of DNA downstream from naeIM was subcloned into S. lividans, a host more similar to N. aerocolonigenes, to determine if naeIR was present on the clone but not active in E. coli. No expression of the ENase was detected, suggesting that naeIR was not present. Accordingly, more DNA upstream from the MTase gene was cloned from N. aerocolonigenes genomic DNA. The resulting clone, pEVCnaeIRM9.3 (Fig. 1), spans naeIM and contains at least 5 kb of DNA upstream from naeIM. No ENase expression from pEVCnaelRM9.3 could be detected in E. coli probably due to the distance of naeIR from any endogenous promoters in the vector. (b) Comparison of the nt sequence of pEVCnaeIRM9.3 and the aa sequence of R.NaeI to determine the location of naelR To establish the presence of naelR on

pEVCnaelRM9.3, DNA sequencing of subclones and N-terminal aa sequencing of R.NaeI was undertaken. DNA sequencing was done on a 3.7-kb region spanning naeIM and approx. 1.7kb upstream from naeIM. N-terminal peptide sequencing was performed on purified R.Nael. Three major bands of 37, 34 and 31 kDa were observed when the purified R'NaeI was run on SDSPAGE. The 37 and 34-kDa peptide sequences overlapped (Fig. 4), while the 31-kDa band yielded a sequence which

Transformed cells were plated on L-agar containing Tc and 40 gg/ml XGal. The plates were incubated overnight at 43~C, shifted to 30°C for 3 h and then shifted back to 43~C for 2 h. The presence of a blue colony after temperature shift-down in AP1-200 cells can indicate the presence of an active MTase gene. The plasmid D N A (pEVCnaelRM9.3) isolated from a resulting blue colony contained a SacI fragment of the expected size and displayed partial MTase activity.

21 appeared to be unrelated. The data are consistent with the 37-kDa protein band being the full-length R.NaeI, the 34-kDa protein band being a proteolytic derivative of the ENase and the 31-kDa peptide being a contaminant. The translational start of naelR was located by comparing the aa sequence deduced from the DNA sequence with this N-terminal sequence from R.NaeI. The start on the naelR is located approx. 1.6 kb upstream from naelM.

(e) The expression of naelR in S. lividans All attempts to clone the MluI fragment from pEVCnaeIRM9.3 (Fig. 1) containing the entire NaeI R-M system directly into S. lividans were unsuccessful. However, by preprotecting the host from R.NaeI digestion by first transforming S. lividans with a low-copy

••

Streptomyces vector, pIJ922 (Lydiate et al., 1985), containing naelM, the cloning of the MluI fragment on a high-copy Streptomyces cloning vector was accomplished (Fig. 2). Interestingly, S. lividans CT2, the strain of S. lividans containing both of these plasmids, produces only about 3000 u of R.NaeI per gram of cells, with no significant increase in the production of R'NaeI over levels produced in N. aerocolonigenes even though naelR is cloned on a high copy plasmid. This phenomenon of basal expression regardless of gene-dosage has also been observed for the SalI system (Rodicio and Chater, 1988) and for the SacII and NcoI systems (E.P.G., unpublished results). These observations suggest that the R-M genes in Actinomycetes are probably highly regulated even though sequence comparisons of the proposed promoters of naeIM and R genes to the identified promoters of the

plJ922

rZ

/

Pstl I

EC~tV

EcoRIlinkem'~Ligat e Transform

TK24

S. lividans

Ligale

f Select tor Ts R S. lividans CT1 Nael M + R

~

P

s

l

l

---'-I ~

Select for TsRand HyR

6

:

~

~

u

S. lividans CT2 Nael M + R + pCTnaelM13-9

pCTnaelRM21-1 PsZJ Fig. 2. Diagram of the steps involved in cloning the NaeI R-M system into S. lividans TK24. First naelM was cloned into the low-copy Streptomyces vector pIJ922, naelM was isolated from pEVCnaeIM37 (Van Cott and Wilson, 1988) by digestion with ScaI + AseI. The ends of the purified fragment were treated with PolIk. EcoRI linkers were ligated on to the resulting blunt ends and digested with EcoRI to give compatible sticky ends. This fragment was ligated into EcoRI digested pIJ922 and transformed into S. lividans TK24 protoplasts. Transformants were selected for Ts R (tsr). Genomic D N A isolated from the resulting strain, S. lividans CT1, was resistant to R'NaeI. pEG12-6 was constructed as a vector which would be compatible with pIJ922 and have a different selectable marker, pEG12-6 was constructed by taking a BglII fragment from pIU87 (provided by G. Janssen, University of Miami), a p U C 1 9 derivative containing the Hy R gene (hyr) cloned from S. hygroscopicus, and ligating it into pIJ486 (Ward et al., 1986) which had been digested with BamHI. The XbaI-HindIII fragment containing the NaeI R-M system from pEGnaeIRM6-1, an MluI subclone in pUC19 from pEVCnaeIRM9-3 (Fig. 1), was ligated into pEG12-6 which had been similarly digested. The resulting ligation mix was used to transform S. lividans CT1. Hy R and Ts R colonies were selected. The resulting transformants were screened for the presence of both plasmids on an agarose gel. S. lividans CT2 contained both plasmids and expressed both naelM and naelR.

22

sallM and R genes (Alvarez et al., 1993) displayed no striking similarity. (d) The overexpression of naelR in E. coil Due to the relatively low level of expression of naelR in S. lividans, an attempt was made to overexpress the gene in E. coli. In order to have the naelR gene stably expressed in E. coli, it was necessary to have an active MTase to protect the host D N A from cleavage by the ENase. Because naelM was poorly expressed in E. coli, a heterologous MTase, M.MspI, was used to protect the

NeOl I

\

(

Z ~

igate

Fig. 3. A representation of the steps involved in the overexpression of naelR in E. coil using the expression vector pAGR3, pAGR3 is a pBR322-based vector, constructed by W. Jack at New England Biolabs, which contains the Ptac promoter, an Ap R (bla) gene, a single copy of lacl q, a fourfold direct repeat of the rrn b terminator upstream from the Ptac promoter to prevent read-through transcription from the vector, and an Ncol site downstream from a lac ribosome-binding site. Attempts to use PCR to create a BspHI site to overlap the A U G start codon of naelR so that naelR could be cloned directly downstream from the Ptac promoter in pAGR3 were complicated by the presence of an additional BspHI site within naelR. Partial digestion with BspHl of the PCR product to yield intact naelR proved unsuccessful, therefore the following two-step cloning procedure was used. The naelR gene was amplified by PCR from pEGnaeIRM6-1 (Fig. 2), creating an NcoI site at the start of naelR so that naelR could be cloned directly downstream from the Ptac promoter in pAGR3. Creation of the NcoI site using PCR primers 1 and 2 caused a change of the second aa residue of R'NaeI (Thr ~ Ala) in the resulting clone pCTnaeIR 16-1. To correct this change, a second PCR using primers 3 and 4 was performed which amplified only the 5' end of naelR changing the D N A sequence to include a BspHI site at the start of the naelR. This change does not alter the aa sequence and allows for cloning directly downstream from the Ptac promoter. The amplified fragment from BspHI to BgllI was used to replace the fragment from NcoI to BglII fragment from pCTnaeIR16-1 to create pCTnaeI24-2. PCR primer 1, 5'-CCGCGCTCGAG CACACGCGAGCGG; PCR primer 2, 5 ' - G A A G G G G T C C C A T G G C T G A G T T G C C G C ; PCR primer 3, 5 ' - G A A G G G G T C T C A T G A C T G A G T T G C C G C ; PCR primer 4, 5'-GGATTTCGCTGAGAAGATCTGATCGCGCACG.

Fig. 4. Organization and sequence of the NaeI R-M system. (A) Diagram of the O R F s identified from the nt sequence of the NaeI R-M system. Each arrow above and below the double line indicates the location and direction of each ORF. The O R F s identified as the MTase and ENase are noted as such. The O R F which has homology with Vsr indicated as naelV. N u m b e r s on the double line indicate the nt at which the O R F begins and ends. Only those O R F s which are greater than 100 aa in length and which start with Met are indicated in this diagram with the exception of the MTase which starts with Val. (B) The nt sequence and deduced aa sequence of the NaeI R-M system. The nt sequence of a 3.7-kb region, starting at a KpnI site and ending with a Pstl site, spanning the naelM and naelR genes was determined using the dideoxynucleotide chain-termination procedure as previously described (Brooks et al., 1991). R.NaeI was purified for N-teminal peptide sequencing from 960 g of N. aerocolonigenes cells obtained from a 100-1itre fermentation. The cells were lysed by passage through a Gaulin press. The supernatant produced by centrifugation of the lysate in a Sharples centrifuge was first passed over a DEAE-Sepharose column (Pharmacia). The flow-through and wash were then loaded onto a Bio-Rad Affi-gel blue column followed by a Pharmacia Heparin-Sepharose column, a P l l cellulose phosphate column (Whatman), a Mono Q F P L C column (Pharmacia) and finally a TSK-Heparin 5 P W column (TosoHaas). An NaCl gradient was used to elute R.NaeI from each column. After these steps a 2-ml pool of the peak of activity remained which contained 0.1 mg/ml of protein, had 160 000 u of R'NaeI and was about 50% pure as determined by SDS-PAGE giving an estimated specific activity of 8 x 105 u/rag R.NaeI protein. The purified R'Nael was run on a Tris-Glycine 0.1% SDS-10 to 2 0 % PAGE (ISS-Enprotech). Protein bands of 37, 34 and 31 k D a were individually subjected to sequential degradation using an Applied Biosystems model 470A gas-phase sequencer (Waite-Rees et al., 1991). The nt sequence of the coding strand of the NaeI R-M system is shown in its entirety with the deduced aa sequence for the naelR and naelM genes included below the nt sequence and the deduced aa sequence for naelV included above the nt sequence. The underlined aa sequence of R-NaeI is the region confirmed by N-terminal sequence of the purified ENase. The underlined aa sequence of M.NaeI corresponds to the predictive motifs for a mSC-MTase (Lauster, 1989; P6sfai et al., 1989). These sequences have been submitted to G e n B a n k under accession No. U09581.

23

A naelR naelM 1 1461

ld 61 11291

17347,5 16291170619

28091297513153I 23571

27611293013037

3665

34741

naelV

B i GGTACCGGCATCACCATGGGCCGGAGTTT~T~GCTCATGGCCTGCGCGGACA~CTTCACACCCATGCCTTGAGTG~TTCGAGCACAGAA~GGAGCGTGTCATTGGGGGCAGCCCATGCCA 120 AGATCATCAGATTTGAAGGGGT•ACATGA•TGAGTTGCCGCTGCAGT•CGCGGAACCCGATGACGATCTCGA•CGGGTTCGGGCAACGTTGTACAGCCTTGACCCAGACGGTGACCGGAC i M T E L p k Q F A E P D D D L E R V RAT L Y S L D P D G D R T 240 TGCTGGTGTGTTGAGAGACACGCTCGACCAGTTGTACGACGGTCAGCGAACCGGGAGGTGGAACTTCGATCAGCTGCACAAGACCGAGAAGACGCACATGGGAACCCTGGTGGAGATCAA 33 A G V L R D T L D Q L Y D G Q R T G R W N F D Q L H K T E K T H M G T L V E I N 360 CCTGCACCGTGAGTTCCAGTTCGGTGACGGCTTTGAGACCGATTACGAGATTGCAGGAGTGCAGGTCGACTGCAAGTTTTCGATGAGCCAGGGCGCTTGGATGCTGCCTCCGGAGTCGAT 73 L H R E F Q F G D G F E T D Y E I A G V Q V D C K F S M S @ G A W M L P P E S I 480 CGGGCACATCTGTCTGGTCATCTGGGCAAGTGATCAGCAGTGCGCATGGACCGCAGGACTGGTGAAGGTCATACCCCAGTTCCTCGGCACTGCCAACCGTGACCTCAAGCGGCGACTCAC 113 G H I C L V I W A S D Q Q C A W T A G L V K V I P Q F L G T A N R D L K R R L T 600 ACCCGAAGGCCGTGCCCAAGTTGTCAAACTGTGGCCAGATCACGGAAAGCTGCAGGAGAACCTGCTCCTGCACATCCCCGGTGACGTGCGCGATCAGATCTTCTCAGCGAAATCCAGCCG 153 P E G R A Q V V K L W P D H G K L Q E N L L L H I P G D V R 0 Q I F S A K S S R 720 CGGTAATCAGCACGGTCAGGCGCGCGTGAACGAACTGTTCCGCCGAGTGCACGGGC GTCTCATCGGGAGAGCGGTCATAGC GACTGTGGCGCAGCAGGACGACTTCATGAAGCGCGTACG 193 G N Q H G Q A R V N E L F R R V H G R L I G R A V I A T V A Q Q D D F M K R V R 840 CGGGTCAGGCGGCGCGCGTTCGATCCTTCGGCCTGAAGGAATCATCATTCTTGGGCATCAGGACAACGATCCGAAGGTGGCGAACGATCTCGGGTTGCCGGTGCCGCGCAAGGGGCAGGT 233 G S G G A R S I L R P E G I I I L G H Q D N D P K V A N D L G L P V P R K G Q V 960 CGTCGCAGCACGAGTGGTAC•GGCTGAc•AGGGAGACCAGCGG•AAACCGCTGA•ATCCAGGGGCGGCGCTGGG•CGTAGCCGTGCCTGGCGACCCCAT••TCGAGG•GCCGGTTGTGCC 273 V A A R V V P A D E G 0 (} R 0 T A E I Q G R R W A V A V P G D P I V E A P V V 166 * D E A S T D F G H E L H H L T D V V A A 1080 CCGGAAATCAGCCGA•TAGGGCGTGGCGCG•CCAGACTC•GGGAATTGT•AGTCCT•CGCACTAGTGTCGAACCCATGTT•GAGATGATGAAGGGTGTCCACGACGGCTGCTA•CGCGTC 313 R K S A E *

V

P

A

D

143 Q L E E H E W V R L V R W G A N T L S Q N V T R 0 R E V N R R L K P S W Y W E 1200 TTGCAGCTCTTCGTGCTCCCAAACCCGCAACACC•TCCACCCGGCGTTGGTGAGG•ACTGGTT•ACCGTTCGATCGC•CTCGACGTTCCGCCTGAG•TTCGGCGACCAGTACCACTCGTT

N

103 T Y P Q R G H D P C V H W F C G D I F V A V K R A T F V [ D P K V K V G D G L 1320 CGTCGTAGGCTGTCGTCCGTGGICTGGGCA•ACATGCCAGAAGCAGC•GTCGATGAAGACAGCGACCTTCCGGGCGGTGAAGACGATGTC•G•TTTGACCTTGAC•CCGTCGCCAA•GCG

R

63 L L F D K R Y R Y G L K F L A S R L A A E P K T G S R R N A 0 M N R S R G A N L 1440 CAGCAGGAAATCCTTGCGATACCTGTACCCGAGCTTGAAGAGCGCGCTTCGCAGTGCAGCCTCGGGTTTCGTACCACTGCGCCGGTTCGCCTGCAT GTTCCGCGAGCGCCCAGCGTTCAG 23 P A P Y T G S A H A R A R A A A R S S K D S M 1560 CGGCGCTGGGTAGGTACCGCTCGCGTGTGCTCGAGCGCGGGCGG•AGCTCTGCTGCTCTTATCAGACATG•GCGAAAGCCTCTTGTGCCGGTTGGCTACAGGTA•GGGGCGCCGGTGGAC 1680 GATACTGCATCGCGAGGTACACCTGATCACATTTGGACGCGAAAGGGGCGCTTGT•CAGAGTCTCGAGGTAGTGGAGATCTGCGCCGGTGCCGGTGGTCAGGCGCTGGGG•TTGAGAAAG i V O S L E V V E I C A G A G G O A L G L E K A 1800 CTGGCTTCAG•CATCGGCTTGc•GTTGAGCTGGACGTGAACGCGGCAGCGACGCTGCGCAAGAACCTCAAGTCGGACGTGGTGATCAC•GGCGACGTCGCTGATCCTTCCGTGCTGAACC 24 G F S H R L A V E L D V N A A A T L R K N L K S O V V I T G D V A D P S V L N P 1920 CGATGGAACACCTGGGGGTGTCGTTG•TGG•TGGTGGTGTGCCTTGTCCCcCATTCAGCATCGcGGGCAAGCAGCTCGGTGC•GACGA•ATGCGGGACCTGTTCGCCTGGGCGGTTGAGC 64 M E H L G V S L L A G G V P C P P F S I A G K O L G A D D H R D L F A W A V E

L

2040 TGTGCGATGTCAT•AAGCCGCG•GCCTTGATGCTCGAGAACGTCCGTGGCCTCAGTATGCCCAGGTT•GC•GGCTACCGGCAGCACGTCCTCGATCGGCTGAACGACATGGGTTACGTC• 104 C D V M K P R A L M L E N V R ~ L S M P R F A G Y R Q H V L D R L N D M G Y V A 2160 CTGAGTGGCGTCTCCTGCACGCATCGGACTTTGGGGTTCCTcAA•T•CGGCCGCGTTTCGTACTTGTCGCTCTGCAGAACAAGTTCGCCCCCTATTTCACCTGGCCTGAGCCGACCGGTG 144 E W R L L H A S O F G V P O L R P R F V L V A L O N K F A P Y F T W P E P T G A 2280 CGGCACC•ACGGTGGGGGAGACGTTGAAGGACCTCATGGC•GCGGACGGCT•GGAAGGTGCCGAAGAGTGGGCGGCTCAGGCGAACGACATCGCACCAACCATCGTGGGTGGCTCCAAGA 1 8 4 A P T V G E T L K D L M A A D G W E G A E E W A A Q A N D I A P T [ V G G S K K 2400 AACATGGCGGAGCTGACC•CGGCCCGACTCGCGCGAAGCGGGCGTGGGCAGAGCTCGGTGT•GACGCAATGGGAGTCGCTGACGCGCCGCCCCAGCCTGGCGACAAGTTCAAGGTAGGA• 2 2 4 H G G A D L G P T R A K R A W A E L G V D A M G V A O A P P Q P G D K F K V G P 2520 CGAAG•TGACCTGCGAGATGGTTGCCAGGAT•CAAGGGTGGCGCGACGGCGAGTGGATCTT•GAGGGTCGTAAGACCT•GCGATACCGCCAGATCGGTAA•GCTTTCCCGC•ACCCGTGG 264 K L T C E M V A R I Q G W R D G E W I F E G R K T S R Y R Q I G N A F P P P

V

A

2640 CTGAAGCGATCGGCAAGCGCATCCGTGCTGCCTTGAACATGGAGGGTGAGGGCAGGGATCGGGCGGTCGACAGCGACCACAACCCGTTGTACCGG•CGCTGAAGGAGTCGGGCGATTTCA 304 E A I G K R I R A A L N M E G E G R D R A V D S D H N P L Y R A L K E S G D

F

M

2760 TGACTCACCGGCAGCTGGAAAGGGCTGTCGGTCGACCCATCGAGGCATATGAGC•GGAGCGCA•GATCTCTGATCTGGGGCGTGA•TTCGAGGTCGAGACGAAGGACGGTGCTTCGG•GA 3 4 4 T H R Q L E R A V G R P I E A Y E L E R T I S D L G R D F E V E T K D G A

A

M

2880 TGGCGTACAAACTGGGG•CG•TCAAGGCCTTCACAGGCCAAGAGGGTCATTTGCGGCACGAGAT•TTCGTGCGCCACCGCACAAAGATCAGCTAGGAGGAAGGCTGGAT•TCCACATAG• 384 A Y K L G P F K A F T G Q E G H L R H E M F V R H R T K I S * 3000 CAAAGTGCCCGGCTAAGGTGGACATCCGGCTCAGCATCAGTCGTCGTCCCCGACGATGGCGAT•AGGTCTTTTTTCGA•ATCGCGACGTACTTCTTCCTCTTCGCGGTGTT•CGGATGAT 3120 CGCAGG•ATCAGCTCGTAGGTCAGATC•A•TGATCCGAGCACGTAGTAG•CGATGT•GTTTGACTCGAGGCTCACAATATCCATGCCCAGTTCGCGCAACTCTCGCAGG•GGCG•TCGGT 3240 GTGGACGGAGTCG•CGGTCACGACCCTGAGCAGGGCTGCCTCCAC•TTCTG•CCTTTGCGT•GCAAAAGCAGG•TGAACAACT•CTCGTGCACACGGCTCCCATAGG•AACGGCGAGATA 3360 GGCCGCA•GAACGGTGCCACCGTACCGTCG•TCCATTTCAGTGCCGAGTTGGGCGCG•AGCTCATGAAACTCTTCGCACCGCTCCGGAAGGCCCAGCCGTACGTAAGCCTGCATG•ACTC 3480 AGCGAGGAGCC•CTTAGCGTACAAATAGGTGTCGAGCAGGTCCTGACTGGAGGTTCCGCTCCGAAGCTGGACCAG•AGATCTTCG•TTGCGGCCGCTGTATCGGGCCTCAACCG•CT•CA 3660 GTCACCCTGCGGATACCGAGATGCGGTCAAGCTATTTCCCCTTCTGCTTCAGCTGGTTACTGCAG

S

24 host from R.NaeI cleavage. M.MspI recognizes the sequence 5'-CCGG-3' (the internal tetramer of the NaeI recognition sequence) and fully protects against NaeI digestion (Nelson and McClelland, 1991). E. coli CAA1, had previously been constructed (W. Jack and L. Greenough, unpublished results), by integrating msplM as a stable ?~ lysogen in E. coli K802. Genomic DNA purified from this strain was resistant to R.NaeI cleavage. Overexpression of naelR was accomplished by cloning the gene downstream from the Vtac promoter in pAGR3 (W. Jack, unpublished results; Fig. 3). When the resulting plasmid, pCTnaeIR24-2, was cloned into E. coli CAA1, after induction of Vtac with 1 mM IPTG, about 1.5 x 10 6 u of R'NaeI per gram of pelleted ceils was produced. R.NaeI purified from this E. coli recombinant was subjected to N-terminal peptide sequencing. The sequence obtained was found to be in agreement with that determined for the protein purified from N. aerocolonigenes.

(e) Sequence analysis of Nael R-M Analysis of the DNA sequence of the system revealed two ORFs oriented head to tail (Fig. 4). The larger of these two ORFs, naelM, contains the conserved motifs characteristic of mSC-MTases (Lauster, 1989; Pdsfai et al., 1989). From deletion analysis of the M clone (Van Cott and Wilson, 1988) and the location of the predictive motifs in the sequence, the start of naelM is proposed to be the G T G at nt 1734. This ORF terminates at nt 2975 and codes for a 45-kDa protein. The ORF which precedes naelM codes for naelR as determined by comparison of the predicted aa sequence with the N-terminal sequence obtained from purified R'NaeI. This ORF starts at nt 146 and terminates at nt 1099 coding for a 35-kDa protein, close to the 37 kDa estimated for the purified ENase protein by SDS-PAGE. Comparison of the sequences of NaeI R-M with other R-M systems and other reported proteins was performed using the BLAST network service (Altschul et al., 1990). M.NaeI has significant aa homology (50% identity and 65% similarity) to its isoschizomer M.NgoMI (Fig. 5), even though these two proteins come from taxonomically unrelated organisms (Nocardia and Neisseria). We compared the aa sequences in the postulated target recognition domains (TRDs) among M.NgoMI, M.MspI, M.BsuFI and M.HpalI (all mSC-MTases which recognize the internal tetramer CCGG) to this region in M.Nael. Of these four MTases, only M.NgoMI has significant homology to M-NaeI in this region (Fig. 5). The sequence similarity in the TRDs between M.BsuFI and M.MspI noted by Walter et al. (1990) is not observed for M.NaeI. Interestingly, a small ORF which lies between naelM and naelR on the opposing strand starting at nt 1629 and ending at nt 1129 (marked naelV in Fig. 4) encodes a

A 3 LEWEICAGAGGQALGLGKAGFSHRLAVE£DVNAAATLRI
:.

:llllLIIIIIIIIl:llllt

:1::..I..111

I ..

52

I1.1 52

3 FTSLEICAGAGGQALGLERAGFSHVALIEIEPSACQTLRLNRPDWNVIEG

53 DVADPSVLNPMEHLGVSLL/
II

.::--

:- I : . l l l l l l l l l l l l

IIIIII

El

PIll:

I:

I

I00 AKETDPKAIMLENVRGLLDPKFENYRNHITEQFAKLGYLGQWKLLYAADy

I,.,~

i.::l,p.,

....

I~'11 '1. . . .

Iklll''

149

II" ' : ' ' : 1

AEEWAA{~GSKKHGGA..DLGPTRAKRAWAELGVDANGVAD

202

I.:1

.il.:ll

I:[i:.

I.

:.

II

...[.

.

:

I:.

249 I:.

~00 ~ L ~ Q I ~ P ~ V Q ~ V ~ O ~ . ~ S ~ V W ~ Q V ~ G ~ .

:~

250 APPQPGDKFKVGPKLTCEMVARIQGWRDG;WIFEGRKTSRYRQIGNAFPP

299

.... 249

I..

i:ll.

I.IIII1:.1

:1 I I : t l . . l l l l ! l l t l l

...RRLKTFTGMPRLTVRMTARIQGFPD.DWQFFGKKTPMYRQIGNAFPP

300 PVAEAIGKRIRAALNMEG

lllli:l:.l

317

M.NaeI

312

M.NgoMI

294

li..l.

295 PVAEAVGRQIIKALKKEN

19 11 RARAHASGTYPAPLNAGRSRNMQANRRSGTKPEAALRSALFKLGYRYRKD

• I .I..:1

I1:11-1

.:1

I

I I I

I.

60

:1

:

1 MADVHDKAT ........ RSKNMRAIATRDTAIEKRLASLLTGQGLAFRVQ 61 FLLRLGDGVKVKPDIVFTARKVAVFIDGCFWHVCPDHG.RQPTTNEWYWS :.::..:11:1...

:..:1..11111





:

I.I..

42 109

:1

43 ..... DAS LPGRP D FVVDEYRCVI FTHGC [WHHHHCYL FKV PATRTE FWL 87 II0 PKLRRNVERDRTVNQSLTNAGWRVLRVWE • L:

:IIILII

.I

:

11111

Ilk

..... H E E L Q D A V A A W D T L H :1.1

I

.1:.:

88 EKIGKNVERDRRDISRLQELGWRVLIVWECALRGREKLTD..EALTERLE 155 HLEHGFDTSAE .:

I

165

V.NaeI

146

Vsr

154

[.

135

:.IL:

136 EWICGEGASAQ

Fig. 5. Alignment of homologous aa sequences• (A) Comparison of the deduced aa sequence of M.NaeI to M.NgoMI. Regions underlined correspond to the predictive motifs for a mSC-MTase (Lauster, 1989; Pdsfai et al., 1989). Short regions of striking homology are are boxed and may form the TRD. (B) Alignment of the deduced aa sequence of the Vsrlike ORF in the NaeI system and the E. coil Vsr (Sohail et al., 1990). The optimal alignment of similarity for both comparisons was determined using BestFit in the Genetics Computer Group (Madison, WI, USA) package. Lines between the two sequences indicate aa which are identical. Periods or colons between the sequences indicate a degree of similarity between the aa.

protein with homology to Vsr (Fig. 5), the protein responsible for the very short patch repair process in E. coli (Sohail et al., 1990)• The presence of Vsr homologs has been noted in other R-M systems, including HpalI (Kulakauskas et al., 1994), XorII (Choi and Leach, 1994), BsuRI and AluI (J. Barsomian and G.G.W., unpublished results). No significant homology has been found to any protein in the protein sequence databases contained on the BLAST network service to any of the other unidentified ORFs. The aa sequence comparisons of the R.NaeI have also revealed no significant similarity with other ENase-encoding genes or with any other proteins in the

25 protein sequence databases in the BLAST network service. Since R.NaeI binds to two GCCGGC sequences within the DNA inducing looping of the DNA (Topal et al., 1991) and a similar type of binding and looping of DNA is performed by proteins in the recombinase family, we examined the deduced aa sequence of naelR for the presence of motifs found in integrases and resolvases. Even though the EcoRIl ENase, which like R.NaeI requires an 'activator site', contains the motifs found in integrases (Topal and Conrad, 1993), no motif characteristic of integrases or resolvases was found in the deduced aa sequence of naelR. Work is in progress.in other labs using the clones reported here to elucidate the mechanisms involved in binding and cleavage by R'NaeI.

ACKNOWLEDGEMENTS

The authors wish to thank Dr. J. P6sfai for help with identifying the MTase motifs, Dr. W. Jack, L. Greenough and Dr. G. Janssen for providing unpublished plasmids and strains, and Drs. J.E. Brooks, R. Roberts, I. Schildkraut and T. Bowen for careful reading and helpful criticism of the manuscript,

REFERENCES Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215 (1990) 403 410. Alvarez, M.A., Chater, K.F. and Rodicio, M.R.: Complex transcription of an operon encoding the SalI restriction-modification system of Streptomyces albus G. Mol. Microbiol. 8 (1993) 243-252. Brooks, J.E., Nathan, P.D., Landry, D., Sznyter, L.A., Waite-Rees, P., Ives, C.L., Moran, L.S., Slatko, B.E. and Benner, J.S.: Characterization of the cloned BamHI restriction modification system: its nucleotide sequence, properties of the methylase, and expression in heterologous hosts. Nucleic Acids Res. 19 (1991) 841 850. Choi, S.H. and Leach, J.E.: Identification of the XorI1 methyltransferase gene and a vsr homolog from Xanthomonas oryzae pv. oryzae. Mol. Gen. Genet. 244 (1994) 383 390. Conrad, M. and Topal, M.D.: DNA and spermidine provide a switch mechanism to regulate the activity of restriction enzyme NaeI. Proc. Natl. Acad. Sci. USA 86 (1989) 9707-9711. Krtiger, D.H., Barcak, G.J., Reuter, M. and Smith, H.O.: EcoRII can

be activated to cleave refactory DNA recognition sites. Nucleic Acids Res. 16 (1988) 3997-4008. Kulakauskas, S., Barsomian, J.M., Lubys, A., Roberts, R.J. and Wilson, G.G.: Organization and sequence of the HpaII restrictionmodification system and adjacent genes. Gene 142 (1994) 9-15. Lauster, R.: Evolution of type II DNA methyltransferases. A gene duplication model. J. Mol. Biol. 206 (1989) 313-321. Lydiate, D.J., Malpartida, F. and Hopwood, D.A.: The Streptomyces plasmid SCP2*: its fuctional analysis and development into useful cloning vectors. Gene 35 (1985) 223-235. Nelson, M. and McClelland, M.: Site-specific methylation: effect on DNA modification methyltransferases and restriction endonucleases. Nucleic Acids Res. 19 (1991) 2045 2071. Oller, A.R., Vanden Brock, W., Conrad, M. and Topal, M.D.: Ability of DNA and spermidine to affect the activity of restriction endonucleases from several bacterial species. Biochemistry 30 (1991) 2543-2549. Piekarowicz, A., Yuan, R. and Stein, D.C.: A new method for the rapid identification of genes encoding restriction and modification enzymes. Nucleic Acids Res. 19 (1991) 1831-1835. P6sfai, J., Bhagwat, A.S., P6sfai, G. and Roberts, R.J.: Predictive motifs derived from cytosine methyltransferases. Nucleic Acids Res. 17 (1989) 2421-2435. Roberts, R.J.: Restriction enzymes and their isoschizomers. Nucleic Acids Res. 15 (1987) r189-r217. Rodicio, M.R. and Chater, K.F.: Cloning and expression of the SalI restriction-modification genes of Streptomyces albus G. Mol. Gen. Genet. 213 (1988) 346-353. Sohail, A., Leib, M., Dar, M. and Bhagwat, A.S.: A gene required for very short patch repair in Escherichia coli is adjacent to the DNA cytosine methylase gene. J. Bacteriol. 172 (1990) 4214 4221. Topal, M.D. and Conrad, M.: Changing endonuclease EcoRII Tyr308 to Phe abolishes cleavage but not recognition: possible homology with the Int-family of recombinases. Nucleic Acids Res. 21 (1993) 2599-2603. Topal, M.D., Thresher, R.J., Conrad, M. and Griffith, J.: NaeI endonuclease binding to pBR322 DNA induces looping. Biochemistry 30 (1991) 2006-2010. Van Cott, E.M. and Wilson, G.G.: Cloning the FnuDI, NaeI, NcoI and Xba[ restriction-modification systems. Gene 74 (1988) 55 59. Waite-Rees, P.A., Keating, C,J., Moran, L.S., Slatko, B.E., Hornstra, L.J. and Benner, J.S.: Characterization and expression of the Escherichia coli Mrr restriction system. J. Bacteriol. 173 (199l) 5207-5219. Walter, J., Noyer-Weidner, M. and Trautner, T.A.: The amino acid sequence of the CCGG recognizing DNA methyltransferase M.BsuFI: implications for the analysis of sequence recognition by cytosine DNA methyltransferases. EMBO J. 9 (1990) 1007-1013. Ward, J.M., Janssen, G.R., Kieser, T., Bibb, M.J., Buttner, M.J. and Bibb, M.J.: Construction and characterisation of a series of multicopy promoter-probe plasmid vectors for Streptomyces using the aminoglycoside phosphotransferase gene from Tn5 as indicator. Mol. Gen. Genet. 203 (1986) 468-478. Yang, C.C. and Topal, M.D.: Nonidentical DNA-binding sites of endonuclease NaeI recognize different families of sequences flanking the recognition site. Biochemistry 31 (1992) 9657 9664.