Vol.
182,
February
No.
3, 1992
BIOCHEMICAL
AND
BIOPHYSICAL
RESEARCH
Pages
14, 1992
SEQUENCE
COMMUNICATIONS
OF A PUTATIVE GLUTATHIONE SYNTHETASE FLANKING REGIONS FROM ANAPJXSMA CENTRALE
Jennifer M. Peter&,
Brian P. Dalrymplel*
1040-1046
II GENE AND +
and Wayne K. Jorgensen2
1 Commonwealth Scientific and Industrial Research Organisation, Division Tropical Animal Production, Meiers Rd, Indooroopilly, Qld, 4068, Australia 2 Queensland
Received
Department
January
of
of Primary Industry, Tick Fever Research Centre, Grin&t Rd, Wacol, Qld, 4076, Australia
8, 1992
SUMMARY: The complete nucleotide sequence of a putative glutathione synthetase gene (gsh ZZ) has been determined from Anaplasma centrule. The predicted 308 amino acid protein has a molecular weight of 34,222 and is 32% identical to the enzyme, glutathione synthetase (EC 6.3.2.31, encoded by Escherichia coZi gsh ZZ. The previously proposed ATP-binding site is not highly conserved. The putative glutathione synthetase gene (gsh ZZ) is preceded by an unassigned open reading frame. Downstream of gsh ZZ is the 5’ region of an open reading frame encoding a protein with significant similarity to bacterial D-alanine:D-alanine ligases (ADP forming) (EC 6.3.2.4). The predicted partial amino acid sequence is 33% identical to the amino acid sequence of the protein encoded by the E. coli ddl gene. 0 1992
Academic
Press,
Inc.
Glutathione synthetase (GSH II; EC 6.3.2.3) catalyses the addition of y-Lglutamyl-L-cysteine to glycine in the presence of ATP to produce glutathione. The enzyme has been studied extensively from E. coli B and the gene sequence has been obtained (1). ATP- (2) and substrate-binding (3) sites have been proposed for the enzyme. However, no sequences of the equivalent enzymes from other organisms have been reported. The sequences of additional examples of GSH II would aid in the identification of regions of the protein involved in the binding of co-factors and substrates. Here we report the sequence of an A. centrule gsh ZZ homologue and flanking regions of the genome. + The DNA sequence has been submitted to GenBank M80425. * Author to whom correspondence should be addressed. 0006-291X/92 $1.50 Copyright 0 1992 by Academic Press, Inc. All rights of reproduction in any form reserved.
1040
with
the accession number
Vol.
182,
No.
BIOCHEMICAL
3, 1992
AND
MATERIALS
. strains, Bacterial
BIOPHYSICAL
RESEARCH
COMMUNICATIONS
AND METHODS
Escherichia coli JMlO9 (endAl, recA1, gyrA96, thi, hsdR17(rk-, mk+), reZA1, supE44, h-, A(fac-proAB), [F’, rraD36, proAB, lacIcl, lac ZAM151} was used as the host for all of the plasmids constructed.
Librarv
construction and screening, EcoRI digested Anaplasma centrale genomic DNA was inserted into EcoRI digested plasmid vector pGEM7Zf(+) (Promega Corporation) and transformed into JM109. A recombinant plasmid containing two EcoRI fragments of -8.7 kb and -7 kb was isolated using an A. marginale specific DNA sequence (unpublished). A 2.1 kb ClaI - EcoRI subfragment of the -7 kb EcoRI insert, not including sequences to which the probe hybridized, was subcloned into pGEM7Zf(+) for further analysis. DNA sequenciry and analysis, Nested deletions of the 2.1 kb ClaI-EcoRI fragment were generated using the Erase-a-Base system (Promega Corporation). Deletions were generated in both directions to obtain double-stranded sequence. The resulting subclones were sequenced by the dideoxy chain-termination method. The sequence between nucleotides 1811 and 2146 has only been determined from one strand. The DNA and protein sequences were analysed using the MacVector 3.5 program (International Biotechnologies, Inc.). RESULTS
AND DISCUSSION
One complete and two incomplete open reading frames (ORFs A-C) were identified in the DNA sequence of the 2.1 kb C2aI-EcoRI fragment (Fig. 1). The predicted amino acid sequences of ORFs A-C were used to search the translated GenBank database (release 67). No entries in the database were significantly similar to the amino acid sequence
Identification of ORF B contained
of ORF A. . of a vutatlve A. centrale mh II eerie The amino acid sequence 32% identical amino acids and 28% conservative amino acid
substitutions when aligned with the E. coli GSH II protein (Fig. 2). We propose that ORF B be designated the A. centralegsh 11 gene. On the basis of the amino acid sequence alignments a valine codon, GTG (nucleotides 471-3), was predicted to be the initiation codon for GSH II. This codon was preceded by a sequence with strong homology to ribosome binding sites (GGAGG). The first methionine codon of the open reading frame was not preceded by a sequence with such strong homology to ribosome TATAAT proximity translation
binding sites. A search for promoter sequences of the TTGACA-17ntype, in the region 5’ to the initiation codon, was not successful. The close of the preceding open reading frame and the same direction of suggests
Analvsis
that the A. centralegsh
of the GSH II vrotein
ZZgene may be part of a longer operon. The E. coli GSH II enzyme is
seauences.
inhibited by methotrexate and other inhibitors of DHFR and by 7,&dihydrofolate, a substrate of DHFR (2). A region of amino acid similarity has previously been identified between GSH II and, in particular, the DHFRs of eukaryotes (2). In DHFR the region of similarity with GSH II includes the binding site for the adenosine diphosphate
moiety
of NADPH.
This region
1041
of GSH II has been proposed
to be
Vol.
182,
No.
3, 1992
BIOCHEMICAL
AND
BIOPHYSICAL
Cl*1 10 20 30 40 ATCG*TG*TGCCGCAATCT~~~c~GA~~~~~GG*Gc~G~~~cAGcA~GGc*~*~~~~cc~c~*~~*cc
RESEARCH
50
COMMUNICATIONS
60
70
IleAspAs~lsAlsIleLeuIleLysIleAsnLysGlu~uPheGlnHisGlyIlePheSerSerIleThr> -1 80 90 100 110 120 130 140 GTCCGTGTCAGCGAAGGGAGGGTGCTGTTGI\CGGGCAC~G~~GA~~C~CCCGA~~GA~C*-GCCG~ Vs1ArGVslSeeGluGlyArGVsl~U~u~~lyThrVa1A,pSeeProAspLysArGLsuLysAl.Glu> 150
160
170
180
200
190
210
AGGGTGGCGTGGCAGCAGAGLGTGMGPTAARGAGGTTGTClCC
A~G"s1AlsTfpGl~Gl~SorGluVa1LysGl~"slYs1As~Gl"IlsAls"slAspLys~spG1uVa1Th~> 220 230 240 250 260 270 280 *TGAI\AGI\GGTTGCT~.TAGATAGCGCCATCAGTGCGCA~~~~~GC~~~~~*G~~~~~~~~~~~~~~~~T~ Ls~LysGl~VslAlsIleA~pSe~AlsIleSs~AlsGl~IleLysAlsA~~etVs1AlsHisAlsGlyIls~ 290 300 310 320 330 ULGTCTGTIVVLCTACAGCA~~~CA~~G~~GGAGGGGT LySSer"slAs"Ty~SerIleAsnrhrValGlyGlyVale~G~yIleA~sG~~~e~G~"Lys>
340
350
360
370 380 390 400 410 420 GAGCTGARCTCCGTAATCGGGATTAGT~A~GTT~GGCGT~AGCAGGTAAT~GCTAC~~~CTG GluLs"As"SerYslI~eGly~lsSs=LysA=G"s~L~sG~y"slLP~G~~"slI~s~e=Ty~"s~A=GLe~>
430
440 450 460 470 480 490 500 ~CACAGCAAGTTGCGTCGCTAAAGGGGT~*TCGTGCTC~AGTTGCCTTCCAGATGGATG~GACG LysHisSe~LysLe"ArGArG> VslLe~Ly~VslAlsPheGlMetAspGl~A~p> -S 560 570 510 520 530 540 550 TCGTAGTTGGGCGTGATGTTTCAGTA-CTTGTAGAAGTTTTTT ValValValG1yArGAspYalSerValLysLeuVslGluGluAlaGlnArqArqGlyHisGluVs1PhsPhe> 5so 590 600 610 620 630 640 ACCGGCCAACCAGCCTCGCATTTGTGTGTGGGGAGTTGGTTGCTGAGGCGTTTTCTGTCCGCGTGGGTGCGG TyrALqPr~ThrSerLeuAlaPheYalCysGlyGluLe""alAleGluAlaPheSer"slArq"slGlyAlaZ 650 660 670 680 690 700 710 720 ATTCGTTACATTTTCATGAC-CTAGGCTTCCGCTtGGGGAAGC~GGACA~CTGTTTGTGCGGC-CC AspSerLeuRisPheHisAspLysThrArqLeuPro~uGlyLysLeuAsp~etLeuPheValArGGlnAsn~ 730 740 750 760 770 CCCCTTTCGATATGCGATACGTGACCACAACTTACTTGCTTC ProProPhQAS~etArGTyr"a1ThrTh~Th~TyrLeuLe"G~"A~qLs"AspIleLevPletIleASnAs"l
780
790
800 SlO 820 830 840 850 860 CTAAI\GCGATTAGAGACCACCCGGAG~GCTA~GCC~~~CATTTCCT~G~TTATTCCCCC~CCTT~ ProLysA1sI1eA~qAspHisProGluLysLe"~"P~~~"~e~P~sProLysPhsIleProProThrLe"l 870 880 890 900 910 920 930 TCACCGIUI\GTGTGAGTGAGAT~GTGCTTTCTACGCAGAGTATGGAGA~ATAGTGCTC~GCCACTGTATG I1eThrG1"Se~"s1Se~G1"IleS~AlsP~eTyrllaG~"Ty~GlyAspIle"slLe"LysP~~Le"Ty~> 940 950 960 970 980 990 ATTATGGAGGCMTGGTGTATGC~~~~~GGCTGATGGTTG AspTy~GlyGlyAsnGly"a1CysArqTleCybG1yArqAlaAsp"s,VaGlyAlaIleSerSerAlaHetYa1>
1000
1010 1020 1030 1040 1050 1060 1070 lOS0 AGCGTTACGAAGCACCCTTGGTTG~CAGCAG~T~TGATGACAT~GCAGTGAC~GC~G~AGTGT~G* GluArqTyLGluAlaProLeuValAlaGlnGlnPheI~As~spIlsSe~SerAspLysArqVslVsl~u> 1090 1100 1110 1120 1130 TGGGCGGCAGACCCATCGGGGCGGTA~GGCGA~VLAGTT~ LauClyGlyArqProIleGlyAlaValAirqArqLysValThrAla~etGlyGluI1eArqThrAsnLeuArq> 1160 1170 1190 1190 1200 TGGGAGCGACGCCTGAGGCACCGAGCTTTCGGATAW\GPTGCTGC ValGlyAlaThrP~oG1uA1sTh~Gl"~uSerAspArqGl~~qGluIleCysHisAspVslGly~et~"~
1140
1210
1150
1220
1230 1240 1250 1260 1270 1280 1290 TGTCCAGCGTTGI\TATATTGTTTGCTGGAATTGI\TRTTTTAGGAGGCCACCTTATAG~GTG~CGT~ACCT LeuSeeSerValAspIleLeuPheAlsGlyIleAspIlsLeuGlyGly~is~uIleGluVslAsnVslTh?~ 1300 1310 1320 1330 1340 1350 1360 CACCTTGCGGTATTCTGGAGATTAACCI\AGTGTACGGA~GACC~TAGAGAGGGA~~GCTGGGA~TACTTTG Se~ProCysGlyIleLe"G~"IlsAs~GlnValTyTG1 1370 1380 1390 1400 1410 1420 1430 AI\TATGCATTGTTACA*AGATCA~~~T~CGTG~~C~~~~~~~~~~~~~TGC~~~~~~~~~~~~~ Gl"Ty~Al.Le~Le""isATq~e~P~~> 1460 1470 1480 I.490 1500 1450 GTTTTGCTTiTWL32Z GCTGGCGTGTCTGTGCTGTTCGTGCTATCGCGCGTCTGTTTTA
1440
1510
1520 1530 1540 1550 1560 1570 15SO GCTGGATGG~TTCTATGCCGGTfGGTCTCGCGTGT~~GCGGACGAT~GC~GTCGAT*GC~TGCT MetPro"alGlyIsuAlsCysAs~AlsAs~sp"sl~~Se~IlsAls"sl~~~ OIIC 1610 1630 1640 1650 1590 1600 1620 GTGTGGTGGCTCATCTCCGGAGCGTG~GTATCCCTAGCGGG~GGG~GA-T*GCGGATGC*TTGGGTAG CysGlyGlySerSerProGluArqGluVa1SerLeul\laGlyGlyLysArqIleAlaAspAlsLeUGlyArq~
Fig.. amino
Nucleotide sequence of the 2.1 kb CM-EcoRI fragment and the deduced acid sequences of the putative open reading frames. The strand shown is the
S-3 direction. The deduced amino acid sequenceis given below the corresponding nucleotide sequence. The predicted ribosomal binding and putative promoter elements are indicated by underlining. Putative promoter elementsare additionally indicated by italics. 1042
Vol.
182,
No.
3, 1992
BIOCHEMICAL
AND
lW0 1%90 1900 TAAGTCTCTGGGTATAGACTTCCCGGIGTTTAGCGTGCTGT
BIOPHYSICAL
1910
RESEARCH
1920
COMMUNICATIONS
1940
1930
Ly~~~~Le"GlyIl~A~pPh~P~~Gl"Ph~S~~"~lL~"Th~Ly~Gl~Gl""~1L~"S~~Al~Ly~Gl""~1> 1950 1960 1970 GATGCCCTATCCATTTGTGAT~G~~~ATATGTGGCG
1980
1990
2000
2010
MetProTyrProPheValIleLysPraIleCyaGlyGlySal-ThrIleGlyVa1HiaAlaIlePhaSerArg> *oto 2030 2040 2050 2060 ATCCGAATATCTCGATCTTTCCGTrCATGCCGATG~G~~AG~GG~AG~TG~TTGTGGAGGAGTA~ATA~~
2070
2080
SBeGl"TyrIa"AIpLe"SBr"~alHlaAlaAspAlaLPuGlu~~Le""alGl"Gl"?ytIlePro> 2090
2100 2110 TGGGCAAGAGGTACACACC~~GTATTTCTGGGTCGGGC~
2120
2130
2140
EC0RI
GlyGlnGlu"a1HiaThrAla"alPheLeuGlyArgAlaIleGly~h~"~~Gl"Ph~>
Fig.
1 - Continued
involved in the binding of ATP. The initial alignment of the E. coli and the putative A. cent~ule GSH II sequences was optimized for few gaps. The first half of the proposed ATP-binding site included few amino acid identities, or similar
",.
~$,a,.
H R S F QQQ
308 316
w2. Alignment of the amino acid sequences of the E. coli and putative A. ceniraZe GSH II enzymes. Gaps introduced to align the sequencesare indicated by dashes. Amino acid identities are boxed and shaded, amino acid similarities are boxed. Similar amino acids are grouped as follows, (I, L, V, M, C, F), (F, Y, W), (D, N, E, Q), (K, R, H), (S, T, G, A, P). The double headed arrows indicates the putative ATPbinding site (2). The conserved cysteine residue is indicated ‘*. 1043
Vol.
182,
No.
3, 1992
BIOCHEMICAL
AND
f3B __ S.
faecium
L. E. M.
casei co11
E.
coli
CC
hair Pin w
36 36 35 48
DKILVMGi---;;-Y;G-m;---KLSLPYR"-II~;6;KDFKVEKNAE" GKIMWGJl---RT-YES-FP-I[-R-PLPERTNV-VLTHQEDYQdQGd-W DKPVIMGR---HT-WES-IG---R-PLPQlKNI-ILSSQP-GTDDRVTNV QNLVIMGl---KT-h'FS-IPEKNR-PLKDRINI-VLSRELKEPPRGdHFL
B
GSH II
53
HTRTLNVKQNYEE-WFSl'VGE-QDLPLAD-LSVILMRXD--PPFDTEFI
GSH II
**
**
l
COMMUNICATIONS
hair Pin e
DHFR DHFR DHFR DHFR
musculls
centrale
RESEARCH
K
*
A.
BIOPHYSICAL
*
*
l **
l ******
49 V?+E?+FSVRVGdDSLHFH---DKTRLPL-GKLDMLF'iRQN--PPFDMRYV
76 75 75 90 96 *
****
**
91
&& Alignment of the amino acid sequences of part of the NADPH-binding site of DHFR enzymes with the putative ATP-binding site of GSH II enzymes. All amino acid sequences were taken from GenBank, except for the Streptococcus faecium sequence which was from the NBRF database. Gaps introduced to align the sequences are indicated by dashes. Amino acid identities between any of the other sequences and the A. centrale GSH II sequence are in bold. Amino acid identities and similar substitutions between the two GSH II sequences are indicated by ly. Similar amino acids are grouped as described in Fig. 2. Residues in Lactobacillus cusei DHFR involved in contact with the adenosine diphosphate moiety in NADPH are indicated ‘+’ (4). Secondary structure details determined from the E. coli and L. cusei DHFR proteins are indicated by 8 for P-sheets and a for a-helicesJ5).
substitutions (Fig. 2). Using the amino acid sequence of the NADPH-binding site of DHFR [with particular reference to the amino acids known to interact with the phosphate and adenine ring of NADPH (4)l a second alignment of the two GSH II sequences was determined for the proposed ATP-binding region of the proteins (Fig. 3). In this alignment the E. cdi and putative A. centrule GSH II enzymes share 12 identities and 13 conservative GSH II and mouse DHFR conservative 10 identities
amino acid substitutions. The putative A. centrule amino acid sequences share 8 identities and 12
amino acid substitutions. The E. cob GSH II and mouse DHFR share and 14 conservative amino acid substitutions. Overall the sequences of
the two GSH II enzymes are more similar to each other in this region than either is to the sequences of the DHFRs. The amino acids in DHFR involved in binding to the phosphate and the adenine ring are not highly conserved in the GSH II sequences, but similar substitutions were observed. In this alignment the &-helix (5) may contain an extra turn in GSH II relative to DHFR. However, this region of the GSH II protein does not appear to be highly conserved and the similarities between the amino acid sequences of the two GSH II proteins are less than among the amino acid sequences of bacterial DHFRs. Indeed, when the amino acid sequence of the putative AT&binding site from the A. centrale protein was used to screen the translated GenBank database the amino acid sequences of the DHFR enzymes did not exhibit significant similarity scores (data not shown). If this region of the GSH II enzymes is the AT&binding site, then there may be substantial differences between the E. coli and A. centrule enzymes, in particular the binding of substrates and inhibitors of DHl?R (2). The roles of the four cysteine residues in the E. coli GSH II have been studied by amino acid replacement and thiol-specific reagents (3). None of the residues 1044
Vol.
182. No. 3. 1992
BIOCHEMICAL
AND BIOPHYSICAL
RESEARCH COMMUNICATIONS
A. B. 5. Is. 1. B. J. Pi. 1. B. 5. B. A. E. s. 1. A. 8. J. 8. A. B. s. 1. 1. 1. s. E. A. P. 5. E.
F&& Alignment of the amino acid sequencesof the amino-terminal regions of Dalanine:D-alanine ligases. The E. coli sequenceis from Robinson et al. 1986 (7), the Salmonella typhimurium sequenceis from Daub et al. 1988 (8) and the Enterococcus fuecium sequenceis from Dukta-Malen et nl. 1990 (9). Gaps introduced to align the sequencesare indicated by dashes.Amino acid identities are boxed and shaded.
appeared to be chemically involved
in the active site. Interestingly,
the only thiol
residue to be conserved in location in both the E. coli and the putative A. centrule GSH II enzymes is the residue 289. This residue has been proposed to be close to a substrate-binding
site (3). A clustering of amino acid identities and conservative
substitutions occurs flanking this cysteine. These observations support the proposal that this region of the protein may be important for binding of substrates. Jdentification of a Dutative A. centrale D-alanine:D-alanine lipase Pene, The amino acid sequence of ORF C exhibited significant similarity to a number of bacterial D-alanine:D-alanine ligases (Fig. 4). D-alanine:D-alanine ligases are involved in the synthesis of the peptidoglycans of the bacterial cell wall and are an important target for antibacterial agents (6). The A. centrule amino acid sequence exhibits 33% sequence identity with the equivalent region of the amino acid sequence of the product of the E. coli ddl gene (7). The putative A. centrule protein has fewer identities with the amino acid sequences of the equivalent proteins encoded by genes isolated from Salmonella typhimurium and Enterococcus faecium. The S. typhimurium and E. faecium proteins also contain a single large 1045
Vol.
182,
No.
insertion propose
3, 1992
BIOCHEMICAL
AND
BIOPHYSICAL
RESEARCH
of up to 42 amino acids relative to the A. centrule and E. coli proteins. We that the putative A. cent?& D-alanine:D-alanine ligase gene be designated
ddl. The first methionine of the open reading frame is preceded binding site. No sequence similar to factor-independent terminators promoter promoter
COMMUNICATIONS
was identified in the sequence of the intragenic sequences of the type TTGACA-17n-TATAAT (Fig. 1). Thus, ddl may be transcribed
AcknowledPments: Wright for useful
We would discussions.
independently
by a good ribosome RNA transcription
region, but a search for identified a potential from the gsh 11 gene.
like to thank Drs. P. W. Riddles,
K. R. Gale and I. G.
REFERENCES 1. Gushima, H., Yasuda, S., Soeda, E., Yokota, M., Kondo, M. and Kimura, A (1984) Nucleic Acids Res. 12,9299-9307. 2. Kato, H., Chihara, M., Nishioka, T., Murata, K., Kimura, A. and Oda, J. (1987) J. Biochem. 101, 207-215. 3. Kato, H., Tanaka, T., Nishioka, T., Kimura, A., and Oda, J. (1988) J. Biol. Chem. 263,11646-11651. 4. Filman, D. J., Bolin, J. T., Matthews, D. A. and Kraut, J. (1982) J. Biol. Chem. 257, 13663-13672. 5. Bolin, J. T., Filman, D. J., Matthews, D. A., Hamlin, R C. and Kraut, J. (1982) J. Biol. Chem. 257, 13650-13662. 6. Walsh, C. T. (1989) J. Biol. Chem. 264,2393-2396. 7. Robinson, A. C., Kenan, D. J., Sweeney, J. and Donachie, W. D. (1986) J. Bacterial. 167,809-817. 8. Daub, E., Zawadzke, L. E., Botstein, D. and Walsh, C. T. 0988) Biochem. 27,37013708. 9. Dutka-Malen, S., Molinas, C., Arthur, M. and Courvalin, I’. (1990) Mol. Gen. Genet. 224,364-372.
1046