Sequence of a putative glutathione synthetase II gene and flanking regions from Anaplasma centrale

Sequence of a putative glutathione synthetase II gene and flanking regions from Anaplasma centrale

Vol. 182, February No. 3, 1992 BIOCHEMICAL AND BIOPHYSICAL RESEARCH Pages 14, 1992 SEQUENCE COMMUNICATIONS OF A PUTATIVE GLUTATHIONE SYNT...

510KB Sizes 0 Downloads 20 Views

Vol.

182,

February

No.

3, 1992

BIOCHEMICAL

AND

BIOPHYSICAL

RESEARCH

Pages

14, 1992

SEQUENCE

COMMUNICATIONS

OF A PUTATIVE GLUTATHIONE SYNTHETASE FLANKING REGIONS FROM ANAPJXSMA CENTRALE

Jennifer M. Peter&,

Brian P. Dalrymplel*

1040-1046

II GENE AND +

and Wayne K. Jorgensen2

1 Commonwealth Scientific and Industrial Research Organisation, Division Tropical Animal Production, Meiers Rd, Indooroopilly, Qld, 4068, Australia 2 Queensland

Received

Department

January

of

of Primary Industry, Tick Fever Research Centre, Grin&t Rd, Wacol, Qld, 4076, Australia

8, 1992

SUMMARY: The complete nucleotide sequence of a putative glutathione synthetase gene (gsh ZZ) has been determined from Anaplasma centrule. The predicted 308 amino acid protein has a molecular weight of 34,222 and is 32% identical to the enzyme, glutathione synthetase (EC 6.3.2.31, encoded by Escherichia coZi gsh ZZ. The previously proposed ATP-binding site is not highly conserved. The putative glutathione synthetase gene (gsh ZZ) is preceded by an unassigned open reading frame. Downstream of gsh ZZ is the 5’ region of an open reading frame encoding a protein with significant similarity to bacterial D-alanine:D-alanine ligases (ADP forming) (EC 6.3.2.4). The predicted partial amino acid sequence is 33% identical to the amino acid sequence of the protein encoded by the E. coli ddl gene. 0 1992

Academic

Press,

Inc.

Glutathione synthetase (GSH II; EC 6.3.2.3) catalyses the addition of y-Lglutamyl-L-cysteine to glycine in the presence of ATP to produce glutathione. The enzyme has been studied extensively from E. coli B and the gene sequence has been obtained (1). ATP- (2) and substrate-binding (3) sites have been proposed for the enzyme. However, no sequences of the equivalent enzymes from other organisms have been reported. The sequences of additional examples of GSH II would aid in the identification of regions of the protein involved in the binding of co-factors and substrates. Here we report the sequence of an A. centrule gsh ZZ homologue and flanking regions of the genome. + The DNA sequence has been submitted to GenBank M80425. * Author to whom correspondence should be addressed. 0006-291X/92 $1.50 Copyright 0 1992 by Academic Press, Inc. All rights of reproduction in any form reserved.

1040

with

the accession number

Vol.

182,

No.

BIOCHEMICAL

3, 1992

AND

MATERIALS

. strains, Bacterial

BIOPHYSICAL

RESEARCH

COMMUNICATIONS

AND METHODS

Escherichia coli JMlO9 (endAl, recA1, gyrA96, thi, hsdR17(rk-, mk+), reZA1, supE44, h-, A(fac-proAB), [F’, rraD36, proAB, lacIcl, lac ZAM151} was used as the host for all of the plasmids constructed.

Librarv

construction and screening, EcoRI digested Anaplasma centrale genomic DNA was inserted into EcoRI digested plasmid vector pGEM7Zf(+) (Promega Corporation) and transformed into JM109. A recombinant plasmid containing two EcoRI fragments of -8.7 kb and -7 kb was isolated using an A. marginale specific DNA sequence (unpublished). A 2.1 kb ClaI - EcoRI subfragment of the -7 kb EcoRI insert, not including sequences to which the probe hybridized, was subcloned into pGEM7Zf(+) for further analysis. DNA sequenciry and analysis, Nested deletions of the 2.1 kb ClaI-EcoRI fragment were generated using the Erase-a-Base system (Promega Corporation). Deletions were generated in both directions to obtain double-stranded sequence. The resulting subclones were sequenced by the dideoxy chain-termination method. The sequence between nucleotides 1811 and 2146 has only been determined from one strand. The DNA and protein sequences were analysed using the MacVector 3.5 program (International Biotechnologies, Inc.). RESULTS

AND DISCUSSION

One complete and two incomplete open reading frames (ORFs A-C) were identified in the DNA sequence of the 2.1 kb C2aI-EcoRI fragment (Fig. 1). The predicted amino acid sequences of ORFs A-C were used to search the translated GenBank database (release 67). No entries in the database were significantly similar to the amino acid sequence

Identification of ORF B contained

of ORF A. . of a vutatlve A. centrale mh II eerie The amino acid sequence 32% identical amino acids and 28% conservative amino acid

substitutions when aligned with the E. coli GSH II protein (Fig. 2). We propose that ORF B be designated the A. centralegsh 11 gene. On the basis of the amino acid sequence alignments a valine codon, GTG (nucleotides 471-3), was predicted to be the initiation codon for GSH II. This codon was preceded by a sequence with strong homology to ribosome binding sites (GGAGG). The first methionine codon of the open reading frame was not preceded by a sequence with such strong homology to ribosome TATAAT proximity translation

binding sites. A search for promoter sequences of the TTGACA-17ntype, in the region 5’ to the initiation codon, was not successful. The close of the preceding open reading frame and the same direction of suggests

Analvsis

that the A. centralegsh

of the GSH II vrotein

ZZgene may be part of a longer operon. The E. coli GSH II enzyme is

seauences.

inhibited by methotrexate and other inhibitors of DHFR and by 7,&dihydrofolate, a substrate of DHFR (2). A region of amino acid similarity has previously been identified between GSH II and, in particular, the DHFRs of eukaryotes (2). In DHFR the region of similarity with GSH II includes the binding site for the adenosine diphosphate

moiety

of NADPH.

This region

1041

of GSH II has been proposed

to be

Vol.

182,

No.

3, 1992

BIOCHEMICAL

AND

BIOPHYSICAL

Cl*1 10 20 30 40 ATCG*TG*TGCCGCAATCT~~~c~GA~~~~~GG*Gc~G~~~cAGcA~GGc*~*~~~~cc~c~*~~*cc

RESEARCH

50

COMMUNICATIONS

60

70

IleAspAs~lsAlsIleLeuIleLysIleAsnLysGlu~uPheGlnHisGlyIlePheSerSerIleThr> -1 80 90 100 110 120 130 140 GTCCGTGTCAGCGAAGGGAGGGTGCTGTTGI\CGGGCAC~G~~GA~~C~CCCGA~~GA~C*-GCCG~ Vs1ArGVslSeeGluGlyArGVsl~U~u~~lyThrVa1A,pSeeProAspLysArGLsuLysAl.Glu> 150

160

170

180

200

190

210

AGGGTGGCGTGGCAGCAGAGLGTGMGPTAARGAGGTTGTClCC

A~G"s1AlsTfpGl~Gl~SorGluVa1LysGl~"slYs1As~Gl"IlsAls"slAspLys~spG1uVa1Th~> 220 230 240 250 260 270 280 *TGAI\AGI\GGTTGCT~.TAGATAGCGCCATCAGTGCGCA~~~~~GC~~~~~*G~~~~~~~~~~~~~~~~T~ Ls~LysGl~VslAlsIleA~pSe~AlsIleSs~AlsGl~IleLysAlsA~~etVs1AlsHisAlsGlyIls~ 290 300 310 320 330 ULGTCTGTIVVLCTACAGCA~~~CA~~G~~GGAGGGGT LySSer"slAs"Ty~SerIleAsnrhrValGlyGlyVale~G~yIleA~sG~~~e~G~"Lys>

340

350

360

370 380 390 400 410 420 GAGCTGARCTCCGTAATCGGGATTAGT~A~GTT~GGCGT~AGCAGGTAAT~GCTAC~~~CTG GluLs"As"SerYslI~eGly~lsSs=LysA=G"s~L~sG~y"slLP~G~~"slI~s~e=Ty~"s~A=GLe~>

430

440 450 460 470 480 490 500 ~CACAGCAAGTTGCGTCGCTAAAGGGGT~*TCGTGCTC~AGTTGCCTTCCAGATGGATG~GACG LysHisSe~LysLe"ArGArG> VslLe~Ly~VslAlsPheGlMetAspGl~A~p> -S 560 570 510 520 530 540 550 TCGTAGTTGGGCGTGATGTTTCAGTA-CTTGTAGAAGTTTTTT ValValValG1yArGAspYalSerValLysLeuVslGluGluAlaGlnArqArqGlyHisGluVs1PhsPhe> 5so 590 600 610 620 630 640 ACCGGCCAACCAGCCTCGCATTTGTGTGTGGGGAGTTGGTTGCTGAGGCGTTTTCTGTCCGCGTGGGTGCGG TyrALqPr~ThrSerLeuAlaPheYalCysGlyGluLe""alAleGluAlaPheSer"slArq"slGlyAlaZ 650 660 670 680 690 700 710 720 ATTCGTTACATTTTCATGAC-CTAGGCTTCCGCTtGGGGAAGC~GGACA~CTGTTTGTGCGGC-CC AspSerLeuRisPheHisAspLysThrArqLeuPro~uGlyLysLeuAsp~etLeuPheValArGGlnAsn~ 730 740 750 760 770 CCCCTTTCGATATGCGATACGTGACCACAACTTACTTGCTTC ProProPhQAS~etArGTyr"a1ThrTh~Th~TyrLeuLe"G~"A~qLs"AspIleLevPletIleASnAs"l

780

790

800 SlO 820 830 840 850 860 CTAAI\GCGATTAGAGACCACCCGGAG~GCTA~GCC~~~CATTTCCT~G~TTATTCCCCC~CCTT~ ProLysA1sI1eA~qAspHisProGluLysLe"~"P~~~"~e~P~sProLysPhsIleProProThrLe"l 870 880 890 900 910 920 930 TCACCGIUI\GTGTGAGTGAGAT~GTGCTTTCTACGCAGAGTATGGAGA~ATAGTGCTC~GCCACTGTATG I1eThrG1"Se~"s1Se~G1"IleS~AlsP~eTyrllaG~"Ty~GlyAspIle"slLe"LysP~~Le"Ty~> 940 950 960 970 980 990 ATTATGGAGGCMTGGTGTATGC~~~~~GGCTGATGGTTG AspTy~GlyGlyAsnGly"a1CysArqTleCybG1yArqAlaAsp"s,VaGlyAlaIleSerSerAlaHetYa1>

1000

1010 1020 1030 1040 1050 1060 1070 lOS0 AGCGTTACGAAGCACCCTTGGTTG~CAGCAG~T~TGATGACAT~GCAGTGAC~GC~G~AGTGT~G* GluArqTyLGluAlaProLeuValAlaGlnGlnPheI~As~spIlsSe~SerAspLysArqVslVsl~u> 1090 1100 1110 1120 1130 TGGGCGGCAGACCCATCGGGGCGGTA~GGCGA~VLAGTT~ LauClyGlyArqProIleGlyAlaValAirqArqLysValThrAla~etGlyGluI1eArqThrAsnLeuArq> 1160 1170 1190 1190 1200 TGGGAGCGACGCCTGAGGCACCGAGCTTTCGGATAW\GPTGCTGC ValGlyAlaThrP~oG1uA1sTh~Gl"~uSerAspArqGl~~qGluIleCysHisAspVslGly~et~"~

1140

1210

1150

1220

1230 1240 1250 1260 1270 1280 1290 TGTCCAGCGTTGI\TATATTGTTTGCTGGAATTGI\TRTTTTAGGAGGCCACCTTATAG~GTG~CGT~ACCT LeuSeeSerValAspIleLeuPheAlsGlyIleAspIlsLeuGlyGly~is~uIleGluVslAsnVslTh?~ 1300 1310 1320 1330 1340 1350 1360 CACCTTGCGGTATTCTGGAGATTAACCI\AGTGTACGGA~GACC~TAGAGAGGGA~~GCTGGGA~TACTTTG Se~ProCysGlyIleLe"G~"IlsAs~GlnValTyTG1 1370 1380 1390 1400 1410 1420 1430 AI\TATGCATTGTTACA*AGATCA~~~T~CGTG~~C~~~~~~~~~~~~~TGC~~~~~~~~~~~~~ Gl"Ty~Al.Le~Le""isATq~e~P~~> 1460 1470 1480 I.490 1500 1450 GTTTTGCTTiTWL32Z GCTGGCGTGTCTGTGCTGTTCGTGCTATCGCGCGTCTGTTTTA

1440

1510

1520 1530 1540 1550 1560 1570 15SO GCTGGATGG~TTCTATGCCGGTfGGTCTCGCGTGT~~GCGGACGAT~GC~GTCGAT*GC~TGCT MetPro"alGlyIsuAlsCysAs~AlsAs~sp"sl~~Se~IlsAls"sl~~~ OIIC 1610 1630 1640 1650 1590 1600 1620 GTGTGGTGGCTCATCTCCGGAGCGTG~GTATCCCTAGCGGG~GGG~GA-T*GCGGATGC*TTGGGTAG CysGlyGlySerSerProGluArqGluVa1SerLeul\laGlyGlyLysArqIleAlaAspAlsLeUGlyArq~

Fig.. amino

Nucleotide sequence of the 2.1 kb CM-EcoRI fragment and the deduced acid sequences of the putative open reading frames. The strand shown is the

S-3 direction. The deduced amino acid sequenceis given below the corresponding nucleotide sequence. The predicted ribosomal binding and putative promoter elements are indicated by underlining. Putative promoter elementsare additionally indicated by italics. 1042

Vol.

182,

No.

3, 1992

BIOCHEMICAL

AND

lW0 1%90 1900 TAAGTCTCTGGGTATAGACTTCCCGGIGTTTAGCGTGCTGT

BIOPHYSICAL

1910

RESEARCH

1920

COMMUNICATIONS

1940

1930

Ly~~~~Le"GlyIl~A~pPh~P~~Gl"Ph~S~~"~lL~"Th~Ly~Gl~Gl""~1L~"S~~Al~Ly~Gl""~1> 1950 1960 1970 GATGCCCTATCCATTTGTGAT~G~~~ATATGTGGCG

1980

1990

2000

2010

MetProTyrProPheValIleLysPraIleCyaGlyGlySal-ThrIleGlyVa1HiaAlaIlePhaSerArg> *oto 2030 2040 2050 2060 ATCCGAATATCTCGATCTTTCCGTrCATGCCGATG~G~~AG~GG~AG~TG~TTGTGGAGGAGTA~ATA~~

2070

2080

SBeGl"TyrIa"AIpLe"SBr"~alHlaAlaAspAlaLPuGlu~~Le""alGl"Gl"?ytIlePro> 2090

2100 2110 TGGGCAAGAGGTACACACC~~GTATTTCTGGGTCGGGC~

2120

2130

2140

EC0RI

GlyGlnGlu"a1HiaThrAla"alPheLeuGlyArgAlaIleGly~h~"~~Gl"Ph~>

Fig.

1 - Continued

involved in the binding of ATP. The initial alignment of the E. coli and the putative A. cent~ule GSH II sequences was optimized for few gaps. The first half of the proposed ATP-binding site included few amino acid identities, or similar

",.

~$,a,.

H R S F QQQ

308 316

w2. Alignment of the amino acid sequences of the E. coli and putative A. ceniraZe GSH II enzymes. Gaps introduced to align the sequencesare indicated by dashes. Amino acid identities are boxed and shaded, amino acid similarities are boxed. Similar amino acids are grouped as follows, (I, L, V, M, C, F), (F, Y, W), (D, N, E, Q), (K, R, H), (S, T, G, A, P). The double headed arrows indicates the putative ATPbinding site (2). The conserved cysteine residue is indicated ‘*. 1043

Vol.

182,

No.

3, 1992

BIOCHEMICAL

AND

f3B __ S.

faecium

L. E. M.

casei co11

E.

coli

CC

hair Pin w

36 36 35 48

DKILVMGi---;;-Y;G-m;---KLSLPYR"-II~;6;KDFKVEKNAE" GKIMWGJl---RT-YES-FP-I[-R-PLPERTNV-VLTHQEDYQdQGd-W DKPVIMGR---HT-WES-IG---R-PLPQlKNI-ILSSQP-GTDDRVTNV QNLVIMGl---KT-h'FS-IPEKNR-PLKDRINI-VLSRELKEPPRGdHFL

B

GSH II

53

HTRTLNVKQNYEE-WFSl'VGE-QDLPLAD-LSVILMRXD--PPFDTEFI

GSH II

**

**

l

COMMUNICATIONS

hair Pin e

DHFR DHFR DHFR DHFR

musculls

centrale

RESEARCH

K

*

A.

BIOPHYSICAL

*

*

l **

l ******

49 V?+E?+FSVRVGdDSLHFH---DKTRLPL-GKLDMLF'iRQN--PPFDMRYV

76 75 75 90 96 *

****

**

91

&& Alignment of the amino acid sequences of part of the NADPH-binding site of DHFR enzymes with the putative ATP-binding site of GSH II enzymes. All amino acid sequences were taken from GenBank, except for the Streptococcus faecium sequence which was from the NBRF database. Gaps introduced to align the sequences are indicated by dashes. Amino acid identities between any of the other sequences and the A. centrale GSH II sequence are in bold. Amino acid identities and similar substitutions between the two GSH II sequences are indicated by ly. Similar amino acids are grouped as described in Fig. 2. Residues in Lactobacillus cusei DHFR involved in contact with the adenosine diphosphate moiety in NADPH are indicated ‘+’ (4). Secondary structure details determined from the E. coli and L. cusei DHFR proteins are indicated by 8 for P-sheets and a for a-helicesJ5).

substitutions (Fig. 2). Using the amino acid sequence of the NADPH-binding site of DHFR [with particular reference to the amino acids known to interact with the phosphate and adenine ring of NADPH (4)l a second alignment of the two GSH II sequences was determined for the proposed ATP-binding region of the proteins (Fig. 3). In this alignment the E. cdi and putative A. centrule GSH II enzymes share 12 identities and 13 conservative GSH II and mouse DHFR conservative 10 identities

amino acid substitutions. The putative A. centrule amino acid sequences share 8 identities and 12

amino acid substitutions. The E. cob GSH II and mouse DHFR share and 14 conservative amino acid substitutions. Overall the sequences of

the two GSH II enzymes are more similar to each other in this region than either is to the sequences of the DHFRs. The amino acids in DHFR involved in binding to the phosphate and the adenine ring are not highly conserved in the GSH II sequences, but similar substitutions were observed. In this alignment the &-helix (5) may contain an extra turn in GSH II relative to DHFR. However, this region of the GSH II protein does not appear to be highly conserved and the similarities between the amino acid sequences of the two GSH II proteins are less than among the amino acid sequences of bacterial DHFRs. Indeed, when the amino acid sequence of the putative AT&binding site from the A. centrale protein was used to screen the translated GenBank database the amino acid sequences of the DHFR enzymes did not exhibit significant similarity scores (data not shown). If this region of the GSH II enzymes is the AT&binding site, then there may be substantial differences between the E. coli and A. centrule enzymes, in particular the binding of substrates and inhibitors of DHl?R (2). The roles of the four cysteine residues in the E. coli GSH II have been studied by amino acid replacement and thiol-specific reagents (3). None of the residues 1044

Vol.

182. No. 3. 1992

BIOCHEMICAL

AND BIOPHYSICAL

RESEARCH COMMUNICATIONS

A. B. 5. Is. 1. B. J. Pi. 1. B. 5. B. A. E. s. 1. A. 8. J. 8. A. B. s. 1. 1. 1. s. E. A. P. 5. E.

F&& Alignment of the amino acid sequencesof the amino-terminal regions of Dalanine:D-alanine ligases. The E. coli sequenceis from Robinson et al. 1986 (7), the Salmonella typhimurium sequenceis from Daub et al. 1988 (8) and the Enterococcus fuecium sequenceis from Dukta-Malen et nl. 1990 (9). Gaps introduced to align the sequencesare indicated by dashes.Amino acid identities are boxed and shaded.

appeared to be chemically involved

in the active site. Interestingly,

the only thiol

residue to be conserved in location in both the E. coli and the putative A. centrule GSH II enzymes is the residue 289. This residue has been proposed to be close to a substrate-binding

site (3). A clustering of amino acid identities and conservative

substitutions occurs flanking this cysteine. These observations support the proposal that this region of the protein may be important for binding of substrates. Jdentification of a Dutative A. centrale D-alanine:D-alanine lipase Pene, The amino acid sequence of ORF C exhibited significant similarity to a number of bacterial D-alanine:D-alanine ligases (Fig. 4). D-alanine:D-alanine ligases are involved in the synthesis of the peptidoglycans of the bacterial cell wall and are an important target for antibacterial agents (6). The A. centrule amino acid sequence exhibits 33% sequence identity with the equivalent region of the amino acid sequence of the product of the E. coli ddl gene (7). The putative A. centrule protein has fewer identities with the amino acid sequences of the equivalent proteins encoded by genes isolated from Salmonella typhimurium and Enterococcus faecium. The S. typhimurium and E. faecium proteins also contain a single large 1045

Vol.

182,

No.

insertion propose

3, 1992

BIOCHEMICAL

AND

BIOPHYSICAL

RESEARCH

of up to 42 amino acids relative to the A. centrule and E. coli proteins. We that the putative A. cent?& D-alanine:D-alanine ligase gene be designated

ddl. The first methionine of the open reading frame is preceded binding site. No sequence similar to factor-independent terminators promoter promoter

COMMUNICATIONS

was identified in the sequence of the intragenic sequences of the type TTGACA-17n-TATAAT (Fig. 1). Thus, ddl may be transcribed

AcknowledPments: Wright for useful

We would discussions.

independently

by a good ribosome RNA transcription

region, but a search for identified a potential from the gsh 11 gene.

like to thank Drs. P. W. Riddles,

K. R. Gale and I. G.

REFERENCES 1. Gushima, H., Yasuda, S., Soeda, E., Yokota, M., Kondo, M. and Kimura, A (1984) Nucleic Acids Res. 12,9299-9307. 2. Kato, H., Chihara, M., Nishioka, T., Murata, K., Kimura, A. and Oda, J. (1987) J. Biochem. 101, 207-215. 3. Kato, H., Tanaka, T., Nishioka, T., Kimura, A., and Oda, J. (1988) J. Biol. Chem. 263,11646-11651. 4. Filman, D. J., Bolin, J. T., Matthews, D. A. and Kraut, J. (1982) J. Biol. Chem. 257, 13663-13672. 5. Bolin, J. T., Filman, D. J., Matthews, D. A., Hamlin, R C. and Kraut, J. (1982) J. Biol. Chem. 257, 13650-13662. 6. Walsh, C. T. (1989) J. Biol. Chem. 264,2393-2396. 7. Robinson, A. C., Kenan, D. J., Sweeney, J. and Donachie, W. D. (1986) J. Bacterial. 167,809-817. 8. Daub, E., Zawadzke, L. E., Botstein, D. and Walsh, C. T. 0988) Biochem. 27,37013708. 9. Dutka-Malen, S., Molinas, C., Arthur, M. and Courvalin, I’. (1990) Mol. Gen. Genet. 224,364-372.

1046