Nucleotide sequence of the Clostridium thermocellum laminarinase gene

Nucleotide sequence of the Clostridium thermocellum laminarinase gene

Vol. 181, No. 2, 1991 December 18. 1991 BIOCHEMICAL NUCLEOTIDE SEQUENCE AND BIOPHYSICAL RESEARCH COMMUNICATIONS Pages 507412 OF THE CLOSTRIDIUM ...

343KB Sizes 2 Downloads 75 Views

Vol. 181, No. 2, 1991 December 18. 1991

BIOCHEMICAL

NUCLEOTIDE

SEQUENCE

AND BIOPHYSICAL RESEARCH COMMUNICATIONS Pages 507412

OF THE

CLOSTRIDIUM

LAYINARINASE Vladimir

V.

Zverlov", and

Dmitriy Galina A.

THERMOCELLUY

GENE

A. Lepteva, Velikodvorskajal

Vladimir

I.

Tishkova

1Institute

of Molecular Genetics, USSR Aced. of Sci., Kurchatov sq. 46, 123182 Moscow, USSR 2 National Research Centre of Medical Genetics, USSR Aced. of Med. Sci., Moskvarechie 1, 115478 Moscow, USSR 3 A.N. Bakh Institute of Biochemistry, USSR Acad. of Sci., Leninskiy pr. 33, 117071 Moscow, USSR Received

October

16,

1991

SUMMARY : The sequence presented (1022 bp) shows the Clostridium leminarinese gene (lam.2) end its flanking regions. thermocell urn The gene lam1 comprises an open reading frame of 726 nt, encoding a 242-aa protein predicted M, 27661. The ORF with startswith the translation initiation codon ATG. This ATG codon is preceded at a spacing of 7 bp by a potential ribosome binding site (GGAGGT). A putative signal peptide was identified (the potential cleavage site is between position 27-28 aa). The comparison of the primary protein sequence with other beta-l,d1,4-glucanases showed extensive homology for Bacillus amyloliqefaciens and Bacillus subtilis glucanases (identity 46.7 X: similarity - 57.0 X). 0 1991 Academic Press, Inc.

The

anaerobic a complex

produces of

various of The

many

purpose

complete

This

studies of

nucleotide

isolated was of

from interest

sequence

I To

thermophylic bacterium enzymes hydrolyzing the

polyglucans.

object

define molecular

of

of

Abbreviations relative frame.

the

other

ine with

the was

C. thermocellum F7 DNA sequence of laminarinase

[7].

with

It the

5.

should aa, mass;

6,

since and

be

give

homologies clues

may

help

about

the

addressed

amino acid(s); nt, nucleotide(s);

507

has

71.

study was to the leminerinase

1,3-1,4-glucanases, conserved regions of 1,3-glucanases.

correspondence used: molecular

dete gel.-

of

4,

enzymes

urn

the

3,

present of the

of

thermocell and 1,4-bonds been

2,

sequence

genomic banks to compare

specific, evolution

whom

[I,

class

C. 1,3-

bp,

base ORF,

pair(s); open

M,, reading

0006-291X/91 $1.50 Copyright 0 1991 by Academic Press, Inc. AU rights of reproduction in any form reserved.

Vol.

181, No. 2, 1991

MATERIALS

BIOCHEMICAL

AND BIOPHYSICAL

RESEARCH COMMUNICATIONS

AND METHODS

All subclones were constructed using E. coli TGl, A(Iacpro). thi, supE, hstDS[F' traD36, proA+B+, laclq, lacZbl5]. The cloning vectors used were pTZ19R [8] and M13np19 [9]. The nucleotide sequence of both strands of the 1,l kb C. thermocell urn F7 DNA fragment was determined by the dideoxynucleotide chain-termination method either 1101 with [asS]dATP for the manual procedure or fluorescent dyes for the Applied Biosystem 370A DNA Sequencing system. Sequence data were analysed and compared using the PC Gene computer software (IntelliGenetics Inc./Genofit SA. Switzerland). RESULTS --__.-.

AND ___- DISCUSSIOJ

Nucleotide The

seqence gene. pCU401,

lam1

isolated

in

thermocellum The

complete regions

nt

59,2%

found

at

is Ill* A-T

(hG=-6,9

shown

Although

the

coding

-46

(Fig. it

the

translational

experimentally, be most likely.

spacing

of

7 bp

The

acid

The

ORF extended

predicted

M,

apparent purified The

M, of enzyme deduced

27661. approx.

starts with It bacterial proteins. Ala(27) using the

This

the

Like

of

in

ATG

algorithm

78,5X sequence,

codon

of

its of

C.

lam1

is

vithin but

31 nt was palindrome

initiation site

has

is

by

not

proposed

codon

in

been Fig.

1

at

a

preceded

binding

ORF between

site

(GGAGGT).

(AG=

structure (&=-la,20

nucleotides

173

and

205.

usage

determined

peptide cleavage von

508

similar site

Heijne

protein agrees with

for

E. coli by SDS-PAGE of laminarinase is

of

of

region contains

726 nt, encoding a 242-aa The size of the protein

signal A putative

genes

An A-T region

ribosome

32000,

other was

[71*

and

region

transcription

ATG

codon

gene

content translated

C.

pBR322

lam1

showed a palindromic palindromic structure

and

produced in aa sequence a

the

This

third within

sequence

the

initiation

region

found

A+T

originally a 1,9-kb

carrying site

noncoding

help

a potential

flanking

-4,lO kcal/mol). kcal/mol) was Amino

by

5'-

region. might

DNA duplex.

to 3'

1.

the

was

BemHI of

Fig.

1).

the

determined appears The

in

The preceding

kcal/mol);

destabilizing

the

sequence

immediately within

laminarinase, plasmid at

12, 131, residues.

position

gene

cloned

nucleotide

thermocellum enriched with 140

lam1

encoding a recombinant

DNA insert

flanking

the only

of

the

partially

[7]. shown in Fig. 1. to other secreted

was (1986).

with the

predicted after It should be

Vol.

181, No. 2, 1991

BIOCHEMICAL

AND BIOPHYSICAL

-181 GACMTC'EAAGTITAT

-141 AGGATATMTTMATiXMTTMCZaYTGI

-131 TACilWCITA AAATACAGGG -61 TGTIMl-IlTTACOCCCCCT

-71

'ITIGAEITI ATATATATIT CII-EMTIG ---------------__-__-------------AAAGTATAMM~Tl-ITA~ATl-ITACGGGAGGTATlTTIT ___-30 AITTcA~A~AATFccTnr:Tn;Crrnr;Crrnr;nx;CrAAIT

AIGAAAMCAGGGTA

MET I+m Am Arg. Val Ile

RESEARCH COMMUNICATIONS

'ITACAGGAGC

lTECEAAG -1

-S.D.

60

Ser Leu Leu Met Ala Sex- Leu Leu Leu Val Leu Ser Val Ile 90 120

GITGCTCCTmTACAAAGCGGMGCCGCAA~Gn;GTAMTACCCCTTZTGTTGCA~ Val Ala Pro Phe Tyr Lys Ala Glu Ala Ala Ihr Val Val Asn Thr 150 TITCGTTOGAACTITGACTCC GTACAGn;GAllAMGcGATGGGcGMG?n:cn;TcAA~ Phe Arg Ser Asn Phe Asp Ser Val Gln Trp Lys Lys Arg Trp Ala 210 GIG TIG AA GQZ~.TlC-ACA-@ZIGAC Al-I TO3 MC CZT MA An; ATT Val -----3 Leu lu Ala Phe Thr Gly Asp Ile Ser Asn Gly Lys Met Ile 270 GM TAT GGC GGT 'EA TAT COG TAT AAA AGC @IT GM TAT CGT ACA Glu Tyr Gly Gly Ser Tyr Pro Tyr Lys Ser Gly Glu Tyr Arg Thr 330 TACGGTTATTATGMGTAAGAATG AMGCTGCCAMAACGTAGGAATTG'ITICATCI'TIG Tyr Gly Tyr Tyr Glu Val Arg Met Lys Ala Ala Iys Asn Val Gly

Pro

Phe Val

Ala

Lys

Phe Val

Ser

TIG Leu

ACC CIT Thr Leu

AAA TCA TIT Lys Ser Phe

Ile

Val

Thr 240 GAC AGG Asp Arg 300 TIC GGA Phe Gly 360

Ser Ser Phe

390 TIG Phe

ACT TAT ACA GGA CCT 'KG Thr Tyr Thr Gly Pro Ser

Val 180

GAC MC Asp Asn

MT CCA TGG GAC GM AX! GAT A'IC GAG TIT Asn Pro Trp Asp Glu Ile Asp Ile Glu Phe 450 GGA MG GAC ACA ACT AM G-I-I CAG TIC MC IGG TAC AM MT GGA GIG m OGA MC Gly Lys Asp 'Ihr Thr Lys Val Gln Phe Asn Trp Tyr Lys Asn Gly Val Gly Gly Asn 510 TAT IlG CAC MT CIT GGA TlC GAT GCI 'ICC CAG GAT 'I-IT CAT ACA TAT GCA TIT GM Tyr Leu His Asn Leu Gly Phe Asp Ala Ser Gln Asp Phe His 'Ihr Tyr Gly Phe Glu 570 AQG CQ; CAT TAT ATA GAC I'lC TAT GlT GAC GGC AAA AAA GIT TAT CGT GGA ACC AGG Arg Pro Asp Tyr Ile Asp Phe Tyr Val Asp Gly Lys Lys Val Tyr Arg Gly Thr Arg 630 ATACCTCITACTcoCGGCMAA'lTA-IG An; AATTIGTGGCCAGGAATAGGACIGGATGM Ile Pro Val Thr Pro Gly Lys Ile Met Met Asn Leu Trp Pro Gly Ile Gly Val Asp 690 'EG 'PIG GGA CGT TAC GAC GGA AGA ACT CCI TIG GAG GCG GAG TAC GGA ATA n;r AAA Trp Leu Gly Arg Tyr Asp Gly Arg Thr Pro Leu Gln Ala Glu Tyr Gly Ile Cys Lys 740 CTATCCTM C Gc;rc;TTccGc MGATAATCC TACICCTACTtXYIACGATIG CICCRCTACTCCGACTAAC Leu Ser --800 835 EI'MTlTACCIC&IMGGG&AC.IT~ITCGG03ACGCl-CATGTT --..-

420 l-IA Leu 480 GAG Glu 540 IGG Trp 600 MC Asn 660 Glu 720 ATA Ile 790

Fig.1, Nucleotide sequence of the C. thermocellum lam1 gene and deduced amino asid sequence of laminarinase. The presumptive Shine-Dalgarno ribosoam binding site is hold-faced. An A+T-rich stretch is overlined with the dashed line. Facing arrows indicate a palindromic structure. Aa belonging to the signal sequence are italicized. The numbers refer to the nt piosition, with numbering starting at the nt of the coding sequence. The indicated nucleotides are aligned with the last digit of each number, (The HMHL accession number X58392.)

that

noted 3,-1)

The

cleavage

site

not

does

fully

conform

with

the

(-

rule. A

mHNA

the

of

summary

is unused

presented codons

the in are

codons Table CUC

used 1.

Of

and

CUG

509

in

the

61

codons,

(leu

O/16),

translation 53 CAA

of have (Gin

been O/4),

lam1 used. UGC

Vol.

181, No. 2, 1991

BIOCHEMICAL

AND BIOPHYSICAL RESEARCH COMMUNICATIONS

TABLE

Codon

UUU uucUUA UUG

Codon

utilization

Amino acid

No.'

Codon

8 a 3 9

UCU ucc UCA UCG

Ser

3 0 1 0

ecu ccc CCA' CCC *

Pro

6 2 5 6

ACU= ACC= ACA ACG

Thr

8 1 6 5

GCU' GCC GCA' GCG'

Ala

Phe Leu

cuu cut CUA CUG= AUU AUCAUA AUG

Ile

Met

GUU' GUC GUA* GUG

Val

-Major "Number 'Stop

(CYS

The

aa and

reported

and

Gly 2

12,

131.

The

search

(SwissProt)

for

Codon

12 5 1 0

UGU UGC UGA UGG

Cys

1 1 0 4

CGU' CCC. CGA CGG

Arg

4 0 1 0

5 8 14 3

AGU

Ser

0 1 2 4

CAU CAC CAA CAG=

His

6 2 5 1

AAU AAC" AAA' AAG

Asn

4 3 2 3

GAU GAC

Asp

GAA’ GAG

Glu

End' End"

Gln

Lys

lam1

coding

Amino acid

No.

1 0 0 7

End= Trp

AGC

AGA AGG

Arg

GGU= GGC* GGA GGG

Gly

region

most

(see

6 3 14 0

Fig.1).

with

and

laminarinase

and

%)

aa

of

glucanases

of

with

B.

to

expect

that

a

laminarinase

high

the

level

expression in

obtained.

510

protein

be

Bacillus

found

beta-1,3(

,

and

identity the

region

about

77

%

2). of

and in

that

proteins.

reveals (Fig.

1

genes

subtilis

sequences

amyloliquefaciens

to

could

laminarinase

aa

only

in

homology

Bacillus

particularly

similar cellulase

done

Bacillus

laminarinase

of

is

thermocellum

significant

homology with

1,4-glucanases,

C.

(Gly

are

urn

homology

other

GGG

there

was

convincing

57,O

223

usage

thermocell

homologies

showed

O/l), and

codon C.

any

151

(Ser

abundant

The

No

and

significant

thermocellum

AGU

O/11),

amyloliquefaciens

99

The

thermocellur No.

5 1 2 2

other

bank.

similarity

homology

C.

Amino acid Tyr

sequence

[14,

%;

Codon

UAU UAC. UAA UAG

171). for

Bacillus

between

of

1 4 5 4

are

His(164,

1,4-glucanases 46,7

gene

No.

(Arg Val

laminarinase

Only

C.

and

data

between

allows

CGG

previosly

[ll,

Amino acid

lam1

codons.

CGC

Cis(238)

the

E. coli tRNA species [16]. of codons in the entire

O/l),

O/23).

in

1

B.

C.

thermocellum

subtilis

signal

beta-1,3-

peptide and subtilis

region,

secretion

of will

be

Vol. 181, No. 2, 1991

B.S. B.A. C.T.

B.S. B.A. C.T.

B.S. B.A. C.T.

B.S. B.A. C.T.

B.S. B.A. C.T.

BIOCHEMICAL

AND BIOPHYSICAL

RESEARCH COMMUNICATIONS

30 40 50 1 10 20 MPYLK-RVLLLLVTGLFMSLFAVTATA-SAQTGGSFFDPFNGYNSGFWQKADGYSNGNMF . : : : ::.::::::::: .:.. :::....:: MK-RVLLILVTGLFMSLCGITSSV-SAQTGGSFFE~~~S~~~~L~~~~~~~~~~D~~ : : ::. .:.. : . : : , YKNRVISLLMASLLLVLSVIVAPFYKIEAATVVNT~~V~VFRSNFDSVQWKKRWAK~ 30 40 50 1 10 20 60 70 80 90 100 110 NCTWRANNVSMTSLGEMRLALTSP---AYNKFDCGENRSVQTYGYGLYEVRUKPAKNTGIV .::::::::::::::::::::::::::::::::: : : : : :::::::::::::::::::: NCTWRANNVSMTSLGEMRLALTSP---SYNKFDCGENRSVQTYGYGLYEVRMKPAKNTGIV .. . VSlVLEAFTGDI~N,K~IITIDREYGGSJP-GKS~~Y~~KSF~~~~Y~~~~~~A~~~V~~~ 90 100 110 60 70 80 120 130 140 150 160 170 SSFFTYTGPTDGTPWDEIDIEFLGKD'TTKVBFNYYTNGA-GNHEKIVDLGFDAANAYHTYA ::::::::::.:::::::::::::::::::::::::::: : : :::::::::::: ::: :: SSFFTYTGPTEGTPWDEIDIEFLGKDTTKVQFNYYTNGA-GNHEKFADLGFDAANAYHTYA .. ::::::::: .. ::::::::::::::::::::.: :: :: : SSFFTYTGPSDNNPWDEIDIEFLGKDTTKVgFNWYKNGVGGN-EYLBN~~~~~~~D~~~~G 170 120 130 140 150 160 230 180 190 200 210 220 FDWQPNSIKWYVDGQLKHTATNQIPTTLGKI~MNLWNGTGVDEWLGSYNGVNPLYAHYDW ,... : : :::'.'.::"'.::..:: FDWQPNS;KWYVDG9LKHTATTQ~~AAP~k~t~~t~~~TG;DD~~~~~k~I~~;tled~6 . .. ....... .. ... ~,,R~DY,D,;;,,KKVYRG,R,~,V,,,;,,,~,P~IBS~,,L,R~D~RT,,Q~E~G* 180 190 200 210 220

230

240 VRYTKK . MRYRkk CKILS 240

Fia. 2. iageinyrinase

Amino

acid sequence alignment of C. thermocellum (C.T.), beta-1,3-1,4-glucanases of B. subtilis . . and beta-1,3-1,4-glucanases of B. amyloliqefaciens (B.A.). The character to show that two residues are identical is similar - (.). Similarity criteria were: (:), A,S,T: D,E; N,Q: I,L,M.V; R,K: F,Y,W. Gaps that have been introduced to optimize the alignment are indicated by dashes. The indicated residues are aligned with the last digit of each number.

REFERENCES 1. 2. 3. 4. 5.

Schwarz, W.. Bronnenmeier, K. and Staudenbauer, W.L. (1985) 7, 859-864. Biotechnol. Lett. Schwarz, W.H., Schimming, S. and Staudenbauer, W.L. (1988) Biotechnol. Lett. 10, 225-230. Schwarz, W.H., Schimming. S. and Staudenbauer, W.L. (1988) Microbial. Biotechnol. 29, 25-31. APP~. Hazlewood, G.P., Romaniec. M.P.M., Davidson, K., Grepinet. 0 Beguin, P., Millet, J., Raynaud, 0. and Aubert, J.-P. (;988) FEW Mikrobiol. Lett. 51, 231-236. Bumazkin, B.K., Velikodvorskaja, G.A., Tuka, K., Mogutov M.A., Strongin, A.Ya. (1990) Biochem. Biophys. Research Comm. 167, 1057-1064. 511

(

Vol.

6.

7. 8. 9. 10. 11. 12. 13. 14. 15. 16.

181, No. 2, 1991

BIOCHEMICAL

AND BIOPHYSICAL RESEARCH COMMUNICATIONS

Tuka, K.. Zverlov, V.V., Bumazkin, B.K., Velikodvorskaya, G.A. and Strongin, A.YR. (1990) Biochem. Biophys. Research Comm. 369, 1055-1060. Zverlov, V.V., Velikodvorskaya, G.A. (1990) Biotehnol. Lett. 12, 811-816. Mead, D.A., Szczesna-Skorupa, E and Kemper, B. (1986) Prot. Eng. 1, 67-74. Norrander,J., Kempe, T. and Messing, J. (1983) Gene 26, lOl106. Sanger, F., Nicklen, S. and Coulson A.R. (1977) Proc. Netl. Aced. Sci. USA 74, 5463-5467. BurgschweiSchwarz, W.H., Schimming, S., Rucknagel, K.P., (1988) Gene 63, ger, S.. Kreil, G. and Staudenbeuer, W.L. 23-30. Beguin, P. (1988) J. Grepinet, O., Chebrou, M.-C. and 4582-4588. Bacterial. 170, Yague, E., Beguin, P. and Aubert, J.-P. (1990) Gene 89, 61-67. McConnel, D.J. and Cantwell, B.A. (1984) Nucl. Murphy, N., Acids Res. 12, 5355-5367. (1986) Gene 49, Hofemeister, J., Kurtz, A. and Knowles J. 177-187. Ikemura, T. (1981) J. Mol. Biol. 151, 389-409.

512