Vol. 181, No. 2, 1991 December 18. 1991
BIOCHEMICAL
NUCLEOTIDE
SEQUENCE
AND BIOPHYSICAL RESEARCH COMMUNICATIONS Pages 507412
OF THE
CLOSTRIDIUM
LAYINARINASE Vladimir
V.
Zverlov", and
Dmitriy Galina A.
THERMOCELLUY
GENE
A. Lepteva, Velikodvorskajal
Vladimir
I.
Tishkova
1Institute
of Molecular Genetics, USSR Aced. of Sci., Kurchatov sq. 46, 123182 Moscow, USSR 2 National Research Centre of Medical Genetics, USSR Aced. of Med. Sci., Moskvarechie 1, 115478 Moscow, USSR 3 A.N. Bakh Institute of Biochemistry, USSR Acad. of Sci., Leninskiy pr. 33, 117071 Moscow, USSR Received
October
16,
1991
SUMMARY : The sequence presented (1022 bp) shows the Clostridium leminarinese gene (lam.2) end its flanking regions. thermocell urn The gene lam1 comprises an open reading frame of 726 nt, encoding a 242-aa protein predicted M, 27661. The ORF with startswith the translation initiation codon ATG. This ATG codon is preceded at a spacing of 7 bp by a potential ribosome binding site (GGAGGT). A putative signal peptide was identified (the potential cleavage site is between position 27-28 aa). The comparison of the primary protein sequence with other beta-l,d1,4-glucanases showed extensive homology for Bacillus amyloliqefaciens and Bacillus subtilis glucanases (identity 46.7 X: similarity - 57.0 X). 0 1991 Academic Press, Inc.
The
anaerobic a complex
produces of
various of The
many
purpose
complete
This
studies of
nucleotide
isolated was of
from interest
sequence
I To
thermophylic bacterium enzymes hydrolyzing the
polyglucans.
object
define molecular
of
of
Abbreviations relative frame.
the
other
ine with
the was
C. thermocellum F7 DNA sequence of laminarinase
[7].
with
It the
5.
should aa, mass;
6,
since and
be
give
homologies clues
may
help
about
the
addressed
amino acid(s); nt, nucleotide(s);
507
has
71.
study was to the leminerinase
1,3-1,4-glucanases, conserved regions of 1,3-glucanases.
correspondence used: molecular
dete gel.-
of
4,
enzymes
urn
the
3,
present of the
of
thermocell and 1,4-bonds been
2,
sequence
genomic banks to compare
specific, evolution
whom
[I,
class
C. 1,3-
bp,
base ORF,
pair(s); open
M,, reading
0006-291X/91 $1.50 Copyright 0 1991 by Academic Press, Inc. AU rights of reproduction in any form reserved.
Vol.
181, No. 2, 1991
MATERIALS
BIOCHEMICAL
AND BIOPHYSICAL
RESEARCH COMMUNICATIONS
AND METHODS
All subclones were constructed using E. coli TGl, A(Iacpro). thi, supE, hstDS[F' traD36, proA+B+, laclq, lacZbl5]. The cloning vectors used were pTZ19R [8] and M13np19 [9]. The nucleotide sequence of both strands of the 1,l kb C. thermocell urn F7 DNA fragment was determined by the dideoxynucleotide chain-termination method either 1101 with [asS]dATP for the manual procedure or fluorescent dyes for the Applied Biosystem 370A DNA Sequencing system. Sequence data were analysed and compared using the PC Gene computer software (IntelliGenetics Inc./Genofit SA. Switzerland). RESULTS --__.-.
AND ___- DISCUSSIOJ
Nucleotide The
seqence gene. pCU401,
lam1
isolated
in
thermocellum The
complete regions
nt
59,2%
found
at
is Ill* A-T
(hG=-6,9
shown
Although
the
coding
-46
(Fig. it
the
translational
experimentally, be most likely.
spacing
of
7 bp
The
acid
The
ORF extended
predicted
M,
apparent purified The
M, of enzyme deduced
27661. approx.
starts with It bacterial proteins. Ala(27) using the
This
the
Like
of
in
ATG
algorithm
78,5X sequence,
codon
of
its of
C.
lam1
is
vithin but
31 nt was palindrome
initiation site
has
is
by
not
proposed
codon
in
been Fig.
1
at
a
preceded
binding
ORF between
site
(GGAGGT).
(AG=
structure (&=-la,20
nucleotides
173
and
205.
usage
determined
peptide cleavage von
508
similar site
Heijne
protein agrees with
for
E. coli by SDS-PAGE of laminarinase is
of
of
region contains
726 nt, encoding a 242-aa The size of the protein
signal A putative
genes
An A-T region
ribosome
32000,
other was
[71*
and
region
transcription
ATG
codon
gene
content translated
C.
pBR322
lam1
showed a palindromic palindromic structure
and
produced in aa sequence a
the
This
third within
sequence
the
initiation
region
found
A+T
originally a 1,9-kb
carrying site
noncoding
help
a potential
flanking
-4,lO kcal/mol). kcal/mol) was Amino
by
5'-
region. might
DNA duplex.
to 3'
1.
the
was
BemHI of
Fig.
1).
the
determined appears The
in
The preceding
kcal/mol);
destabilizing
the
sequence
immediately within
laminarinase, plasmid at
12, 131, residues.
position
gene
cloned
nucleotide
thermocellum enriched with 140
lam1
encoding a recombinant
DNA insert
flanking
the only
of
the
partially
[7]. shown in Fig. 1. to other secreted
was (1986).
with the
predicted after It should be
Vol.
181, No. 2, 1991
BIOCHEMICAL
AND BIOPHYSICAL
-181 GACMTC'EAAGTITAT
-141 AGGATATMTTMATiXMTTMCZaYTGI
-131 TACilWCITA AAATACAGGG -61 TGTIMl-IlTTACOCCCCCT
-71
'ITIGAEITI ATATATATIT CII-EMTIG ---------------__-__-------------AAAGTATAMM~Tl-ITA~ATl-ITACGGGAGGTATlTTIT ___-30 AITTcA~A~AATFccTnr:Tn;Crrnr;Crrnr;nx;CrAAIT
AIGAAAMCAGGGTA
MET I+m Am Arg. Val Ile
RESEARCH COMMUNICATIONS
'ITACAGGAGC
lTECEAAG -1
-S.D.
60
Ser Leu Leu Met Ala Sex- Leu Leu Leu Val Leu Ser Val Ile 90 120
GITGCTCCTmTACAAAGCGGMGCCGCAA~Gn;GTAMTACCCCTTZTGTTGCA~ Val Ala Pro Phe Tyr Lys Ala Glu Ala Ala Ihr Val Val Asn Thr 150 TITCGTTOGAACTITGACTCC GTACAGn;GAllAMGcGATGGGcGMG?n:cn;TcAA~ Phe Arg Ser Asn Phe Asp Ser Val Gln Trp Lys Lys Arg Trp Ala 210 GIG TIG AA GQZ~.TlC-ACA-@ZIGAC Al-I TO3 MC CZT MA An; ATT Val -----3 Leu lu Ala Phe Thr Gly Asp Ile Ser Asn Gly Lys Met Ile 270 GM TAT GGC GGT 'EA TAT COG TAT AAA AGC @IT GM TAT CGT ACA Glu Tyr Gly Gly Ser Tyr Pro Tyr Lys Ser Gly Glu Tyr Arg Thr 330 TACGGTTATTATGMGTAAGAATG AMGCTGCCAMAACGTAGGAATTG'ITICATCI'TIG Tyr Gly Tyr Tyr Glu Val Arg Met Lys Ala Ala Iys Asn Val Gly
Pro
Phe Val
Ala
Lys
Phe Val
Ser
TIG Leu
ACC CIT Thr Leu
AAA TCA TIT Lys Ser Phe
Ile
Val
Thr 240 GAC AGG Asp Arg 300 TIC GGA Phe Gly 360
Ser Ser Phe
390 TIG Phe
ACT TAT ACA GGA CCT 'KG Thr Tyr Thr Gly Pro Ser
Val 180
GAC MC Asp Asn
MT CCA TGG GAC GM AX! GAT A'IC GAG TIT Asn Pro Trp Asp Glu Ile Asp Ile Glu Phe 450 GGA MG GAC ACA ACT AM G-I-I CAG TIC MC IGG TAC AM MT GGA GIG m OGA MC Gly Lys Asp 'Ihr Thr Lys Val Gln Phe Asn Trp Tyr Lys Asn Gly Val Gly Gly Asn 510 TAT IlG CAC MT CIT GGA TlC GAT GCI 'ICC CAG GAT 'I-IT CAT ACA TAT GCA TIT GM Tyr Leu His Asn Leu Gly Phe Asp Ala Ser Gln Asp Phe His 'Ihr Tyr Gly Phe Glu 570 AQG CQ; CAT TAT ATA GAC I'lC TAT GlT GAC GGC AAA AAA GIT TAT CGT GGA ACC AGG Arg Pro Asp Tyr Ile Asp Phe Tyr Val Asp Gly Lys Lys Val Tyr Arg Gly Thr Arg 630 ATACCTCITACTcoCGGCMAA'lTA-IG An; AATTIGTGGCCAGGAATAGGACIGGATGM Ile Pro Val Thr Pro Gly Lys Ile Met Met Asn Leu Trp Pro Gly Ile Gly Val Asp 690 'EG 'PIG GGA CGT TAC GAC GGA AGA ACT CCI TIG GAG GCG GAG TAC GGA ATA n;r AAA Trp Leu Gly Arg Tyr Asp Gly Arg Thr Pro Leu Gln Ala Glu Tyr Gly Ile Cys Lys 740 CTATCCTM C Gc;rc;TTccGc MGATAATCC TACICCTACTtXYIACGATIG CICCRCTACTCCGACTAAC Leu Ser --800 835 EI'MTlTACCIC&IMGGG&AC.IT~ITCGG03ACGCl-CATGTT --..-
420 l-IA Leu 480 GAG Glu 540 IGG Trp 600 MC Asn 660 Glu 720 ATA Ile 790
Fig.1, Nucleotide sequence of the C. thermocellum lam1 gene and deduced amino asid sequence of laminarinase. The presumptive Shine-Dalgarno ribosoam binding site is hold-faced. An A+T-rich stretch is overlined with the dashed line. Facing arrows indicate a palindromic structure. Aa belonging to the signal sequence are italicized. The numbers refer to the nt piosition, with numbering starting at the nt of the coding sequence. The indicated nucleotides are aligned with the last digit of each number, (The HMHL accession number X58392.)
that
noted 3,-1)
The
cleavage
site
not
does
fully
conform
with
the
(-
rule. A
mHNA
the
of
summary
is unused
presented codons
the in are
codons Table CUC
used 1.
Of
and
CUG
509
in
the
61
codons,
(leu
O/16),
translation 53 CAA
of have (Gin
been O/4),
lam1 used. UGC
Vol.
181, No. 2, 1991
BIOCHEMICAL
AND BIOPHYSICAL RESEARCH COMMUNICATIONS
TABLE
Codon
UUU uucUUA UUG
Codon
utilization
Amino acid
No.'
Codon
8 a 3 9
UCU ucc UCA UCG
Ser
3 0 1 0
ecu ccc CCA' CCC *
Pro
6 2 5 6
ACU= ACC= ACA ACG
Thr
8 1 6 5
GCU' GCC GCA' GCG'
Ala
Phe Leu
cuu cut CUA CUG= AUU AUCAUA AUG
Ile
Met
GUU' GUC GUA* GUG
Val
-Major "Number 'Stop
(CYS
The
aa and
reported
and
Gly 2
12,
131.
The
search
(SwissProt)
for
Codon
12 5 1 0
UGU UGC UGA UGG
Cys
1 1 0 4
CGU' CCC. CGA CGG
Arg
4 0 1 0
5 8 14 3
AGU
Ser
0 1 2 4
CAU CAC CAA CAG=
His
6 2 5 1
AAU AAC" AAA' AAG
Asn
4 3 2 3
GAU GAC
Asp
GAA’ GAG
Glu
End' End"
Gln
Lys
lam1
coding
Amino acid
No.
1 0 0 7
End= Trp
AGC
AGA AGG
Arg
GGU= GGC* GGA GGG
Gly
region
most
(see
6 3 14 0
Fig.1).
with
and
laminarinase
and
%)
aa
of
glucanases
of
with
B.
to
expect
that
a
laminarinase
high
the
level
expression in
obtained.
510
protein
be
Bacillus
found
beta-1,3(
,
and
identity the
region
about
77
%
2). of
and in
that
proteins.
reveals (Fig.
1
genes
subtilis
sequences
amyloliquefaciens
to
could
laminarinase
aa
only
in
homology
Bacillus
particularly
similar cellulase
done
Bacillus
laminarinase
of
is
thermocellum
significant
homology with
1,4-glucanases,
C.
(Gly
are
urn
homology
other
GGG
there
was
convincing
57,O
223
usage
thermocell
homologies
showed
O/l), and
codon C.
any
151
(Ser
abundant
The
No
and
significant
thermocellum
AGU
O/11),
amyloliquefaciens
99
The
thermocellur No.
5 1 2 2
other
bank.
similarity
homology
C.
Amino acid Tyr
sequence
[14,
%;
Codon
UAU UAC. UAA UAG
171). for
Bacillus
between
of
1 4 5 4
are
His(164,
1,4-glucanases 46,7
gene
No.
(Arg Val
laminarinase
Only
C.
and
data
between
allows
CGG
previosly
[ll,
Amino acid
lam1
codons.
CGC
Cis(238)
the
E. coli tRNA species [16]. of codons in the entire
O/l),
O/23).
in
1
B.
C.
thermocellum
subtilis
signal
beta-1,3-
peptide and subtilis
region,
secretion
of will
be
Vol. 181, No. 2, 1991
B.S. B.A. C.T.
B.S. B.A. C.T.
B.S. B.A. C.T.
B.S. B.A. C.T.
B.S. B.A. C.T.
BIOCHEMICAL
AND BIOPHYSICAL
RESEARCH COMMUNICATIONS
30 40 50 1 10 20 MPYLK-RVLLLLVTGLFMSLFAVTATA-SAQTGGSFFDPFNGYNSGFWQKADGYSNGNMF . : : : ::.::::::::: .:.. :::....:: MK-RVLLILVTGLFMSLCGITSSV-SAQTGGSFFE~~~S~~~~L~~~~~~~~~~D~~ : : ::. .:.. : . : : , YKNRVISLLMASLLLVLSVIVAPFYKIEAATVVNT~~V~VFRSNFDSVQWKKRWAK~ 30 40 50 1 10 20 60 70 80 90 100 110 NCTWRANNVSMTSLGEMRLALTSP---AYNKFDCGENRSVQTYGYGLYEVRUKPAKNTGIV .::::::::::::::::::::::::::::::::: : : : : :::::::::::::::::::: NCTWRANNVSMTSLGEMRLALTSP---SYNKFDCGENRSVQTYGYGLYEVRMKPAKNTGIV .. . VSlVLEAFTGDI~N,K~IITIDREYGGSJP-GKS~~Y~~KSF~~~~Y~~~~~~A~~~V~~~ 90 100 110 60 70 80 120 130 140 150 160 170 SSFFTYTGPTDGTPWDEIDIEFLGKD'TTKVBFNYYTNGA-GNHEKIVDLGFDAANAYHTYA ::::::::::.:::::::::::::::::::::::::::: : : :::::::::::: ::: :: SSFFTYTGPTEGTPWDEIDIEFLGKDTTKVQFNYYTNGA-GNHEKFADLGFDAANAYHTYA .. ::::::::: .. ::::::::::::::::::::.: :: :: : SSFFTYTGPSDNNPWDEIDIEFLGKDTTKVgFNWYKNGVGGN-EYLBN~~~~~~~D~~~~G 170 120 130 140 150 160 230 180 190 200 210 220 FDWQPNSIKWYVDGQLKHTATNQIPTTLGKI~MNLWNGTGVDEWLGSYNGVNPLYAHYDW ,... : : :::'.'.::"'.::..:: FDWQPNS;KWYVDG9LKHTATTQ~~AAP~k~t~~t~~~TG;DD~~~~~k~I~~;tled~6 . .. ....... .. ... ~,,R~DY,D,;;,,KKVYRG,R,~,V,,,;,,,~,P~IBS~,,L,R~D~RT,,Q~E~G* 180 190 200 210 220
230
240 VRYTKK . MRYRkk CKILS 240
Fia. 2. iageinyrinase
Amino
acid sequence alignment of C. thermocellum (C.T.), beta-1,3-1,4-glucanases of B. subtilis . . and beta-1,3-1,4-glucanases of B. amyloliqefaciens (B.A.). The character to show that two residues are identical is similar - (.). Similarity criteria were: (:), A,S,T: D,E; N,Q: I,L,M.V; R,K: F,Y,W. Gaps that have been introduced to optimize the alignment are indicated by dashes. The indicated residues are aligned with the last digit of each number.
REFERENCES 1. 2. 3. 4. 5.
Schwarz, W.. Bronnenmeier, K. and Staudenbauer, W.L. (1985) 7, 859-864. Biotechnol. Lett. Schwarz, W.H., Schimming, S. and Staudenbauer, W.L. (1988) Biotechnol. Lett. 10, 225-230. Schwarz, W.H., Schimming. S. and Staudenbauer, W.L. (1988) Microbial. Biotechnol. 29, 25-31. APP~. Hazlewood, G.P., Romaniec. M.P.M., Davidson, K., Grepinet. 0 Beguin, P., Millet, J., Raynaud, 0. and Aubert, J.-P. (;988) FEW Mikrobiol. Lett. 51, 231-236. Bumazkin, B.K., Velikodvorskaja, G.A., Tuka, K., Mogutov M.A., Strongin, A.Ya. (1990) Biochem. Biophys. Research Comm. 167, 1057-1064. 511
(
Vol.
6.
7. 8. 9. 10. 11. 12. 13. 14. 15. 16.
181, No. 2, 1991
BIOCHEMICAL
AND BIOPHYSICAL RESEARCH COMMUNICATIONS
Tuka, K.. Zverlov, V.V., Bumazkin, B.K., Velikodvorskaya, G.A. and Strongin, A.YR. (1990) Biochem. Biophys. Research Comm. 369, 1055-1060. Zverlov, V.V., Velikodvorskaya, G.A. (1990) Biotehnol. Lett. 12, 811-816. Mead, D.A., Szczesna-Skorupa, E and Kemper, B. (1986) Prot. Eng. 1, 67-74. Norrander,J., Kempe, T. and Messing, J. (1983) Gene 26, lOl106. Sanger, F., Nicklen, S. and Coulson A.R. (1977) Proc. Netl. Aced. Sci. USA 74, 5463-5467. BurgschweiSchwarz, W.H., Schimming, S., Rucknagel, K.P., (1988) Gene 63, ger, S.. Kreil, G. and Staudenbeuer, W.L. 23-30. Beguin, P. (1988) J. Grepinet, O., Chebrou, M.-C. and 4582-4588. Bacterial. 170, Yague, E., Beguin, P. and Aubert, J.-P. (1990) Gene 89, 61-67. McConnel, D.J. and Cantwell, B.A. (1984) Nucl. Murphy, N., Acids Res. 12, 5355-5367. (1986) Gene 49, Hofemeister, J., Kurtz, A. and Knowles J. 177-187. Ikemura, T. (1981) J. Mol. Biol. 151, 389-409.
512