Nucleotide sequence of the celC gene encoding endoglucanase C of Clostridium thermocellum

Nucleotide sequence of the celC gene encoding endoglucanase C of Clostridium thermocellum

23 Gene, 63 (1988) 23-30 Elsevier GEN 02302 Nucleotide sequence (Recombinant DNA; of the celC gene encoding cellulase; endoglucanase endo-1,4-...

765KB Sizes 0 Downloads 90 Views

23

Gene, 63 (1988) 23-30 Elsevier GEN 02302

Nucleotide

sequence

(Recombinant

DNA;

of the celC gene encoding cellulase;

endoglucanase

endo-1,4-p-glucanase;

leader

C

sequence;

of

Clostridium thermocellum

signal peptide;

reiterated

domain;

phage 1 vectors)

Wolfgang H. Schwarz”, Silke Schimming”, Karl P. Riicknagel b, Sylvia Burgschwaiger =, Giinther Kreil’ and Waiter L. Staudenhauer” 0 Institutefor Microbiology, Technical Universityof Munich, D-8000 Miinchen 2 (F.R. G.), b Abteilung Braunitzer, Max Planck Institutefor Biochemistry, D-8033 Martinsried (F.R.G.) Tel. (089)8578-2489, and ’ Institutefor Molecular Biology, Austrian Academy of Science, A-5020 Salzburg (Austria) Tel. (06222)249-6121 Received

25 August

Accepted

16 November

1987

Received

by publisher

1987

1 December

1987

SUMMARY

The nucleotide sequence of the cellulase gene celC, encoding endoglucanase C of Clostridium thermocellum, has been determined. The coding region of 1032 bp was identified by comparison with the N-terminal amino acid (aa) sequence of endoglucanase C purified from Escherichiu coli. The ATG start codon is preceded by an AGGAGG sequence typical of ribosome-binding sites in Gram-positive bacteria. The derived amino acid sequence corresponds to a protein of M, 40439. Amino acid analysis and apparent M, of endoglucanase C are consistent with the amino acid sequence as derived from the DNA sequencing data. A proposed N-terminal 21-aa residue leader (signal) sequence differs from other prokaryotic signal peptides and is non-functional in E. coli. Most of the protein bears no resemblance to the endoglucanases A, B, and D of the same organism. However, a short region of homology between endoglucanases A and C was identified, which is similar to the established active sites of lysozymes and to related sequences of fungal cellulases.

INTRODUCTION

The thermophilic anaerobic bacterium C. thermocellum encodes a variety of extracellular endoglucanases involved in the degradation of cellulose

Correspondenceto: Dr. W.L. Staudenbauer, biology,

Technical

University

(F.R.G.)

Tel. (089)2105-2372.

of Munich,

Institute D-8000

for MicroMtinchen

(Millet et al., 1985; Schwarz et al., 1985; Romaniec et al., 1987). Besides the major endoglucanase A (Cornet et al., 1983; Schwarz et al., 1986), three additional enzymes, designated endoglucanase B (Beguin et al., 1983), C (Petre et al., 1986), and D

1000 bp; nt, nucleotide(s); 2

polyacrylamide fate;

Sm,

designates Abbreviations: coding

aa, amino

for cellulase

0378-l 119/88/$03.50

acid(s);

component;

0 1988 Elsevier

bp, base pairs(s); dd, dideoxy;

ccl, gene

d, deletion;

Science Publishers

kb,

B.V. (Biomedical

Division)

ORF,

gel electrophoresis;

streptomycin; plasmid-carrier

open reading

( ), designates state.

frame;

SDS, sodium prophage

PAGE,

dodecyl state;

sul-

[ 1,

24

(Joliff et al., 1986a), have been characterized by cloning and expressing the corresponding genes (c&A, celB, celC, celD> in E. coli. The nucleotide sequences of the genes ceM (Beguin et al., 1985), celB (Grepinet and Beguin, 1986), and celD (Joliff et al., 1986b) have been determined. Comparison of the deduced amino acid sequences revealed the presence of N-terminal leader (signal) sequences required for protein export, Furthermore, it was found that the C-terminal regions of all three enzymes shared a highly conserved reiterated domain (Joliff et al., 1986b). Recently, we reported on the high-level expression of the C. thermoceilum celA and celC genes in E. co& (Schwarz et al., 1987b). Endoglucanase A was partially exported into the cytoplasmic space, whereas endoglucanase C remained in the cytoplasm. Overexpression of ceiA resulted in decreased cell viability concomitant with the accumulation of endoglucanase A in the membrane fraction. In contrast, overproduced endoglucanase C accumulated as a soluble enzyme in the cytoplasm without adverse effects on the host cell. To elucidate the molecular basis for this difference in enzyme localization, we have determined the nucleotide sequence of the celC gene. It was found that the amino acid sequence of the presumptive signal peptide region at the N terminus differed significantly from the corresponding sequences of other pre-proteins from Gr~-positive bacteria. Moreover, the C-terminal amino acid sequence of endoglucanase C shared no homology with the other C. thermocellum endoglucanases. Only a limited region of sequence homology between endoglucanases A and C was identified, which resembles the proposed active sites of other cellulases.

MATERIALS

AND METHODS

(a) Bacterial strains and plasmids E. coli M72 (~7~53cI857~Hl)

SmR 1acZ Abio-uvrB AtrpEA2 was obtained from E. Remaut. Plasmid pSU1, a derivative of pPLc236 (Remaut et al., 1981) carrying the multiple cloning site of pUC8, was provided by W. Lubitz (Schtiller et al., 1985). Phage aLIC7 was isolated from a gene bank

constructed by cloning a partial Suu3A digest of C. thermocellum DNA in the phage A vector 1059 (Schwarz et al., 1985). (b) Nucleotide sequence analysis The nucleotide sequence was determined according to the procedures of Maxam and Gilbert (1980). Restriction fragments were end-labelled with ]a-32P]ddATP (EcaRI, EcoRV, KpnI) or a suitable [ cr-““P]dNTP (HindIII, XbaI) employing either reverse transcriptase (EcoRI, HindIII, XbaI) or terminal transferase (EcoRV, KpnI). The products of the chemical cleavage reactions were electrophoresed on 0.2 mm thick 6% or 18% polyacrylamide gels. Sequence data were analysed with the Programs for Rapid Biosequence Similarity Analysis of D.J. Lipman and W.J. Wilbur (NIH, Bethesda). (c) Enzyme purification Endoglucanase C was overproduced upon thermal induction of E. coli M72[pWS1257] (Schwarz et al., 1987b). Cell extracts were prepared by freezethaw lysis of Iysozyme-treated cells and E. coli proteins were precipitated by heating for 10 min at 60’ C. The enzyme was then purified to homogeneity by anion exchange chromatography on a Mono Q HR 5/5 column and subsequent gel filtration on a Superose 12 HR lo/30 column employing a Pharmacia FPLC system. (d) Enzyme characterization Enzyme activity was determined by assaying the release of reducing sugars from barley fi-glucan (Schwarz et al., 1986). SDS-PAGE was carried out in 10% polyacrylamide slab gels in the presence of 0.1% SDS. Staining for /I-glucanase activity was performed in polyacrylamide gels containing 0. I % barley /3-glucan (Schwarz et al., 1987a). For determination of aa composition, purified endoglucanase C was hydrolyzed in 5.7 N HCl at 110°C for 20 h and analyzed on a Biotronic LC 5000 analyzer. Tr~tophan was determined after addition of thioglycolic acid. Cysteine and methionine were estimated after oxidation with performic acid.

25

RESULTS

AND

DISCUSSION

between the mRNA and the 3’-end of most bacterial

sequence of the celC gene

interaction

16s rRNA (Van Knippenberg (a) Nucleotide

of the orientation

ment suggesting otide sequence,

of a clostridial

which can function

E. coli. Enhanced

tion of the 2 p,_ promoter plasmid

indicated

that

C. thermocellum. The ORF ends with the stop codon UGA at nt position + 1030. Another codon (UAA) follows at nt position

nucle-

as a promoter

gene expression

following

present

The

in

occurs

lated sequence,

E.r

pWS1251 ;

c* .’ .*

.’

.’ **

Pv2

0 I' $j

/" #'

H X6,-/’

E5

.*

preceding

Pv2

P

3’-.

2

X

K Sm

‘\

‘\

‘\

‘\

,

i

0.5

Fig. 1. Subcloning was subcloned

and sequencing

into plasmid

strategy

pBR322

\

\

i I



+ I

'\

‘-,E

'\

from right to left. The arrows

indicated

by vertical

B, BarnHI; and Pv2/Sm

E, EcoRI; indicate

a PvuII cloning

I

represent

the direction

H, HindIII;

hybrid sites generated

-\ '\ '\ ‘\

of pi_-promoted

site (Pv2/Sm),

respectively.

fragment

g

‘x.&K E

pL

ii I

;;

kb

of phage I LIC7 DNA (Schwarz

expressing

transcription

and extent of sequence

a Sau3A

1.5

/

endoglucanase indicated

determinations,

C activity

et al., 1985) was blunt-end

that mRNA synthesis

from celC

with the sites of 3’-end labelling

by using a set of runs in 6% and 18% polyacrylamide into a BarnHI

gels.

Sm, SmaI; X, XbaI. Symbols

B/S

cloning site (B/S) and a SmaI fragment

into

K, KpnI; P, PstI; Pvl, PvuI; Pv2, PvuII; S, Sau3A;

by inserting

kb

I

for the celC gene. A 4.0-kb EcoRI fragment

lines. More than 300 nt were read from one experiment E5, EcoRV;

j i -

1

kb

H

G '\

E5

yielding pWS 1251. A 1.6-kb PvuII fragment

ligated into the .SmaI site of pSU1 to yield pWS 1257. Induction proceeds

of upstream

‘.

Pvl

pWS1257 j ?

the trans-

15

‘\

,1**’

;

is

is 74%

regulatory regions appears to be characteristic for genes of bacilli and related bacteria (Moran et al., 1982). No typical promoter sequences could be identified in this region by sequence inspection. Biochemical analysis of ceZC mRNA would therefore be required to localize the transcription start point(s).

Pv2

I'

stop

of the celC gene

The A + T content

high A + T content

r-

/* .*

in-frame + 1063.

but 62% within the coding region.

An exceptionally

+ 3. These codons are preceded by a strong potential ribosome-binding site allowing perfect base pairing

0

region

within the 200 nt immediately

from

right to left on the physical map shown in Fig. 1. Fig. 2 shows the complete nucleotide sequence of the celC structural gene along with its flanking regions. There is only one long ORF in this strand and its complement which begins at nt position -67. The first start codons in this ORF are the AUG codon at nt position + 1 and the GUG at nt position

XLIC7

5’-noncoding

enriched in AT residues.

induc-

on the vector

transcription

(-78.7

the celA and celB genes of

positive genes including

of the cloned DNA frag-

the presence

kcal/mol

kJ/mol) can be calculated (Tinoco et al., 1973). This value is in accord with those found for other Gram-

The ceZC gene was localized within a 1.6-kb PvuII fragment of genomic C. chermocellum DNA (Fig. 1). Constitutive synthesis of endoglucanase C was independent

et al., 1984). For this

a free energy of -18.8

26

-191 -181 -161 -151 -171 CAATAAAAAC TGAACACAGA AGAAGAAAAC GTGATATAAT TAAATTAGAA CGAACGCGCG -141TACATTi$AATAACCCAG -121TGTTAAATGG -111TTTCAG~~~~ -91 -81 -71 -61 -51 -41 -31 -21 -11 -1 CGATTCCAAA TGTTTATATC CAATTTACAT TTAAAAACAT ACAAAACATC AAAAGTATTT AATACCAATA TTTAAAACAC AATATTTCAG GAGGAAAAAA 15 3C 45 6C 75 90 ATGGTGAGTTTTAAAGCAGGTATAAATTTACCCCGATGGATATCACAATATCAAGTTTTCAGCAAAGAGCATTTCGATACATTCATTACG METValSeKPheLvsAlaGlvIleAsnLeuGlrr GlvTm IleSerGinTyrGinValPheSerLysGluHisPhe Asp Thr Phe IIeThr 1c5 12c 135 150 165 180 GAGAAGGACATTCIAACTATTGCAGAAGCAGGGTTTGACCATGTCAGACTGCCTTTTCATTATCCAATTATCGAGTCTCATGAC AATGTG GluLysAspIleGluThrIleAlaGIuAlaGlyPheAspHisVaIArgLeuProPheAspTyrProIleIleGluSerAspAspAsnVal

195 21c 225 240 270 255 CGACIATATAAAGAACATGGGCTTTCTTATATTGACCCCTGCCTTGAGTGGTGTAAAAA11 TACAATTTGGGGCTTGTGTTGCATATGCAT GlyGluTyrLysGIuAspGlyLeuSerTyrIleAspArgCysLeuGluTrpCysLysLysTyrAm LeuGlyLeuValLeuAspNetEis 300 315 360 285 33C 345 GCTCCCGGGTACCCCTTTCAAGATTTTAAGACAAGCACCTTGTTTGAAGATCCCAACCAGCAAAAGAGATTTGTTGAC ATATGGAGA HisAlaProGlyTyrArgPheGinAspPheLysThrSerThrLeuPheGluAspProAm GinGinLysArgPheValAspIleTrpArg

CAC

450 39c 420 435 375 405 TTTTTACCCAAGCGTTACATAAATGAACGGGAACATATTGCCTTTGAACTGTTAAATGAAGTTGTTGAGCCTGACACTACCCCCTGGAAC PheLeuAlaLysArgTyrIleAsnGluArgGluHisIleAlaPheGluLeuLeuAm GluValValGluProAspSerThrArgTrpAsn 540 480 51c 525 465 495 TTGATGCTTGAGTATATAAAAGCAATCAGGGAAATTCATTCCACCATGTGGCTTTACATTCCCCCCAATAACTATAACAGTCCTCAT LysLeuMetLeuGluTyrIleLysAlaIleArgGluIleAspSerThrKetTrpLeuTyrIleGlyGlyAsnAm TyrAsnSerProAsp

AAG

630 57c 600 615 555 585 GAGCTTAAAAACCTTGCACATATTCATCATCATTACATAGTTTACAATTTCCATTTTTACAATCCTTTTTTCTTTACGCATCAGAAAGCC GluteuLysAm LeuAlaAspIleAspAspAspTyrIleValTyrAsnPheEisPheTyrAsnProPhePhePheThrEisGinLysAla 720 690 705 660 645 675 TGGTCGGAAACTCCCATGCCCTACAACAGGACTGTAAAATATCCGCGACAATATGAGCGAATTGAA GAG TTTGTGAl AATAATCCT HisTrpSerGluSerAlaMetAIaTyrAm ArgThrValLysTyrProGlyGinTyrGluGlyIleGluGluPheValLysAsnAm Pro CAC

810 780 795 75c 735 765 AAGTACACTTTTATGATGGAATTGAATAACCTGAAGCTGAATAAAGAGCTTTTGCGCAI GATTTAAAA CCAGCAATTGAGTTCAGGGAA LysTyrSerPheMetNetGluLeuAsnAsnLeuLysLeuAsnLysGluLeuLeuArgLysAspLeuLysProAlaIleGluPheArgGlu 900 885 87C 840 825 855 AAGAAAAAATGCAAACTATATTGCCCCGAGTTTGGCGTAATTCCCATTGCTGACTTGGAGTCTAGGATAAAATGGCATGAACATTATATA LysLysLysCysLysLeuTyrCysGlyGluPheGlyValIleAlaIleAlaAspLeuGluSerArgIleLysTrpHisGluAspTyrIle 990 960 975 930 915 945 AGTCTTCT11 GAGGAGTATCATATCCCCGGCCCCGTGTGGAAC TACAAAAAAATGCATTTTGAAATTTATAATGAGCATAGAAA11 CCTGTC SerLeuLeuGluGluTyrAspIleGlyGlyAIaValTrpAsnTyrLysLysNetAspPheGIuIleTyrAm GluAspArgLysProVal 1CSC 1060 1070 1050 1005 1040 1020 TCGCIAGAATTGGTAAATATACTGCCCAGAAGAAAA ACTTGATTATTAAA ACTACATTTT TGCAAAAGTT TGTAATTTAA AAAATACAAC SerGinGluLeuValAsnIleLeuAlaArgArgLysThr*** Fig. 2. Nucleotide Shine-Dalgarno determined

sequence

by automated

of the coding

of the C. thermocellum

(SD) ribosome-binding sequence.

gas-phase The indicated

ceZC gene and deduced

amino acid sequence

site before the start codon is underlined sequencing. nucleotides

The numbers are aligned

of endoglucanase

C. The presumptive

with a double line. The underlined

refer to the nucleotide

position,

with numbering

with the last digit of each number.

amino acids were

starting

at the first nt

27

(b) Protein structure and codon usage The start codon AUG and the reading frame were verified by determining the N-terminal amino acid sequence, which is in full agreement with the amino acid sequence deduced from the nucleotide sequence. While the N-formyl methionine residue of endoglucanase C is deformylated in E. co&,removal of N-terminal amino acids did not take place. The large ORF encodes a protein of 343 aa residues with a predicted A4, of 40439. The size of the protein agrees well with the apparent M, of approx. 38000 determined for the purified enzyme produced in E. coli by SDS-PAGE (Petre et al., 1986; Schwarz et al., 1987b). Furthermore, the amino acid composition determined experimentally is in good agreement with the composition deduced from the nucleotide sequence (Table I). These data taken together indicate that the complete gene has been cloned and that the enzyme is expressed in E. coli without further proteolytic processing.

TABLE

I

Amino

acid composition

Amino

Determined

acid

analysis

of endogluca~ase by amino

acid

of the protein a

C Deduced

Ala

15.33

Arg Asx’

16.44

16

47.46

47

CYS Glx’

4.84

4

41.49

41

Gly His

16.38

16

7.84

9

Ile

22.56

26

28.09

28

LYs Met

29.17

29

7.13

8

Phe

21.47

22

Pro

11.06

11

Ser

15.51

15

Thr

9.94

10

Trp

5.74

8

Tyr Val

20.80

22

14.08

15

Total

335.93

343

a Analysis

was carried

’ Sequence

section

sequence h

16

Leu

METHODS,

from the

nucleotide

out as described

in MATERIALS

d.

data shown in Fig. 2.

’ Asx = Asn + Asp; Glx = Gln + Glu (see Table II).

AND

A summary of the codons used in the translation of celC mRNA is presented in Table II. The codon usage is similar to that reported previously for other C. t~e~moceilum cellulase genes (Grepinet and Beguin, 1986; Joliff et al., 1986). There appears to be a bias for selection of codons ending in A or U. This preference reflects the comparatively low G + C content (38%) of C. t~e~moce~lumDNA. Clostridial codon usage is therefore not optimal for the E. coli translation machinery and might limit the efficiency of gene expression in the heterologous host (Garnier and Cole, 1986). (c) Signal sequence and protein l~alization Evidence has been provided that endoglucanase C is secreted into the culture medium of C. thermoceflum grown on cellulose (Petre et al., 1986). A common feature of bacterial protein export is the requirement for an N-terminal leader (signal) sequence, which is removed upon translocation of the pre-protein across the plasma membrane. Leader peptides are 15-40 aa residues in length with a positively charged N-terminal region, an apolar hydrophobic core and characteristic sequences adjacent to the cleavage site (Kreil, 1981; Oliver, 1985). Some features of the N-terminal amino acid sequence of endogiucanase C, such as the presence of a positively charged lysine near the N te~inus followed by a somewhat hydrophobic domain, would be in accord with the presence of a leader peptide. The N end of mature endoglucanase C is not known; application of the algorithm developed by von Heijne (1983; 1986) reveals the presence of a potential cleavage site for leader peptidase after serine-21. However, this putative signal peptide of endoglucanase C is quite atypical in several respects. The leader peptides of celA, ceIB, and celD (Fig. 3) as well as those of several other preproteins of Gram-positive bacteria (Pugsley and Schwartz, 1985; MacKay et al., 1986; O’Neill et al., 1986) are longer and more basic at the N terminus than the one for celC. Moreover, the core region of the celC leader contains several neutral-polar amino acids (asparagine, glut~ine) and lacks a cluster of strongly hydrophobic amino acid residues. Lastly, almost all bacterial signal peptides terminate with alanine and prokaryotic leader peptidase only rarely cleaves after

28

TABLE II Codon utilization in the celC gene of Closttidium thermocellum Codon

Amino acid

No. b

Codon

Amino acid

No.”

Codon

Amino acid

No.”

Codon

Amino acid

uuu

Phe

16 6 4 8

ucu ucc UCA UCG

Ser

3 1

UAU UAC a UAA UAG

TYr

12 10 0 0

UGU UGC UGA UGG

CYs

1

EndC Trp

3 1 8

Pro

His

7 2 6 2

CGU” CGC” CGA CGG

Arg

14 8 21 8

AGU AGC AGA AGG

Ser

18 7 I7 16

GGU a GGC = GGA GGG

UUC” UUA UUG

Leu

cuu

Leu

1 2

9

ecu

cut

0

ccc

I

CUA CUG”

2 5

CCA a CCG”

2 2

CAU CAC CAA CAG”

14 3 9 8

ACU” ACC a ACA ACG

Thr

3 3 2 2

AAU AAC” AAA a AAG

Asn

5 2 3 5

GCU = GCC GCA a GCGa

Ala

2 5 6 3

GAU GAC GAA” GAG

Asp

AUU AUC” AUA AUG GUU” GUC GUA” GUG

Ile

Met Val

6

EndC End”

Gln

Lys

GlU

No.~

1

3 0 2

Arg

GUY

6 2 6 4 1 5 4 6

a Major E. cdi tRNA species (Ikemura, 1981). b Number of codons in the entire celC coding region (see Fig. 2). f Stop dons.

serine residues. These unusual features may be the reason why the celC leader sequence does not function in E. coIi and the protein thus remains in the cytoplasm (Schwarz et al., 1987b).

CelB

!&iii

MVSFfK

CelC

CelD

MSihTLk&&

comparison

to fuuctionaHy

related

Alignment of the deduced endoglucanase C sequence with the amino acid sequences of endo-

VGVVLLILAVLGVYMLAMPANTVSA

M+KN&

CelA

(d) Sequence enzymes

FLVLLIALIMIATLLVVPGVQTSA

AGINLGGWISQYQVFS

VLSLLIAVVFLSLTGVFPSGLI;T;VSA

Fig. 3. Comparison of signal peptides of C. f~e~~ceZZ~~ endoglucanases. The signal peptidase cleavage site of endogiucanase A (CelA) has been determined by N-terminal sequencing of the extracellular protein. Cleavage sites (shown as gaps) for endoglucanases B, C, and D (CelB, C, D) are those predicted by cleavage-site recognition rules. The symbols ( + ) and ( - ) denote basic and acidic amino acid residues, respectively.

29

CelC I"*;E" 9

CelA

HEWL Fig. 4. Alignment sequences

of homologous

T5D?

ESNFNTQATNRNTDGS regions

are boxed. The hen egg-white

of endoglucanase lysozyme

(HEWL)

C (CelC) and endoglucanase sequence

A (CelA). Amino

shows the region of the established

acids common

to both

active site residues

Glu-35

and Asp-52.

glucanases A, B, and D yielded no regions of extensive homologies. Furthermore, comparison with the sequences of various cellulolytic enzymes (Knowles et al., 1987) revealed no significant similarities. In particular, endoglucanase C is lacking the C-terminal direct repeat of 24 aa shared by the other three C. thermocellum endoglucanases. This conserved sequence has been implicated in the binding of these enzymes to two adjacent glucose residues of the cellulose substrates (Beguin et al., 1985). It should be pointed out that endoglucanase C has an unusual substrate range and displays features common to cellobiohydrolases by being able to cleave the agluconic bond of aryl-B-glucosides. Comparison of endoglucanase C with endoglucanase A revealed a short region of similarity (Fig. 4). Interestingly, this homologous sequence includes the motif Glu-Xaa,-Asn-Xaa,,,-Asp (where Xaa is any amino acid and subscript 5/7 indicates a number of 5 or 7 aa residues, respectively). A similar arrangement of amino acids is present in the catalytic site of lysozymes (Canfield, 1963). Limited sequence homology to the active site of lysozymes has been found in several fungal endoglucanases and cellobiohydrolases (Yaguchi et al., 1983; Paice et al., 1984; Teeri et al., 1987). These findings support the notion that cellulases might act like lysozymes by an acid catalysis mechanism (Clarke and Yaguchi, 1985).

ACKNOWLEDGEMENTS

We are grateful to Dr. P. Beguin for communicating his unpublished sequence data. We thank Drs. W. Lubitz and E. Remaut for providing us with bacterial strains and plasmids. This work was supported by grants from the Deutsche Forschungsgemeinschaft (SFB 145), from the Fonds der Chemischen Industrie, and from the Dr. Otto Rohm Gedachtnisstiftung GmbH.

REFERENCES Beguin, P., Cornet,

P. and Aubert,

gene of the thermophilic J. Bacterial.

P. and Millet, J.: Identification

encoded

cellum. Biochimie Canfield,

of a cellulase

Clostridium thermocellum.

162 (1985) 102-105.

Beguin, P., Cornet, glucanase

J.P.: Sequence

bacterium

of the endo-

by the celB gene of Clostidium

thermo-

65 (1983) 495-500.

R.E.: The amino-acid

sequence

of egg-white

lysozyme.

J. Biol. Chem. 238 (1963) 2698-2707. Clarke, A.J. and Yaguchi, function

of endo+

M.: The role of carboxyl 1,4-glucanase

mune. Eur. J. Biochem. Garnier,

from

in the com-

149 (1985) 233-238.

T. and Cole, S.T.: Characterization

genie plasmid

groups

from Schizophyilum

Clostidium

of a bacteriocino-

perjtingens

and molecular

genetic analysis ofthe bacteriocin-encoding gene.

J. Bacterial.

168 (1986) 1189-1196. Grepinet,

0. and Beguin,

P.: Sequence

of the cellulase

Clostridium thermocellum coding for endoglucanase

gene of B. Nucl.

Acids Res. 14 (1986) 1791-1799. Ikemura,

T.: Correlation

between the abundance

fer RNA and the occurrence protein

of E. coli trans-

of the respective

genes. J. Mol. Biol. 151 (1981) 389-409.

codons

in its

30

JolifT,G., Beguin, P., Juy, M., Millet, J., Ryter, A., Poljak, R. and Aubert J.P.: Isolation, crystallization and properties of a new cehulase of Cfosttidiurn the~ocel~~rn overproduced in ~sche~ch~a coli. Bio~echnolo~ 4 (1986a) 896-900. Joliff, G., Beguin, P. and Aubert, J.P.: Nueleotide sequence of the cellulase gene celD encoding endoglucanase D of Clostridtum thermocellum. Nucl. Acids Res. 14 (1986b) 8605-8613. Knowles, J., Lehtovaara, P. and Teeri, T.: Cellulase families and their genes. Trends Biotechnol. 5 (1987) 255-261. Kreil, G.: Transfer of proteins across membranes. Annu. Rev. Biochem. 50 (1981) 317-348. MacKay, R.M., Lo, A., Willick, G., Zuker, M., Baird, S., Dove, M., Moranclli, F., Seligy, V.: Structure of a Bacillus subtilis endo-/3-1,4-glucanase gene. Nucl. Acids Res. 14 (1986) 9159-9170. Maxam, A.M. and Gilbert, W.: Sequencing end-labeled DNA with base-specific chemical cleavages. Methods Enzymol. 65 (1980) 499-560. Moran, C.P., Lang, N., LeGrice, S.F.J., Lee, G., Stephens, M., Sonenshein, A.L., Pero, J. and Losick, R.: Nucleotide sequences that signal the initiation oftranscription and translation in Bacillus subtiiis. Mol. Gen. Genet. 186 (1982) 339-346. O’Neill, G.P., Warren, R.A.J., Kilburn, D.G. and Miller, R.C.: Secretion of a Cellulomonasj%ni exoglucanase by Escherichia coli. Gene 44 (1986) 331-336.

Paice, M.G., Desrochers, M., Rho, D., Jurasek, L., Rollin, CF., De Miguel, E. and Yagnchi, M.: Two forms of endoglucanase from the basi~omycete Sch~zoph~llum commune and their relationship to other &1,4glycoside hydrolases. Bio/Technology 2 (1984) 535-539. Petre. D., Millet, J., Longin, R., Beguin, P., Girard, H., Aubert, J.P.: Purification and properties of the endoglucanase C of Ciostridium thermocellum produced in Escherichia co/i. Biochimie 68 (1986) 687-695. Pugsley, A.P. and Schwartz, M.: Export and secretion of proteins by bacteria. FEMS Microbial. Rev. 32 (1985) 3-38. Remaut, E., Stanssens, P. and Fiers, W.: Plasmid vectors for high-efficiency expression controlled by the p,_ promoter of coliphage lambda. Gene 15 (1981) 81-93. Romaniec, M.P.M., Clarke, N.G. and Hazlewood, G.P.: Molecular cloning of C~ost~di~rnthermo~elIum DNA and the expression of further novel endo-B-1 ,4-glucanase genes in Escherichia coli. J. Gen. Microbial. 133 (1987) 1297-1307.

Schtiller, A., Harkness, R.E., Rilther, U. and Lubitz, W.: Deletion of C-terminal amino acid codons of phiX174 gene E: effect on its lysis inducing properties. Nucl. Acids Res. 13 (1985) 4143-4152. Schwarz, W.. Bronnenmeier, K. and Staudenbauer, W.L.: Molecular cloning of Clostridium thermocekkm genes involved in b-glucan degradation in bacteriophage lambda. Biotechnol. Lett. 7 (1985) 859-864. Schwarz, W.H., Grabnitz, F. and Staudenbauer, W.L.: Properties of C~ost~d~urnther~I~el~um endoglucanase produced in Escherichia e&i. Appl. Environ. Microbiof. 51 (1986) 1293-1299. Schwarz, W.H., Bronnenmeier, K., Griibnitz, F. and Staudenbauer, W.L.: Activity staining of cellulases in poly acrylamide gels containing mixed-linkage fl-glucans. Anal. Biochem. 164 (1987a) 72-77. Schwarz, W.H., Schimming, S. and Staudenbauer, W.L.: Highlevel expression of CIastridiumthermocellum cellulase genes in Escherichiu coli. Appl. Microbial. Biotechnol. 27 (1987b) 50-56. Teeri, T.T., Lehtovaara, P., Kauppinen, S., Salovuori, 1. and Knowles, J.: HomoIogous domains in Trichoderma reesei cellulolytic enzymes: gene sequence and expression of cellobiohydrolase II. Gene 51 (1987) 43-52. Tinoco, I., Borer, P.N., Dengler, B., Levine, M.D., Uhlenbeck, O.C., Crothers, D.M. and Gralla, J.: Improved estimation of secondary structure in ribonucleic acids. Nature New Biol. 246 (1973) 40-41. Van Knippenberg, P.H., Van Kimmenade, J.M.A. and Heus, H.A.: Phylogeny of the conserved 3’ terminal structure ofthe RNA of small ribosomal subunits. Nucl. Acids Res. 12 (1984) 2595-2603. von Heijne, G.: Patterns of amino acids near signal sequence cleavage sites. Eur. J. Biochem. 133 (1983) 17-21. von Heijne, G.: A new method for predicting signal sequence cleavage sites. Nucl. Acids Res. 14 (1986) 4683-4690. Yaguchi, R.J., Roy, C., Rollin,C.F., Paice, M.G. and Jurasek, L.: A fungal cellulase shows sequence homology with the active site of hen egg-white lysozyme. Biochem. Biophys. Res. Commun. 116 (1983) 408-411. Communicated by J. Knowles.