EGIII, a new endoglucanase from Trichoderma reesei: the characterization of both gene and enzyme

EGIII, a new endoglucanase from Trichoderma reesei: the characterization of both gene and enzyme

11 Gene, 63 (1988) Ii-21 Elsevier GEN 02293 EGIII, a new eudoglucanase from ~r~c~oder~~ reesei: the characterization of both gene and enzyme (Reco...

1MB Sizes 0 Downloads 87 Views

11

Gene, 63 (1988) Ii-21 Elsevier GEN 02293

EGIII, a new eudoglucanase from

~r~c~oder~~ reesei: the characterization of both gene and enzyme

(Recombinant DNA; cellulase; Trichoderma; gene sequence; module shufIling; active site; phage il vector)

M. Saloheimo”, P. Lehtovaara”,

M. Penttika,

T.T. Teeri”, J. Stihlhergb,

G. Johansson b, G. Petterssonb,

M. C1aeyssensc, P. TommeC and J.K.C. Knowles” u Biotechnicul Laboratory, VTT, SF-02150 Espoo (Finland) Tel. 3.58-0-4561: b Institute of Biochemistry, University of Uppsala, Biomedical Center, Uppsala (Sweden) Te!. 46-18-l 74000, and ELaboratory for Biochemistry. State University Ghent, Ghent [Bei~um) Tel: 32-91-22182 I Received

6 October

Accepted

30 October

1987 1987

Received

by publisher

20 November

1987

SUMMARY

A novel endoglucanase from T~chode~a reesei, EGIII, has been purified and its catalytic properties have been studied. The gene for that enzyme (eg13) and cDNA have been cloned and sequenced. The deduced EGIII protein shows clear sequence homology to a Schizophyllum commune enzyme (M. Yaguchi, personal communication), but is very different from the three other 7: reesei cellulases with known structure. Nevertheless, all the four T. reesei cellulases share two common, adjacent sequence domains, which apparently can be removed by proteolysis. These homologous sequences reside at the N termini of EGIII and the cellobiohydrolase CBHII, but at the C termini of EGI and CBHI. Comparison of the fungal cellulase structures has led to re-evaluation of hypotheses concerning the localization of the active sites. --

INTRODUCTION

The brown-rot fungus T. reesei is an efficient and well studied cellulose-degrading microorganism Correspondence VTT, Tietotie Abbreviations: BSA,

bovine

to: Dr. M. Saloheimo,

aa, amino acid(s); serum

albumin;

carboxymethylcellulose; endoglucanase; graphy;

polyacrylamide SDS, sodium turnover

bp, base pair(s);

cellobiohydrolase;

focusing

liquid

gel electrophoresis;

number.

sulfate;

frame;

pi, pH of isoelectric

TFA, trifluoroacetic

EG,

chromato-

on polyacrylamide

or 1000 bp; ORF, open reading dodecyl

CMC,

carboxymethylcellulase;

high-performance

isoelectric

Laboratory,

Tel. 358-O-4561.

AC, acetate;

CBH,

CMCase,

HPLC,

IEF-PAG,

kb, kilobase

Biotechnical

2, SF 02150 Espoo (Finland)

gel; PAGE, point;

acid; TON,

(Enari, 1983; Monten~ou~, 1983). For the degradation of crystalline cellulose to glucose in vivo, three types of cellulolytic enzyme are needed: endoglucanases (EC 3.2.1.4), cellobiohydrolases (EC 3.2.1.91) and /?-glucosidases (EC 3.2.1.21) (Enari, 1983). The degradation of cellulose by T. reesei cellulases is most efficient when at least two cellulase types act synergistically on cellulose (Hemissat et al., 1985; Fagerstam and Pettersson, 1980). The exact number of cellulolytic enzymes produced by T. reesei and their role in cehulose degradation is still unclear. The genes coding for two distinct cellobiohydrolases (CBHI and CBHII) have been cloned and sequenced (Shoemaker et al., 1983; Teeri et al., 1987a; Chen et al., 1987). The number of

0378-I 1~9~88/$03.50 0 1988 Elsevier Science Publishers B.V. {Biomedical Division)

12

different endoglucanases is less clear. Depending on the growth conditions and isolation methods used different groups have been able to identify a range of different endoglucanase proteins from T. reesei culture filtrates (Shoemaker and Brown, 1978a; Bhikhabh~ et al., 1984; Odegaard et al., 1984; Sheir-Neiss and Montenecourt, 1984). Only the gene for EGI has been sequenced (Penttila et al., 1986; Arsdell et al., 1987) and sections of the protein have also been sequenced (Bhikhabh~ and Pettersson, 1984). In this paper we report on the isolation and primary structure of the gene eg13 coding for a novel T. reesei endoglucanase, EGIII. Also the EGIII enzyme has been purified, and its amino acid composition and N-terminal sequence support the data obtained from the gene sequence. The enzymic properties of EGIII are unique, in part resembling those of the other endoglucanase EGI, and also showing some similarity to CBHII acting on small substrates. The evolutions relationships and the domain structures of these enzymes are discussed.

MATERIALS

AND METHODS

(a) Chemicals

SP-Sephadex, DEAE-Sepharose and Q-Sepharose Fast Flow were purchased from Pharmacia (Sweden). C~box~ethyl cellulose (0.7 DS) was from Hercules (France). The 4-methylumbelliferyl glycosides were synthesized as described (Van Tilbeurgh et al., 1982). Calf liver pyroglutamate aminopeptidase (EC 3.4.11.8) was from Sigma (U.S.A.). All other chemicals were of analytical grade. (b) Activity measurements

CMCase activity assay: 10 ~1 enzyme solution was pipetted to 2 ml 0.5% solution of CMC in 50 mM NaAc buffer, pH 5.0 (4°C). The reaction mixture was incubated at 40°C for 8 min. Reducing sugars liberated were determined according to Somogyi (1952) and Nelson (1944). Activity against 4-methylumbelliferyl-~-D-cello” trioside was measured fluorometrically and for the

higher homologues a HPLC procedure was used as described (Van Tilbeurgh et al., 1982). (c) Enzyme purification

Fraction Al from DEAE-Sepharose (Bhikhabhai et al., 1984) was concentrated by adsorption to a small SP-Sephadex column in 50 mM NH,Ac, pH 3.7 followed by elution with 20 mM Tris * HCl buffer, pH 6.5. This step reduced the volume from 1.5 liter to 15-20 ml. After a change of buffer to 8 mM Tris - HCl, pH 6.5 on a Sephadex G-25 column the material was fractionated by anion-exchange chromatography on a Q-Sepharose Fast Flow column, 2.0 x 20 cm. Elution was with a linear ionic-strength gradient 2 x 200 ml of NaCl, O-50 mM in 8 mM Tris . HCl, pH 6.5. Flow rate was 40 mI/h. (d) Analytical

methods

IEF-PAG was pe~ormed conventionally (LKB, Sweden). Enzyme activity detection was with 2 mM 4-methylumbelliferyl-B-D-cellotrioside (Van Tilbeurgh and Claeyssens, 1985). The molar absorption coefficient at 280 nm (77 OOO/M/cm)was calculated from the amino acid composition of EGIII. SDS-PAGE was performed as described by Maize1 (1969) using 10% w/v polyacrylamide in the separation gel. Amino acid analyses were run on a Durrum D500 analyzer after hydrolysis of the sample for 24 h at 110°C in 6 M HCl cont~n~g 2 mg phenol/ml. Serine and threonine values were calculated using the standard recovery factors of 0.90 and 0.96 respectively. Tryptophan was estimated spectrophotometrically. Prior to N-terminal sequence determination 40 nmol of lyophilized protein (reduced and carboxymethylated) were dispersed in 1 ml of deblocking buffer (Podell and Abraham, 1978) and flushed with nitrogen before use. The protein derivative was suspended in a whirlmixer and ultrasonic bath and pyroglutamate ~inopeptidase (10 units) was added. The mixture was incubated for 24 h at room temperature. After centrifugation at 3000 rev./min, the pellet was washed in standard electrophoresis ‘fixing’ solution (7 v0 HAc in methanol : H,O (50 : 50, v : v)) and dried in a stream of nitrogen and dissolved in 1 ml of 1% aqueous SDS.

An aliquot (25 ~1) of the protein-SDS adsorbed

to a TFA-treated

glass-fiber

disc. 37 cycles of sequencing

out according

solution was

to the procedure

were carried

described

by Hewick

acid identification in an Applied Biosystems 120A PTH analyzer. Initial coupling yield was estimated 92 2.

while the average

Unambiguous

33 aa residues Carbohydrate anthrone-sulphuric

of eg13

and polybrene-coated

et al. (1981) using an Applied Biosystems 470A gasliquid phase sequencer with ‘on-line’ PTH-amino

to be lo%,

(h) Isolation of the cDNA-copy

repetitive

identification

A T. reesei VTT-D-80133

cDNA bank was made

in E. coli using pUC8 as vector (Teeri et al., 1987b). The bank was screened by colony hybridization (Hanahan restriction

and Meselson, 1983). A nick-translated fragment was used as hybridization

probe.

yield was

was possible

for

(i) Sequencing

out of the first 36 aa. analysis

was performed

acid reagent

using the

with mannose

as

standard. The absorbance was measured at 585 nm, an isosbestic point where the absorbance given by hexose is independent of the tryptophan content of the sample (Hbrmann and Gollwitzer, 1962). (e) Strains and vectors Escherichia coli strain JM 109 (Yanisch-Perron et al., 1985) was used as a host for the cloning vectors pUC8 (Messing, 1983) and pUC18 (Norrander et al., 1983). T. reesei strain VTT-D-80133 (Bailey and Nevalainen, 1981) was used for RNA and DNA isolation.

The two clones containing

and

were terminally

deleted using exonuclease III and Sl nuclease according to Henikoff (1984). The gene sequence was determined from deleted plasmid subclones essentially according to Zagursky et al. (1986), as well as sequencing of inserts cloned in 2 vector using synthetic 17-mer oligodeoxynucleotide primers. The sequences were analysed with a computer according to Queen and Korn (1984). Methods not described above were carried out using standard techniques [e.g., Maniatis et al. (1982)].

RESULTS

(f) Isolation of the chromosomal

the chromosomal

cDNA gene copy in pUC plasmids

AND DISCUSSION

eg13 gene (a) isolation

A chromosomal T. reesei gene bank was made in the phage vector A1059 and differential hybridization of the bank was done as described (Teeri et al., 1983). The gene product was examined by hybrid mRNA selection, followed by in vitro translation using a rabbit reticulocyte lysate (Amersham) and SDS-PAGE analysis (Laemmli, 1970). (g) Isolation of mRNA For the isolation of the cellulase mRNAs T. reesei was grown in conditions described (Bailey and Nevalainen, 198 1) except that 2 % of lactose and 2% of soluble extract of distiller’s spent grain were added. Frozen mycelium was ground into line powder under liquid nitrogen and suspended into 5 ~01s. of guanidium isothiocyanate. Total RNA was isolated according to Chirgwin et al. (1983) and the poly(A) + RNA fraction was purified by oligo(dT)cellulose chromatography (Aviv and Leder, 1972).

and sequencing of the eg13 gene

A chromosomal gene bank of T. reesei was made in E. coli using 2 1059 as vector. Clones containing genes strongly expressed under cellulase induction were isolated by differential hybridization as described by Teeri et al. (1983). The clone characterized in the present study was first analyzed by hybrid selection and in vitro translation. A protein of 42 kDa was detected in SDS-PAGE. At this stage the cloned gene could only be identified as a gene efliciently expressed in T. reesei during the production of cellulases. To be able to subclone and characterize this gene within the about 18-kb long insert, the recombinant 2 clone was mapped with Southern hybridization using cDNA made of induced mRNA as probe. The Southern mapping suggested that a 2.3-kb HpaI fragment would contain the gene or at least most of it. The 2.3-kb HpaI fragment was thus isolated and cloned in pUC18 to obtain the genomic subclone. A full-length cDNA copy of 1550 bp was

14

isolated from a T. reesei cDNA bank made in pUC8 (Teeri et al., 1987b), using a 420 bp HincII fragment from the 5’ end of the genomic subclone as probe.

The cDNA clone and the genomic clone, both in pUC plasmids, were deleted with exonuclease III and S 1 nuclease

to generate

a series of deleted sub-

TECATTTCT GACCTGGATA GCTTTTCCTA TGGTCATTCC TATAAGAGAC ACGCTCTTTC GTCGGCCCGT AGATATCACA TTGGTATTCA GTCGCACAGA CCAAG-

110

ttgat cctccaacat gagttctatg agcccccccc ttgccccccc ccgttcacct tgacctgcaa tgagaatccc accttttaca agagcatcaa gccgtatcaa tggcg

220

ctgaa t%CCTCTGC TCGATAATAT CTCCCCGTCA TCGACA ATG AAC AAG TCC GE CCT CCA TTG CTG CTT GCA GCG TCC ATA CTA TAT GGC GGC GCC Met Asn Lys Ser Val Ala Pro Leu Leu Leu Ala Ala Ser Ile Leu Tyr Gly Gly Ala -21 +A Ava II GTC GCA CAG CAG ACT GTC TGG GGC CAG TGT GGA GGT ATT GGT TGG AGC %xT ACG AAT TGT GCT CCT GGC TCA GCT TGT TCG ACC CTC Val Ala Gin Gln Thr Val Trp Gly Gln Cys Gly Gly Ile Gly Trp Ser Gly Pro Thr Am Cys Ala Pro Gly Ser Ala Cys Ser Thr Leu -1 +1 10 20 Ava II AAT CCT TAT TAT GCG CAA TGT ATT CCG GGA CCC ACT ACT ATC ACC ACT TCG ACC CGG CCA CCA TCC WA ACC ACC ACC ACC AGG GCT Am Pro Tyr Tyr Ala Gln Cys Ile Pro Gly Ala Thr Thr Ile Thr Thr Ser Thr Arg Pro Pro Ser Gly Pro Thr Thr Thr Thr Arg Ala 30 40 50 BACC TCA ACA AGC TCA TCA ACT CCA CCC ACG AGC TCT GGG GTC CGA ?TT GCC GGC'GTT AAC'ATC GCG GGT TTI GAC RT GGC TGT ACC ACA Thr Ser Thr Ser Ser Ser Thr Pro Pro Thr Se,’ Ser Gly Val At-gPhe Ala Gly Val Am Ile Ala Gly Phe Asp -Phe 2 Cys Thr Thr 60 70 75 80

318

r

A--l/B

498

"paI

1

GA~accc

408

ttgtttcctg gtgttgctgg ctggttgggc gggtatacag cgaagcggac gcaagaacac cgccggtccg ccaccaccaa gatgtgggcg gtaagcggcg

588

700

AS gtgttttgta

caactacctg

AAG AAC TTC ACC GGC & Asn Phe Thr Gly 7 CCT GTC GGA TGG CAG -Pro Val Gly Trp Gln TGC CTG TCT CTG GGC Cys Leu Ser Leu Q

TCG AAG GTT TAT CCT CCG TTG Ser Lys Val Tyr pr0 Pro Leu 100

801

GGG ATG ACT Am TTC CGC TTA Gly Met Thr Ile Phe Arg Leu 130

891

TAT CAT CAG CIT GTT CAG GGG Tyr Asp Gln Leu Val Gin Gly 160

981

GGT CAG GGC GGC CCT ACT AAT Gly Gln Gly Gly pr0 Thr Asn 190

1071

TCA AAG TAC GCA TCT CAG TCG AGG GTG TGG TPC GGC ATC ATG AAT GAG CCC CAC GAC Ser Lys Tyr Ala Ser Gin Ser Arg -_ Val Trp Phe Gly Ile Met Asn Glu Pro His & 210 ----220

1161

ac%T GGC ACT TGC GTT ACC p Gly Thr Cys Val Thr 90 nine II TCA AAC AK TAC CCC CAT GGC ATC GGC GAG ATG CAG CAC TTC'GTC AAC'GAG GAC Ser Am Asn Tyr Pro Asp Gly -Ile Gly Gln -Met Gin His Phe Val Am Glu Asp 110 120 Hint II TAC CTC ?%%??AAC AAT TTG GGC GGC AAT CTT GAT TCC ACG AGC ATT TCC AAG Tyr Leu Val Asn Asn Am Leu Gly Gly Am Leu * Ser Thr Ser Ile Ser Lys 140 -150 Sal I GCA TAC TGC ATC kTC GAC'ATC CAC MT TAT GCT CGA TGG AAC GGT GGG ATC ATT Ala TJJ Cys Ile Val Asp Ile _His Asn Tyr Ala Arg Trp Asn Gly Gly Ile e 170 1RO aca&ztcact

caSRaact@

~aattaatg~

GCT CM TTC ACG AGC CTT TGG TCG CAG TTG GCA Ala Gln Phe Thr Ser Leu a Ser Gln Leu Ala 200 Ava II GTG AAC ATC AK ACC TGG GCT GCC ACG GTC CAA Val Am Ile Am Thr Trp Ala Ala Thr Val Gin 230 -

aagtcttgtt

CCT Pro

1251

ACG Thr

I341

AAT CTG ATT TTT GAC GTG CAC AAA TAC TCG GAC TCA GAC AAC TCC GGT ACT CAC GCC GAA TGT ACT ACA AAT AAC ATI GAC GGC GCC TTT Asn Leu Ile Phe Asp Val His Lys w Leu & Ser & Asn Ser 9 Thr His Ala Glu Cys Thr Thr Asn Asn -Ile Asp Gly Ala Phe -_ 290 300 310

I431

TCT CCG CTT GCC ACT TGG CTC CGA CAG AAC AAT CGC CAG GCT ATC mG ACA GAA ACC GGT GGT GGC AAC GTT CAG TCC TGC ATA CAA GAC Ser Pro Leu -_ Ala Thr 2 Asn Asn ArS Gin Ala Ile -Leu Thr Glu Thr 9 Gly % Asn Val Gln Ser Cys Ile Gln Asp -Leu Arg Gln -320 330 340

1521

ATG TGC CAG CAA ATC CA.4TAT CTC AAC CAG MC TCA GAT GTC TAT CTT GGC TAT GTT GGT TGC GGT GCC GGA TCA TTT GAT AGC ACG TAT Met Cys Gln Gln Ile Gln s Leu Asn Gln Asn Ser Asp Val Tyr Leu e Tyr Val Gly _TleGly Ala Gly Ser Phe Asp Ser Thr w -370 350 360

1611

GTC CTG ACG GAA ACA CCG ACT AGC AGT GGT AAC TCA TGG ACG GAC ACA TCC TTG GTC AGC TCG TGT CTC GCA AGA AAG TAG CACTCTGAGC Val Leu -Thr Glu Thr Pro Thr Ser Ser 9 Am Ser Trp Thr * Thr Ser Leu Val Ser Ser Cys -Leu Ala Arg Lys 3RO 390 397

1702

TGAATGCAGA AGCCTCGCCA ACGTTTGTAT CTCGCTATCA AACATAGTAG CTACTCTATG AGGCTGTCTG TTCTCGATTT CAGCTTTATA TAGTTTCATC AAACAGTACA

1812

GAG GTT GTA ACC GCA ATC CGC AAC GCT GGT GCT ACG TCG CAA TTC ATC TCT TTG GIu 'la1Val Thr Ala Ile 9 Am ----Ala % Ala Thr Ser Gln Phe Ile Ser Leu 240 250 Rinc II GGA AAT GAT TGC CAA TCT GCT GGG GCT TTC ATA TCC GAT GGC AGT GCA GCC GCC CTG TCT CAA GTC ACG ACC CCG GAT GGi?mACA % Asn Asp TKJJGln Ser Ala Gly Ala Phe Ile Ser Asp Gly Ser Ala Ala -Ala Leu Ser Gln Val -Thr Am Pro Asp Gly Ser Thr 260 270 280

TATTCCCTCT GTGGCCACGC (A)17 1R49

Fig. 1. Nucleotide The proposed cellulase

sequence

of the egN structural

signal sequence

genes sequenced

lower-case

letters.

underlined

in the sequence

gene with restriction

of 21 aa is underlined.

are shown in Fig. 6. The putative

Their border

sequences

of the mature

are underlined. protein.

sites and the deduced

Near the N terminus, N-glycosylation Amino

amino acid sequence

two blocks of aa (A and B) homologous site is marked

acids that can be aligned

with an asterisk.

of the EGIII protein. to all other T. reesei

Introns

are written

with the S. c~mmtlne EGI sequence

with are

15

clones for sequencing

(Henikoff,

1984). The cDNA

not used. As in the other cellulase

genes (Penttila

clone was deleted from the 5’ end and the genomic

et al., 1986; Teeri et al., 1987a;

clone from the 3’ end, and the noncoding

1983) there is a bias against NTA codons,

and coding

strands, respectively, were sequenced. The 2.3-kb chromosomal clone was found to lack about 300 bp of the N-terminal

protein-coding

the gene was therefore recombinant nucleotide

2 clone sequencing

region. This part of

sequenced using primers.

from the initial

specific

oligodeoxy-

The sequence

of the

Shoemaker

et al.,

where N

is any nucleotide. Comparison of the cDNA and gene sequences reveals two introns, both exceptionally long for T. reesei cellulase genes and among the longest found in filamentous

fungal

genes (Penttila

et al., 1986;

Teeri et al., 1987a). The first, 123-bp long intron

is

whole coding region of the gene was thus determined

in the 5’-flanking

from both strands

start codon ATG (Fig. 1). The second intron of 174

The derived

of DNA.

gene sequence

and the protein

se-

quence deduced from DNA are shown in Fig. 1. An ORF of 418 aa can be found in the cDNA sequence. As shown in section c, below, this deduced protein had an identical amino acid composition with an endoglucanase component purified from the supernatant of T. reesei. The identity was confirmed by N-terminal sequencing of the protein. The codon usage of the eg13 gene shows that CAT(His), CGT(Arg) and CTA(Leu) codons are

bp is at the position

region 33 bp upstream

from the

coding for aa 89 in the mature

protein. (b) Isolation of the EC111 protein Two different T. reesei endoglucanases, EGI (then called Endo II) and EGII (then called Endo III) have previously been purified using DEAE-Sepharose chromatography at pH 5.0 as the first fractionation step (Bhikhabhai et al., 1984). A CMCase-rich

Al:1

9 8

3 2 1 0 FRACTION Fig. 2. Anion-exchange

chromatography

2.0 x 20 cm. Linear gradient: rate: 40 ml/h. Fractions

Al

of the Al material

No

on Q-Sepharose

Fast Flow using an ionic-strength

0 to 50 mM NaCl in 8 mM Tris HCI, pH 6.5. Gradient

: 1, Al : 2 and Al : 3 are pooled.

gradient.

volume: 2 x 200 ml. Fraction

Column

size:

volume: 4 ml. Flow

16

fraction (designated Al) was not adsorbed to the anion exchanger under these conditions. However, it was adsorbed to the anion exchanger.QSepharose Fast Flow at pH 6.5. Elution of the column using a linear gradient CMCase

of NaCl yields three distinct

activity

(Fig. 2). These

peaks of

fractions,

called

Al : 1, Al : 2 and Al : 3, were pooled and rechromatographed overlapping the three

under

the same

material fractions

conditions

to remove

from surrounding gave symmetrical

cating fraction

homogeneity.

(c) Molecular

and enzymatic

peaks. peaks,

All indi-

properties

The M,s for Al: 1, Al : 2 and Al : 3 were 48, 48 and 37 kDa, respectively, as estimated by SDS-PAGE (Fig. 3). Al:2 and Al: 3 showed minor impurities of lower M,s, whereas Al : 1 produced only one band on the gel. Using chromogenic glycosides of the cellodextrins as substrates and analytical IEF a tentative differentiation of the cellobiohydrolases and endocellulases present in T. reesei culture filtrates was obtained (Van Tilbeurgh and Claeyssens, 1985). One component (EGIII) with apparent pI of 5.5-5.6 liberates the fluorescent phenol from 4-methylumbelliferyl-/?D-cellotrioside and is the only enzyme to do so (Fig. 5). All the three fractions (A 1: 1; A 1: 2 and A 1 : 3) proved to be active on this substrate (Fig. 4). Amino acid compositions were determined for the purified proteins and compared to the amino acid composition deduced from the nucleotide sequence. One of the components, Al : 1, was found to correspond well with the deduced gene product (Table I). The other two components in A 1 revealed similarities in amino acid composition and catalytic properties, suggesting that they may be related (to be published). A protein puritied by Shoemaker and Brown (1978), then called Endo IV, had an amino acid composition closely resembling that of the deduced protein (Table I). The A4, and carbohydrate determinations also gave results similar to those presented here. It is thus likely that they had purified a similar if not the same protein. EGIII is clearly a protein different from Endo III purified by Bhikhabhai et al. (1984) in terms of both amino acid composition and N-terminal amino acid sequence.

1. Fig. 3. SDS-PAGE

2.

of the pooled

-3.

4.

Al-fractions.

200 pg of lyo-

philized protein was boiled for 3 min in 200 ~1 83 mM Tris-HCl, pH 8.8 containing samples

3% SDS and 10 mM dithioerythritol.

were then cooled

added to alkylate

all SH-groups.

pl of final reaction

mixture

gel. The gel (10% acrylamide; at room temperature 1970). Lane 1: Al

The

and 10 ~1 of 0.5 M I-acetamide Reaction

was

time was 30 min. 20

(r 20 pg protein)

was added to the

0.1% SDS) was run for 3.5 hours

(200 V) with a buffer described

: 3. Lane 2: Molecular

(Laemmli,

weight markers

of 94,

67, 43, 30, 20.1 and 14.4 kDa. Lane 3: Al: 1. Lane 4: Al:2. (Fractions

of Al are shown in Fig. 2.)

The N terminus of EGIII was found to be blocked. After removal of the N-terminal pyroglutamyl residue, EGIII was subjected to 36 steps of automated Edman degradation. 33 aa residues could be unambiguously identified and the sequence obtained is identical to that deduced from the gene sequence. The three unidentified residues correspond to serine and threonine residues 3, 26 and 27, which are probably glycosylated, thus giving unidentifiable PTH-derivatives. The translated

17

TABLE

PI

Amino

I acid composition

4.0

<

< 5s

c 5.6

Endo IV”

EGHI b

egf3 gene* 12

12

ND

CYS Asn + Asp

49

48.8

49

Thr

44

42.7

40

Ser

42

43.9

38

Gin + Glu

32

28.1

34

Pro

19

19.5

23

Gly Ala

42

44.8

41

28

29.0

32

Met

4

3.5

3

Val

22

22.1

21

Ile

21

18.8

19

Leu

23

23.0

21

Tyr Phe

14

13.6

13

13

12.5

12

His

5

5.2

6

Lys

6

7.8

I

Trp

11

Arg

- 10

10.5

- 11

Total

397

373.8

398

a Deduced b Normalized

from nucleotide

ND

10

sequence.

to 374 aa residues.

Same as Al

: 1 in Figs. 2,3 and

4. c According

to Shoemaker

and Brown (1978a).

ND, not determined.

3

2

1

0 Fig. 4. IEF-PAG

of the pooled

Al

fractions.

Analytical

electric focusing was performed

on a LKB Multiphore

and

(pH

Ampholine

PAG

plates

3.5-9.5

, degree of cross-linkage

gelconc.

5”/,

dialysed

against

tapwater.

Voltage was gradually

two hours.

Isoelectric

in parallel

(not shown).

fluorimetrically (Van Tilbeurgh solution

with

1:

cooled

pg) onto the with running

(100-1000

proteins

2.4%;

were first

volt) over

(LKB) were run

activity was detected

4-methyl-umbelliferyl-~-D-cellotrioside 1985) by flooding the gel with a

(0.5 mM in sodium

active fractions

bands upon transillumination conventionally

increased

After the focusing

and Claeyssens,

and were photographed

plate

point standard

of the glycoside

5.0). Enzyme

ampholines

3%). Samples

distilled water and applied (lo-30

gel (5 x 11 x 0.1 cm), supporting

iso-

apparatus

became

buffer, pH

visible as blue fluorescent

with a 6 x 15 W UV lamp (302 nm)

(Polaroid

with Coomassie

acetate

region starts with a signal sequence of 21 aa. The mature protein then has two Gin residues at the N terminus, but the first is converted in vivo to the blocking pyroglutamate residue. The signal sequence cleavage site would thus be almost identical to that of 7: reesei EGI (Penttila et al., 1986). Almost all T~c~o~e~~ cellulases characterized so far are glycoproteins (Bhikhabhai et al., 1984; Beldman et al., 1985). The total sugar content of EGIII was estimated to be an average of 47 mannose equivalents per protein molecule. This corresponds to 15% w/w of the total mass 49.8 kDa, calculated from the formula weight 42.2 kDa of the 397-aa residue peptide chain, plus 7.6 kDa given by the 47 hexose units.

53). The gel was then stained blue. pI as indicated:

lane 1, A

(d) Specificity against small substrates as compared with EGI, CBHI and CBHII

1; lane 2, A 1: 2; lane 3, A 1 : 3.

The main products cleaved by EGIII from cellodextrins (containing two to five glucose units) and

Fig. 5. Substrate

specificity

and cellopentaose pyranose

(reducing

cm RSil Polyol chromophoric

end); upward

IO pm, Alltech)

reaction

products

5.0, 25°C) was incubated of this mixture

of EGIII against

4-methylumbelliferyl-derived

could not be unequivocally

their chromophoric In contrast with

arrow, main hydrolysis was performed

Symbols:

nM enzyme

Peak identification

and under&d

by isocratic

cellodextrins.

0, /?-glucopyranosyl

site; dashed upward

at 313 nm (Van Tilbeurgh

with 2-20

was injected.

determined.

elution

(-I ,4);

arrow, secondary

and after IO-20 min 50-pl aliquots was done by comparison

glycosides are shown in Fig. 5. the chromophoric substrates

cleavage sites for the underived cellodextrins (cellotriose, cellopentaose) could not unequivocally be assessed. K, values for the 4-methylumbelliferyl derivatives were in the lo-60-PM range whereas those for EGI acting on the same substrates, but of course giving different reaction product patterns, are in the mM range (H. van Tilbeurgh, unpublished). Turnover numbers (TON) for EGIII (lo-200/min) are 50-200 times lower than for EGI. Consequently the catalytic efficiency (TON/K,) for both enzymes acting on the same low-M, substrates is of the same magnitude (l-2 x 106/M/min). In contrast to both EGI and CBHI, cellobiosides (and lactosides) are not hydrolysed by EGIII. The specificity of this enzyme shows therefore some similarity to that of CBHII (Van Tilbeurgh et al., 1985). This is interesting in view of the fact that both EGIII and CBHII have the homologous domains at the N termini in contrast to CBHI and EGI which have similar sequences at the C termini (see section e, below). No transferase activity was observed for EGIII in contrast to findings with EGI (H. Van Tilbeurgh, unpublished). The pH optimum as determined with the cellotrioside was at pH 4.0-5.0 and the stability of the enzyme at this pH in diluted (< 1 FM) solution (SO’C) was enhanced by the addition of BSA (1 mg/ml).

hydrolysis

(25 : 75, water-acetonitrile,

et al., 1982). 100 PM substrate

The cutting points of cellotriose

l ,4-methylumbelliferyl;

solution

q , b-gluco-

site. HPLC analysis (25 x 0.5

1.5 ml/min)

and detection

(50 mM sodium acetate

were diluted with 100 ~1 acetonitrile.

of retention

times with those of adequate

of the

buffer pH Twenty

~1

standards.

(e) Terminal modules are conserved in Trichoderma reesei cellulases Remarkable

sequence

homology

can be detected

between the N terminus of EGIII and N- or C-terminal regions of three other T. reesei cellulases (Shoemaker et al., 1983; Penttila et al., 1986; Teeri et al., 1987a). Two homologous regions of about 35 aa, denoted as A and B, are found at the N terminus of EGIII and CBHII and in the C terminus of CBHI and EGI (Fig. 6). It is important to note that outside the relatively short terminal A and B blocks, EGIII shows very little if any homology to the other T. reesei cellulases. The A block of EGIII is about 70% homologous to the A regions in other cellulases, and the homology in the B blocks is 50-60%. Being so much alike in these four proteins, these regions almost certainly have a functional significance. These A-B blocks fold to a domain, which can be removed from both CBHI and CBHII by limited proteolysis. The core enzymes have a significantly decreased affinity to microcrystalline cellulose while activity on soluble substrates is retained (Van Tilbeurgh et al., 1986; Tomme et al., 1987). This type of domain structure is found in all four of the Trikhoderma cellulases and an analogous overall architecture is found also in the cellulases of Cellulomonasfimi and Clostridium thermoceflum (Warren et al., 1986; Joliff et al., 1986). We have proposed that this type of structure serves not

19

CBH I

ss

B

A

5 I

EG I

ss

BA

CBH II ss

A B B’ . .....’.. . . :)I.: :..;.:’

1

EG Ill SS A B

Fig. 6. Basic structures Introns

of 7’. reesei cellulases.

SS,

signal sequence. (A) and (B), amino acid blocks common to all T. reesei cellulases.

are shown by a thick solid line. In CBHI and EGI the homologous

are in the N terminus

boxes are in the C terminus

(Fig. 1). Apart from the boxes, EGIII shows very little homology

only to decrease the effective Km of the enzyme with natural substrates but also to release cellulose chains from the cellulose crystal prior to hydrolysis by the catalytic domain (Knowles et al., 1987). It is possible that the proteolytic removal of the terminal homologous regions could serve as an in vivo mechanism to alter the properties of cellulases during the hydrolysis, when complex substrates are gradually changed to shorter and more accessible substrates. Like the other T. reesei cellulases, EGIII might also be processed this way. This idea is supported by the amino acid composition data of Shoemaker and Brown (1978b), who isolated not only an EGIII-like enzyme, but also what they presumed to be the proteolytic cleavage product. This contained less carbohydrate, and its amino acid composition now seems to be well explained by removal of the terminal homologous sequences A and B. The B region is rich in serines and threonines. In EGIII, 21 aa residues out of 34 are serine or threo-

and in CBHII

with the other cellulases

and EGIII they

(from Teeri et al., 1987a).

nine. The terminal homologous regions in CBHI and CBHII are the sites for most of their 0-glycosylation (Fagerstam et al., 1984; Tomme et al., 1987). On the basis of homology, we suggest that also the B region of EGIII could be heavily 0-glycosylated. Within the B block there are live proline residues. These prolines might help the close packing of the carbohydrate chains, as has been demonstrated for some heavily 0-glycosylated proteins (Allen, 1983). The EGIII protein sequence contains only one putative N-glycosylation site Asn-Phe-Thr, at the aa positions 103-105. It thus seems most likely that EGIII protein is very heavily 0-glycosylated. As in CBHI and CBHII (Teeri et al., 1987a) the cysteine residues of EGIII are concentrated near the ends of the protein molecule. Six cysteines out of twelve are near the N terminus and four at the C-terminal end of EGIII. It is possible that the terminal regions of the molecule fold into domains stabilized by disulfide bridges.

(f) EGIII shows extensive homology with a Schizuphyllum commune endoglucanase The basidiomycete to produce EGII

S. commune has been shown

two forms of endoglu~anases,

(Paice et al., 1984). EGII

processed N-terminus.

The T. reesei EGIII

to the S. commune EC1 protein

Yaguchi, personal

comm~ication).

residues

are of course

and must be tested with other methods

such as site-directed mutagenesis modification studies.

and active-center

to be

cleavage at the

homologous

regions of EGIII homologous

of the active-site

speculative

EGI and

is proposed

from EGI by a proteolytic

dictions

sequence

is clearly

Leaving out the

to other T. reesei cellu-

lases (A and B box), the degree of identity

ACKNOWLEDGEMENTS

(M.

of EC111

The authors providing

wish to thank

the protein

sequence

Dr. M. Yaguchi

for

of the S. commune

endoglucanase.

and S. commune EGI is 30.4% (Fig. 1). If conservative substitutions are allowed, this homology increases to more than 40%. It is thus most likely that the genes coding for T. reesei EGIII and S. ~ornrn~~e EGI have evoived by divergent evolution from a common ancestor. However, the S. commune endoglucanases isolated from the growth medium do not have the amino acid blocks A and B that are common to all T. reesei cellulases sequenced so far (Teeri et al., 1987a). It will be interesting to see if the A and B boxes are coded for by the S. commune egll gene and then proteolytically processed later. In the homologous region between T. reesei EGIII and S. commute EGI, two cysteines are conserved (aa residues 302 and 338 in EGIII). Also Cys92 has its counterpart nearby in the S. commune protein. The conservation of the cysteines indicates that they might form disultide bridges with each other or with other cysteines. The reaction mechanism semble that of lysozymes,

of cellulases might rewith carboxyl groups

taking part in the catalysis. On the basis of some limited amino acid homology found between the N termini of lysozyme and S. commune EGI, it has been suggested that the residues Glu47, Asn56 and Asp64 of S. commune EGI would participate in the catalysis (Paice et al., 1984; Yaguchi et al., 1983). However, in the T. reesei EGIII sequence these residues are not conserved, and the whole region is dearly nonhomologous. Comparison of the two endoglucanase sequences suggests that the active site residues may be located elsewhere, probably in the best conserved regions of the two proteins. Interestingly, Glu218 and Asp292 of T. reesei EGIII can be aligned with Glu and Asp residues in the S. commute enzyme and they are also located in the best conserved regions of the proteins (Fig. 1). Pre-

REFERENCES Allen, A.: Mucus - a protective secretion of complexity. Trends Biochem. Sci. 8 (1983) 169-173. Aviv, Ii. and Leder, P.: Purification of biologically active globin messenger RNA by chromato~aphy on oligothymidylic acid cellulose. Proc. Natl. Acad. Sci. USA 69 (1972) 1408-1412. Bailey, M.J. and Nevalainen, K.M.H.: Induction, isolation and testing of stable Trichoderma reesei mutants with improved production of solubilizing cellulase. Enzyme Microb. Techno]. 3 (1981) 153-157. Beldman, G., Searle-Van Leeuwen, M.F., Rombouts, F.M. and Voragen, F.G.: The cellulase of Trichde~ma viride. Eur. J. Biochem. 146 (1985) 301-308. Bhikhabhai, R. and Pettersson, L.G.: The cellulolytic enzymes of Trichoderma reesei as a system of homologous proteins. FEBS Lett. 167 (1984) 301-308. Bhik~abha~, R., Joh~sso~, G. and Pettersson, L.G.: Isolation of cellulolytic enzymes from T~~ode~a reesei QM 9414. J. Appl. Biochem. 6 (1984) 336-345. Chen, CM., Gritzali, M. and Stafford, D.W.: Nucleotide sequence and deduced primary structure of cellobiohydrolase II of Trichoderma reesei. Bio/Iechnology 5 (1987) 274-278. Chirgwin, J.M., Przybyla, A.E., MacDonald, R.J. and Rutter, W.J.: Isolation of biologically active ribonucleic acid from sources enriched in ribonuclease. Biochem. J. 18 (1979) 5294-5299. Enari, T.-M.: Microbial cellulases. In W.M. Fogarty (Ed.), Microbial Enzymes and Biotechnology. Applied Science Publishers, London, 1983, pp. 183-223. Fagerstam, L.G. and Pettersson, L.G.: The i-4-p-glucan cellobiohydrolases of Trichoderma reesei QM9414. A new type of cellulolytic synergism. FEBS Lett. 119 (1980) 97-100. Fagerstam, L.G., Pettersson, L.G. and Engstrom, J.A.: The primary structure of a 1,4-/_I-glucancellobiohydrolase from the fungus Trichodenna reesei QM9414. FEBS Lett. 167 (1984) 309-31s. Hanahan, D. and Meselson, M.: Plasmid screening at high colony density. Methods Enzymol. 100 (1983) 333-342.

21

Henikoff,

S.: Unidirectional

creates

targeted

digestion

breakpoints

with

exonuclease

for DNA sequencing.

III

Gene 28

B., Driguez,

cellulose. Hewick,

H., Viet, C. and Schtilein,

Bio/Technology

A gas-liquid

of

3 (1985) 722-726.

R.M., Hunkapiller,

M.W., Hool, L.E. and Dreyer,

solid phase

peptide

and protein

W.J.:

sequencer.

J.

von Hexosen

in

H. and Gollwitzer,

Tryptophan-haltigen

R.: Bestimmung

Eiweisskorpern.

Ann. Chem. 655 (1962)

the cellulase

P. and Aubert, gene encoding

J.-P.: Nucleotide

sequence

of

D of Clostridium

endoglucanase

thermocellum. Nucl. Acids Res. 14 (1986) 8605-8613. Knowles,

J., Lehtovaara,

families

and their genes. Trends Biotechnol. 5 (1987) 255-261. Laemmli, U.: Cleavage of structural proteins during the assembly of bacteriophage

T4. Nature

Maize1 Jr., J.V.: Acrylamide

227 (1970) 680-685.

gel electrophoresis

nucleic acids. In Habel, K. and Salzman, mental Techniques

in Virology,

of proteins

and

Academic

Press, New York,

1969, pp. 334-362. Maniatis,

T., Fritsch,

A Laboratory

E.F. and Sambrook,

J.: Molecular

Cold Spring Harbor

Cloning.

Laboratory,

Cold

NY, 1982.

J.: New Ml3 vectors

Montenecourt,

for cloning.

Methods

Enzymol.

Trichoderma

N.J.: A photometric

Norrander, improved

adaptation

T. and

of the Somogyi J. Biol. Chem.

Messing,

J.: Construction

P.C. and Lowrie, R.E.: Resolution

cellulase

J. Appl. Biochem. Desrochers,

endoglucanase

M., Lehtovaara,

J.: Homology

of pyroglutamic

between

hydroR. and

genes of Tricho-

sequence

of the endo-

Res. Commun.

amino

for the removal

terminus

of proteins

peptidase.

Biochem.

81 (1978) 176-185.

L.J.: A comprehensive

for the IBM personal

computer.

1 (1983) 691-696.

cellulases

cellulase

J. Biol. Chem. 195

cellulolytic

enzymes:

J.,

S., Salovuori,

gene sequence

I. and

in Trichoderma reesei

domains

and expression

of cello-

V., Lehtovaara,

P. and Knowles,

J.D.: Anal.

164 (1987b) 60-67.

P., Van Tilbeurgh, Teeri,

T.T.,

H., Pettersson,

Knowles,

of cellulose

J.

and

Van Arsdell,

G., Vandekerchove, Claeyssens,

hydrolysis:

in Trichoderma cellulases

function

analysis

The

by partial proteolysis.

J.N., Kwok,

S., Schweickart,

V.L., Ladner,

D.H. and Innis, M.A.: Cloning,

and expression

Eur. M.B.,

characterization,

in Saccharomyces cerevisiae of endoglucanase

H. and Claeyssens,

of cellulase

5 (1987) 60-64.

M.: Detection

components

and differen-

using low molecular

mass

substrates. FEBS Lett. 187 (1985) 283-288. H., Claeyssens, M. and De Bruyne, C.K.: The use

of4-methylumbelliferyl the study

M.:

of domain

(1987) in press.

I from Trichoderma reesei. Bio/Technology tiation

reesei.

II. Gene 51 (1987a) 43-52.

Teeri, T., Kumar, Tomme,

cloning

Trichoderma

from

P., Kauppinen,

J.: Homologous

biohydrolase

J.: The molecular

gene

1 (1983) 696-699.

Teeri, T.T., Lehtovaara,

and other chromophoric

of cellulolytic

Van Tilbeurgh,

enzymes.

sequence

analysis

Nucl. Acids Res. 12

B.S.: Characterization

of Trichoderma reesei wild

M., Pettersson,

sens, M.: Studies

FEBS

glycosides

Lett.

interactions

ofthe type

and

L.G., Bhikhabhai,

of the cellulolytic

reesei QM 9414. Reaction

in

149 (1982)

system

specificity

of small substrates

cellobiohydrolase

R. and Claeysof Trichoderma

and thermodynamics

and ligands

of

with the 1,4-/?-

II. Eur. J. Biochem.

148 (1985)

329-334. Van Tilbeurgh,

H., Tomme,

and Pettersson, I from

P., Claeyssens,

G.: Limited proteolysis

M., Bhikhabhai,

R.

of the cellobiohydro-

Trichoderma reesei. FEBS

Lett.

204 (1986)

223-227. Warren,

R.A.J., Beck, CF., Gilkes, N.R., Kilburn,

D.G., Langs-

ford, M.L., Miller Jr., R.C., O’Neill, G.P., Schenfens, Sequence

in an endoglucanase jimi. Prot. Struct. Yaguchi,

G. and Montenecourt,

of exo-

Trichoderma reesei strain

from

I. and Knowles,

major

Wong, W.K.R.:

(1984) 581-599. secreted

the

Bio/Technology

lase

G.N.: A technique

acid from the amino

C. and Kern,

Sheir-Neiss,

of

glucan

H., Bhikhabhai,

cellulase

nucleotide

using calf liver pyroglutamate

program

Schizophyllum com-

I gene. Gene 45 (1986) 253-263.

Podell, D.N. and Abraham,

Queen,

M.: Two forms of

2 (1984) 535-539.

derma reesei: complete glucanase

L., Roy, C.,

to other fl-1,4-glycoside

P., Nevalainen,

D., Kwok,

cloning

152-156.

D., Jurasek,

from the basidiomycete

lases. Bio/Technology

of

6 (1984) 156-183.

M., Rho,

mune and their relationship

Biophys.

I derived

viridae.

(1952) 19-23.

fluorogenic Van Tilbeurgh,

of Trichoderma reesei

complex

Rollin, C.F., De Miguel, E. and Yaguchi,

Knowles,

of

Gene 26 (1983) 101-106.

the multienzyme

Penttila,

method

153 (1944)

using oligodeoxynucleotide-directed

B.H., Anderson,

M.G.,

Trends

M., Gelfand,

M.: Molecular

M.: Notes on sugar determination.

Van Tilbeurgh,

M 13 vectors

mutagenesis.

Somogyi,

Gelfand,

of glucose.

J., Kempe,

QM9414.

reesei cellulases.

1 (1983) 156-161.

for the determination 375-380.

Paice,

cellobiohydrolase

of endo-

133-146.

V., Ladner,

K. and Innis,

viridae.

Trichoderma

from

Acta 523 (1978b)

S., Schweickart,

S., Myambo,

J. Biochem.

B.S.:

Biotechnol.

Odegaard,

purified

Biophys.

mechanism

101 (1983) 20-75.

Nelson,

Biochim.

Biochem.

Manual.

Spring Harbor, Messing,

1,4-B-D-glucanases

Knowles,

N.P. (Eds.), Funda-

of endo-

Trichoderma

from

Biochim. Biophys. Acta 523 (1978a) 147-161. Shoemaker, S.P. and Brown, R.D.: Enzymic activities

Teeri, T., Salovuori,

P. and Teeri, T.T.: Cellulase

Appl. Microbial.

R.D.: Characterization

purified

L27. Bio/Technology

178-188. Joliff, G., Beguin,

fermentations.

S.P. and Brown,

Shoemaker,

Biol. Chem. 256 (1981) 7990-7997. Hormann,

controlled

20 (1984) 46-53.

1,4$-D-glucanases

M.: Synergism

from Trichoderma reesei in the degradation

of cellulases

during

Biotechnol. Shoemaker,

(1984) 351-359. Henrissat,

mutants

conservation

and an exoglucanase Funct.

Genet.

M. and

and region shuming from Cellulomonas

1 (1986) 335-341.

M., Roy, C., Rollin, C.F., Paice, M.G. and Jurasek,

A fungal cellulase

shows sequence

site of hen egg-white Commun.

lysozyme.

116 (1983) 408-411.

L.:

homology

with the active

Biochem.

Biophys.

Res.

22 Yanisch-Perron, C., Vieira, J. and Messing, J.: Improved M 13 phage cloning vectors and host strains: nucleotide sequences of the M13mp18 and pUC19 vectors. Gene 33 (1985) 103-J 19.

Zagursky, R.J., Berman, M.L., Baumeister, K. and Lomax, N.: Rapid and easy sequencing of large linear double stranded DNA and supercoiled plasmid DNA. Gene Anal. Techn. 2 (1986) 89-94. Communicated by A.J. Podhajska.