11
Gene, 63 (1988) Ii-21 Elsevier GEN 02293
EGIII, a new eudoglucanase from
~r~c~oder~~ reesei: the characterization of both gene and enzyme
(Recombinant DNA; cellulase; Trichoderma; gene sequence; module shufIling; active site; phage il vector)
M. Saloheimo”, P. Lehtovaara”,
M. Penttika,
T.T. Teeri”, J. Stihlhergb,
G. Johansson b, G. Petterssonb,
M. C1aeyssensc, P. TommeC and J.K.C. Knowles” u Biotechnicul Laboratory, VTT, SF-02150 Espoo (Finland) Tel. 3.58-0-4561: b Institute of Biochemistry, University of Uppsala, Biomedical Center, Uppsala (Sweden) Te!. 46-18-l 74000, and ELaboratory for Biochemistry. State University Ghent, Ghent [Bei~um) Tel: 32-91-22182 I Received
6 October
Accepted
30 October
1987 1987
Received
by publisher
20 November
1987
SUMMARY
A novel endoglucanase from T~chode~a reesei, EGIII, has been purified and its catalytic properties have been studied. The gene for that enzyme (eg13) and cDNA have been cloned and sequenced. The deduced EGIII protein shows clear sequence homology to a Schizophyllum commune enzyme (M. Yaguchi, personal communication), but is very different from the three other 7: reesei cellulases with known structure. Nevertheless, all the four T. reesei cellulases share two common, adjacent sequence domains, which apparently can be removed by proteolysis. These homologous sequences reside at the N termini of EGIII and the cellobiohydrolase CBHII, but at the C termini of EGI and CBHI. Comparison of the fungal cellulase structures has led to re-evaluation of hypotheses concerning the localization of the active sites. --
INTRODUCTION
The brown-rot fungus T. reesei is an efficient and well studied cellulose-degrading microorganism Correspondence VTT, Tietotie Abbreviations: BSA,
bovine
to: Dr. M. Saloheimo,
aa, amino acid(s); serum
albumin;
carboxymethylcellulose; endoglucanase; graphy;
polyacrylamide SDS, sodium turnover
bp, base pair(s);
cellobiohydrolase;
focusing
liquid
gel electrophoresis;
number.
sulfate;
frame;
pi, pH of isoelectric
TFA, trifluoroacetic
EG,
chromato-
on polyacrylamide
or 1000 bp; ORF, open reading dodecyl
CMC,
carboxymethylcellulase;
high-performance
isoelectric
Laboratory,
Tel. 358-O-4561.
AC, acetate;
CBH,
CMCase,
HPLC,
IEF-PAG,
kb, kilobase
Biotechnical
2, SF 02150 Espoo (Finland)
gel; PAGE, point;
acid; TON,
(Enari, 1983; Monten~ou~, 1983). For the degradation of crystalline cellulose to glucose in vivo, three types of cellulolytic enzyme are needed: endoglucanases (EC 3.2.1.4), cellobiohydrolases (EC 3.2.1.91) and /?-glucosidases (EC 3.2.1.21) (Enari, 1983). The degradation of cellulose by T. reesei cellulases is most efficient when at least two cellulase types act synergistically on cellulose (Hemissat et al., 1985; Fagerstam and Pettersson, 1980). The exact number of cellulolytic enzymes produced by T. reesei and their role in cehulose degradation is still unclear. The genes coding for two distinct cellobiohydrolases (CBHI and CBHII) have been cloned and sequenced (Shoemaker et al., 1983; Teeri et al., 1987a; Chen et al., 1987). The number of
0378-I 1~9~88/$03.50 0 1988 Elsevier Science Publishers B.V. {Biomedical Division)
12
different endoglucanases is less clear. Depending on the growth conditions and isolation methods used different groups have been able to identify a range of different endoglucanase proteins from T. reesei culture filtrates (Shoemaker and Brown, 1978a; Bhikhabh~ et al., 1984; Odegaard et al., 1984; Sheir-Neiss and Montenecourt, 1984). Only the gene for EGI has been sequenced (Penttila et al., 1986; Arsdell et al., 1987) and sections of the protein have also been sequenced (Bhikhabh~ and Pettersson, 1984). In this paper we report on the isolation and primary structure of the gene eg13 coding for a novel T. reesei endoglucanase, EGIII. Also the EGIII enzyme has been purified, and its amino acid composition and N-terminal sequence support the data obtained from the gene sequence. The enzymic properties of EGIII are unique, in part resembling those of the other endoglucanase EGI, and also showing some similarity to CBHII acting on small substrates. The evolutions relationships and the domain structures of these enzymes are discussed.
MATERIALS
AND METHODS
(a) Chemicals
SP-Sephadex, DEAE-Sepharose and Q-Sepharose Fast Flow were purchased from Pharmacia (Sweden). C~box~ethyl cellulose (0.7 DS) was from Hercules (France). The 4-methylumbelliferyl glycosides were synthesized as described (Van Tilbeurgh et al., 1982). Calf liver pyroglutamate aminopeptidase (EC 3.4.11.8) was from Sigma (U.S.A.). All other chemicals were of analytical grade. (b) Activity measurements
CMCase activity assay: 10 ~1 enzyme solution was pipetted to 2 ml 0.5% solution of CMC in 50 mM NaAc buffer, pH 5.0 (4°C). The reaction mixture was incubated at 40°C for 8 min. Reducing sugars liberated were determined according to Somogyi (1952) and Nelson (1944). Activity against 4-methylumbelliferyl-~-D-cello” trioside was measured fluorometrically and for the
higher homologues a HPLC procedure was used as described (Van Tilbeurgh et al., 1982). (c) Enzyme purification
Fraction Al from DEAE-Sepharose (Bhikhabhai et al., 1984) was concentrated by adsorption to a small SP-Sephadex column in 50 mM NH,Ac, pH 3.7 followed by elution with 20 mM Tris * HCl buffer, pH 6.5. This step reduced the volume from 1.5 liter to 15-20 ml. After a change of buffer to 8 mM Tris - HCl, pH 6.5 on a Sephadex G-25 column the material was fractionated by anion-exchange chromatography on a Q-Sepharose Fast Flow column, 2.0 x 20 cm. Elution was with a linear ionic-strength gradient 2 x 200 ml of NaCl, O-50 mM in 8 mM Tris . HCl, pH 6.5. Flow rate was 40 mI/h. (d) Analytical
methods
IEF-PAG was pe~ormed conventionally (LKB, Sweden). Enzyme activity detection was with 2 mM 4-methylumbelliferyl-B-D-cellotrioside (Van Tilbeurgh and Claeyssens, 1985). The molar absorption coefficient at 280 nm (77 OOO/M/cm)was calculated from the amino acid composition of EGIII. SDS-PAGE was performed as described by Maize1 (1969) using 10% w/v polyacrylamide in the separation gel. Amino acid analyses were run on a Durrum D500 analyzer after hydrolysis of the sample for 24 h at 110°C in 6 M HCl cont~n~g 2 mg phenol/ml. Serine and threonine values were calculated using the standard recovery factors of 0.90 and 0.96 respectively. Tryptophan was estimated spectrophotometrically. Prior to N-terminal sequence determination 40 nmol of lyophilized protein (reduced and carboxymethylated) were dispersed in 1 ml of deblocking buffer (Podell and Abraham, 1978) and flushed with nitrogen before use. The protein derivative was suspended in a whirlmixer and ultrasonic bath and pyroglutamate ~inopeptidase (10 units) was added. The mixture was incubated for 24 h at room temperature. After centrifugation at 3000 rev./min, the pellet was washed in standard electrophoresis ‘fixing’ solution (7 v0 HAc in methanol : H,O (50 : 50, v : v)) and dried in a stream of nitrogen and dissolved in 1 ml of 1% aqueous SDS.
An aliquot (25 ~1) of the protein-SDS adsorbed
to a TFA-treated
glass-fiber
disc. 37 cycles of sequencing
out according
solution was
to the procedure
were carried
described
by Hewick
acid identification in an Applied Biosystems 120A PTH analyzer. Initial coupling yield was estimated 92 2.
while the average
Unambiguous
33 aa residues Carbohydrate anthrone-sulphuric
of eg13
and polybrene-coated
et al. (1981) using an Applied Biosystems 470A gasliquid phase sequencer with ‘on-line’ PTH-amino
to be lo%,
(h) Isolation of the cDNA-copy
repetitive
identification
A T. reesei VTT-D-80133
cDNA bank was made
in E. coli using pUC8 as vector (Teeri et al., 1987b). The bank was screened by colony hybridization (Hanahan restriction
and Meselson, 1983). A nick-translated fragment was used as hybridization
probe.
yield was
was possible
for
(i) Sequencing
out of the first 36 aa. analysis
was performed
acid reagent
using the
with mannose
as
standard. The absorbance was measured at 585 nm, an isosbestic point where the absorbance given by hexose is independent of the tryptophan content of the sample (Hbrmann and Gollwitzer, 1962). (e) Strains and vectors Escherichia coli strain JM 109 (Yanisch-Perron et al., 1985) was used as a host for the cloning vectors pUC8 (Messing, 1983) and pUC18 (Norrander et al., 1983). T. reesei strain VTT-D-80133 (Bailey and Nevalainen, 1981) was used for RNA and DNA isolation.
The two clones containing
and
were terminally
deleted using exonuclease III and Sl nuclease according to Henikoff (1984). The gene sequence was determined from deleted plasmid subclones essentially according to Zagursky et al. (1986), as well as sequencing of inserts cloned in 2 vector using synthetic 17-mer oligodeoxynucleotide primers. The sequences were analysed with a computer according to Queen and Korn (1984). Methods not described above were carried out using standard techniques [e.g., Maniatis et al. (1982)].
RESULTS
(f) Isolation of the chromosomal
the chromosomal
cDNA gene copy in pUC plasmids
AND DISCUSSION
eg13 gene (a) isolation
A chromosomal T. reesei gene bank was made in the phage vector A1059 and differential hybridization of the bank was done as described (Teeri et al., 1983). The gene product was examined by hybrid mRNA selection, followed by in vitro translation using a rabbit reticulocyte lysate (Amersham) and SDS-PAGE analysis (Laemmli, 1970). (g) Isolation of mRNA For the isolation of the cellulase mRNAs T. reesei was grown in conditions described (Bailey and Nevalainen, 198 1) except that 2 % of lactose and 2% of soluble extract of distiller’s spent grain were added. Frozen mycelium was ground into line powder under liquid nitrogen and suspended into 5 ~01s. of guanidium isothiocyanate. Total RNA was isolated according to Chirgwin et al. (1983) and the poly(A) + RNA fraction was purified by oligo(dT)cellulose chromatography (Aviv and Leder, 1972).
and sequencing of the eg13 gene
A chromosomal gene bank of T. reesei was made in E. coli using 2 1059 as vector. Clones containing genes strongly expressed under cellulase induction were isolated by differential hybridization as described by Teeri et al. (1983). The clone characterized in the present study was first analyzed by hybrid selection and in vitro translation. A protein of 42 kDa was detected in SDS-PAGE. At this stage the cloned gene could only be identified as a gene efliciently expressed in T. reesei during the production of cellulases. To be able to subclone and characterize this gene within the about 18-kb long insert, the recombinant 2 clone was mapped with Southern hybridization using cDNA made of induced mRNA as probe. The Southern mapping suggested that a 2.3-kb HpaI fragment would contain the gene or at least most of it. The 2.3-kb HpaI fragment was thus isolated and cloned in pUC18 to obtain the genomic subclone. A full-length cDNA copy of 1550 bp was
14
isolated from a T. reesei cDNA bank made in pUC8 (Teeri et al., 1987b), using a 420 bp HincII fragment from the 5’ end of the genomic subclone as probe.
The cDNA clone and the genomic clone, both in pUC plasmids, were deleted with exonuclease III and S 1 nuclease
to generate
a series of deleted sub-
TECATTTCT GACCTGGATA GCTTTTCCTA TGGTCATTCC TATAAGAGAC ACGCTCTTTC GTCGGCCCGT AGATATCACA TTGGTATTCA GTCGCACAGA CCAAG-
110
ttgat cctccaacat gagttctatg agcccccccc ttgccccccc ccgttcacct tgacctgcaa tgagaatccc accttttaca agagcatcaa gccgtatcaa tggcg
220
ctgaa t%CCTCTGC TCGATAATAT CTCCCCGTCA TCGACA ATG AAC AAG TCC GE CCT CCA TTG CTG CTT GCA GCG TCC ATA CTA TAT GGC GGC GCC Met Asn Lys Ser Val Ala Pro Leu Leu Leu Ala Ala Ser Ile Leu Tyr Gly Gly Ala -21 +A Ava II GTC GCA CAG CAG ACT GTC TGG GGC CAG TGT GGA GGT ATT GGT TGG AGC %xT ACG AAT TGT GCT CCT GGC TCA GCT TGT TCG ACC CTC Val Ala Gin Gln Thr Val Trp Gly Gln Cys Gly Gly Ile Gly Trp Ser Gly Pro Thr Am Cys Ala Pro Gly Ser Ala Cys Ser Thr Leu -1 +1 10 20 Ava II AAT CCT TAT TAT GCG CAA TGT ATT CCG GGA CCC ACT ACT ATC ACC ACT TCG ACC CGG CCA CCA TCC WA ACC ACC ACC ACC AGG GCT Am Pro Tyr Tyr Ala Gln Cys Ile Pro Gly Ala Thr Thr Ile Thr Thr Ser Thr Arg Pro Pro Ser Gly Pro Thr Thr Thr Thr Arg Ala 30 40 50 BACC TCA ACA AGC TCA TCA ACT CCA CCC ACG AGC TCT GGG GTC CGA ?TT GCC GGC'GTT AAC'ATC GCG GGT TTI GAC RT GGC TGT ACC ACA Thr Ser Thr Ser Ser Ser Thr Pro Pro Thr Se,’ Ser Gly Val At-gPhe Ala Gly Val Am Ile Ala Gly Phe Asp -Phe 2 Cys Thr Thr 60 70 75 80
318
r
A--l/B
498
"paI
1
GA~accc
408
ttgtttcctg gtgttgctgg ctggttgggc gggtatacag cgaagcggac gcaagaacac cgccggtccg ccaccaccaa gatgtgggcg gtaagcggcg
588
700
AS gtgttttgta
caactacctg
AAG AAC TTC ACC GGC & Asn Phe Thr Gly 7 CCT GTC GGA TGG CAG -Pro Val Gly Trp Gln TGC CTG TCT CTG GGC Cys Leu Ser Leu Q
TCG AAG GTT TAT CCT CCG TTG Ser Lys Val Tyr pr0 Pro Leu 100
801
GGG ATG ACT Am TTC CGC TTA Gly Met Thr Ile Phe Arg Leu 130
891
TAT CAT CAG CIT GTT CAG GGG Tyr Asp Gln Leu Val Gin Gly 160
981
GGT CAG GGC GGC CCT ACT AAT Gly Gln Gly Gly pr0 Thr Asn 190
1071
TCA AAG TAC GCA TCT CAG TCG AGG GTG TGG TPC GGC ATC ATG AAT GAG CCC CAC GAC Ser Lys Tyr Ala Ser Gin Ser Arg -_ Val Trp Phe Gly Ile Met Asn Glu Pro His & 210 ----220
1161
ac%T GGC ACT TGC GTT ACC p Gly Thr Cys Val Thr 90 nine II TCA AAC AK TAC CCC CAT GGC ATC GGC GAG ATG CAG CAC TTC'GTC AAC'GAG GAC Ser Am Asn Tyr Pro Asp Gly -Ile Gly Gln -Met Gin His Phe Val Am Glu Asp 110 120 Hint II TAC CTC ?%%??AAC AAT TTG GGC GGC AAT CTT GAT TCC ACG AGC ATT TCC AAG Tyr Leu Val Asn Asn Am Leu Gly Gly Am Leu * Ser Thr Ser Ile Ser Lys 140 -150 Sal I GCA TAC TGC ATC kTC GAC'ATC CAC MT TAT GCT CGA TGG AAC GGT GGG ATC ATT Ala TJJ Cys Ile Val Asp Ile _His Asn Tyr Ala Arg Trp Asn Gly Gly Ile e 170 1RO aca&ztcact
caSRaact@
~aattaatg~
GCT CM TTC ACG AGC CTT TGG TCG CAG TTG GCA Ala Gln Phe Thr Ser Leu a Ser Gln Leu Ala 200 Ava II GTG AAC ATC AK ACC TGG GCT GCC ACG GTC CAA Val Am Ile Am Thr Trp Ala Ala Thr Val Gin 230 -
aagtcttgtt
CCT Pro
1251
ACG Thr
I341
AAT CTG ATT TTT GAC GTG CAC AAA TAC TCG GAC TCA GAC AAC TCC GGT ACT CAC GCC GAA TGT ACT ACA AAT AAC ATI GAC GGC GCC TTT Asn Leu Ile Phe Asp Val His Lys w Leu & Ser & Asn Ser 9 Thr His Ala Glu Cys Thr Thr Asn Asn -Ile Asp Gly Ala Phe -_ 290 300 310
I431
TCT CCG CTT GCC ACT TGG CTC CGA CAG AAC AAT CGC CAG GCT ATC mG ACA GAA ACC GGT GGT GGC AAC GTT CAG TCC TGC ATA CAA GAC Ser Pro Leu -_ Ala Thr 2 Asn Asn ArS Gin Ala Ile -Leu Thr Glu Thr 9 Gly % Asn Val Gln Ser Cys Ile Gln Asp -Leu Arg Gln -320 330 340
1521
ATG TGC CAG CAA ATC CA.4TAT CTC AAC CAG MC TCA GAT GTC TAT CTT GGC TAT GTT GGT TGC GGT GCC GGA TCA TTT GAT AGC ACG TAT Met Cys Gln Gln Ile Gln s Leu Asn Gln Asn Ser Asp Val Tyr Leu e Tyr Val Gly _TleGly Ala Gly Ser Phe Asp Ser Thr w -370 350 360
1611
GTC CTG ACG GAA ACA CCG ACT AGC AGT GGT AAC TCA TGG ACG GAC ACA TCC TTG GTC AGC TCG TGT CTC GCA AGA AAG TAG CACTCTGAGC Val Leu -Thr Glu Thr Pro Thr Ser Ser 9 Am Ser Trp Thr * Thr Ser Leu Val Ser Ser Cys -Leu Ala Arg Lys 3RO 390 397
1702
TGAATGCAGA AGCCTCGCCA ACGTTTGTAT CTCGCTATCA AACATAGTAG CTACTCTATG AGGCTGTCTG TTCTCGATTT CAGCTTTATA TAGTTTCATC AAACAGTACA
1812
GAG GTT GTA ACC GCA ATC CGC AAC GCT GGT GCT ACG TCG CAA TTC ATC TCT TTG GIu 'la1Val Thr Ala Ile 9 Am ----Ala % Ala Thr Ser Gln Phe Ile Ser Leu 240 250 Rinc II GGA AAT GAT TGC CAA TCT GCT GGG GCT TTC ATA TCC GAT GGC AGT GCA GCC GCC CTG TCT CAA GTC ACG ACC CCG GAT GGi?mACA % Asn Asp TKJJGln Ser Ala Gly Ala Phe Ile Ser Asp Gly Ser Ala Ala -Ala Leu Ser Gln Val -Thr Am Pro Asp Gly Ser Thr 260 270 280
TATTCCCTCT GTGGCCACGC (A)17 1R49
Fig. 1. Nucleotide The proposed cellulase
sequence
of the egN structural
signal sequence
genes sequenced
lower-case
letters.
underlined
in the sequence
gene with restriction
of 21 aa is underlined.
are shown in Fig. 6. The putative
Their border
sequences
of the mature
are underlined. protein.
sites and the deduced
Near the N terminus, N-glycosylation Amino
amino acid sequence
two blocks of aa (A and B) homologous site is marked
acids that can be aligned
with an asterisk.
of the EGIII protein. to all other T. reesei
Introns
are written
with the S. c~mmtlne EGI sequence
with are
15
clones for sequencing
(Henikoff,
1984). The cDNA
not used. As in the other cellulase
genes (Penttila
clone was deleted from the 5’ end and the genomic
et al., 1986; Teeri et al., 1987a;
clone from the 3’ end, and the noncoding
1983) there is a bias against NTA codons,
and coding
strands, respectively, were sequenced. The 2.3-kb chromosomal clone was found to lack about 300 bp of the N-terminal
protein-coding
the gene was therefore recombinant nucleotide
2 clone sequencing
region. This part of
sequenced using primers.
from the initial
specific
oligodeoxy-
The sequence
of the
Shoemaker
et al.,
where N
is any nucleotide. Comparison of the cDNA and gene sequences reveals two introns, both exceptionally long for T. reesei cellulase genes and among the longest found in filamentous
fungal
genes (Penttila
et al., 1986;
Teeri et al., 1987a). The first, 123-bp long intron
is
whole coding region of the gene was thus determined
in the 5’-flanking
from both strands
start codon ATG (Fig. 1). The second intron of 174
The derived
of DNA.
gene sequence
and the protein
se-
quence deduced from DNA are shown in Fig. 1. An ORF of 418 aa can be found in the cDNA sequence. As shown in section c, below, this deduced protein had an identical amino acid composition with an endoglucanase component purified from the supernatant of T. reesei. The identity was confirmed by N-terminal sequencing of the protein. The codon usage of the eg13 gene shows that CAT(His), CGT(Arg) and CTA(Leu) codons are
bp is at the position
region 33 bp upstream
from the
coding for aa 89 in the mature
protein. (b) Isolation of the EC111 protein Two different T. reesei endoglucanases, EGI (then called Endo II) and EGII (then called Endo III) have previously been purified using DEAE-Sepharose chromatography at pH 5.0 as the first fractionation step (Bhikhabhai et al., 1984). A CMCase-rich
Al:1
9 8
3 2 1 0 FRACTION Fig. 2. Anion-exchange
chromatography
2.0 x 20 cm. Linear gradient: rate: 40 ml/h. Fractions
Al
of the Al material
No
on Q-Sepharose
Fast Flow using an ionic-strength
0 to 50 mM NaCl in 8 mM Tris HCI, pH 6.5. Gradient
: 1, Al : 2 and Al : 3 are pooled.
gradient.
volume: 2 x 200 ml. Fraction
Column
size:
volume: 4 ml. Flow
16
fraction (designated Al) was not adsorbed to the anion exchanger under these conditions. However, it was adsorbed to the anion exchanger.QSepharose Fast Flow at pH 6.5. Elution of the column using a linear gradient CMCase
of NaCl yields three distinct
activity
(Fig. 2). These
peaks of
fractions,
called
Al : 1, Al : 2 and Al : 3, were pooled and rechromatographed overlapping the three
under
the same
material fractions
conditions
to remove
from surrounding gave symmetrical
cating fraction
homogeneity.
(c) Molecular
and enzymatic
peaks. peaks,
All indi-
properties
The M,s for Al: 1, Al : 2 and Al : 3 were 48, 48 and 37 kDa, respectively, as estimated by SDS-PAGE (Fig. 3). Al:2 and Al: 3 showed minor impurities of lower M,s, whereas Al : 1 produced only one band on the gel. Using chromogenic glycosides of the cellodextrins as substrates and analytical IEF a tentative differentiation of the cellobiohydrolases and endocellulases present in T. reesei culture filtrates was obtained (Van Tilbeurgh and Claeyssens, 1985). One component (EGIII) with apparent pI of 5.5-5.6 liberates the fluorescent phenol from 4-methylumbelliferyl-/?D-cellotrioside and is the only enzyme to do so (Fig. 5). All the three fractions (A 1: 1; A 1: 2 and A 1 : 3) proved to be active on this substrate (Fig. 4). Amino acid compositions were determined for the purified proteins and compared to the amino acid composition deduced from the nucleotide sequence. One of the components, Al : 1, was found to correspond well with the deduced gene product (Table I). The other two components in A 1 revealed similarities in amino acid composition and catalytic properties, suggesting that they may be related (to be published). A protein puritied by Shoemaker and Brown (1978), then called Endo IV, had an amino acid composition closely resembling that of the deduced protein (Table I). The A4, and carbohydrate determinations also gave results similar to those presented here. It is thus likely that they had purified a similar if not the same protein. EGIII is clearly a protein different from Endo III purified by Bhikhabhai et al. (1984) in terms of both amino acid composition and N-terminal amino acid sequence.
1. Fig. 3. SDS-PAGE
2.
of the pooled
-3.
4.
Al-fractions.
200 pg of lyo-
philized protein was boiled for 3 min in 200 ~1 83 mM Tris-HCl, pH 8.8 containing samples
3% SDS and 10 mM dithioerythritol.
were then cooled
added to alkylate
all SH-groups.
pl of final reaction
mixture
gel. The gel (10% acrylamide; at room temperature 1970). Lane 1: Al
The
and 10 ~1 of 0.5 M I-acetamide Reaction
was
time was 30 min. 20
(r 20 pg protein)
was added to the
0.1% SDS) was run for 3.5 hours
(200 V) with a buffer described
: 3. Lane 2: Molecular
(Laemmli,
weight markers
of 94,
67, 43, 30, 20.1 and 14.4 kDa. Lane 3: Al: 1. Lane 4: Al:2. (Fractions
of Al are shown in Fig. 2.)
The N terminus of EGIII was found to be blocked. After removal of the N-terminal pyroglutamyl residue, EGIII was subjected to 36 steps of automated Edman degradation. 33 aa residues could be unambiguously identified and the sequence obtained is identical to that deduced from the gene sequence. The three unidentified residues correspond to serine and threonine residues 3, 26 and 27, which are probably glycosylated, thus giving unidentifiable PTH-derivatives. The translated
17
TABLE
PI
Amino
I acid composition
4.0
<
< 5s
c 5.6
Endo IV”
EGHI b
egf3 gene* 12
12
ND
CYS Asn + Asp
49
48.8
49
Thr
44
42.7
40
Ser
42
43.9
38
Gin + Glu
32
28.1
34
Pro
19
19.5
23
Gly Ala
42
44.8
41
28
29.0
32
Met
4
3.5
3
Val
22
22.1
21
Ile
21
18.8
19
Leu
23
23.0
21
Tyr Phe
14
13.6
13
13
12.5
12
His
5
5.2
6
Lys
6
7.8
I
Trp
11
Arg
- 10
10.5
- 11
Total
397
373.8
398
a Deduced b Normalized
from nucleotide
ND
10
sequence.
to 374 aa residues.
Same as Al
: 1 in Figs. 2,3 and
4. c According
to Shoemaker
and Brown (1978a).
ND, not determined.
3
2
1
0 Fig. 4. IEF-PAG
of the pooled
Al
fractions.
Analytical
electric focusing was performed
on a LKB Multiphore
and
(pH
Ampholine
PAG
plates
3.5-9.5
, degree of cross-linkage
gelconc.
5”/,
dialysed
against
tapwater.
Voltage was gradually
two hours.
Isoelectric
in parallel
(not shown).
fluorimetrically (Van Tilbeurgh solution
with
1:
cooled
pg) onto the with running
(100-1000
proteins
2.4%;
were first
volt) over
(LKB) were run
activity was detected
4-methyl-umbelliferyl-~-D-cellotrioside 1985) by flooding the gel with a
(0.5 mM in sodium
active fractions
bands upon transillumination conventionally
increased
After the focusing
and Claeyssens,
and were photographed
plate
point standard
of the glycoside
5.0). Enzyme
ampholines
3%). Samples
distilled water and applied (lo-30
gel (5 x 11 x 0.1 cm), supporting
iso-
apparatus
became
buffer, pH
visible as blue fluorescent
with a 6 x 15 W UV lamp (302 nm)
(Polaroid
with Coomassie
acetate
region starts with a signal sequence of 21 aa. The mature protein then has two Gin residues at the N terminus, but the first is converted in vivo to the blocking pyroglutamate residue. The signal sequence cleavage site would thus be almost identical to that of 7: reesei EGI (Penttila et al., 1986). Almost all T~c~o~e~~ cellulases characterized so far are glycoproteins (Bhikhabhai et al., 1984; Beldman et al., 1985). The total sugar content of EGIII was estimated to be an average of 47 mannose equivalents per protein molecule. This corresponds to 15% w/w of the total mass 49.8 kDa, calculated from the formula weight 42.2 kDa of the 397-aa residue peptide chain, plus 7.6 kDa given by the 47 hexose units.
53). The gel was then stained blue. pI as indicated:
lane 1, A
(d) Specificity against small substrates as compared with EGI, CBHI and CBHII
1; lane 2, A 1: 2; lane 3, A 1 : 3.
The main products cleaved by EGIII from cellodextrins (containing two to five glucose units) and
Fig. 5. Substrate
specificity
and cellopentaose pyranose
(reducing
cm RSil Polyol chromophoric
end); upward
IO pm, Alltech)
reaction
products
5.0, 25°C) was incubated of this mixture
of EGIII against
4-methylumbelliferyl-derived
could not be unequivocally
their chromophoric In contrast with
arrow, main hydrolysis was performed
Symbols:
nM enzyme
Peak identification
and under&d
by isocratic
cellodextrins.
0, /?-glucopyranosyl
site; dashed upward
at 313 nm (Van Tilbeurgh
with 2-20
was injected.
determined.
elution
(-I ,4);
arrow, secondary
and after IO-20 min 50-pl aliquots was done by comparison
glycosides are shown in Fig. 5. the chromophoric substrates
cleavage sites for the underived cellodextrins (cellotriose, cellopentaose) could not unequivocally be assessed. K, values for the 4-methylumbelliferyl derivatives were in the lo-60-PM range whereas those for EGI acting on the same substrates, but of course giving different reaction product patterns, are in the mM range (H. van Tilbeurgh, unpublished). Turnover numbers (TON) for EGIII (lo-200/min) are 50-200 times lower than for EGI. Consequently the catalytic efficiency (TON/K,) for both enzymes acting on the same low-M, substrates is of the same magnitude (l-2 x 106/M/min). In contrast to both EGI and CBHI, cellobiosides (and lactosides) are not hydrolysed by EGIII. The specificity of this enzyme shows therefore some similarity to that of CBHII (Van Tilbeurgh et al., 1985). This is interesting in view of the fact that both EGIII and CBHII have the homologous domains at the N termini in contrast to CBHI and EGI which have similar sequences at the C termini (see section e, below). No transferase activity was observed for EGIII in contrast to findings with EGI (H. Van Tilbeurgh, unpublished). The pH optimum as determined with the cellotrioside was at pH 4.0-5.0 and the stability of the enzyme at this pH in diluted (< 1 FM) solution (SO’C) was enhanced by the addition of BSA (1 mg/ml).
hydrolysis
(25 : 75, water-acetonitrile,
et al., 1982). 100 PM substrate
The cutting points of cellotriose
l ,4-methylumbelliferyl;
solution
q , b-gluco-
site. HPLC analysis (25 x 0.5
1.5 ml/min)
and detection
(50 mM sodium acetate
were diluted with 100 ~1 acetonitrile.
of retention
times with those of adequate
of the
buffer pH Twenty
~1
standards.
(e) Terminal modules are conserved in Trichoderma reesei cellulases Remarkable
sequence
homology
can be detected
between the N terminus of EGIII and N- or C-terminal regions of three other T. reesei cellulases (Shoemaker et al., 1983; Penttila et al., 1986; Teeri et al., 1987a). Two homologous regions of about 35 aa, denoted as A and B, are found at the N terminus of EGIII and CBHII and in the C terminus of CBHI and EGI (Fig. 6). It is important to note that outside the relatively short terminal A and B blocks, EGIII shows very little if any homology to the other T. reesei cellulases. The A block of EGIII is about 70% homologous to the A regions in other cellulases, and the homology in the B blocks is 50-60%. Being so much alike in these four proteins, these regions almost certainly have a functional significance. These A-B blocks fold to a domain, which can be removed from both CBHI and CBHII by limited proteolysis. The core enzymes have a significantly decreased affinity to microcrystalline cellulose while activity on soluble substrates is retained (Van Tilbeurgh et al., 1986; Tomme et al., 1987). This type of domain structure is found in all four of the Trikhoderma cellulases and an analogous overall architecture is found also in the cellulases of Cellulomonasfimi and Clostridium thermoceflum (Warren et al., 1986; Joliff et al., 1986). We have proposed that this type of structure serves not
19
CBH I
ss
B
A
5 I
EG I
ss
BA
CBH II ss
A B B’ . .....’.. . . :)I.: :..;.:’
1
EG Ill SS A B
Fig. 6. Basic structures Introns
of 7’. reesei cellulases.
SS,
signal sequence. (A) and (B), amino acid blocks common to all T. reesei cellulases.
are shown by a thick solid line. In CBHI and EGI the homologous
are in the N terminus
boxes are in the C terminus
(Fig. 1). Apart from the boxes, EGIII shows very little homology
only to decrease the effective Km of the enzyme with natural substrates but also to release cellulose chains from the cellulose crystal prior to hydrolysis by the catalytic domain (Knowles et al., 1987). It is possible that the proteolytic removal of the terminal homologous regions could serve as an in vivo mechanism to alter the properties of cellulases during the hydrolysis, when complex substrates are gradually changed to shorter and more accessible substrates. Like the other T. reesei cellulases, EGIII might also be processed this way. This idea is supported by the amino acid composition data of Shoemaker and Brown (1978b), who isolated not only an EGIII-like enzyme, but also what they presumed to be the proteolytic cleavage product. This contained less carbohydrate, and its amino acid composition now seems to be well explained by removal of the terminal homologous sequences A and B. The B region is rich in serines and threonines. In EGIII, 21 aa residues out of 34 are serine or threo-
and in CBHII
with the other cellulases
and EGIII they
(from Teeri et al., 1987a).
nine. The terminal homologous regions in CBHI and CBHII are the sites for most of their 0-glycosylation (Fagerstam et al., 1984; Tomme et al., 1987). On the basis of homology, we suggest that also the B region of EGIII could be heavily 0-glycosylated. Within the B block there are live proline residues. These prolines might help the close packing of the carbohydrate chains, as has been demonstrated for some heavily 0-glycosylated proteins (Allen, 1983). The EGIII protein sequence contains only one putative N-glycosylation site Asn-Phe-Thr, at the aa positions 103-105. It thus seems most likely that EGIII protein is very heavily 0-glycosylated. As in CBHI and CBHII (Teeri et al., 1987a) the cysteine residues of EGIII are concentrated near the ends of the protein molecule. Six cysteines out of twelve are near the N terminus and four at the C-terminal end of EGIII. It is possible that the terminal regions of the molecule fold into domains stabilized by disulfide bridges.
(f) EGIII shows extensive homology with a Schizuphyllum commune endoglucanase The basidiomycete to produce EGII
S. commune has been shown
two forms of endoglu~anases,
(Paice et al., 1984). EGII
processed N-terminus.
The T. reesei EGIII
to the S. commune EC1 protein
Yaguchi, personal
comm~ication).
residues
are of course
and must be tested with other methods
such as site-directed mutagenesis modification studies.
and active-center
to be
cleavage at the
homologous
regions of EGIII homologous
of the active-site
speculative
EGI and
is proposed
from EGI by a proteolytic
dictions
sequence
is clearly
Leaving out the
to other T. reesei cellu-
lases (A and B box), the degree of identity
ACKNOWLEDGEMENTS
(M.
of EC111
The authors providing
wish to thank
the protein
sequence
Dr. M. Yaguchi
for
of the S. commune
endoglucanase.
and S. commune EGI is 30.4% (Fig. 1). If conservative substitutions are allowed, this homology increases to more than 40%. It is thus most likely that the genes coding for T. reesei EGIII and S. ~ornrn~~e EGI have evoived by divergent evolution from a common ancestor. However, the S. commune endoglucanases isolated from the growth medium do not have the amino acid blocks A and B that are common to all T. reesei cellulases sequenced so far (Teeri et al., 1987a). It will be interesting to see if the A and B boxes are coded for by the S. commune egll gene and then proteolytically processed later. In the homologous region between T. reesei EGIII and S. commute EGI, two cysteines are conserved (aa residues 302 and 338 in EGIII). Also Cys92 has its counterpart nearby in the S. commune protein. The conservation of the cysteines indicates that they might form disultide bridges with each other or with other cysteines. The reaction mechanism semble that of lysozymes,
of cellulases might rewith carboxyl groups
taking part in the catalysis. On the basis of some limited amino acid homology found between the N termini of lysozyme and S. commune EGI, it has been suggested that the residues Glu47, Asn56 and Asp64 of S. commune EGI would participate in the catalysis (Paice et al., 1984; Yaguchi et al., 1983). However, in the T. reesei EGIII sequence these residues are not conserved, and the whole region is dearly nonhomologous. Comparison of the two endoglucanase sequences suggests that the active site residues may be located elsewhere, probably in the best conserved regions of the two proteins. Interestingly, Glu218 and Asp292 of T. reesei EGIII can be aligned with Glu and Asp residues in the S. commute enzyme and they are also located in the best conserved regions of the proteins (Fig. 1). Pre-
REFERENCES Allen, A.: Mucus - a protective secretion of complexity. Trends Biochem. Sci. 8 (1983) 169-173. Aviv, Ii. and Leder, P.: Purification of biologically active globin messenger RNA by chromato~aphy on oligothymidylic acid cellulose. Proc. Natl. Acad. Sci. USA 69 (1972) 1408-1412. Bailey, M.J. and Nevalainen, K.M.H.: Induction, isolation and testing of stable Trichoderma reesei mutants with improved production of solubilizing cellulase. Enzyme Microb. Techno]. 3 (1981) 153-157. Beldman, G., Searle-Van Leeuwen, M.F., Rombouts, F.M. and Voragen, F.G.: The cellulase of Trichde~ma viride. Eur. J. Biochem. 146 (1985) 301-308. Bhikhabhai, R. and Pettersson, L.G.: The cellulolytic enzymes of Trichoderma reesei as a system of homologous proteins. FEBS Lett. 167 (1984) 301-308. Bhik~abha~, R., Joh~sso~, G. and Pettersson, L.G.: Isolation of cellulolytic enzymes from T~~ode~a reesei QM 9414. J. Appl. Biochem. 6 (1984) 336-345. Chen, CM., Gritzali, M. and Stafford, D.W.: Nucleotide sequence and deduced primary structure of cellobiohydrolase II of Trichoderma reesei. Bio/Iechnology 5 (1987) 274-278. Chirgwin, J.M., Przybyla, A.E., MacDonald, R.J. and Rutter, W.J.: Isolation of biologically active ribonucleic acid from sources enriched in ribonuclease. Biochem. J. 18 (1979) 5294-5299. Enari, T.-M.: Microbial cellulases. In W.M. Fogarty (Ed.), Microbial Enzymes and Biotechnology. Applied Science Publishers, London, 1983, pp. 183-223. Fagerstam, L.G. and Pettersson, L.G.: The i-4-p-glucan cellobiohydrolases of Trichoderma reesei QM9414. A new type of cellulolytic synergism. FEBS Lett. 119 (1980) 97-100. Fagerstam, L.G., Pettersson, L.G. and Engstrom, J.A.: The primary structure of a 1,4-/_I-glucancellobiohydrolase from the fungus Trichodenna reesei QM9414. FEBS Lett. 167 (1984) 309-31s. Hanahan, D. and Meselson, M.: Plasmid screening at high colony density. Methods Enzymol. 100 (1983) 333-342.
21
Henikoff,
S.: Unidirectional
creates
targeted
digestion
breakpoints
with
exonuclease
for DNA sequencing.
III
Gene 28
B., Driguez,
cellulose. Hewick,
H., Viet, C. and Schtilein,
Bio/Technology
A gas-liquid
of
3 (1985) 722-726.
R.M., Hunkapiller,
M.W., Hool, L.E. and Dreyer,
solid phase
peptide
and protein
W.J.:
sequencer.
J.
von Hexosen
in
H. and Gollwitzer,
Tryptophan-haltigen
R.: Bestimmung
Eiweisskorpern.
Ann. Chem. 655 (1962)
the cellulase
P. and Aubert, gene encoding
J.-P.: Nucleotide
sequence
of
D of Clostridium
endoglucanase
thermocellum. Nucl. Acids Res. 14 (1986) 8605-8613. Knowles,
J., Lehtovaara,
families
and their genes. Trends Biotechnol. 5 (1987) 255-261. Laemmli, U.: Cleavage of structural proteins during the assembly of bacteriophage
T4. Nature
Maize1 Jr., J.V.: Acrylamide
227 (1970) 680-685.
gel electrophoresis
nucleic acids. In Habel, K. and Salzman, mental Techniques
in Virology,
of proteins
and
Academic
Press, New York,
1969, pp. 334-362. Maniatis,
T., Fritsch,
A Laboratory
E.F. and Sambrook,
J.: Molecular
Cold Spring Harbor
Cloning.
Laboratory,
Cold
NY, 1982.
J.: New Ml3 vectors
Montenecourt,
for cloning.
Methods
Enzymol.
Trichoderma
N.J.: A photometric
Norrander, improved
adaptation
T. and
of the Somogyi J. Biol. Chem.
Messing,
J.: Construction
P.C. and Lowrie, R.E.: Resolution
cellulase
J. Appl. Biochem. Desrochers,
endoglucanase
M., Lehtovaara,
J.: Homology
of pyroglutamic
between
hydroR. and
genes of Tricho-
sequence
of the endo-
Res. Commun.
amino
for the removal
terminus
of proteins
peptidase.
Biochem.
81 (1978) 176-185.
L.J.: A comprehensive
for the IBM personal
computer.
1 (1983) 691-696.
cellulases
cellulase
J. Biol. Chem. 195
cellulolytic
enzymes:
J.,
S., Salovuori,
gene sequence
I. and
in Trichoderma reesei
domains
and expression
of cello-
V., Lehtovaara,
P. and Knowles,
J.D.: Anal.
164 (1987b) 60-67.
P., Van Tilbeurgh, Teeri,
T.T.,
H., Pettersson,
Knowles,
of cellulose
J.
and
Van Arsdell,
G., Vandekerchove, Claeyssens,
hydrolysis:
in Trichoderma cellulases
function
analysis
The
by partial proteolysis.
J.N., Kwok,
S., Schweickart,
V.L., Ladner,
D.H. and Innis, M.A.: Cloning,
and expression
Eur. M.B.,
characterization,
in Saccharomyces cerevisiae of endoglucanase
H. and Claeyssens,
of cellulase
5 (1987) 60-64.
M.: Detection
components
and differen-
using low molecular
mass
substrates. FEBS Lett. 187 (1985) 283-288. H., Claeyssens, M. and De Bruyne, C.K.: The use
of4-methylumbelliferyl the study
M.:
of domain
(1987) in press.
I from Trichoderma reesei. Bio/Technology tiation
reesei.
II. Gene 51 (1987a) 43-52.
Teeri, T., Kumar, Tomme,
cloning
Trichoderma
from
P., Kauppinen,
J.: Homologous
biohydrolase
J.: The molecular
gene
1 (1983) 696-699.
Teeri, T.T., Lehtovaara,
and other chromophoric
of cellulolytic
Van Tilbeurgh,
enzymes.
sequence
analysis
Nucl. Acids Res. 12
B.S.: Characterization
of Trichoderma reesei wild
M., Pettersson,
sens, M.: Studies
FEBS
glycosides
Lett.
interactions
ofthe type
and
L.G., Bhikhabhai,
of the cellulolytic
reesei QM 9414. Reaction
in
149 (1982)
system
specificity
of small substrates
cellobiohydrolase
R. and Claeysof Trichoderma
and thermodynamics
and ligands
of
with the 1,4-/?-
II. Eur. J. Biochem.
148 (1985)
329-334. Van Tilbeurgh,
H., Tomme,
and Pettersson, I from
P., Claeyssens,
G.: Limited proteolysis
M., Bhikhabhai,
R.
of the cellobiohydro-
Trichoderma reesei. FEBS
Lett.
204 (1986)
223-227. Warren,
R.A.J., Beck, CF., Gilkes, N.R., Kilburn,
D.G., Langs-
ford, M.L., Miller Jr., R.C., O’Neill, G.P., Schenfens, Sequence
in an endoglucanase jimi. Prot. Struct. Yaguchi,
G. and Montenecourt,
of exo-
Trichoderma reesei strain
from
I. and Knowles,
major
Wong, W.K.R.:
(1984) 581-599. secreted
the
Bio/Technology
lase
G.N.: A technique
acid from the amino
C. and Kern,
Sheir-Neiss,
of
glucan
H., Bhikhabhai,
cellulase
nucleotide
using calf liver pyroglutamate
program
Schizophyllum com-
I gene. Gene 45 (1986) 253-263.
Podell, D.N. and Abraham,
Queen,
M.: Two forms of
2 (1984) 535-539.
derma reesei: complete glucanase
L., Roy, C.,
to other fl-1,4-glycoside
P., Nevalainen,
D., Kwok,
cloning
152-156.
D., Jurasek,
from the basidiomycete
lases. Bio/Technology
of
6 (1984) 156-183.
M., Rho,
mune and their relationship
Biophys.
I derived
viridae.
(1952) 19-23.
fluorogenic Van Tilbeurgh,
of Trichoderma reesei
complex
Rollin, C.F., De Miguel, E. and Yaguchi,
Knowles,
of
Gene 26 (1983) 101-106.
the multienzyme
Penttila,
method
153 (1944)
using oligodeoxynucleotide-directed
B.H., Anderson,
M.G.,
Trends
M., Gelfand,
M.: Molecular
M.: Notes on sugar determination.
Van Tilbeurgh,
M 13 vectors
mutagenesis.
Somogyi,
Gelfand,
of glucose.
J., Kempe,
QM9414.
reesei cellulases.
1 (1983) 156-161.
for the determination 375-380.
Paice,
cellobiohydrolase
of endo-
133-146.
V., Ladner,
K. and Innis,
viridae.
Trichoderma
from
Acta 523 (1978b)
S., Schweickart,
S., Myambo,
J. Biochem.
B.S.:
Biotechnol.
Odegaard,
purified
Biophys.
mechanism
101 (1983) 20-75.
Nelson,
Biochim.
Biochem.
Manual.
Spring Harbor, Messing,
1,4-B-D-glucanases
Knowles,
N.P. (Eds.), Funda-
of endo-
Trichoderma
from
Biochim. Biophys. Acta 523 (1978a) 147-161. Shoemaker, S.P. and Brown, R.D.: Enzymic activities
Teeri, T., Salovuori,
P. and Teeri, T.T.: Cellulase
Appl. Microbial.
R.D.: Characterization
purified
L27. Bio/Technology
178-188. Joliff, G., Beguin,
fermentations.
S.P. and Brown,
Shoemaker,
Biol. Chem. 256 (1981) 7990-7997. Hormann,
controlled
20 (1984) 46-53.
1,4$-D-glucanases
M.: Synergism
from Trichoderma reesei in the degradation
of cellulases
during
Biotechnol. Shoemaker,
(1984) 351-359. Henrissat,
mutants
conservation
and an exoglucanase Funct.
Genet.
M. and
and region shuming from Cellulomonas
1 (1986) 335-341.
M., Roy, C., Rollin, C.F., Paice, M.G. and Jurasek,
A fungal cellulase
shows sequence
site of hen egg-white Commun.
lysozyme.
116 (1983) 408-411.
L.:
homology
with the active
Biochem.
Biophys.
Res.
22 Yanisch-Perron, C., Vieira, J. and Messing, J.: Improved M 13 phage cloning vectors and host strains: nucleotide sequences of the M13mp18 and pUC19 vectors. Gene 33 (1985) 103-J 19.
Zagursky, R.J., Berman, M.L., Baumeister, K. and Lomax, N.: Rapid and easy sequencing of large linear double stranded DNA and supercoiled plasmid DNA. Gene Anal. Techn. 2 (1986) 89-94. Communicated by A.J. Podhajska.