A developmentally regulated cysteine protease gene family in Haemonchus contortus

A developmentally regulated cysteine protease gene family in Haemonchus contortus

Molecular and Biochemical Parasitology, 43 (1990) 181-192 181 Elsevier MOLBIO 01415 A developmentally regulated cysteine protease gene family in H ...

884KB Sizes 2 Downloads 114 Views

Molecular and Biochemical Parasitology, 43 (1990) 181-192

181

Elsevier MOLBIO 01415

A developmentally regulated cysteine protease gene family in H a e m o n c h u s contortus D i c k s o n Pratt l, G e o r g e N. C o x 1, M i c h a e l J. M i l h a u s e n I and R u d o l p h J. B o i s v e n u e 2 ] Synergen, Inc., Boulder, CO, U.S.A. and 2 Animal Health Discovery, Lilly Research Laboratories, Eli Lilly and Company, Greenfield, IN, U.S.A.

(Received 11 April 1990; accepted 19 June 1990)

The nucleotide sequence of a gene encoding a 35-kDa thiol protease of the parasitic nematode Haemonchus eontortus has been determined. The gene, designated AC-2, shares 97% nucleotide sequence identity and 98% amino acid identity with previously characterized AC-1 cDNAs encoding the thiol protease. The AC-2 gene spans 8 kb and appears to contain I1 introns, ranging in size from 57 bp to over 5.2 kb. One of the introns interrupts the proposed active site .region that is conserved between the H. contortus protease and the related thiol proteases cathepsin B and papain. Southern blot hybridization experiments indicate that the protease is encoded by a small gene family in H. contortus. Rabbit antisera prepared against the recombinant protein react on Western blots with 35 and 37-kDa proteins of adult worms. These proteins were not detectable by Western blot analysis in three larval parasitic developmental stages of H. eontortus+ Northern blot hybridizations indicate that mRNA transcripts for the gene family are present at low levels in a mixed population of third- and fourth-stage larvae but highly abundant in adult worms. Expression of the protease correlates with blood-feeding and suggests a role for the protease in blood digestion. Key words: Haemonchus contortus; Cysteine protease; Cathepsin B; Gene structure; Gene family; Expression

Introduction

The nematode H a e m o n c h u s c o n t o r t u s is one of the most pathogenic endoparasites of sheep. The one-inch-long adult worms live in the abomasum and gain nourishment by feeding on blood sucked from capillaries. Blood loss resulting from H . c o n t o r t u s infections can cause severe anemia, weight loss and sometimes death of infected animals. Since blood-feeding would appear to be critical for survival of H . c o n t o r t u s in sheep, we have initiated studies to identify parasite proteases that may play a role in this process. In a previous reCorrespondence to: George N. Cox, Synergen, Inc., 1885 33rd Street, Boulder, CO 80301. U.S.A. Abbreviations: L4, fourth-stage larva(e); nt, nucleotide; SDS, sodium dodecyl sulfate; SL3, ensheathed third-stage larva(e); XL3, exsheathed third-stage larva(e). Note: Nucleotide sequence data reported in this paper have been submitted to the GenBank TM data base with the accession numbers M34859 and M34860.

port we described the cloning of cDNAs encoding an abundant 35-kDa thiol protease present in H. c o n t o r t u s adult worms [1]. The function of this protease is not known. The primary sequence of the protease predicted from the cDNAs revealed that it was a member of the catbepsin superfamily of thiol proteases, which comprises a growing array of cellular proteases, including the lysosomal enzymes cathepsin B, H and L; the plant enzymes papain and actinidin; and certain cytosolic calcium-dependent proteases [2~]. The H. c o n t o r t u s enzyme is more closely related in sequence to cathepsin B (an overall 42% amino acid sequence identity) than to other members of the cathepsin thiol protease family [1]. In this report we describe the structure of a gene encoding the H. c o n t o r t u s protease. In addition, we demonstrate that the protease is encoded by a small gene family in H . c o n t o r t u s and that expression of the protease occurs primarily in the blood-feeding adult worm stage.

0166-6851/90/$03.50 © Elsevier Science Publishers B.V. (Biomedical Division)

182

Materials and Methods Isolation of the AC-2 gene. Overlapping cDNAs 2B (approx. 180 bp), 3-1 (approx. 870 bp) and F-1 (approx. 1100 bp), encoding the H. contortus AC-1 protease have been described previously and were isolated from an adult worm eDNA library [1]. The AC-2 gene was isolated from an H. contortus genomic DNA library constructed in AEMBL-3 [7]. The library was screened by plaque hybridization using labeled DNA fragments that had been eluted from agarose gels and nicktranslated with [~-~2p]TTP [7,8]. The location of the 5' end of the gene in AMB3 was determined by hybridization of Southern blots of the phage DNA with a 40-nt-long oligomer that corresponds to the anti-sense sequence of the 5' end of cDNA 3-1 [1 ]. The oligomer, which has the sequence 5'CACTTCAGGGTCGGGATCTTCTTTGACCAT AAGATTTAGC-3', was end-labeled with [Z,32P]ATP using polynucleotide kinase and hybridized to nitrocellulose filters at 42°C using 2× SSC/0.5% SDS (20× SSC = 3 M NaCI/0.3M sodium citrate, pH 7.0). Filters were washed using the same solutions at 45-55°C. The oligomer gave specific hybridization signals at the higher wash temperature. General nucleic acid procedures. Genomic DNA Southern blot experiments were performed essentially as described [7] except that nitrocellulose filters were hybridized at 30°C and washed at 39°C in hybridization buffer. The isolation of poly(A) + mRNA from adult worms and from a mixture of XL3 and-L4 larvae has been described [7]. Northern blot hybridization experiments were performed as described 171 except that the hybridization and wash temperatures were 37°C. RNA size standards (0.16-1.77 kb) were purchased from Bethesda Research Laboratories. Nucleotide sequences were determined by the dideoxy chain termination method [9,10] after subcloning appropriate DNA restriction fragments into M I3 phage vectors. Construction q/" the e.vpression vector pSEV6. The ;]-galactosidase expression plasmid pSEV6 (Fig. I A) was constructed in several steps from plasmid pSEV4, which is identical to plasmid

. Apa I

A ~

II~",~

Sau I

pSEV6 (76 k b ) y o%

Nco"]" ~j~.~b._ (..... ) KBgplInl/ ~ / '~I~P'~'Ssl' Eca RI

Aal II

Clal

,~

EcoRV, XboJ ,' j EcoRI

B

.

7:: 1

EooR,. " : / / , , /

.

"

A ~°"

./ // pBR322::3-1

v5,

(5.2 kb) \i'\

¢ z11~r~

, Apa I

C / ~

~

S .Sau I Aal H

pSEV6::AC-1

i~

(84 kb)

C

Bgl II

Kpn I )Z NCOI / Ec

Sal I '

"

Ssl I

7, Xho r Eco R{'

Fig. I. P l a s m i d s used m lhcsc studies.

pLG2 [ 11 ], except that one of the two EcoRl sites of pLG2 has been destroyed, leaving a unique EcoRI site in the ,]-galactosidase gene. pSEV4 DNA was digested to completion with Sphl and then partially digested with AatIl. T4 DNA polymerase was used to make the DNA ends blunt. After agarose gel electrophoresis, the 7.6 kb partial digestion product was electroeluted, ethanol precipitated, dried, resuspended in buffer and ligated overnight with T4 DNA ligase to seal the blunt ends. These steps destroy the SphI and Aatll restriction sites. The ligation mixture was used to transform Escherichia coli AMA 1004 [ 121 and

183 the cells plated in the presence of ampicillin, isopropyl 13-D-thiogalactopyranoside and 5-bromo4-chloro-3-indoyl fl-D-galactopyranosidase. Hasmid DNA was isolated from blue colonies and one plasmid with the proper configuration was designated pSEV5. To create pSEV6, pSEV5 DNA was digested with NcoI and ligated overnight with complementary DNA adapters of sequence 5' CATGAGATCTGGTAC 3' and 5' CATGGTACCAGATCT 3'. After transformation of E. coli AMA1004, plasmid DNA was isolated from several blue colonies and analyzed for the presence of unique NcoI, KpnI and BgllI sites in the proper orientation. One such plasmid was designated pSEV6.

Construction of the AC-l :/3-galactosidase gene fusion and preparation of specific antisera. A pBR322 plasmid containing AC-1 cDNA 3-1 (Fig. 1B) was digested with EcoRV and ligated to EcoRI linkers with the sequence 5'CCGGAATTCCGG -3'. After digestion with EcoRI and EcoRV, the approx. 840 bp fragment containing most of the AC-1 coding sequence was eluted from an agarose gel. The restriction fragment was subcloned into the unique EcoRI site of the /4-galactosidase (lacZ) gene in plasmid pSEV6, cDNAs in the proper orientation with respect to the (3-galactosidase gene were selected by antibody screening and by digestion with XhoI, which cleaves asymmetrically within the 3-1 cDNA. One correct construct was named pSEV6::AC-1 (Fig. 1C). Bacteria containing pSEV6::AC-1 were grown in LB broth contain 50 ltg ml -t ampicillin until the absorbance at 600 nm was 0.3. Isopropyl-f4-D-thiogalactopyranoside was then added to 1 mM and the culture shaken for an additional 2 h at 37°C. Bacteria were harvested by centrifugation and lysed by boiling briefly in SDS sample buffer [13] and vortexing. New Zealand white rabbits were immunized with fl-galactosidase or the AC-l:t3-galactosidase fusion protein (100-200 #g per injection) that had been electrophoretically eluted from SDS gels as described [1]. Rabbits received an initial injection containing Freund's complete adjuvant and a booster injection containing Freund's incomplete adjuvant one month later.

Preparation of worm extracts and Western blot procedures. Extracts of SL3s, XL3s, L4s and adults were prepared by breaking the worms open by sonication or by grinding frozen worms with a mortar and pestle over liquid nitrogen and boiling them for 2 min in a solution containing 1% SDS/0.125 M Tris-HCl pH 6.8/5% f4mercaptoethanol [14]. Fibrinogen-degrading extracts were prepared as described [1] by homogenizing adult worms in buffer and fractionating soluble proteins on a column o f Sepharose CL-4B, followed by FPLC Mono Q column chromatography. SDS-polyacrylamide gel electrophoresis and Western blot procedures were essentially as described [13,14], except that 2% bovine serum albumin was used as the blocking agent. Rabbit antisera were used at 1:200 dilutions. Results

Isolation and nucleotide sequence analysis of the H. contortus AC-2 cysteine protease gene. The AC-2 gene was isolated by screening a Haemonchus:AEMBL-3 phage library with 32p_ labeled, nick-translated cDNA 2B, which encodes the final 12 amino acids of the AC- 1 cysteine protease, plus 150 bp of 3' untranslated sequence [ 1]. The first screen yielded two overlapping phages, called AMB! and AMB2 (Fig. 2). The coding region of the gene was localized by hybridization of Southern blots of restriction enzyme digests of the phage DNAs with the 32P-labeled 2B cDNA. The hybridizing region and restriction fragments upstream of this region were sequenced using the strategy shown in Fig. 2. Comparison of the nucleotide sequence of the gene to that of the near full-length AC-I cDNA F-I [1] revealed that the gene contained multiple introns. This comparison also revealed that there were several nucleotide differences between the gene and the cDNA. These facts will be discussed further below. A large intron was encountered when trying to identify the region of the gene corresponding to the 5' end of the cDNA. Sequencing of over 800 bp from this region of AMB1 phage DNA failed to locate sequences encoding the 5' region of the cDNA. Southern blots of AMB 1 and AMB2 DNAs probed with a 32p-labeled 40 nt oligomer corresponding in sequence to part of the missing

184

KMB1

AQ

I I

KMB2

]KMB3 (S)

EE

H

I

il

I II

H

SE

HH E S E

S

ll!lL

I

E

I

ES

J

H

S

It,

I

3.9

II

3.5

cDNA

"~ --..

E

I I

(S)

I 2 kb

I

BO X (S)

Bg

I

i

I

0

H I

S I

E I

H S (S)

BT ~ I

1234

D

LI

I-tpEK Ji !ii i

S Ju

E Iu

5 678 9

1011 12

• w . ~ i

.

I mini

I a|wire mB

ATG

H H ! Ji

III

I

9

• .



~

TGA p

It Nm

D

LI



P

I

lkb

I

Fig. 2. Restriction enzyme m a p and exon/intron organization of the H. contortus AC-2 gene. A composite restriction map of the AC-2 gene and flanking regions is shown in (A). The limits of recombinant AEMBL-3 phages AMB-1, AMB-2 and AMB-3 are shown above the map. Restriction enzyme sites shown are: E, EcoRl; S, Sall and H, HindlIl. The Sail sites in parentheses occur in the AEMBL-3 polylinker sequences and are not present in H. contortus DNA. They are shown because they were used to generate restriction fragments for DNA sequencing. The 3.9 kb and 3.5 kb EcoRI fragments of AMB2 that were used to double-screen the AEMBL-3 library to isolate AMB3 are indicated by brackets. The 1.0 kb EcoRl fragment that hybridizes to c D N A 2B is marked. Regions of the AEMBL-3 phages that were sequenced are indicated by arrows in (B), which is an expanded version of the pertinent region of (A). Asterisks indicate sequences that werc generated using synthetic oligonucleotide primers. Additional restriction enzyme sites shown are: B, BamHl; Bg, Bglli; Hp, Hpal; K, Kpnl; T, SacI; X, Xbal. In some cases the AEMBL-3 phages contain additional sites for these restriction enzymes that are not shown. The exon/intron organization of the AC-2 gene is shown in (B). Black boxes indicate exons. The horizontal length of the box approximates the length of the exon, except for exon 1 which consists only of the initiator ATG codon.

region of the cDNAs indicated that the missing exon(s) was not present in AMB 1 or 2. Therefore, the AEMBL-3 library was rescreened in duplicate with 32p-labeled restriction fragments from the left end (the 3.9 kb EcoRI fragment) and near the right end (the 3.5 kb EcoRI fragment) of AMB2 (see Fig. 2). Three phages that hybridized only with the 3.9 kb EcoRI fragment were identified

and plaque-purified. Hybridization of the phage DNAs with the 40 nt oligomer revealed that each of them contained the missing exon(s). One of the phages, AMB3, was mapped with restriction enzymes and shown to overlap AMB1 and 2 (Fig. 2). The region of AMB3 that hybridized to the 40 nt oligomer, and the region immediately upstream of it, were sequenced as outlined in Fig.

185

2 and found to contain sequences for the missing regions of the cDNAs, cDNA F-1 does not contain an initiator methionine codon [1 ]. The gene encodes a methionine three amino acids upstream of where cDNA F-1 terminates. Just upstream of this methionine codon is an in-frame TGA stop codon. Although it is possible that this methionine codon is the initiator methionine codon for the gene, we do not believe so. Between this methionine codon and where cDNA F-1 terminates is the sequence TTTCAG/A, which fits the consensus 3' intron acceptor splice sequence (Table I); the slash indicates where splicing would occur. If this sequence functions as a splice acceptor sequence, then the above methionine codon is present in an intron and would not be present in the mature mRNA. We searched the upstream region for potential intron 5' splice donor sequences and for other potential initiator methionine codons. Approximately 80 bp upstream of the putative 3' acceptor sequence is the sequence ATG/GTAA which fits the consensus intron 5' splice donor sequence (Table I). Splicing would join this ATG methionine codon in-frame with exon 2 and the remainder of the gene. There is an in-frame TGA termination codon 18 bp upstream of this methionine. Therefore, we believe that this ATG is the actual initiator methionine for the gene and have assumed that this is the case in the discussions presented below. Primer extension experiments using the 40 nt oligomer and poly(A) + mRNA isolated from adult worms indicated that cDNA F-1 was approximately 10 bp shorter than full-length [1]. If this analysis is correct, then cDNA F- 1 is missing just the AT of the initiator ATG codon and approximately 8 nucleotides of 5' untranslated sequence. The nucleotide sequence of the gene, including the small intron sequences is presented in Fig. 3. The gene has 97% nucleotide identity with the AC-1 cDNA F-l, to which it is compared in Fig. 3. Most of the nucleotide differences occur in the presumed 3' untranslated region of the gene (cDNA) and in third-base codon wobble positions that do not change amino acids. Seven nucleotide changes result in different amino acids. Overall, the gene and the F-1 cDNA have 98% protein sequence identity. At this time we do not know if the gene is distinct from the gene that encodes

the AC-1 cDNAs or whether the nucleotide (protein) differences are due to polymorphisms in a single gene in the H. contortus worm populations used to construct the cDNA and genomic DNA libraries. We have isolated a partial cDNA (350 bp in length) that has an identical nucleotide sequence as the gene from the adult worm cDNA library, so the gene appears to be expressed (data not shown). Complicating this issue is the fact that the protease appears to be encoded by a multigene family (see below). Because of these uncertainties, we have named the gene AC-2 to distinguish it from the AC-1 gene identified by cDNAs 2B, 3-1 and F-1 [1]. As shown in Figs. 2 and 3, the AC-2 gene contains 11 putative introns that range in size from 57 bp to over 5.2 kb. Ten of the introns can be assigned unambiguously by comparison to the AC-I cDNA sequence; the l lth intron assumes our identification of the initiator methionine is correct. Table I lists the intron/exon splice donor and acceptor sequences for these 11 introns and for two introns present in the H. contortus 3A3 collagen gene [7]. Approximately 40 bp upstream of the proposed initiator methionine is a sequence that is similar to the eukaryotic TATA promoter element. We have no evidence as yet that this sequence functions as a promoter for the AC-2 gene. Downstream of the TGA stop codon is a canonical AATAAA poly(A) addition sequence. These sequences are underlined in Fig. 3. The active site of the AC-1 protease has been tentatively identified by homology with the active site sequences of cathepsin B and papain [5]. AC2 has an identical sequence in this region (Fig. 3). The four potential N-linked glycosylation sites in AC-1 [1] also are conserved in AC-2 and are marked in Fig. 3. AC proteases comprise a multi-gene family in H. contortus. Because of the differences in the nu-

cleotide sequences of AC-1 and AC-2, it was of interest to determine the number of copies of the gene that were present in the H. contortus genome. Southern blot hybridizations of H. contortus genomic DNA under low stringency conditions revealed multiple hybridizing bands with several restriction enzymes (Fig. 4). The labeled probe used

186 TABLE I Splice j u n c t i o n s e q u e n c e s o f Haemonchus contortus introns Gene

Intron No, a

AC-2

Length

Intron

Splice donor sequence b

Splice acceptor sequence b

1

80

bp

AATATG

GTAAGT ..........

TTTCAG

AAATAC

2

69

bp

AGAATG

GTTGGT ..........

TTACAG

CTGCCC

3

58

bp

TTCGAG

GTGATT .......... TTGCAG

GTCAAT

4

>5.2

kb

ACCCAG

GTGAGT .......... TTACAG

CTACGA

5

68

bp

ACTGCG

GTGAGC .......... TTTCAG

GCTCAT

6

57

bp

AAACAG

GTGCAA .......... ATTTAG

GTGAAT

7

79

bp

TGACGG

GTAAAA .......... TTTCAG

GTGTGA

8

57

bp

ACTAAA

GTGAGA .......... CATTAG

GATGTA GAA3M~G

9

544

bp

GATACG

GTAAGC ..........

i0

186

bp

TATAAG

GCGAGT .......... TTATAG

TTTTAG

CACACA

II

203

bp

AAAAAG

GTAAGT .......... TTTCAG

GATATT

1

57

bp

TGCAAGIGTAAGA

.......... TTTCAG

2

59

bp

TGCCAGIGTATTT

.......... TTTCAG IGCCAAG

Collagen 3A3 ~

ATCTCG

CONSENSUS a

A_ A N

N A

A

G

I G

T G

T_ A

G T ..........

T

T T C

A

GIG

A N AA

N

v

7__

v 8 12

113

12 _6 10 9 v . . . . . . . . . .

11 12 9 9 13

6

1318_6_

6 6_

5

qntron numbers for AC-2 are taken from Fig. 3. For the 3A3 collagen gene, the 5'- most intron is referred to as intron No. 1. bVerlical lines indicate intron boundaries. ~Collagen 3A3 data taken from Fig. 1 of reference 7. dA consensus nucleotide is shown for positions in which the same nucleotide is present in at least 6 of the 13 introns analyzed. The number of times the consensus nucleotide is present in the 13 introns is shown beneath the nucleotide. An N indicates that there is no consensus nucleolide Ior this position.

for these hybridizations was the AC-1 cDNA 2B, which hybridizes to a single 1.0 kb EcoRI fragment in AMB1 (this fragment is marked in Fig. 2). There is a 1.0 kb genomic DNA fragment that hybridizes to cDNA 2B and presumably corresponds to the 1.0 kb band in AMBI (Fig. 4). The four other hybridizing bands detected in EcoRI digests of H. contortus DNA must derive from additional gene copies of the protease. The multiple hybridizing bands detected with the 2B cDNA probe is in contrast to the single hybridizing band detected with a tropomyosin gene probe (data not shown). These data indicate that there are multi-

ple copies of the gene for the AC protease in the genome.

H. c o n t o r t u s

Developmental expression of AC protease mRNAs. Northern blot hybridizations of adult worm poly(A) + mRNA with a 3:P-labeled plasmid containing the 1.0 kb EcoRI fragment of AC- 1 cDNA F-1 [1] showed a single hybridizing mRNA band of about 1250 nt in length under low-stringency conditions, similar to what was observed previously using higher stringency conditions [1]. A weakly hybridizing band (approximately 30-fbld less intensity) of the same size was detected in

187 •l G T r C CGC A C A O G A T G A A C A T A ~ A A A b ~ I ' r

.~0 ATAGCATCT~CA~A~AT]TGGt*CCACTTTGM~T~T~

gt itlgcgt ~git

at gcm¢,gt

-200 CTC G C T T T C T TCT~ATCATAGACAAACTACTTATCTTTCACACCTC(,~=r AATACCAATATAAGC A G A A T T C G A ~ T ~ •

-2~0

laae----e

-68O0 t gt t gct a t t igt el t gacigti~geaggt

t t aga+t

t t ggag~gt

t t t tgt t ~ l ~ t

ATCGGAGGCA&T ATG g t & a g t t a c c t c g c t g a c t t t c t t t t a c ~ t m a t t t t t t t c c a t t t g g a t t g a a c g g c a g a t t g ~ ~et Int ron 1 -320 c A gatatatgtttcag AAA TAC TTG G~G ~ GGA ~ TGC ACC T A T c ~ r T C T TGA C&& TCC GGA GCA Lys Tyr I~u Val L~u AJa Leu Cys Thr ~r Leu Cys Ser Gin Ser Cly A]a Thr • 400 .45o GAG GAG AAT C g t t g g t ~ a c a a l t t e t g g c c c a t a a c t a a t g t a a t t g a g a g a a c a t a c t t g g t c g e t t a c c t e t t a e a g Asp Glu ASh A Intron 2 • boo CT CCC CAA GGA A T ~ CCT CTA GAA GCG CAA AGG ~ ACT GCT GAG COG C'l~ G~G GCT TAC la AJa Gin Cry lle Pro Leu GIu Ala Gin Arg Leu Thr Gly Clu Pro L~u Val Ma TyT L~u

t~

at gt g~t

cact cg~gt i t t Meilgcle

*6900 t c tgtMt ct t tacaat

etmtct

t c t t t t e tclgc -/000

-69~0

.300

......

9

-6850

gt cacgaggccga

-67~ t t g g t e© t t c t t c m g r

etgiilgg~gc

[ntron

-lb0 C ~ C A ~ C ~

• I0o

~ r CAAT'r~,GAGAAc*r ACT']'C~GACCAAAC~AATGCA'~r AAACAAAA~CACTCT'[=rG~C

67O0 et at eagatigt ~egt t igllAtgcigtat

a~tgatggaagegattagtgcagc&tgtta~gaca~tgttcacccglgltgtttgaeaetcgaateat~ttgg~ecac • 1050 aeaggt t t cc~t

t t atgat

t cgaaaacaeacgaaaccacacae~ccacat

Mglgt tct t eagt t agctat

• /100 Aegt gac t aaat aeat tgcct cgget t et t gt at gatgt ggtgt t tst c~gt -1700 gcgcttgaaCglgatttcttcgagcttegtagagattg~ttttttag

-/120 t~ecaat

teat t t gt~

gat etgeeget

t catgat

at

c */750 GA ~ A GAG GGC TAG ATG GTA AAA CAG ly Lys Asp Ala Tyr tie Val Ly6 Gin

¢ /~00. c TCC GT'r AAA CCC ATT CAG ACt GAA ATA CTA AAG &AT GGA CGG CTr GTG CC'f TCC TTT GCC GTC

• ~20 -600 AGG AGG AGT CAG AAC CTC T T C GAG g t g a t t c a t t c t g t a a a c a g a a a t t t e t c a t c c a a a a t e t t a a t e c t t c c t A r g A r r , Ser ( ; i n Asn Leu Phe Clu Intron 3

Set

Val

Lys Ala

lie

Gin Ser

GIu

Ile

L~u Lys Asn Gly Pro Arg

v&l Val A]a Set

Phi+ A l a

Val

•/350 tttgcag

GTC ~ T

T~

Val ASh Set

C GAT CCG A ~

C~

Asp Pro Thr

Pro

~a

A GAT ~ C

.620 ~ ~

~G

Asp Fhe Clu Gin Asn

CA ATA ATG AGC ATA ~ TAC Lys lie ~et Set Ile Lys Tyr l ys Asp

T A T GAA GAT T T C AC~ CAC TAC AAA TCA GGA A T r T A T AAG g c g a g t c t t c t a a a t a a a e g t c a g r t c t ~ e Tyr GIu Asp Phe Arg His Tyr Lys Ser Cly lie Tyr Lys • 1400

• 1420

t gt t agt at t egcaagcaacagc - loo

t t gceat t gt gaat t ¢gct t ggaa~aaga~,ct eat tga~t Int ron 10 -1~00 etgcacagaaagt tgcggaaaaga&aatagaatgteagcaa&agatgattatJg

~cosv

CAT CAG AAG CTA AAT C'~=r ATC CTC AAA GILA GAT CCC GAC CCT GAA GTC CAT ATC CCA CCC AG His Gln Lys [+u Asn L~u Net Val I,ys Glu Asp Pro Asp Pro Glu Val Asp lie Fro pro $e

agatatgatctatacaat

gc~t

c t at at gear cc

CAC ACA GCT

s Thr

AJa

•/~o

gtgagtacctc •

¢t

gtcaatcgectcact

. . . Intron

.

. 4

. . . . -b ? kb

ggttaaagaaaatagatttacag

c TAG • TyT

T T T b0~OT GAT CCT CGA GAC GTC TCC AAA AAC TGC ACA ACG T T C T A T A T r CGC GAC CAA GCC AAC TGC Asp Pro At& Asp Val Trp Lys ~ = ~ [ ~ T h r P l ~ T y r l i e A r g A s p G i n & l a #.so C y s G

• t %20

T

* •600

CG'T GAG CTA CGA GGG TAC CAT GCT GTA .tAG ATG A'l=r GGA TGG GGA AAT Gt~ &AT &AT &CA GRC G l y G l u l ~ u A r g G l y T y r I I i s A l a V a t L y s R e t l i e G | y T r p G l y A s h G l u ~/[U ~ l ~ / g ~ Amp

-6~

-6100 gtgagctcagctgcaagagataccaatgactccgammttcaattcggaaagagttagmttttcag [ntron 2

•/650 T T C TGG CTC A T T GGC ~ C T C T T C ~ GAG A&C GAT TGG GGA GAA ~AA G g t a a g t c a c t c a t g c c a ~ g t Phe Trp [ku lie Ala ASh Ser Trp His ASh Asp Trp Gly Glu Lys G

GC TCA TGT TGG

ly

5er

Cys Trp

-ttO0 [ nt ron

-6t50 A T GCT G T r TCC A~G CCA CCT GCA A ~ r T C ~ GAT CGC A T T TOC A T r GCA AGC AAA GCT GAA AAA CAG A l a V a l S e t T h r A l a A ] a & l a l i e S e t A s p A r 8 l i e C y s l i e A l a S ~ r L y s A l a Girl L y 8 G l n

¢cat aeaecacgccggct

•6 2 ~

• 1820 tagctattttcag

-6250

gtgcaagtt~tctgtgatataat~gc~tcgtttacttcaatgtgeaacatttag [ntron 6 -630~ Asp

Ile

Net Thr

Cys cys

Arg Pro Gln CyS ely

.64~ c T C T CAA CCA CCA TCC C ~

ely

Trp

Pro

A T e G&A GCT lie Glu Ala

t get t et gt coat t

t t c g a g a g a S ~ . t a a g a c t a t glgt c e ~ t

e~

G A C c t90~. G A CA T A T T T C CGC A T ^ G T T CGC GGA &G~T &AC GAC T G T CG^ A T T CAA GGA ACC ATC l y T y r phe A r f , l i e V a i A r g G l y S e t ASO A s p C y s G l y l i e G l u f l y T h r I l e tie Thr -/9S0

CCC GCT ~ A]a &la Cly

y Cys Clu C|y

caga~caseagaaaeacgc

tgaetctc~,ageacg

A s p GI

°6350 g~matugtatatagccecatttgtettutaattcattgttteag Intron I

II •/8OO

G I ~ AAT A T T T C T CCC ACT V a l ~M~I~ [ ~ e $ ¢ r Ala T h r

GAC ATC ATG ACC "rGC TGC AGG CCA CAG TCC CCT CAC GG g t m m a ~ t t t e g t a g t a t

-ttbO

gtt~gtacatttgta~tt~gc~tggttgtcgcagtttt~tta~g~ga~c~atagattag~gceagtacctgaagcaacg~gcca

A

c

T

&TT CTC CA(] ACA GAA AGT CTA TGR T A ~ C C A T C ' r C A C A C C A ~ t A A T C / ~ T A A A T T A A C = r A A T lie Va] Asp Thr Glu Ser Leo *

C-80OO C A A A -8050 CATATGTCEATGTAGAATCCTACTTTGAGGAAGCCTCTGT&ATCTATr CGATGAAACCTCCCTTACATC~rA~C 3"~

.8100

.8150

TGG AAA TAC T r c A T A T A T GAC CCC C T T C T r T C T GGA GGA GAA TAC CTC ACT AAA g t g a g a t a a c t a T r p L y $ T y r P h e H e T y r A s p C l y v a l V a l Set G l y e l y Clu T~r L~U T h r L y s .82OO • 6500 ttcetutattatttgcega~ttgttetageagtcttgaeattag [ntron 8

-65~0

AACATACAGACGCGCTCAGAATATTCGGCGAAATGCI"~CU~..AC~ATGET]~CAG'rcGTAACtCGTI'r~A

GAT CTA TGC CCC CCT T A T CCA A T T CAC Alp Val Cya Ar K Pro Tyr Pro lie Ht s

-82~0

~ C ~ M

-8300

TCATC4~AIWT C ~ A T A ~ A ~ T A T C T & C C A C T A T T A A C ~

T ~ T C ~ r

CCACATA

-6600 CCA TCT CGA C A T CAC CGA AAC GAC ACC TAC T A C ~ Pro Cys Cly Nla His Cly ~n ~n Thr Tyr Tyr ely

~

TC.C CCT ~

Glu Cys Arg ely

ACA ~ Thr Ala

CCA & ~ Pro Thr

CCA Pro

-83~0

.8318

TC~AA~A~~CACC~ACCATCCTCCYrCT~TTC

.6650

CCC T ~ C AAA A ~ Pro Oys Lys /~g

~ TGC C~G CCC CGC G T ~ ACC AAA A T ~ TAC AAr~ A T A CAC AA~ COA TAC C Lyl Cy* Arg Pro Cly Vii Arg Ly* Net Tyr I l e Amp L y s A r K T y r c

Fig. 3. Nucleotide and deduced amino acid sequence of the H. contortus AC-2 gene. Lower case letters indicate introns. Nucleotides are numbered consecutively until intron 4, which is approximately 5.2 kb in length and was not sequenced in its entirety. Nucleotide numbers after intron 4 are approximations. Nucleotides and amino acids that are different in the AC-1 cDNAs [1] are shown above and below the AC-2 sequences. Nucleotides corresponding to the beginning and end of the cDNAs are marked with solid triangles• Potential N-linked glycosylated sites (Asn-X-Ser/Thr, where X can be any amino acid) are marked with double underlines. The six amino acids that are present in the active site and conserved in AC-1, AC-2, cathepsin B and papain are underlined with dashes. The EcoRV cleavage site present in AC-I and AC-2 that was used to create the AC-l:/4-galactosidase gene fusion is shown. The termination codon is marked with an asterisk• Sequences similar to the eukaryotic TATA promoter element and AATAAA polyadenylation signal are underlined. poly(A) + mRNA isolated from a mixture of XL3 and y o u n g L 4 larvae (data not s h o w n ) • T h e data s u g g e s t that the A C protease g e n e s e x p r e s s e d by these d e v e l o p m e n t a l stages p r o d u c e m R N A s o f s i m i l a r sizes•

Expression of the AC-1/AC-2 protease during parasite development. R a b b i t antisera w e r e prepared against an A C - 1 :f3-galactosidase f u s i o n protein in order to s t u d y e x p r e s s i o n o f the protein in v a r i o u s d e v e l o p m e n t a l stages o f H . c o n t o r t u s . B e c a u s e o f the h i g h d e g r e e o f a m i n o acid s e q u e n c e identity

188 MWE

H N 03

~

73

23.1 9.46.6--

a

4.4-

--

+

37 35

kb

2,3

B

--

--

37

- -

35

1.3 - ~ + 1.1 0.9

~

....

0.6 -

Fig. 4. Southern blot analysis of H. contortus AC protease genes. H. contortus genomic DNA (2 /,.g) was digested with EcoRI (E); HindIII (H) or Nhel (N), size-fractionated on a 0.8% agarose gel, blotted to a nitrocellulose filter and hybridized with the 3"-P-labeledAC-I eDNA 2B under low stringency conditions (see Materials and Methods). Sizes of marker DNA fragments (MW lane) are indicated on the left in kb, b e t w e e n the p r e d i c t e d AC-1 and A C - 2 proteins, we e x p e c t that antisera r a i s e d a g a i n s t the AC-1 p r o t e i n will c r o s s - r e a c t with the A C - 2 protein; h o w e v e r , it has not p r o v e n p o s s i b l e yet to test this a s s u m p t i o n . T h e gene fusion (Fig. 1C) was c o n s t r u c t e d as o u t l i n e d in M a t e r i a l s and M e t h o d s . The fusion p r o t e i n has a m o l e c u l a r w e i g h t o f 140 k D a and c o n t a i n s the final 241 a m i n o acids o f A C - 1 . AC-1 and A C - 2 differ b y o n l y 3 c o n s e r v a t i v e a m i n o acid c h a n g e s (Lys to Arg, Val to Ile, and S e r to Thr) within this region. A n t i s e r u m from a r a b b i t ( R b - 1 0 2 8 5 ) i m m u n i z e d with the 3 5 - k D a p r o t e a s e isolated from adult w o r m s [1] reacted on Western blots with the r e c o m b i n a n t A C - l : f l g a l a c t o s i d a s e fusion protein, c o n f i r m i n g that the m o d i f i e d c D N A was j o i n e d in the p r o p e r r e a d i n g frame (data not shown). T h e A C - l : f l - g a l a c t o s i d a s e fusion protein was e l e c t r o e l u t e d from p r e p a r a t i v e S D S gels and used to i m m u n i z e rabbits R b - 8 5 5 2 and Rb-9190. C o n sistent with p r e v i o u s studies [1], both i m m u n e rabbit antisera r e a c t e d s t r o n g l y with a 3 5 - k D a pro-

C

--

37

--

35

D

Fig. 5. Developmental expression of the H. contortus AC-I (AC-2) protease. Aliquots of SL3, XL3+ L4 and adult worms containing equivalent amounts of protein were separated on 12% SDS gels under reducing conditions, blotted onto nitrocellulose filters and proteins reacted with various rabbit antisera raised against the H. contortus protease. Panel (A) shows the reaction of Rb-10285 serum, which was raised against the 35-kDa protease purified from H. contortus adult worms 11]. This antiserum reacts with other proteins besides the protease. The 35-kDa and 37-kDa forms of the protease are marked with arrows. The band marked by an asterisk is not the protease and probably is tropomyosin (unpublished results). Panels (B) and (C) show the reactions of rabbit antisera Rb-9190 and Rb8552, respectively, which were raised against the recombinant AC-l:fl-galactosidase fusion protein. Panel (D) shows the reaction of pre-bleed serum from Rb-9190. Only the relevant regions of the Western blots are shown, tein in adult w o r m extracts (Fig. 5). Both i m m u n e sera also reacted with a 3 7 - k D a protein in adult w o r m extracts (Fig. 5). The R b - 8 5 5 2 serum reacted m o r e i n t e n s e l y with the 3 7 - k D a protein than d i d the R b - 9 1 9 0 or R b - 1 0 2 8 5 antisera. E n d o g l y c o s i d a s e F d i g e s t i o n e x p e r i m e n t s [1] suggest that the 3 7 - k D a protein m a y be a m o r e h e a v i l y glyc o s y l a t e d form o f the 3 5 - k D a protein. A con-

189 trol antiserum prepared against fl-galactosidase did not react with the 35 or 37-kDa proteins (data not shown). Additional Western blot studies (not shown) demonstrated that the 35 and 37-kDa proteins are present in fibrinogen-degrading extracts that can be purified from adult worms, confirming previous experiments using Rb-10285 antiserum [1]. When the Rb-8552 and Rb-9190 antisera were used to probe Western blots of protein extracts prepared from the four parasitic developmental stages of H. contortus (SL3s, XL3s, L4s and mature adults), the 35 and 37-kDa proteins were detected in extracts only of adult worms (Fig. 5). Identical results were obtained with Rb-10285 serum, which was prepared against the natural 35kDa protein isolated from adult worms (Fig. 5). These results suggest that all members of the 35kDa protease gene family recognized by these antisera have similar developmental expression patterns. Discussion

In this report we have described the structure of a gene encoding a 35-kDa cathepsin B-like cysteine protease of H. contortus. Unexpectedly, we have found that the protease is encoded by a small gene family. The nucleotide and predicted amino acid sequences of the AC-2 gene are closely related to those of AC-1 cDNAs described previously [1]. We are presently isolating genomic clones and cDNAs for other gene family members, which will allow a detailed comparison of their nucleotide and amino acid sequences and should allow determination whether AC-1 and AC-2 are alleles or distinct genes. These analyses also should reveal the genomic organization of the gene family members. Since cDNA 2B hybridizes to fewer, but larger, DNA fragments in HindlII and NheI digests versus EcoRI digests of H. contortus DNA (see Fig. 4), it is possible that some of the gene family members are linked. Indeed, we recently discovered a second cysteine protease gene located downstream of the AC-2 gene (unpublished results). In contrast to the H. contortus AC-1 (AC-2) cysteine protease, cathepsin B is encoded by a single gene in mammalian genomes [5]. Although

mammals contain genes for related cysteine proteases, e.g., cathepsins H and L, the proteins share only 26-33% amino acid sequence identity with cathepsin B [6] and it is unlikely that the genes would cross-hybridize under the conditions used in our experiments. H. contortus may possess several gene copies for the protease because of the requirement for large amounts of protease synthesis in a short period of time, unknown differences in the enzymatic activities of the different gene family members, or possibly for immunological diversity in the protease, since we speculate that the protease may be in contact with host antibodies during blood-feeding. We have not examined expression of the gene family in eggs or in nonparasitic first- and second-stage larvae, and it is possible that certain of the gene family members are expressed in these developmental stages. The AC-2 gene has several interesting structural features and has expanded considerably our knowledge ofH. contortus introns. The large number of introns present in AC-2 is unusual for a nematode gene [15]. The consensus splice donor and acceptor sequences determined for H. contortus introns (Table I) are similar to those of the free-living nematode Caenorhabditis elegans [ 15], and of other eukaryotes in general [ 16]. The splice donor sequence of intron 10 in AC-2 is exceptional in that the first two nucleotides of the intron are GC rather than GT. A cytosine at the second position is rare, but not unprecedented, in eukaryotic introns [17]. All of the introns in AC-2 (and in the 3A3 collagen gene; ref. 7) terminate with an AG nucleotide pair, which is similar to what is found in other organisms [16]. Over half of the introns are 80 bp or less in length, which is in agreement with the size of most C. elegans introns, but considerably smaller than introns of higher eukaryotes. Intron 4 in AC-2 is approximately 5.2 kb in length, which is unusually long for a nematode intron [15]. Intron 7 interrupts the coding region (between Gly and Ser) of the active site sequence Cys-Gly-Ser-Cys-Trp-Ala (the boldfaced Cys is the active site cysteine; see Fig. 3) that is conserved in AC-1, AC-2 and in a number of other thiol proteases, including human, rat and mouse cathepsin B, chicken liver cathepsin L and the plant enzyme papain [3,5,6]. Thus, although this sequence is highly conserved in cysteine pro-

190

teases in a number of Phyla, it does not comprise a single exon unit. Our data indicate that the AC protease is expressed primarily by blood-feeding H. contortus adult worms and that expression of the protease appears to be regulated primarily at the level of transcription. Although we did not detect the protease by Western blot analysis in extracts of the SL3, XL3 or L4 larval parasitic stages, we did detect low levels of protease mRNA in a mixed population of XL3s and young L4s by Northern blot analysis, which is a more sensitive technique. With the exception of the absence of detectable protease in extracts of L4s, this pattern of expression correlates with blood-feeding (the L4 is the first developmental stage that actively feeds on host blood components) and is consistent with the postulated function of the enzyme as an anticoagulant protease or digestive enzyme [1]. The failure to detect the protein in Western blots of L4s could be due to the low level of protease present or possibly to the expression of an antigenically distinct gene family member by L4s. Another explanation may be that the L4s (and XL3s) analyzed were obtained by in vitro cultivation in a defined medium and were never exposed to blood. Expression of the protease may be induced by exposure to host blood components, as is the case for certain digestive enzymes of hematophagous flies [18]. Therefore, it will be of interest to repeat expression studies on L4s that are actively feeding on blood to determine to what extent the protease is expressed under these conditions. The availability of cloned gene probes and specific antisera for the protease will facilitate these analyses by in situ methods, since blood-feeding L4s are difficult to isolate and can be obtained in only limited quantities. These gene probes and antisera also should allow determination of sites of mRNA transcription and localize the protease in adult worms, which will aid in understanding the protease's function. Developmental expression of the H. contortus AC-1 (AC-2) protease resembles that of Schistosoma m a n s o n i cysteine proteases such as the hemoglobinase, which are expressed by adult flukes and late stage schistosomula, but not by eggs, cercariae or newly transformed schistosomula larvae [19]. S. m a n s o n i adults also express a cathepsin B-like cysteine protease [20].

Thus, expression of cysteine proteases in actively feeding stages appears to be a property shared by these two parasitic helminths.

Acknowledgements We wish to thank Mr. Ervin Colestock and Mr. Andy Jackson for maintenance and collection of H. contortus worms, Dr. Robert Hageman for gifts of fibrinogen-degrading extracts, Mr. Thomas Gleason for synthesis and purification of the oligomers, Ms. Sheila Baron for help with the antisera preparations, Ms. Carla Worland for preparation of the manuscript and Ms. Dhyan Atkinson for help with figures. This research was supported by funds provided by Eli Lilly and Company.

References 1 Cox, G.N., Pratt, D., Hageman, R. and Boisvenue. R.J. (1990) Molecular cloning and primary sequence of a cysteine protease expressed by Haemonchus contortus adull worms. Mol. Biocllem. Parasitol. 41, 25-34. 2 Came, A. and Moore, C.H. (1978) The amino acid se+ quence of the tryptic peptides from actinidin, a proteolytic enzyme from the fruit of Actinidia chinensis. Biochem. J. 173, 73-83. 3 Takio, K., Towatari, T., Katunuma, N. and Titani, K, (1980) Primary structure of rat liver cathepsin B - - a striking resemblance to papain. Biochem. Biophys. Res. Commun. 97, 340-346. 4 0 h n o , S., Emori, Y., Imajoh, S., Kawasaki, H., Kisaragi, M. and Susuki, K. (1984) Evolutionary origin of a calciumdependent protease by fusion of genes for a thiol protease and a calcium-binding protein? Nature 312, 566-570. 5 Chan, S.J., Segundn, B.S., McCormick, M.B. and Steiner, D.F. (1986) Nucleotide and predicted amino acid sequences of cloned human and mouse preprocathepsin B cDNAs. Proc. Natl. Acad. Sci. USA 83, 7721-7725. 6 Wad& K., Takai, T. and Tanabe, T. (1987) Amino acid sequence of chicken liver cathepsin L. Eur. J. Biochem. 167, 13 18. 7 Shamansky, L.M., Pratt, D., Boisvenue, R.J. and Cox, G.N. (1989) Cuticle collagen genes of Haemonchus cotltOrlllS and Caenorhabditis elegans are highly conserved. Mol. Biochem. Parasitol. 37, 73-86. 8 Benton, W.D. and Davis, R.W. (1977) Screening kgt recombinant clones by hybridization to single plaques in situ. Science 196, 180-182. 9 Sanger, F., Nicklen, S. and Coulson, A.R. (1977) DNA sequencing with chain terminating inhibitors. Proc. Nail. Acad. Sci. USA, 74, 5463-5467. 10 Biggen, M.D., Gibson. T.J. and Hong, G.F. (1983) Buffer gradient gels and 35S label as an aid to rapid DNA sequence determination. Proc. Natl. Acad. Sci. USA 80, 3963 3965. 11 Guarente, L., Lauer, G., Roberts, T.M. and Ptashne, M. (1980) Improved methods for maximizing expression of a

191 cloned gene: a bacterium that synthesizes rabbit fJ-globin. Cell 20, 543-55. 12 Casadaban, M.J., Martinez-Arias, A., Shapira, S.K. and Chou, J. (1983) ~l-galactosidase gene fusions for analyzing gene expression in Escherichia coli and yeast. Methods Enzymol. 100, 283-308. 13 Laemmli, U.K. and Favre, M. (1973) Maturation of the head of bacteriophage T4. I. DNA packaging events. J. Mol. Biol. 80, 575-599. 14 Cox, G.N., Shamansky, L.M. and Boisvenue, R.J. (1990) Haemonchus contortus: Evidence that the 3A3 collagen gene is a member of an evolutionarily conserved family of nematode cuticle collagens. Exp. Parasitol. 70, 175-185. 15 Emmons, S.W. (1988) The genome. In: The Nematode Caenorhabditis elegans (Wood, W.B., ed.) pp. 47-79, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY.

16 Mount, S. (1982) A catalogue of splice junction sequences. Nucleic Acids Res. t0, 459-472. 17 Aebi, M. and Weissman, C. (1987) Precision and orderliness in splicing. Trends Genet. 3, 102-107. 18 Schneider, F., Houseman, J.G. and Morrison, P.E. (1987) Activity cycles and the regulation of digestive proteases in the posterior midgut of Stomoxys calcitrans (L.) (Diptera:Muscidae). Insect Biochem. 17, 859-862. 19 Zerda, K.S., Dresden, M.H. and Chappell, C.L. (1988) Schistosorna mansoni: Expression and role of cysteine proteinases in developing schistosomula. Exp. Parasitol. 67, 238-246. 20 Klinkert, M.-Q., Felleisen, R., Link, G., Ruppel, A. and Beck, E. (1989) F'rimary structures of Sin31/32 diagnostic proteins of Schistosoma mansoni and their identification as proteases. Mol. Biochem. Parasitol. 33, 113-122.