Molecular cloning and bacterial expression of cDNA for rat calpain II 80 kDa subunit

Molecular cloning and bacterial expression of cDNA for rat calpain II 80 kDa subunit

Biochimica et BiophysicaActa, 1216(1993) 81-93 81 © 1993 Elsevier Science Publishers B.V. All rights reserved 0167-4781/93/$06.00 BBAEXP 92533 Mol...

1MB Sizes 0 Downloads 58 Views

Biochimica et BiophysicaActa, 1216(1993) 81-93

81

© 1993 Elsevier Science Publishers B.V. All rights reserved 0167-4781/93/$06.00

BBAEXP 92533

Molecular cloning and bacterial expression of cDNA for rat calpain II 80 kDa subunit Carl I. DeLuca, Peter L. Davies, John A. Samis and John S. Elce Department of Biochemistry, Queen's University, Kingston (Canada)

(Received 4 February 1993)

Key words: Calpain; Cloning; Gene expression;Proteinase; (Rat) The complete cDNA of 3.2 kb for rat calpain II large subunit has been constructed from library- and polymerase chain reaction-derived fragments, and sequenced. The cDNA encodes a protein of 700 amino acids having 93% sequence identity with human calpain II, and 61% identity with human calpain I. The gene possesses 21 exons, of which exons 3-21 have been mapped over 33 kb of the rat genome. A new phagemid expression vector was created from pT7-7 by insertion of the fl origin and mutation of an NdeI to an NcoI site. Rat calpain II cDNA ligated into this vector expressed in Escherichia coli an 80 kDa protein identical in size to highly purified rat calpain II; this protein was specifically recognized on immunoblots by an affinity-purified anti-rat calpain II antibody. This is the second mammalian calpain II large subunit to be fully sequenced, and the first to be artificially expressed.

Introduction

The calpains (E.C. 3.4.22.17) are cytoplasmic Ca 2+dependent thiol proteinases, present in essentially all tissues of higher animals, but the question of their physiological role remains difficult to answer [1-3]. Calpains I and II in mammals both consist of a large, 80 kDa, catalytic subunit, having about 65% identity of amino acid sequence between the I and II isoforms, and a small, 30 kDa, regulatory subunit, which is identical in the two forms. In chicken, there appears to be only one form of calpain large subunit, whose sequence has features intermediate between the mammalian I and II isoforms [4]. cDNA sequences of the large calpain subunits of chicken, man (I and II), and rabbit (I and II) (incomplete) have been reported [5-8]. The only calpain large subunit whose i n t r o n / e x o n structure has been established is that of the chicken [4]. While the interspecies identities among the known calpains are fairly high, widespread use of the rat as an experimental animal model made it desirable to isolate specific rat calpain sequences. Partial genomic clones and a few exons of rat calpain I and II large subunits

Correspondence to: J,S. Elce, Department of Biochemistry,Queen's University, Kingston, Ontario, Canada, K7L 3N6. The nucleotide sequence data reported in this paper have been submitted to the EMBL/GenBank Data Libraries under the accession number L09120.

were reported recently [9], and the next step clearly was isolation of full-length cDNA clones. For rat calpain II, this has proved to be difficult: no cDNA library clone could be found which extended more 5' than exon 16, and repeated screening of the available genomic libraries similarly failed to identify clones which contained 5' terminal regions of the gene. It was therefore necessary to construct the complete cDNA by combining the products of several different polymerase chain reaction (PCR) amplifications and one cDNA library clone. We report here the assembly, genomic mapping, sequence, and expression in Escherichia coli of a full-length cDNA encoding the large subunit of rat calpain II. Materials and Methods

Rat genomic libraries were kindly donated by Dr. D.W. Back, Queen's University, Kingston, and by Dr. W.P. Fung-Leung, University of Toronto. Both had been constructed by partial MboI digestion of rat genomic DNA, size selection, and ligation into BamHI-digested AEMBL3. A rat lung cDNA library in Agtll and a 5'-stretch cDNA library in Agtl0 were purchased from Clontech (Palo Alto, CA, USA); a rat liver cDNA library in Agtll was kindly donated by Dr. C. Mueller, Queen's University. The cDNA libaries had been made from oligo(dT)17-primed first strand cDNA, with the use of E c o R I linkers. All the libraries were derived from Sprague-Dawley rats. Oligonu-

82 TABLE I

pBluescript SK + (Stratagene, La Jolla, CA, USA). Double strand sequencing by the dideoxy termination method was performed with Sequenase (United States Biochemical Corp., Cleveland, OH., USA) in accordance with protocols supplied by the manufacturer. All the reported sequences were confirmed by sequencing in both directions. PCR amplifications were performed in most cases for 30 cycles of 94°C for 2 min; 42°C for 2 min, and 72°C for 2 rain, with 2 U of Taq polymerase in a volume of 0.1 ml [11]. Total RNA was prepared from rat tissues by the acid guanidinium thiocyanate method [12]. Primer extension at the 5' terminus [13] was carried out from a 32p-labelled exon 1 antisense oligonucleotide (Table I) on 25/xg of total RNA from rat lung, rat kidney, and winter flounder liver, after heat denaturation, with or without further denaturation with methylmercuric hydroxide. The products of extension were run on a sequencing gel with a sequencing ladder as size standard. Southern blotting of genomic DNA onto Zeta-Probe GT (Bio-Rad Laboratories, Hercules, CA, USA) was performed in 0.4 M NaOH, 1.5 M NaC1; hybridization in 50% formamide, 7% SDS, and washing conditions, were as described in the manufacturer's protocols.

List of oligonucleotide primers The table shows the rat calpain II primers in sense (s) or anti-sense ( a / s ) orientation which are mentioned in the text. Exon

Orientation

Site

1 1 l 4 8 9 20

sh a/s s a/s s a/s a/s

98 171 229 433 917 971 2021

a

Primer s e q u e n c e ( 5 ' - 3 ' ) TAY CCA TCG TCC TGC

CTS ACT GGG ACC CCC

AAY CCT CAT CAT AGC

CAR TGA CGA TCG TGG

GAY AGC ATG CCA AAC

TAY CCA GAA TAT AC

GA GG GC TG

ACT GAA GGA CAT CCA GA GTC CAG CTG CTT GAA TAT CT

a Position of the 3' terminal base of the primer in the coding region of rat calpain II cDNA. h 64-fold degenerate primer in exon 1; the possible base choices at the positions of degeneracy were: Y = C or T, S = G or C, R = A or G.

cleotides for sequencing and PCR (Table I) were synthesized by the Oligonucleotide Synthesis Service, Queen's University. The techniques of molecular biology were as described [10]. Sub-cloning was carried out in pUC19, and the complete cDNA clone was assembled in

T A B L E II

Intron-exon junctions The table shows the exon and intron lengths, and the sequences surrounding the exon-intron junctions, in the rat calpain II 80 kDa subunit gene. Exon n n

Intron n Posn a

Lgth

1

1

237

2 3 4 5 6 7

238 308 427 561 730 814

70 119 134 169 84 86

8

900

75

9 10 11 12 13 14 15 16 17

975 1 136 1 306 1 318 1 530 1 567 1633 1 691 1 756

161 170 12 212 37 66 58 65 69

18

1 825

79

19 20 21

1 904 2021 2 080

117 59 21 e

a

o c d e

5' splice donor site CCCACG CCCTTG TTCCAG TGCCAA ATCGAT GAGGAG TGACAA ATTCTG ACCCAA TATGAG GAGGAG CTACCA GAAGAG GGAGAG

gtagga N.D. gtaact gtaagt gtaagt gtacgg gtgaga gtaagt gtaagt gtatgg gtacgt

gtaggt gtattt gtaggt

CCAAGC N.D. CTGGAT gtatcc TACCAA gtaggg AAAGCG N.D.

TATTCA g t a c g t A T C T C G gtaagt CTGTGA

(Igth) b

(0.53) (1.5) (0.5) (1.0) (1.1) (0.9)

(1.7) (1.6) (0.86) (1.6) (1.2) (1.3) (3.2) (0.9) (0.37) (1.5)

(0,46) (1,0)

Exon n + 1 3' splice acceptor site

amino acid c

N.D. d

GAGATC

ctttgcattctcag ccccgcgcctgcag

GGGACT TTCTGG GATCAA ATCACC GTTGAA CTGCCC GATGTC ATACCT GTCCCA CTAACA AACTGT ATTGAA GATGCG GCGAAG GAAGAT AAAATT GTTTCA AGATAT TGGCTG

Thr/Glu G/ly Gln/Phe

N.D. ccttgcgtttacag ggatccttcag gctttgcttttcag ctctgctttgctag accttgttccacag cctttatttttcag tgtttaaccatcag ccttctctttccag gtctcaccttctag N.D. attttatcttttag cttaaccgaggcag ttttcatgttccag ccactctcctccag gacctcctccacag ttttcttcatgcag

Position of the first coding bp of exon n in the complete coding sequence. Approximate length of intron n, in kbp. A m i n o acid(s) at the splice site, the break-point shown by / . N.D., not determined. There are 21 coding bp, but intron 21 is approx. 1 kbp in length to the poly-adenylation site.

Ly/s Asp/lie Glu/Val As/n Tr/p A/sn Glu/Val Glu/Leu Gl/n Glu/Ile Glu/Asp A/rg Asp/Glu Gln/Lys G/ly L/ys Ser/Trp Stop

83 SDS gels and immunoblots were run by standard protocols as previously described [9]. A rabbit antibody to rat calpain II was affinity-purified on a column of highly-purified rat calpain II coupled to thiopropylSepharose. After applying calpain (dialyzed free of 2-mercaptoethanol under N 2) to a column (10 ml) of thiopropyl-Sepharose, the column was washed in 8 M urea and then in 0.6 M NaCI. The calpain-Sepharose matrix was incubated with crude antiserum, returned to a column, and washed again in 0.6 M NaCI, 5 mM E D T A , 10 m M Tris-HCl (pH 7.5). Specifically-bound antibody was eluted with 0.2 M glycine, 0.6 M NaC1, p H 2.6 into tubes containing 1 M Tris base [14].

Results

BamHI

3.a4

2.56

1.1

11 ,,I,,I,I, I I 11 '1"6'I 1.9

2.~

2.ao

5.22

12.3

2.6

2.2

, , I,,,, '9' 'h'j'

3.so

8.8

6.6

3.9

2.6?

Eco RI 2-1

10-1 31-1

Fig. 1. Genomic map of rat calpain II large subunit. The diagram shows the section of rat genomic DNA which includes exons 3 to 21 of rat calpain II large subunit. The exons are indicated as black rectangles, not strictly to scale, and every third exon is numbered. Vertical lines above and below the genome line represent BamHl and EcoRI sites, respectively; approximate fragment sizes are indicated. A putative EcoRI site at the 3' end is indicated as a dotted line. The relative positions of the principal genomic clones (2-1, 10-1, 31-1) are shown.

Genomic clones

Isolation of the AEMBL3 genomic clone 10-1, using as probe the rabbit calpain II plasmid kindly supplied by Dr. K. Suzuki [8], has been described previously [9]. Sub-cloning and sequencing of fragments of this genomic clone had identified sequences which represented exons 5, 8, 10, 11, and part of 18 of rat calpain II large subunit. Subsequently, a genomic clone 2 - 1 which had been identified in the initial screening was further analysed. It was shown to overlap 10-1 but extended a further 5 kbp downstream; the additional sequence was found to contain exons 19-21, the latter including the whole 3' untranslated region of 1 kbp to a typical poly-adenylation site. Several steps of genomic walking in the 5' direction, using two AEMBL3 rat genomic libraries, resulted finally in a clone 31-1. This was shown to contain exons 3 and 4, and although extending a further 5 kbp upstream of exon 3, it did not contain exon 2. No genomic clone containing exons 1 or 2 could be identified in these libraries, even when screened with a cloned genomic D N A fragment containing portions of exon 1 and intron 1 which later became available. Exons were located in sub-clones of rat genomic D N A by means of restriction mapping, probing of Southern blots with various PCR-derived c D N A fragments, and extensive sequencing. All exons except 1 and 2 have been m a p p e d on the gene, and all exons except 15 and part of 18 have been sequenced in genomic D N A together with flanking intron sequences. Most of the internal non-coding sequence is not reported here, but is available from the authors on request. The lengths of the exons and introns, and the sequences at the splice sites, are shown in Table II; the splice sequences conformed to the consensus sequences for these sites [15]. The genomic map of rat calpain II, so far as it is known, is shown in Fig. 1, which also illustrates the principal genomic clones.

Southern blot

The Southern blot (Fig. 2) showed multiple Bam HI and E c o R I fragments of genomic D N A which hybridized to full-length rat calpain cDNA, and most of them could be identified in the genomic clones. There are three B a m H I fragments (1.95, 3.0 and 3.9 kbp) in m

-f-

_

12.3

"4- 8 . 8 6.6 5.0

-I~

3.9

~

-~-

3.9

--"

--

3.5

2.67 2.57:1¢

-'9- 2 . 6

2.26

-=,"-

1.95

-~

~

2.3

Fig. 2. Southern blot of rat genomic DNA. Rat liver DNA was digested with BamHI or EcoRI, and samples of 15/.tg were run on a 0.7% agarose gel, blotted, and probed with nick-translated pBSratcalpainlI. Hybridization was performed in 50% formamide, 7% SDS, 0.25 M NaCI, 0.125 M sodium phosphate (pH 7.2), with heat-denatured calf thymus DNA, at 42°C; the final wash was performed in 0.1×SSC, 0.1% SDS, at 68°C. The arrows indicate the estimated fragment sizes in kbp.

84 the Southern blot which cannot be identified in the available genomic clones. These can be explained if at least one BamHI site is assumed to be present within intron 1, in addition to the known BamHI site within exon 1 (see Fig. 3). Similarly, there are two EcoRI fragments (3.9 kbp, in doublet, and 2.6 kbp) which cannot be identified. It is tentatively assumed that the 3.9 kbp fragment contains exons 1 and 2, and that the strongly hybridizing 2.6 kbp EcoRI fragment contains the whole of exon 2l, which is 1 kbp in length. If the latter assumption is correct, an EcoRI site must be located immediately downstream of the 3' end of clone 2-1, and this is indicated in Fig. 1.

cDNA fragments The complete c D N A sequence derived from the final assembled clone is shown in Fig. 3, together with the derived amino acid sequence. The Figure also shows the positions of the exons, some restriction sites, and some other sequence features which are discussed later. The nucleotide sequence data reported in this paper have been submitted to GenBank and assigned the accession number L09120. A comparison of the deduced amino acid sequence of rat calpain II with other published calpain sequences is shown in Fig. 4. The steps involved in assembling various fragments of the c D N A into a full-length c D N A were as follows (Fig. 5). (1) Exons 8-18. An exon 8 sense primer and an exon 20 antisense primer (Table I) derived from genomic sequence were used for PCR amplification of first strand c D N A primed with oligo(dT)17 from rat kidney RNA. The expected fragment of 1.2 kbp was obtained, as predicted from the sequence of human calpain II [7], and was conveniently cloned as a 0.9 kbp fragment, using naturally-occurring EcoRI sites at the 3' end of exon 8 and in the middle of exon 18. This was shown to be about 90% identical to human calpain II, and contained the sequences of exons 10, 11, and 18 which had previously been identified in genomic DNA, showing that the PCR product was the desired rat calpain II c D N A fragment. (2) Exons 18-21. The exons 8-18 fragment was used to screen a rat kidney Agtl0 5' stretch c D N A library, in the hope of identifying full-length c D N A clones. Of

several clones which were purified, that which extended most 5' was found to contain about 1.5 kbp of rat calpain II c D N A extending downstream from exon 16 to the 3' end of exon 21. This was subcloned as two EcoRI fragments (exons 16-18 and exons 18-21) and sequenced, confirming the sequences of these exons derived from genomic clones. For the final assembly, it was necessary to remove the 3' terminal EcoRI site which had been introduced as a linker in library construction. To achieve this, the exons 18-21 fragment was excised with EcoRI/NheI and tigated into pUC19 digested with EcoRI/XbaI (the NheI and XbaI sites are compatible and are lost on ligation). The fragment was transferred by means of EcoRI/HindIII digestion into pBluescript SK + . This resulted in the loss of approx. 80 bp, including the poly-adenylation signal, at the 3' terminus of the cDNA. (3) Exons 3-8. Rat calpain II was digested separately with endoproteinases Glu-C and Lys-C, and with CNBr, and HPLC-purified fragments were subjected to amino acid sequencing. The N-terminal sequence was blocked, but a sequence Y L N Q D Y was identified which is identical to residues 27-32 of human calpain II [7] (Fig. 4), and was assumed to be encoded within exon 1. A 64-fold degenerate 20 nt sense primer (Table I) was synthesized on the basis of this rat amino acid sequence, and used with an exon 9 antisense primer to amplify an exons 1-9 fragment from oligo(dT)lv-primed first strand rat kidney cDNA. The principal PCR product was of the expected length but proved difficult to clone; however, it was eluted from a gel, 32p-labelled by means of further PCR, and used on Southern blots to identify fragments of the genomic clone 31-1 which contained exons 3 and 4. When these had been sequenced, revealing an XbaI site, an XbaI/EcoRI c D N A fragment covering exons 3 - 8 was cloned from the exons 1-9 PCR product and sequenced. (4) Exons 1-4. A new preparation of first strand c D N A was primed from rat kidney R N A with an antisense exon 4 primer, and from this an exons 1-4 fragment was amplified by PCR with the degenerate exon 1 sense primer and the exon 4 antisense primer, yielding a strong, single band of DNA. This fragment was poly(G)-tailed and ligated into poly(C)-tailed,

Fig. 3. cDNA sequence of rat calpain II large subunit. The cDNA sequence and deduced amino acid sequence are shown. The 5' untranslated region shown was derived from genomic DNA, and probable transcription start sites at -68 and -78, as determined by primer extension analysis, are indicated by arrows. The sequences GCGCCCCG in the 5' non-codingregion, or close matches, are underlined. The first nueleotide of each exon is indicated by # above it, followed by the exon number (see also Table II). Some restriction enzymesites are also shown above the cDNA sequence. The presumed active site Cys-105 and His-262 residues are indicated by *, and the four presumed Ca2+-binding coil regions (each of 12 amino acid residues, in exons 17-20) are shaded. The extreme 3' end of the 3' untranslated region is also shown as it was sequenced on genomic DNA, and the location of the 3' terminal EcoRI site in the Agtl0 cDNA clone is indicated as (EcoRI). A 27 nt tandem repeat at 2608-2671 is underlined (<--->). Polyadenylation signal sequences (AATAAA) are shown double underlined, and instability sequences (ATTTA) are shown single underlined. These sequences are available from the authors and under GenBank accession number L09120.

85 AluI -100 AGCTCCCGCAGTCCGCTGCAGCGCCCCGGGCT

I

I

TGGc•GcG•ccCAAccGAGTGcTG•GcccCG•cTcTccGCGACcccTCTcTCTGTGcTGTGCGccATccCGATCGCTACC NcoI

15 30 45 60 GCG GGC ATC G C G A T G A A A CTG GCC A A A GAC CGC GAG GCG GCC GAG GGG CTG GGG TCT Met Ala Gly Ile A l a M e t Lys Leu Ala Lys Asp Arg Glu Ala Ala Glu Gly Leu Gly Ser

20

75 CAC GAG A G A GCC A T C A A G His Glu Arg A l a Ile Lys

90 105 120 TAC CTC AAC CAG GAC TAC GAG A C A CTG CGG A A C GAG TGC C T G Tyr Leu ASh Gln A s p Tyr Glu Thr Leu Arg Asn Glu Cys Leu

40

135 8amHI 150 165 180 GAG GCC G G G GCG C T C T T C C A G G A T C C T TCC TTC CCC GCC CTG CCG TCG TCC CTG GGC T T C Glu Ala Gly Ala Leu Phe Gin A s p Pro Ser Phe Pro Ala Leu Pro Ser Ser Leu Gly Phe

60

ATG

195 210 225 #Exon AAG GAG TTG GGG CCC T A C T C C AGC A A A A C T CGG GGC ATC GAA TGG AAG CGG CCC ACG G A G Lys Glu Leu Gly Pro T y r Ser Ser Lys Thr Arg Gly Ile Glu Trp Lys Arg Pro Thr Glu

2 80

BglII 255 270 285 300 TGC G C A GAC CCC C A G TTC A T T ATT G G A GGT GCT ACC CGC ACA GAC ATC TGC CAA G G A Ile Cys Ala A s p Pro G l n Phe Ile Ile Gly Gly Ala Thr Arg Thr A s p Ile Cys Gln Gly

100

#Exon 3 330 345 360 GCC CTT GGG G A C TGC T G G CTT CTG GCT G C C ATC GCC TCC CTC ACC TTG AAT G A A GAG ATC Ala Leu Gly Asp Cys T r p Leu Leu Ala A l a Ile Ala Ser Leu Thr Leu Asn Glu Glu Ile

120

XhoI 375 XbaI 390 405 420 CTG G C T C G A G T T GTG C C T C T A G A C CAG A G C TTC CAG G A A AAC TAT G C A GGC A T C TTC CAT Leu Ala Arg Val Val Pro Leu A s p Gln Set Phe Gln Glu Ash Tyr A1a Gly Ile Phe His

140

#Exon 4 450 465 480 TTC CAG TTC TGG C A A T A T GGC G A A TGG GTG GAG GTG GTG GTG GAC GAC AGA CTG CCC A C T Phe Gln Phe Trp G i n T y r Gly Glu Trp Val Glu Val Val Val Asp A s p Arg Leu Pro Thr

160

495 PvuII 525 540 AAG GAC GGG GAG CTG C T C TTT G T T CAT T C A G C T GAG GGG AGC GAG TTC TGG AGT GCC CTT Lys A s p Gly Glu Leu Leu Phe Val His Ser Ala Glu Gly Ser G1u Phe Trp Ser Ala Leu

180

555 #Exon 5 570 585 600 CTG GAG AAG GCC T A T G C C A A G A T C A A T GGG TGC TAT GAA G C A CTC TCA GGG GGT GCC A C C Leu Glu Lys Ala T y r A l a Lys Ile ASh Gly Cys Tyr Glu Ala Leu Ser G1y Gly Ala Thr

200

615 630 645 660 ACT GAG GGC TTT G A A G A C TTC A C A G G A G G C ATT GCC GAG TGG TAT GAG CTG AGG AAG CCT Thr Glu Gly Phe Glu A s p Phe Thr Gly G1y Ile Ala Glu Trp Tyr Glu Leu Arg Lys Pro

220

675 690 705 720 CCT CCC AAT TTG T T C A A G A T C A T C CAG AAG GCT TTG GAG A A A GGT TCT CTG CTT GGC TGC Pro Pro ASh Leu Phe Lys Ile Ile Gln Lys Ala Leu Glu Lys Gly Ser Leu Leu Gly Cys

240

ClaI #Exon 6 750 765 780 TCT A T C G A T ATC A C C A G C G C T GCG GAT TCT GAG GCC GTT ACG TAC CAG AAG TTG GTG A A A Ser Ile Asp Ile Thr Ser Ala Ala A s p Ser Glu Ala Val Thr Tyr Gln Lys Leu Val Lys

260

795 810 #Exon 7 825 PstI 840 G G A CAT G C G TAC TCC G T C A C T G G A GCC GAG GAG GTT G A A AGC T C A G G A A G C C T G C A G A A A Gly His A l a Tyr Ser Val Thr Gly Ala Glu Glu Val Glu Ser Ser G l y Ser Leu G i n Lys

280

ATC

855 TTG A T A CGT A T C A G G A A C

870 885 CCC TGG G G A CAA GTG GAG TGG ACC GGG AAG

#Exn TGG A A T G A C A A C

8

86 Leu

Ile A r g

Ile A r g A s n

Pro Trp Gly Gln Val Glu Trp Thr G l y Lys

T r p A s n A s p Asn

300

Pvull 915 930 945 960 TGC CCC A G C TGG AAC A C G G T T GAC C C A G A A G T A AGG GCA AAT CTA A C A G A A CGG CAG G A G Cys Pro Ser Trp Asn Thr Val A s p Pro Glu Val Arg Ala Asn Leu Thr G l u A r g G l n Glu

320

EcoRI #Exon 9 990 1005 XbaI 1020 GAC G G A G A A T T C TGG A T G TCC T T C ~ A G T GAC TTC CTG AGG CAC TAC TCT C G T C T A G A A A T C Asp G l y Glu Phe Trp Met Ser Phe Ser A s p Phe Leu Arg His Tyr Ser A r g Leu Glu Ile

340

1035 1050 1065 1080 TGC AAC CTG ACC CCG G A C A C C CTC A C C TGT GAC TCC TAC AAG A A G TGG A A A CTC A C C AAG Cys A s n Leu Thr Pro A s p Thr Leu Thr Cys A s p Ser Tyr Lys Lys T r p Lys Leu Thr Lys

360

1095 iii0 P s t I PstI # E x o n i0 ATG GAT GGG AAC TGG A G G C G A G G C TCC A C T GCA GGG GGC T G C A G G AAT TAC C C A A A T A C C Met A s p Gly Asn Trp A r g A r g Gly Set Thr Ala Gly Gly Cys A r g A s n Tyr Pro Asn Thr 380 1155 1170 1185 1200 TTC TGG A T G AAC CCT CAG TAC CTA A T C AAG CTG GAG GAA G A A G A T G A A GAT G A T GAG GAT Phe Trp Met Asn Pro G l n Tyr Leu Ile Lys Leu Glu Glu Glu A s p Glu A s p A s p Glu A s p

400

1215 1230 1245 1260 GGG GAG AGG GGC TGT A C C TTC CTG GTG GGC CTC ATC CAG A A G CAT CGG CGG CGG CAG AGG Gly Glu Arg Gly Cys Thr Phe Leu Val Gly Leu Ile Gln Lys His A r g A r g A r g Gln Arg

420

1275 AAG ATG GGT GAG GAC A T G Lys Met Gly Glu A s p Met

1290 1305 # E x o n ii #Exon 12 CAC A C C A T T GGC TTC GGC ATC TAT GAG G T C CCA G A G GAG CTA His Thr Ile Gly Phe Gly Ile Tyr Glu Val Pro G l u Glu Leu 440

1335 1350 1365 1380 A C A GGG CAG ACC A A C A T C CAC CTC AGC AAA AAC TTT TTC CTG A C A A C C CGA G C C A G G GAG Thr Gly Gln Thr Asn Ile His Leu Set Lys Asn Phe Phe Leu Thr Thr A r g A l a Arg Glu

460

1395 1410 1425 XmaI CGG TCA GAT ACC TTC A T C A A C CTC CGG GAG GTC C T C AAC CGC TTC A A G CTG C C C C C G G G A Arg Ser A s p Thr Phe Ile A s n Leu A r g Glu Val Leu Asn Arg Phe Lys Leu Pro Pro Gly

480

1455 GAA TAT GTC CTT GTT Glu Tyr Val Leu Val

1470 1485 1500 CCT T C C ACC TTC GAG CCC CAC AAG AAT G G C G A T TTC TGC A T C CGA Pro Ser Thr Phe Glu Pro His Lys Asn Gly A s p Phe Cys Ile Arg

500

1515 #Exon 13 1545 1560 TTC TCA GAG A A G AAG GCT GAC TAC CAA ACT GTC GAT GAT G A A A T C G A G G C C AAC ATT Phe Ser Glu Lys Lys A l a A s p Tyr Gln Thr Val Asp Asp Glu Ile Glu A l a Asn Ile

520

GTC Val

# E x o n 14 1590 1605 G A A GAG A T T G A A G C C A A T G A G GAG GAC ATT GGA GAT GGA TTC A G A AGG Glu Glu Ile Glu Ala A s n Glu Glu A s p Ile Gly Asp Gly Phe Arg Arg

Pvull CTG TTT GCT C A G Leu Phe A l a Gln 540

# E x o n 15 BglII 1650 PstI 1665 1680 GCT G G A GAG GAT G C G G A G A T C TCT GCC TTT GAG C T G C A G A C C ATC TTG A G A A G A GTT Leu Ala Gly Glu A s p Ala Glu Ile Ser Ala Phe Glu Leu Gln Thr Ile Leu A r g Arg Val

560

#Exon 16 CTA G C C AAG CGC G A A G A C ATC AAG Leu A l a Lys Arg Glu A s p Ile Lys

1710 1725 1740 TCA GAT GGC TTC AGC A T C GAG A C C TGT AAG ATC ATG Ser A s p Gly Phe Ser Ile Glu Thr Cys Lys Ile Met

580

1755 # E x o n 17 1770 1785 1800 GTG GAC A T G CTG GAT G A A GAT GGG A G C G G C AAG CTG GGG CTG AAG .GAG T T C T A C A T C C T C Val Asp Met Leu ~ ~ ~ ~ ~ i ~ i ~ ~ ~iii~iii~!ili!i~i~ilili~ Phe Tyr Ile Leu

600

CTG

1815 # E x o n 18 TGG ACG AAG ATT CAG A A A TAC C A A A A A A T T TAC

1845 1860 CGA GAG ATC GAC GTG GAC A G G TCG G G A

87 T r p Thr Lys Ile Gln Lys Tyr Gln Lys

Ile Tyr Arg Glu Ile ~ i . ; . i ~ . i . i . ~ . . i . ~ . . . . . . ~ . . . , ~

ECORI 1875 1890 #Exon 19 1920 ACC ATG AAT TCC TAT GAG ATG CGG AAG GCT TTG GAA GAA GCA GGT TTC AAG CTG CCC TGT ~ !i;~;i~ i~ ~ili::~i~ Met Arg Lys Ala Leu Glu Glu Ala Gly Phe Lys Leu Pro Cys +:.:.:+:.:.:.:.:.:.:.:...:.:.:.:.:.:.:.:,:.:.:....:.:.:.:.:.:.:.:.:.:.:.:.:.:-:.:,:.:.:.:+;+:......:.:.:.:.:.;.: :.:,:: :.: : : : :

1935 c~

cTr car

c~

GTC ATC GTT GCC

1950 CGG

1965

rTr GcA GAT GAC G~

620

640

1980

CTC ATe ATe GAC TTT aAe

Gln Leu His Gln Val Ile Val Ala Arg P~ii::iii~i!~Z:i::~i~iii::~::::::~ii::~~ : : i ~

~

~

~

1995 2010 #Exon 20 PvuII 2040 AAC TTT GTG CGG TGT TTG GTT CGG CTG GAA ATA CTA TTC AAG ATA TTC AAG C A G C T G GAC ~ Phe Val Arg Cys Leu Val Arg Leu Glu Ile Leu Phe Lys Ile Phe Lys Gln Leu ~

660

680

2055 2070 #Exon 21 2100 CCT GAG AAC ACT GGA ACG ATA CAG CTC GAC CTT ATC TCG TGG CTG AGT TTT TCA GTA CTC

TGAAGTTACATG•CTGAAGATTTCTGGTGGAAGAAAAcAG•cAATG•cTAcA•TTAAATACTTTGTAT•TGGA••TTGAA 2200 •GTA•GGGAA•ATTTA•TT•AT•GGGTGATTGTAAcTGAAGcTcTAAccGTcAGAGGTGGATGAAcTTT•ACATATcAGA

2300 GTAATGGATcTGcATA••ATTAACcAAATGcAAATGAATTGcATAA•cCTcGGTAAGGTcAGAGAGAGcTTTATGCCT•T 2400 A•ATAGTATATccTTT•TA•TTcTAcAcATcGTCTCTT•ATAGcAATATTAAA•cAGGAAGGACTGTTGAGAGGTTTAA•

2500 TAATTGGG•AAA•AGCAATTGAT•T•AGAGG••AcAATG•CTTGATGGGAATATTTAAAGAAATTTAAAAATTTAAGATG •AGTGGTGG•AT•TT••ACAG•AGGccAGAGAGAAA•AGAGAGAGAGAGAGAGAGAcTTGGAAATACAATAATAAAGTAA 2600 TCTGTGTGGAG•GT•TAAAGAATCCAGTCTGATGGAGG•ATCTAAAGAAT••AGTCTGATGGAGG•ATCTAAAGAATCCA <................... >< 2700

GGAGcCAGGAGTATcTGA•ACAAGGc•AAccTGGGCTAc•TAGCAAGA••TGAT•T•AAAAAATAAATGTCAATTAAAAA > 2800 AAAAGAAGGCATGTTTc•AT•ATTTGGTTcTAACAAAAGATGGTTTTTATAcACCTTTGTGAACAG•AAGcCCTCTTAGA 2900

cT•AAGTccTTGACcATATTTGGccTTAGAcATTGAAGTAGTGAGTAGTGAGATcATTTGA•TGAccAACTGAGccAGGT cTG••AT•TGAAAGAAAAT•GTGATAcAAc•c•AGGGATGGTGccTAcGGcTcGGGAGAACAGGG•TCAGAAGAT••TTC 3000 NheI TTAAAACAAccAAGcTGGcTTcGTGGTccGAGGAAAATGCTAG~AACcATAcAGAcATAcTGc~TTAT~cACTTcCAAAc

(EcoRI) 3100 GTACTGTTAATAAAGAGTTGCAATGCAGTCCGCCGTGCCTTGTTTTTCCTCGTAGAGGTCTG

PstI-digested, pUC19 [16]. After transformation of E. coli, the desired exons 1-4 clone was isolated and sequenced. (5) 5' untranslated region and the 5' end of exon 1. The 5' RACE procedure [17] was unsuccessful, as also

were all attempts to find a genomic clone containing this region. The method of inverted PCR [18] was therefore used to isolate a genomic fragment containing the desired 5' portions of the cDNA. Genomic DNA was digested with AluI (and also separately with

88 Amino

a) b) c) d) e) f) g)

Rat Human Rabbit Chicken Human Rabbit Rat

calpain calpain calpain calpain calpain calpain calpain

Acid

II II II I I I

Sequence

Comparison

of

Calpain

Large

Subunits i0 MAG I AMKLAK ..... a ....

(incomplete,

residues

46-end) MMPFG---AR-QR MSEEIITPVYCT-VSAQVQ-

(incomplete, (incomplete,

residues residues

400-end) 440-563)

30 • • 60 DREAAEGLGSHERAIKYLNQDYETLRNECLEAGALFQDPSFPALPSSLGFKELGPYSSKTRGIEWKRPTE A ......... T ......... I--A Q.

a) b) c) d) e)

--LR---V-E-NN-V Q-AREL---R--N

a) b) c) d) e)

90 • 120 ICAD,QFIIGGATRTDICQGALGDCWLLAAIASLTLNEEILARVVPLDQSFQENYAGIFHFQFWQYGEWV N LSPE .... V .............................. V ................... R LVD ..... V ....................... G ...... L-H .... HG ..... D ........ LLSN .... VD DTL-H .... HG .... NG ........

a)

EVVVDDRLPTKDGELLFVHSAEGSEFWSALLEI~YAKINGCYEALSGGATTEGFEDFTGGIAEWYELRKP

........ ,-,,--,-,-,--,--,---,-,, ..... G .... ,--,,--,,-,--,-,,--,,-,---,-,---,



180

'MR' .... V ..... S.... ,--K .....



150

I--F .... L--F ....

210

b)

c) d) e)

D ..... D .....

a) b) c) d) e)

PPNLFKI

L L--I---K-V

CT ............. ....... N .............

L ..... V--S

K-A S .... S .-V--M-D-KRA S-S ......... VT ....... A

• 240 270 IQKALEKGSLLGCS I D I TSAADSEAVTYQKLVKGHAYSVTGAEEVES SGSLQKL IRI RNPWGQ Q I-F ................... N .............. E --D--R ...... Q I-L A .............. E -R-MGH--R .... R F-M .... FK ........... AFKD-NYR-QQEQ -SD-YQ--L .... R .......... S-VL-M-I-FK ............. KQ-NYR-QVVS---M ..... E

a) b) c) d) e)

300 VEWTGKWNDNCPSWNTVDPEVRANLTERQEDGEFWMS ..... R .......... I---E-ER--R-H ..... R ...... N ER-A--H A-S-GSSE-DNI--SD-EE-QLKM A-S-SS SE--N---YE-DQ-RVKM

a) b) c) d) e) f)

• . 390 MDGNWRRGSTAGGCRNYPNTFWMNPQYLIKLEEED**EDDEDGERGCTFLVGLIQKHRRRQRKMGEDMHT .............. **--E .... S ............... V"*--Q .... S FE-T ............ ,-,---,---,,---,---'',-,,-,-,,-,---,-, YE-T .............. A---V---FK-R-D-T-DPD-YG-R-S--S-VLA-M --ES--S-VLA-M

PstI and RsaI) and self-ligated into circles; sense and antisense primers directed away from each other within the known 3' terminal portion of exon 1 were used to amplify these circular D N A preparations. Several bands of DNA were obtained, but only one of approx. 500 bp

330 360 FSDFLRHYSRLE I CNLTPDTLTCDSYKKWKLTK ...................... S-T ........ S-T ........ R--M-EF ........... A--K-ELSR-HTQV R--M-EFT .......... A-KSRTIR--NT-L 420

...... ...... ......

,-RV-G .... E-RF-R--EE-RF-R--E-

derived from the A l u I digestion hybridized with the cloned exons 1-4 plasmid. This fragment, eluted from the gel and re-amplified, was cloned by GC-tailing. It was found to contain the 3' end of exon 1, followed by 190 bp of intron 1, an AluI site, 110 bp of 5' untrans-

89 a) b)

• • 480 IGFGIYEVPEELTGQTNIHLSKNFFLTTRARERSDTFINLREVLNRFKLPPGEYVLVPSTFEPHKNGDFC S N ............... . ..........

c) d)

e) f) g)

R .......

---AV ---AV ---AV

...... AQ-SQ-V--K-D---RNQS-A--E .... P--V--PAV--KRD---ANAS-A--EQ .... R--V--PAL--KRD---ANAS-A--EQ

........

-A--P*V--KRD---ANAS-AQ-EH

.......



a)

G .................................

510

.

.

S-QIR

......

ST--R ST--R

.......

S--IR

......

• I ........

N-D

I ........

N ......

IV .........

V ....... V IV

.... EA--I

N-E---V N-E---V N-E---L

540

IRVFSEKKADYQTVDDEIEANI*EEIEANEEDIGDGFRRLFAQLAGEDAEISAFELQTILRRVLAKREDI

b)

A ........

..............................

DL*--ADVS-D--D

Q--

V ...........

d)

L---T--QS-TAEL-E--S-DLAD-E-IT-D--E---KNM-Q

......

e)

L-F

....

S-GTVEL--Q-Q--LPD-QVLS--E-DEN-KA--R

......

M---VK--R---N-IIS-HK-L

f)

L-F ....

R-GT-EL--Q-Q--LPD-QVLSA-E-DEN-KA--R

......

L---VR

g)

V-F

a)

KSDGFSIETCKIMVDMLDEDGSGKLGLKEFYILWTKIQKYQKIYREIDVDRSGTMNSYEMRKALEEAGFK

......

A ,

L*--FDIS-D--D--V

c)

N ,---,---,---,--,-,,-,

GT~EL--Q-Q--LPD-KVLS--E-D-N-KT--SK-A-D-M---VK

570

QK-

......

N-ITS-HK-L

......

N-IIS-

600

630

b)

..................

S

c)

-T--L

S--T

d)

-T ....

e)

RTK---L-S-RS--NLM-R--N

f)

RTK---M-S-RS--NLM-R---N

a)

LPCQLHQVIVARFADDELIIDFDNFVRCLVRLEILFKIFKQLDPENTGTIQLDLISWLSFSVL

b)

M

.............

.........

V .................

LDS-RN--NLM-K---AR---V--Q---N--RSWLT-F-QY-L-K

S .....

M---S

....

.....

V--N---NR-RN-LS-F-KF-L-K--S-SA

....

M-I-S

....

....

V--N---NR-RN-LA-F-KF-L-K--S-SA

....

M-I-S

....

660

c) d)

-NNK

e) f)

....

.

690

Q ................

T ................

E

Q

T ..........

....

V---Y--A-TGV

......

E

D---M

......

C---K--TM-RF-HSM-RDG---AVMN-AE--LLTMCG

-NKK-YEL-IT-YSEPD-AV

......

C ......

TM-RF--T--TDLD-VVTF--FK--QLTMFA

-NKK-YEL-IT-YSEPD-AV

......

C ......

TM-RF--T--TDLD-VVTF--FK--QLTMFA

C

C ....

Fig. 4. Comparison of rat calpain lI deduced amino acid sequence with other published calpain sequences. The numbering given is ~r rat calpain II; the asterisks indicate a gap of n o amino acids a~er residue 395, and of one amino acid a~er residue 520, introduced to m~imize the identities between the various calpains. One additional gap has been introduced at position 445 in rat calpain I. Dashes indicate amino acid residues which are identical to those of rat calpain II. Peptides at 20-34 and 62-69 which were purified ~om rat calpain II and sequenced are underlined. The species and calpains represented are: (a) rat calpain If; (b) human calpain If; (c) rabbit calpain II; (d) chicken calpain; (e) human calpain I; (D rabbit calpain I; (g) rat calpain I.

lated region leading to an NcoI site in a Kozak sequence [19] at the assumed initiation codon, and the first 200 bp of coding sequence.

Primer extension analysis Primer extension from the antisense exon 1 primer on R N A not treated with methylmercuric hydroxide gave multiple termination products up to about 250 nt in length; after methylmercuric hydroxide denaturation, a single termination product of about 260 nt was seen in the lung sample, and of about 270 nt in the kidney sample (result not shown). These values correspond to transcription start sites approx. 70 and 80 nt upstream of the initiation codon. No extension product was formed from winter flounder RNA.

Construction of a full-length cDNA for rat calpain H large subunit The required c D N A was assembled in a sequence of ligation reactions (Fig. 5), with selection when necessary of the desired orientation. The assembly was car-

ried out in pBluescript SK + , principally because the cloning strategy required an EcoRI site centrally located in a multiple cloning region. The final pBluescript plasmid is designated pBSratcalpainlI, and contains the entire rat calpain cDNA, including the 5' and 3' untranslated regions; it also includes, upstream of the 5' untranslated region, a small section of rat genomic D N A derived from the inverted P C R procedure which contains a duplicated section of the 3' end of exon 1 and 190 bp of intron 1. This clone lacks the sequence downstream of the NheI site (Fig. 3), including the poly-adenylation signal.

Expression of rat calpain H cDNA The expression vector pT7-7 [20] was modified in two ways, by insertion of the fl origin of replication, and by conversion of the NdeI site of pT7-7 to an NcoI site. The new phagemid was designated pT7-7fN, and was checked by digestion with NcoI, failure to digest with NdeI, and partial sequencing.

90 T h e e n t i r e coding r e g i o n a n d 3' u n t r a n s l a t e d r e g i o n o f rat c a l p a i n II was excised f r o m p B S r a t c a l p a i n l I by d i g e s t i o n with NcoI a n d SalI (Fig. 5), a n d l i g a t e d into p T 7 - 7 f N which h a d b e e n d i g e s t e d identically. This p l a c e d t h e p r e s u m e d initiation c o d o n o f r a t c a l p a i n II at t h e d e s i r e d l o c a t i o n with r e s p e c t to t h e r i b o s o m e b i n d i n g site in pT7-7fN. T h e resulting rat c a l p a i n II p h a g e m i d was transf o r m e d into E. coli K38 c o n t a i n i n g t h e p G P 1 - 2 plasmid which p r o d u c e s T7 R N A p o l y m e r a s e on growth at 42°C. T o t a l cell p r o t e i n f r o m cells h e a t - s h o c k e d at 42°C for 30 min a n d grown at 37°C for a f u r t h e r 90 min was a p p l i e d to 7.5% p o l y a c r y l a m i d e gels a n d a n a l y z e d by i m m u n o b l o t t i n g , using an affinity-purified poly-

B

XE

X

E

E

hE

1

4

3

8

18

L

................................ 21

pBS

E

lax

S

BE I

I 8

1

B

X

E

1

X

I' 1 B

S

8:18

I B

21

18

21 E

E

8

18

E

E I

E I

S ........ I

8

18

21

NB 1

B

NBX 1

[] C o d i n g [] N o n C o d i n g

E

E

,

S

,

_

pBSratcalpainll • Intron • GC Tail

--

21 Plasmid

Fig. 5. Construction of a full-length cDNA for rat calpain II large subunit. The diagram shows in outline the fragments and steps involved. Some exon numbers are indicated below each fragment. The fragments were in pUC19 except for those marked pBS which were in pBluescript SK+. Restrictions sites indicated by single letters were: BamHI, B; EcoRI, E, NheI, h; NcoI, N; SalI, S; XbaI, X. The final product in this diagram is referred to as pBSratcalpainlI. For bacterial expression, the NcoI/SalI fragment from this construct was transferred into pT7-7fN.

9 2 kDa -4-

1

2

3

4

5

6 6 kDa

6

Fig. 6. lmmuno-blot and 3ss-labelling of bacterially expressed protein. Protein samples for immunoblotting were obtained from 100/xl of a suspension of E. coli K38 containing the pT7-7fN expression vector with or without calpain eDNA after heat induction at 42°C for 30 min and growth at 37°C for 90 min. For 35S-labelling, samples were obtained after heat induction at 42°C for 30 min with rifampicin followed by growth in the presence of [35S]methionine for 5 min at 37°C. The cells were collected by centrifugation, dissolved in gel sample buffer, and run on a 7.5% polyacrylamide SDS gel. All lanes were blotted, and lane 6 was then removed for staining with Commassie Brilliant Blue. Lanes 1-5 were first exposed to anti-rat calpain II antibody, goat anti-rabbit IgG antibody, and colour was developed with NBT and BCIP [10]. The dried blot was then exposed to X-ray film for 16 h at -70°C. The immuno-blot colour image and the autoradiograph have been superimposed for the figure. Lane 1: cells containing pT7-7fN-rat calpain II eDNA; lane 2: cells containing pT7-7fN with no calpain insert; lane 3: standard rat calpain II (0.1 /xg); Lane 4: 35S-labelled protein from cells containing pT7-7fN with no calpain insert; lane 5: 3ss-labelled protein from cells containing pT7-7fN-rat calpain II cDNA; lane 6: molecular weight standards of 92 kDa and 66 kDa are indicated with arrows.

clonal a n t i - r a t c a l p a i n II antibody. Cells c o n t a i n i n g p T 7 - 7 f N w i t h o u t a c a l p a i n insert w e r e grown in p a r a l lel as a n e g a t i v e c o n t r o l (Fig. 6). T h e i n d u c e d rat c a l p a i n II large s u b u n i t was also selectively l a b e l l e d with [35S]methionine in the prese n c e o f r i f a m p i c i n [20]; total cell p r o t e i n was run on SDS gels which w e r e e i t h e r fixed a n d dried, o r b l o t t e d as above, a n d t h e gels o r blots w e r e e x p o s e d to X A R film for 16 h at - 7 0 ° C . Fig. 6 shows t h a t a single b a n d was i d e n t i f i e d by t h e a n t i b o d y in t h e p r o t e i n s d e r i v e d f r o m cells c o n t a i n i n g t h e rat c a l p a i n II p h a g e m i d . This b a n d was of t h e s a m e size as highly p u r i f i e d rat c a l p a i n II, was not d e t e c t e d in extracts of c o n t r o l cells c o n t a i n ing v e c t o r pT7-7fN, a n d was specifically l a b e l l e d by [35S]methionine. In a f u r t h e r e x p e r i m e n t , h e a t - i n d u c e d positive a n d c o n t r o l cells w e r e lysed by s o n i c a t i o n on ice in 10 m M NaC1, 5 m M E D T A , 10 m M 2m e r c a p t o e t h a n o l , p h e n y l m e t h y l s u l f o n y l fluoride (50 / z g / m l ) , 20 m M Tris-HC1 ( p H 8.0). T h e soluble a n d insoluble m a t e r i a l s w e r e s e p a r a t e d by c e n t r i f u g a t i o n , a n d a n a l y z e d by i m m u n o - b l o t t i n g . R a t c a l p a i n was f o u n d 9 5 - 9 8 % in t h e pellet, with only traces disc e r n i b l e in the soluble p h a s e (result n o t shown).

91 Discussion

Genomic organisation Sequencing and mapping of rat calpain II exons revealed a high degree of conservation of intron/exon organisation with respect to chicken calpain, which is the only other calpain whose exons have been fully described [4]. With the exceptions of exon 1 in the rat, which is three codons shorter, and exon 13 which is one codon shorter, the lengths of the exons and the positions of the breakpoints were identical to those of chicken. The rat introns were however much longer on average than those of chicken calpain, since the whole gene extends over at least 33 kbp (and exons 1 and 2 have yet to be located on the gene), whereas the chicken gene is only 11 kbp long. The average length of the known introns in the rat calpain II gene (Table II) is 1.18 kbp, similar to the reported mean for vertebrates of 1.13 kbp [21], but intron 2 is longer than 5 kbp (Fig. 1), and intron 1 has not yet been isolated.

Isolation of cDNA The longest cDNA clone for rat calpain II which could be found in the available cDNA libraries represented only the 3' half, approx. 1.5 kbp, of the complete message. It was therefore necessary to adopt other strategies to obtain the 5' half of the message, involving a combination of genomic walking and several PCR steps. Some early attempts to find rat calpain 5' sequence by means of PCR with degenerate primers based on human calpain sequences were not successful. The RACE procedure applied to the 5' terminus also did not generate any useful results, even when correct primer sequences were known; this may be because the 5' region on both sides of the initiation codon is GC-rich, and therefore more difficult to reverse-transcribe. In practice, the 5' untranslated region could only be obtained by PCR from genomic DNA. The suggestion that the 5' region is difficult to reverse-transcribe is supported by the fact that unique and probably full-length 5' primer extension products could be obtained only after methylmercuric hydroxide denaturation of the RNA. Although exons 3 and 4 were obtained by genomic walking, repeated screening failed to identify a AEMBL3 clone containing exons 1 or 2. It is possible that MboI sites are located too frequently or infrequently in this region so that it is under-represented in the final libraries. If this is correct, genomic clones for these exons might be found in libraries made by random shearing. To surmount these obstacles, an exons 1-8 cDNA fragment was amplified using a 64-fold degenerate exon 1 sense primer based on rat calpain II amino acid sequence. It was used to locate exons 3 and 4 in

genomic clones. With this information, a new exon 4 anti-sense primer was then made, and an exons 1-4 cDNA fragment was amplified from a fresh RNA sample, and cloned. This in turn provided the sequence information for part of exon 1 which was used to make primers for inverted PCR, leading finally to isolation of a genomic fragment containing the initiation codon and some 5' non-coding sequence. All the various fragments could then be assembled by a series of cloning steps (Fig. 5) to form the complete cDNA (Fig. 3). The cDNA is approx. 3.2 kbp long, consistent with the Northern blot analysis which had indicated an mRNA of about 3.5 nt, as previously reported [9]. It encodes a protein of 700 amino acids, with a molecular weight of 79 880 Da. It is assumed to represent rat calpain II for several reasons: its deduced amino acid sequence shows identity of 93% with human calpain II and of 61% with human calpain I (Fig. 4); the sequence contains two peptides (residues 20-34 and 62-69) which had been established by amino acid sequencing of rat calpain II; and a rat genomic clone has been isolated containing exons which have a much greater identity with human calpain I [9]. The 5' non-coding sequence of 112 bp shown in Fig. 3 is GC-rich (67%) and contains no TATA box; it was derived from genomic DNA by means of PCR and starts with the AluI site used in its isolation. Primer extension suggested that transcription was initiated close to - 7 0 in rat lung and - 8 0 in kidney, and there are purines at -68, -69, -78, -79, which may indicate these start sites. Exon 1 must start at this point, since there is no splice acceptor sequence (AG) between the start site and the initiating ATG codon. The 5' non-coding region of the rat calpain II gene is therefore similar to that of human calpain II, which was extensively analyzed [22]. Human calpain II lacks TATA and CAAT boxes, has 70% GC in the region - 3 0 0 to -20, and showed multiple transcription start sites between - 142 and - 103. One further feature in the 5' non-coding region is a sequence GCGCCCCG, or sequences very closely related to this, which occurs at least three times in this short segment of both rat and human calpain II. Translation is assumed to start at the indicated ATG codon because it is the first ATG downstream of the start site, and lies in a strong Kozak consensus sequence [19]. Further, a hairpin structure can be formed by nt 9-18 which is thought to promote use of the preceding ATG [19]. The coding region of rat calpain II cDNA shows the features expected from its extensive identity with other calpains [23]. The presumed active site cysteine-105 and histidine-262 residues, and the presumed E-F hand Ca2+-binding motifs, one each in exons 17-20, are indicated in Fig. 3. In addition, the positions of the

92 exons are marked, as obtained from genomic mapping and sequencing. The fact that most of the cDNA was obtained by PCR raises the possibility of error, particularly of single base changes arising from the relative inaccuracy of Taq polymerase [24]. Expression of a protein of the correct size, and its reaction with the antibody, indicate that the correct reading frame, including at least one epitope, has been maintained, but single amino acid changes arising from base substitutions cannot be excluded. In fact the cDNA sequence was identical to the exon sequences in genomic clones, except for three locations, and in each case the residue encoded in genomic D N A was highly conserved in other calpains. Subsequently, sequencing of several fresh PCR subclones has confirmed that these differences had arisen from PCR errors. The correct sequences are shown in Figs. 3 and 4. Errors in exons 1 and 2 cannot be excluded, because no genomic clone has yet been found. In the 3' untranslated region there are at least three potential poly-adenylation signals, but for several reasons it appears that only the A A T A A A at 3067 is functional in vivo. It is followed 21 bp downstream in genomic D N A by a GT-rich region, as predicted [25]; the Northern blots [9] showed only one size of mRNA, corresponding to the use of this poly-adenylation signal; and the E c o R I linker site was found in this position in the Agtl0 cDNA library clones which were analyzed. A further feature of the 3' untranslated l~.glon is a cluster of 3 A q T F A sequences within 23 nt at 2473 which may confer instability on the m R N A [26]. Only single A T T T A sequences occur in the 3' untranslated regions of other calpains. Finally there is an exact tandem duplication of a 27 nt sequence at 2608 and 2635, and a triple repeat of a 14 nt sequence in the same region. This sequence has not been recorded in the data bases, and its significance, if any, remains to be determined. For expression of rat calpain II, which has a naturally-occurring NcoI site at the initiation codon, we chose to use the T7-polymerase-directed system of Tabor [20], both because of previous experience with this system, and because it permits selective labelling of the protein product. The pT7-7 plasmid was therefore modified by insertion of the fl origin to permit site-directed mutagenesis, and by provision of an NcoI site in the correct position with respect to the T7 ribosome binding site. With this new expression phagemid, it was possible to express rat calpain II in E. coli. Identification of the product as rat calpain II was based mainly on its recognition on an immunoblot by a specific, affinity-purified, anti-rat calpain II antibody. The product was also of the same molecular weight as highly purified rat calpain II, as shown both by the antibody, and by specific incorporation of 35S (Fig. 6). It, was also shown by immunoblotting that the

product was insoluble after sonication of the cells, and no CaZ+-dependent proteinase activity could be detected in the supernatant. It is possible that deposition in inclusion bodies has had the fortunate effect of protecting the calpain from (auto)degradation. Attempts will be made to solubilize the expressed calpain and to renature it, both with and without addition of calpain small subunit. The presence of the small subunit could assist folding of the large subunit, and it may be essential for proteolytic activity, since the activity of the isolated large subunit was reported to be very low [27]. Bacterial expression of the C-terminal calmodulinlike domains of both large and small subunits of calpain has been described [28,29], but this is the first report of expression of the whole large subunit. The work opens up the possibilities of structure-function studies of calpain itself, and the existence of the cDNA clone should facilitate calpain research in the rat, a very common experimental system. Efforts are now under way to complete the rat calpain I cDNA.

Acknowledgements This work was supported by the Medical Research Council of Canada, the Faculty of Medicine at Queen's University, and by a Medical Research Council Fellowship to C.I.D.

References 1 Goll, D.E., Thompson, V.F., Taylor, R.G. and Zalewska, T. (1992) BioEssays 14, 549-556. 2 Croall, D.E. and DeMartino, G.N. (1991) Physiol. Rev. 71,813839. 3 Suzuki, K., Minami, Y., Emori, Y., Imajoh, S. and Kawasaki, H. (1989) Adv. Exp. Med. Biol. 255, 173-183. 4 Emori, Y., Ohno, S., Tobita, M. and Suzuki, K. (1986) FEBS Lett. 194, 249-252. 5 0 h n o , S., Emori, Y., Imajoh, S., Kawasaki, H., Kisaragi, M. and Suzuki, K. (1984) Nature 312, 566-570. 6 Aoki, K., Imajoh, S., Ohno, S., Emori, Y., Koike, M., Kosaki, G. and Suzuki, K. (1986) FEBS Lett. 205, 313-317. 7 Imajoh, S., Aoki, K., Ohno, S., Emori, Y., Kawasaki, H., Sugihara, H. and Suzuki, K. (1988) Biochemistry 27, 8122-8128. 8 Emori, Y., Kawasaki, H., Sugihara, H., Imaioh, S., Kawashima, S. and Suzuki, K. (1986) J. Biol. Chem. 261, 9465-9471. 9 Samis, J.A., Back, D.W., Graham, E.J., DeLuca, C.I. and Elce, J.S. (1991) Biochem. J. 276, 293-299. 10 Sambrook, J., Fritsch, E.F. and Maniatis, T. (1989) Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor. 11 Saiki, R.K., Gelfand, D.H., Stoffel, S., Scharf, S.J., Higuchi, R., Horn, G.T., Mullis, K.B. and Erlich, H.A. (1988) Science 239, 487-491. 12 Chomczynski, P. and Sacchi, N. (1987) Anal. Biochem. 162, 156-159. 13 Boorstein, W.R. and Craig, E.A. (1989) Methods Enzymol. 180, 347-369. 14 Elce, J.S., Baenziger, J.E. and Young, D.C.R. (1984) Biochem. J. 220, 507-512.

93 15 Senapathy, P., Shapiro, M.B. and Harris, N.L. (1990) Methods Enzymol. 183, 252-278. 16 Gubler, U. and Hoffman, B.J. (1983) Gene 25, 263-269. 17 Frohman, M.A., Dush, M.K. and Martin, G.R. (1988) Proc. Natl. Acad. Sci. USA 85, 8998-9002. 18 Triglia, T., Peterson, M.G. and Kemp, D.J. (1988) Nucleic Acids Res. 16, 8186. 19 Kozak, M. (1991) J. Biol. Chem. 266, 19867-19870. 20 Tabor, S. (1990) In Current Protocols in Molecular Biology (Ausubel, F.M., Brent, R., Kingston, R.E., Moore, D.D., Seidman, J.G., Smith, J.A. and Struhl, K., eds.), pp. 16.2.1-16.2.11, Greene/Wiley-Interscience, New York. 21 Hawkins, J.D. (1988) Nucleic Acids Res. 16, 9893-9908. 22 Hata, A., Ohno, S., Akita, Y. and Suzuki, K. (1989) J. Biol. Chem. 264, 6404-6411.

23 Sorimachi, H. and Suzuki, K. (1992) Biochim. Biophys. Acta 1160, 55-62. 24 Ling, L.L., Keohavong, P., Dias, C. and Thilly, W.G. (1991) PCR Methods Appl. 1, 63-69. 25 Levitt, N., Briggs, D., Gil, A. and Proudfoot, N.J. (1989) Genes Dev. 3, 1019-1025. 26 Malter, J.S. (1989) Science 246, 664-666. 27 Kikuchi, T., Yumoto, N., Sasaki, T. and Murachi, T. (1984) Arch. Biochem. Biophys. 234, 639-645. 28 Murachi, T., Takano, E., Maki, M., Adachi, Y. and Hatanaka, M. (1989) Biochem. Soc. Symp. 55, 29-44. 29 Minami, Y., Emori, Y., Imajoh, S., Kawasaki, H. and Suzuki, K. (1988) J. Biochem. (Tokyo) 104, 927-933.