Gene, 130 (1993) 175-181 0 1993 Elsevier Science Publishers B.V. All rights reserved. 0378-l 119/93/$06.00
175
GENE 07185
Isolation and characterization of the chicken ~ystatin-encoding gene: mapping transcription start point and polyadenylation sites (Proteinase inhibitor; RNA mapping; Northern analysis; housekeeping gene; SPl-binding sites; Csn supe~amily)
Rita Colella” and John WC. Birdb “Department ofAnatomical Sciences and Neurobiology, University of Louisville School of Medicine, Louisville, K Y40292, USA: and bBureau qf Biological Research, Rutgers University, Piscataway, NJ 08855, USA. Tel. (908) 932-3970
Received by D.T. Denhardt: 10 July 1992; Revised/Ac~pte~
3 February/23 February 1993; Received at publishers: 2 April 1993
SUMMARY
A 6.9-kb fragment containing coding sequences for chicken egg white cystatin (CsnEW) was isolated from a chicken genomic library using the CsnEW cDNA as a probe. The gene is approximately 2.4 kb in length; it contains three exons, two introns and two polyadenylation signals. The exon-intron arrangement corresponds exactly with those of other members of the Csn superfamily. The sequence of the 5’ flanking region contains two SPl-binding sites and a high G + C content suggestive of a housekeeping gene. All tissues studied express CsnEW mRNA; Northern analysis showed that CsnEW mRNA levels are most abundant in lung and least abundant in liver and spleen. Mapping of the 3’ end of the CsnEW mRNA isolated from brain tissue resolved two CsnEW mRNA species. The larger transcript resulting from the use of the second polyadenylation signal was more abundant than the smaller transcript. Determination of the transcription start point (tsp) of CsnEW mRNA by primer extension and RNase protection assays showed that CsnEW mRNA from a number of chicken tissues was approximately 40-50 nucleotides shorter than that predicted from the CsnEW cDNA isolated from chicken oviduct.
The chicken cysteine proteinase inhibitor, egg white cystatin (CsnEW), was first described by Sen and Whitaker (1973) and further characterized by Anastasi et al. (1983). This protein inhibits the papain-like cysteine proteinases. A number of proteins with properties similar to CsnEW have been isolated and characterized. These inhibitors have a wide distribution in tissue and biological fluids and belong to a superfamily of proteins known Correspondence ta: Dr. R. Colella, Department of Anatomical Sciences and Neurobiology, University of Louisville Health Science Center, Louisville, KY 40292, USA. Tel. (502)588-6840, Fax (502)588-6228.
Abbreviations: aa, amino acid(s); bp, base pair(s); Csn, cystatin; CsnEW, chicken egg white Csn; Csn, gene encoding Csn; kb, kilobase( nt, nucleotide(s); oligo, oligodeoxyribonucleotide; SDS, sodium dodecyl sulfate; fsp, transcription start point(s); u units.
as the eystatin (Csn) su~rfamily (Barrett et al., 1986). Members of the Csn superfamily are classified into four subfamilies based on the structure of their proteins. Family one is the stefin family whose members lack disulfide bonds and are generally found intracellularly. The second family is composed of proteins with two disulfide bonds per molecule. Members of this family are located predominantly in tissue and biological fluids. Their cDNAs contain coding sequences for a signal peptide supporting the predominant extracellular location for these proteins. Members of this family include human CsnC, (Abrahamson et al., 1987); salivary CsnSA and CsnSN (Saitoh et al., 1987); and CsnEW (Colella et al., 1989). The third family is composed of the kininogens, larger molecules found in the plasma which are involved in inflammation and coagulation. The kininogens are composed of three tandomly repeated Csn-like domains (Kitamura et al., 1985; Salvesen et al,, 1986). A fourth
176 (Sambrook et al., 1989). The sequencing strategy and restriction enzyme map are shown in Fig. 1. The CYS-I4 sequence is shown in Fig. 2. It is 6881 bp in length and contains the entire CsnEW gene as well as 3311 nt upstream from the AUG start codon. The CsnEW gene is composed of three exons separated by introns of 419 and 1109 nt. The first intron is positioned between codons for aa 53 and 54 and the second intron between codons for aa 91 and 92 of the mature protein (Schwabe et al., 1984). AG/GT consensus sequences are found at exon-intron boundaries of the gene (Mount, 1982). The position of the introns correspond exactly with those of the genes for CsnC (Abrahamson et al., 1990), CsnSN and CsnSA (Saitoh et al.,l987). However, the size of intron 1 (419 bp) is much smaller than that reported for CsnC (2252 bp) and C&N (15 11 bp). Comparison of the sequence of the CsnEW gene with that of its cDNA isolated previously (Colella et al., 1989) shows that both contain the same coding sequence for the signal peptide and two polyadenylation signals in the 3’ untranslated region, Two single nt changes were present in the coding sequence of the gene compared to the cDNA (aa Q22 and R6*), however, due to codon degeneracy this did not affect the protein sequence. This may represent a genetic variation resulting in a silent mutation. The sequence between the two polyadenylation signals also differs from the cDNA in that there are six additional G residues in the gene sequence. It is possible that these differences are artifacts incurred from the synthesis of the cDNA library.
family has recently been added to the Csn superfamily which includes the human His-rich glycoprotein (Koide, 1988) and cr-ZHS glycoprotein (Elzanowski et al., 1988). They have two repeated patterns of one Csn-like molecule (Rawhngs and Barrett, 1990). Analysis of the structural organization of a number of genes of members of the Csn superfamily support the evolutionary reIationship of these proteins. This report is on the isolation, structure, and expression of the chicken CsnEW gene.
RESULTS AND DISCUSSION
(a) Characterization of the CsnEW gene A chicken genomic library cloned into the BamHI site of hEMBL3 was obtained from Dr. Jeffrey Robbins of the University of Cincinnati School of Medicine. Approximately 10’ plaques were screened using a fulllength CsnEW cDNA probe isolated from a chicken oviduct cDNA library (Colella et al., 1989). The CsnEW cDNA was labeled with [a-32P]dCTP by nick translation. Five positive clones were isolated and the plaques purified by standard procedures (Sambrook et al., 1989). Inserts from the five positive phage were isolated by BumHI digestion and subjected to restriction analysis. Digestion with PstI and EcoRI gave similar restriction patterns. One of these clones, CYS-14 with an insert of approximately 6-7 kb, was chosen for further analysis. The CYS-14 insert and its restriction fragments (&I, EcoRI, ElindIII, XbaI and AvaI) were subcloned into the appropriate restriction site of the pGEM 3Zf(+) vector (Promega, Madison, WI, USA) for sequencing and for generating nested deletions with exonuclease III
(b) Flanking regions of the CsnEW gene Both the 5’ and 3’ regions of the CsnEW gene were
3314 3541
B 09
I
Hc ’
PEP
EPPP
III I
III1
tic
I P
I P
P
A
I
I
3962 4075
PHIP II i P
XHc He I ,
III
E
5260
5186
I
MC Hc 1
rkb
B
Hc I
I Hd
Hd
,
I
,
Fig. I. Map of the CYS-14 insert. (a) Structure of insert CYS-14 containing the entire CsnEW gene. Exons are depicted as solid boxes (numerals correspond to bp). (b) The insert was digested with AuaI (A), BnmHI (B), EcoRI (E), Hi&I (Hc), Hind111 (Hd), PstI (P) and XbaI (X) to obtain smaller fragments for sequencing and exonuclease III deletions. (c) A summary of the sequencing stategy is shown. Each strand was sequenced on average four times.
,SPl
M A G A R G C V V L L A CGTGTAGTGCGGTGTGTGGAGCC ATG GCA GGA GCT CGG GGT TGC GTA GTG CTG CTG GCC GOEDRSRL AALMLVGAVL GCG GCC CTA ATG CTC GTC GGC GCT GTC CTG GGC AGC GAG GAC CGC TCC CGG CTC LGAPVPVDENDEGLQRAL CTG GGG GCT CCA GTG CCT GTA GAT GAG AAC GAC GAG GGC TTG CAG CGG GCC CTG Q F A M A E Y N R A S N D K Y S S R CAG TTC GCC ATG GCC GAG TAC AAC AGG GCC AGC AAC GAT AAG TAC TCC AGC CGG VVRVISAKRQ GTG GTG CGG GTC ATC AGC GCC AAG CGG CAG GTGGGTGCGGGCGCGGCGGCTGGGGACGCCT GCTGCTCACGGCCTGCCTCTGTGTGTGTGGAGGCGCTGCACAGGCT GTTCTGCTGCAGAAGGTGGATAAAAGCTTGGGGCTTGGGGCTTACGGCTCTG~TGCT~TGTTGCTTTATGAGCT~
GCTGAGTTCCAGCCTCCGTCGTCCTGCTCTGTCCGGCCGGGAGAGAGCAGCCCTCC~TCGCGCGAG~
GGATCCAAACGCCTGGCCCCCATAGGCTTGGTCTGCAC~~TG~GATCC~GG~ATGAG~ TTTAAACCAGATGTTGAGTGTTCAGTCAGT~TG~~ATTGTTC~GCCTCTATCACAGT~~A~~ CTATTGAGGGTCATTCAGAGTTCAGTTCTGTTGAAAATGTTA~~GA~A~GGACTGG~~~ CCAACATCCTCGTCTAATTGTACPTTGAAAGACATGTCAC AAGAGAGAAGTTGTTAACAGCGATGACAAGGAATGTAGCA GCTTATCAATCCTCATTGGACTGAACCATAGATAGATG~TATATCAGATGTTGCA~TGATGTTG~C CAACTGCGTGGTCTGACACAAGAAGACACTGACACTGCCT GCATCACAGACTTTTCCAGTATTATGTCAGGAGTCTGGTGT CATCTGATTGTTTGAGGGCTCTA~GTTTAGTAAAAGT ACCCGTCAGAAACAAAAGGACCAATGCAGTTTTATTTCGTT~ATTTCGTG~CCAGG~CCC~A~~C~G~GTA AACGATCATTTTTTTGCCTCCCTTTTATTTTCTTCCTCTA TAAACAGAn;CAGCTTGCTTCTGGCTCACACA~TGAAA TGGCCCCAGGTGGAAACTCCCAAACTCTCT~GGG~GTTTCC~GCAGCCA~~GCACGGTG~GCC AGGCAGCCACTGTGTCAAAGAGGGACTCTGGCAACCACTA AGAGAGGGCAAAGAGAAATCCTGCCAAAACAGAAGCGTGTG CTTTCTGCAGTTATATCTTCAACTGAGTATAGTGAGTATAGTGTTGATGTCC~GG~GAGGA~CGATGTTGAC~ CAGGCAACACCCTTCAGGCAGTTAAAAAGCTATTCAGACGACCAC~TCTCTGGATACAC~CAG~ TCTGTGGATGGGAGAAGGAGTGGAAGGACAAGCGGAGA GATTTTTCTCCCPTTCAAACCACTTGTGTGATCTTTTCC~TAGGTTC~G~GGTGATGTGAGCAGGAG GGTGAA~TTTCTGGAAACCTGTATGTATGTAGTGTGGG~GGG~GACAGCA~C~CAGATG~C~~G TCTCAGCTGCAGTTTTGGGAAGCTGCTGCAGGTACTTGTCTG~TGCCACAGAGC~TCCCCAGTACGAGG AACCAGTTCTTAAGGAACTTCTCTGGCACAAAC!UAACCcAGGCCGAAGGCTCTGCTCCCTAAAAcTCAA GACAGAGACTCTTCATCACTGTGCCCAGCTGTGGCCTGTC CCTCCTCCATTATGTGTGAATGATTACCPGCCTAAGGAGGAGGC~~GC~CCCT~CAGGTGTCACCAT CCCTCAACATAACTGATGTAGCCTCTTCTACTTCCTCAGTC~T~~TGGAGTGGC~GTGGGATC ATTTATTCATTGGGCCAGTACAGACPGACATCTGAGAAG CAGGCTCCCACTGACCATCAGGAAGTCCTCCTGTCCTC~GTGCTGTCTGGGTGCCACTGCA~GT~CAGGCTACCC AGTGAAGTTGTGGAGGCTCCTCCTCTGAGTTCTTCAAAAG CTGCATTAAAGTGTGTGCTGCTATAACTCAGGAGGAC CTCCAATGAGGTGACTTGTTTGCCTGTACTGCAGGCAAAGC TGCTTTTCCCCCCAGATACACAAAGGGGTGTAATGCCGGCTC TGAAGAAGTGGCCAAAGCTCCCTCAGTCCCAACAGCAG~CGGATATTCCCTCCTGTGTCACTGACAGCACCT GGTGCTGCAGTGGCGCTCACTTCTTTCTCTTACCAGTAAAT TAGTTGCACAGCTTGCTGCCAAGTGTGTAGGGGGGT~CCTGTGACAGTTTT~TGCTGGATATTGCTGGCA CTCAGCAGGCTACAAACAGCCTTTCCAGTCAGCGCACAAG GGATATCAGGCATAAAATGTCACGCTCTCTCTTAGT~~T~CTG~~GC~T~TCT~CT~~C TAGCTCTGTAGTGCTATATTAGGTGCTGTCTTGCCTAACT~C~AGGG~CTGGACAGGAG~GA~CC TTGTGGTTATGAGATGAAACG~GTTAAACCATCTGTCTGTCTGGG~GCCAGCC~GCCACTGATGC~CT CTGACCTTGAGCAATTAATGTAGATGAGGAAACACCAATCACACGCG GTGTTTTGTGCTAAAAGCCATTAGTAACCTAAAATGCC AGGGGAAATAAGGGCTAGAATAAGTCATGAGGAGGAGCAGAGCCTGG~GCCTCCAGGTTCCCGGACCCAT GTGGGAAAGCAGCCTCCTCCATCCCGACCCAAAAAGAGGG CCGCGTTCCTGCAGGCGCCAAGCCCCCCACGCACGGCCGGTCCCCCTG~GTCTGCGAGAGTTCACC~G AGTGCAGCCTGGCTGGGACTGCTCTGAAAGCCCACCGTGCCCACCGTGCGAGGGCTAGGAC~CCAGGCACGACGTTCC CTAGCGTACTCTTTACAGCCGCCACTTTGGGTGTAGTTGTTGC~CTCT~CCGTCCAC~CCAGCCTT
3403 25 3457 43 3511 53 3572 3643 3714
3349 7
-12
3290
3220
2310 2380 2450 2520 2590 2660 2730 2800 2870 2940 3010 3080 3150
1680 1750 1820 1890 1960 2030 2100 2170 2240
490 560 630 700 770 840 910 980 1050 1120 1190 1260 1330 1400 1470 1540 1610
70 140 210 280 350 420
Sequence
of CYS-14
containing
the entire
CsnEW
gene. CYS-14
was sequenced
by the
6844 6881
5779 5850 5921 5992 6063 6134 6205 6276 6347 6418 6489 6560 6631 6702 6773
5708
numbered
with negative
numbers.
The sequence
is deposited
with GenBank,
accession
No. M95725.
lation signals are indicated by stippled boxes. The first aa of the mature protein (Schwabe et al., 1984) is boxed. The aa sequence is numbered according to Schwabe et al. (1984). Signal peptide is
dideoxy chain termination method (Sanger et al., 1977) using Sequenase (USB, Cleveland, OH). The first and last nt of the cDNA sequence are indicated by asterisks. Two SPl-binding sites located approximately 30 nt upstream from the cDNA start site are underlined. The two polyadeny-
Fig.2.
TGTTTATTTATCAGCTAATCTCCTGTGCCCAAAGGATGGGGC AGGTGTAAGTTTGAGAGTTGACTAAAGAGGTGTTTGGGCT TTGGGAAGGTCAACCAGCACTTATGTGAAGTGTGTGTGATCAGTTTCTGCTGG~CT~CTTCCCATCCG~G CTGTATTGCAGAGCTCCATTTGTTTATGCAGCTGTACTGTGT~GGATTCA~A~CTGATGTCTC~G TATTGTATTAGAGTTGTTTGGGAAACACTGACTTCCTCTAGGCTC TTTAGTTCTGTAGTGTTTCAGATGGAAGGTCTGAGTAGT AATAGCATGGCCAACTTCACTTATAACCTCAATTACCTCCGC CTGTGCTTGGTCTACCATAGCTTCCATTTGCAAACAGATA AGCAGCAGGCCTGGCTGGTGGGAGGTGCAACTGTAGTCTGTAGTCTGACTGTATGTT~CTAG~C~~AGTG ATATCTGATCAAGGGTACATGCTATTGAGTCAGCAGGGAG CTGACTAAAGTGTTTCCTTAAATCCATTAGTTAGGGACGT GAGATAACTCTTATGCCGCCGTAGCTTGAGAGATAGGG TGTTGAGTGAATCTCTGACAAGTAAGAGAGACAGT CCTGTCTTTGTTCTGCAAGTGTGGGTGTTCTACAGACACA GGTGGGTCTTTCAGTTGAGCCATTGCTTCTGGTTTCATTGCCTGTGGCATCAGTACTTGAGA~TGTGTG AGTGTCACAGGGAATGACAGCCCTGGATTCAGGATTCAGG~T~TTCTACTTTTTTC~~CC~ATGATATGAT GTCCTGCTATGCAAGCAGAAGAGAG~GCAGG GTCTTGTTTTTCCTTCTGTAAGGAAGGAGCAGGATCC
TGCGCCATGAGTTTACTGATTAAGGAAGCTGTGCTAATTT 3785 GGAATGCTGAGTGTCTCTTCTGTGAAGGTTGCCATACACT 3856 AATGTGGCAGCAAAAGAACCGTCTTTCCTGTAAGATTCTTCTCT~AGAGAGCCC~~GTGTGTG~G~ 3927 L V S G I K Y I L 62 AATAAAGCTGCTACTGCCTGTGCTCTTGTTGCAG CTC GTG TCT GGA ATC AAG TAC ATC CTG 3988 QVEIGRTTCPKSSGDLQS 80 4042 CAG GTT GAG ATT GGT CGG ACA ACT TGC CCC AAG TCA TCA GGT GAT CTC CAG AGC 91 CEFHDEPEMAK TGC GAA TTC CAC GAT GAG CCA GAG ATG GCT AAG GTAGGCTCTGAGCAGGATGTGCTGCT 4101 CCCTGGGTPCCTAGTACCTTCClXTCGCTCTGGGAAGGTA 4172 TTAGTTGCATCATGCCPTTAGACPGGACAAGGTGCTGCCC 4243 CAGAGGGAGGGAGGCCTTCCAGCGTGCTTAAGTGAAGACC 4314 GCCTCAATCTTAGGAGCACATGTGGCPCTGATTCCAGGGAT~GCAGAGG~~~~~GMGAGA~ 4385 GC~AGGGTCTCCTGGCTGTCACTTTGGTGCCTTAAGTT 4456 GTGAACAAGCTCAACTCAGTCCTCTTcTGAGAGGCTCCCATTTcTGTGAATCTCACTTTGGGCTAAGcACl' 4527 GATCACAGCTGAATTACCCTGGTTCTCTAGAGTTGCTATA 4598 TCTAGCTTTTCAGGGTAGAGCACCAAATTATCTGCCTTAC 4669 TCATTACTGGTGTTAACTGGCAATAGGCTGTAGGCTCTG 4740 CTGACACCACACAGAAACAATCTACTACTTGGATAG 4811 TGTAGGGCAGGCAAGCACAAACAAATTGTTG~TAG~GTGTGGGC~GC~GC~GTGCCAT~GAG~ACG 4882 TTGCAGAATTATCCAGGCTGAAAGCTCTACCACACTGGGAC 4953 AATGTTGACTGCTTGGCTTCTGTGTTAGCATTTGGGATAAG 5024 GGTTCTAGAATACACPGCTGGGGAAGTGCAAATCCAGAGT 5095 AATAAAATAGGTCTGTGTAGAGAACAATGATAAGTGAT~GTGACACTGTTGTATGCT~T~TATGAT~TTGA~ 5166 TTCTFVVYSIPW 103 CTTTTTTCTCTCTCCTCAGTAT ACC ACA TGC ACC TTT GTA GTG TAC AGT ATT CCT TGG 5224 L N Q I K L L E S K C Q 115 CTA AAC CAA ATT AAA CTG CTG GAA AGC AAG TGC CAG TAA GCCTCTCTTGGCTCCAGCA 5282 GTGACCAGCAACCAGTTACTTGGAGGAAAAAA GAAGCAATAACATGAATTGAGATGGATTGTATCGCTGCC 5353 TGTTAACTCATACTTCTGTACGCTTGTGCTATGCAAGTAAC 5424 GATCTCGACTTTATTTTTCCTCTTTGTAGTTATATTTTTGGAGTAGCTGTTTGTC~CTCGCAG~TTCCC 5495 POlY ,A', rely w.') ~~A~~GCAACTCCAGTGTCAGAATTGAAGATGAAGTTGGTGTCCCTATG~GTGACTACTGT~~~ ,,,- .,.. , .wsmoxI5566 pm* ~CTCAATTCATCATTGCTGTTGTGTGTGGTTGCCTTTATGAGTGCCTGGTGTC~GGAGGGAGGGT~G 5637
,_, 2
178
subjected to DNA analysis using the PC Gene program. Since chicken Csn is a component of egg white we looked for hormone responsive elements that would subject its expression to hormonal regulation similar to that of other egg white proteins (G’Malley et al., 1969; M&night and Palmiter, 1979). No thyroid, glucocorticoid or estrogenresponsive elements were found in the entire sequence. This supports our previous f&dings that estrogen had little or no effect on CsraEW mRNA levels in the chicken oviduct (Colella et al., 1991). We searched for other known enhancer elements (reviewed in Jones et al., 1988) and found two SPI-binding sites located approximately 100 nt upstream from the AUG start codon. The high G-I-C content (73%) of the 500 nt preceding the AUG start codon and the two SPl-binding sites suggest that CsnEW is a housekeeping gene. No TATA or CAAT boxes are present in the 5’ flanking region of the start codon. The Cs&, CsnD, CsnSA and CsnSN genes have TAT&like sequences upstream from the AUG codon but only CsnSA and CsnSN have a CAAT box (Saitoh et al., 1987; Abrahamson et al., 1990; Freije et al., 1991). The expression of CsnSA, CsnSN and CsnD is restricted to the salivary glands (Shaw et al., 1990; Freije et al., 1991) whereas CsnC is expressed in all tissues studied (Abrahamson et al., 1990). Comparison of the G+ C content of the four genes showed that CsnC had the highest G + C content with a CpG/GpC ratio approximately equal to one which is indicative of HTF islands. HTF islands are DNA sequences abundant in nonmethylated CpG sequences found in most hou~k~ping genes (Bird, 1986). Abrahamson et al. (1990) postulated that the lower G + C content and the lack of HTF islands in the 5’ flanking regions of the genes for CsnSN, CsnSA and CsnD may account for their tissue specificity. The 5 flanking region of the CsnEW gene also has a lower G + C content than that of CsnC and its CpGjGpC ratio is less than 1 indicating the lack of HTF islands. However, CsnEW is expressed in all tissues studied; its mRNA is most abundant in the lung and least abundant in the liver and spleen (Fig. 3). Other unidenti~ed sequences may be responsible for this differential expression of the CsnEW gene and for the tissue-specific expression of the genes for CsnSA, CsnSN and CsnD. We previously isolated two CsnEW cDNAs from a chicken oviduct cDNA library which differed in their 3’ untranslated region due to the di~erential use of two polyadenylation signals (Colella et al., 1989). Two polyadenylation signals were also present downstream from the CsnEW gene. All other characterized Csn genes reported have only one polyadenylation signal in the 3 end. RNase protection assays were done on RNA isolated from chicken brain to determine if both polyadenylatioR sites were utilized. As shown in Figs. 4 and 5, two mRNA
a
b
c
d
e
f
I!
h
ACTiN
CYSTATiN
-+
Fig. 3. Tissue expression of the CsnEW gene. Total RNA (IO pg) isolated from adult female chickens by the CsCl/guanidine isothiocyanate method of Chirgwin et al. (1979) was fractionated on a 1% agarose-2.2 M formaldehyde gel and transferred to Nytran filters (Schieicher and SchueII, Keene, NH, USA)_ The filter was hybridized to 3fP-Iabeled CsnEW and &a&in eDNAs overnight at 42°C in 50% formamide/ x SSPE/l x Denhardt’s solution/l~ pg per ml salmon sperm DNA. After hybridization the 6lter was washed in 2 x SSCjl % SDS four times for 15 min each time at 65°C followed by one wash in 0.1 x SSC at 42°C. (1 x SSC= 15 mM NaCl/l.S mM Na,*citrate pH 7.6; I x SSPE = I5 mM NaCI/I mM NaH,FO,/O.l mM EDTA pH 8). Lanes: (a) Oviduct, (b) lung, (c) muscle, (d) brain, (e) liver, (f) heart, (g} spleen, (b) kidney. The g-actin probe is not specific for &actin and also binds to or-actin in muscle tissue.
species corresponding to each of the polyadenylation signals were synthesized. The band corresponding to the larger RNA transcript is more intense than that utilizing the first polyadenylation signal. This quantitative difference may be due to either a preference for the second polyadenylation signal or to an increased stability of the Ionger mRNA transcript. (c) The ~~n~piption start point {tsp) of C&&W mRNA Determination of the tsp was carried out by primer extension analysis of RNA isolated from a number of chicken tissues. RNA was isolated from the brain, heart and liver of young chicks and from heart, brain, lung and oviduct of mature egg laying hens. Primer extension was carried out using a oligo (l%mer) complementary to nt 158-184 of the CsnEW cDNA. (The complementary strand of the oligo codes for aa 15- 19; Colella et al., 1989). The expected primer extension product as deduced from the CsnEW cDNA should be 185 nt in length. In fact, when RNA synthesized from the CsraEW cDNA (CYS 13, Fig. 4) was used as a control template for the primer extension reaction, a fragment of approximately 200 nt was obtained (Fig. 6). However, the primer extension product obtained from chicken tissue RNA was 40-50 nt smaller than expected. RNase protection analysis of the 5’ end of the CsnEW mRNA from chicken brain was also done. When hybrid-
179
CYSTATIN
(a)
L”p
GENE
RNA PROBES I
I
i
I CYSTATIN
cDNA
ANA8 SYNTHESIZED FROM THE CYSTATIN cDNA
\\
-
*,:
RNA PROTECTED
FRAGMENTS
-k-
Fig. 4. Probes used for RNA mapping. The structure of the CsnEW gene (not drawn to scale) is shown for o~entation. Solid boxes denote exons. (a) Diagram of the RNA probes used to map the 5’ and 3’ ends of CsnEW mRNA. RNA probes were synthesized using the In Vitro Tmn~~ption assay kit of Promega (Madison, WI, USA), [c~-~~P]CTP, and restriction enzymes fragments of the CsnEW gene subcloned into pGEM 3Zf(+f as template. The 5’ end was mapped with antisense RNA synthesized from a 659-bp PstI fragment (designated SP) of the CsnEW gene. The 5’P probe contains sequences identical with the first 285 nt of the 5’ end of the CsnEW cDNA as well as 315 bp upstream from the cDNA start. To map the 3’ end of the C.&W mRNA, a 32P-labeled antisense 375 nt HincII RNA probe was synthesized (designated 3’H). This RNA probe contained most of the 3’ untranslated region inclu~ng the two putative polyadenylation signals plus additional 146 nt downstream from the end of the CsnEW cDNA. The cDNA map is shown with the open box denoting coding sequences for the mature protein and the hatched box denoting coding sequences for the signal peptide. The asterisks denote the positions of the two polyadenylation sites. (b) Diagrams of sense strand RNAs synthesized from the CsnEW cDNA. As a control, three sense strand RNAs (wavy lines) were synthesized by in vitro transcription of the CsnEW cDNA subcloned into pGEM 3Zf(+). These include CYS 13 which contains coding sequences for the signal peptide and the first 82 aa of the mature protein as well as 58 nt of 5’ untranslated region, C2a which contains coding sequences for aa 83 through 116 plus the 3’ untranslated region ending 14 nt downstream from the first polyadenylation site, and ClOb which also contains coding sequences for aa 83-l 16 but the 3’ untranslated region ends 15 nt downstream from the second polyaden~lation site. These are aligned with the CsnEW cDNA in (a). (c) The expected sizes of the protected fragments of the synthesized RNAs after hyb~d~t~on with either SP or 3’H and digestion with RNases.
ized to the RNA synthesized from the 5’ portion of the CsnEW cDNA, the PstI fragment protected a band of approximately 300 bp (Fig. 6). This was expected since the cDNA and the PstI fragment of the CsnEW gene contain 285 identical sequences. However, a major band approximately 40-50 bp smaller than the above band was observed when the PstI fragment was hybridized to RNA from brain tissue. Therefore, both the primer-extended product and the ribonu~lease-protected fra~ent are 40-50 nt shorter at the 5’ end than what is expected from the CsnEW cDNA sequence. This would place the tsp within 10 nt upstream from the AUG start codon for the signal peptide. An AUG codon is present 43 nt downstream from the CCAUGG start codon and in the same reading frame. Translation from this AUG would give a primary translation product with an eight aa extension at the N-terminal. This may have implications as to the cellular localization of the Csn protein since a large portion of the signal peptide would not be synthesized. Localization of Csn in cultured muscle cells with antibody to CsnEW localized it in the cytoplasm where it associated with a filamentous network (Bird et al., 1985). However, in view of the results obtained from the primer extension experiments, it is unlikely that two proteins
with different N-terminals are produced. If there were more than one tsp this should have been evident in the primer extension. Furthe~ore, previous in vitro translation experiments showed a single protein product from oviduct CsnEW mRNA whose size was consistent with the presence of a 23 aa signal sequence (Colella et al., 1989). A number of conjectures can be made regarding the discrepancy between the 5’ end of the CsnEW mRNA and its cDNA. It is possible that the structure of the 5’ end of CsnEW mRNA isolated from the various tissues differs from that synthesized from the CsnEW cDNA and that this difference shields it from the analysis done in this study. However, this does not explain how the longer CsnEW cDNA was generated during synthesis of the chicken oviduct library. The fact that the cDNA and the CsnEW gene have identical 5’ sequences rules out the possibility of a cloning artefact. Housekeeping genes which lack TATA boxes and are composed of G+C-rich elements on the 5’ flanking regions of the gene often initiate transcription at multiple tsp (Jones et al., 1988). Multiple tsp were found for the chicken lysozyme gene (Grez et al., 1981) and the human kininogen gene (Kitamura et al., 1985). The yeast inver-
180
nt
I
A
234567
6
123456
1234567
bp 311249200-
1898872-
151140-118-
562363-
82-
66--
Fig. 5. RNase protection assay of the 3’ end of the CsnEW RNA isolated from chicken brain and RNA synthesized from the CsnEW cDNA (C2a and ClOb in Fig. 4). Probe 3’H (Fig. 4) was hybridized overnight at 42°C to total RNA in 80% formamide/IOO mM Na,citrate/300 mM Naacetate, pH 6.4/l mM EDTA. After hybridization, single-stranded RNA was digested with RNase A/RNase Tl according to the manufacturer’s specification (Ambion, Austin, TX, USA). The protected fragments were separated on an 8% urea/5% polyacrylamide gel. As a control, unlabeled sense strand RNAs were also synthesized from the CsnEW cDNA. Two RNAs were synthesized: one for each cDNA utilizing either of the two polyadenylation signals in the 3’ end (Fig. 4). These unlabeled RNAs were also hybridized to the labeled HincII probe. Lanes: (1 and 2) yeast RNA, (3) antisense RNA synthesized from Cm cDNA clone C2a, (4) sense strand RNA synthesized from Csn cDNA clone C2a, (5) sense strand RNA synthesized from Csn cDNA clone ClOb, (6 and 7) brain RNA. Samples in lanes 2-6 were digested with 2.5 u RNase A/200 u RNase Tl. Sample in lane 7 was digested with 5 u RNase A/200 u RNase Tl. Sample in lane 1 was not digested and shows the 375-bp HincII probe. The numbers on the left refer to RNA markers (Promega, Madison, WI, USA) that were cut off the gel and stained with ethidium bromide.
tase gene encodes two RNAs that differ at their 5’ end (Carlson and Botstein, 1982). The larger invertase RNA contains coding sequences for a signal peptide thereby producing a secreted protein, The smaller RNA begins within the coding region of the signal peptide resulting in the production of the intracellular invertase protein. The choice of the invertase tsp is regulated by glucose. The CsnEW gene may use more than one tsp; whether this affects the synthesis of a signal peptide deserves further study. (d) Conclusions (I) A chicken CsnEW gene of approximately
2.4 kb was isolated. It is composed of three exons and two introns whose boundaries correspond exactly with those of the genes of other members of the Csn superfamily. (2) The high G+C content (73%) and the presence of two SPI-binding sites in the 500 nt of the 5’ flanking region indicate that CsnEW is a housekeeping gene. Northern analysis showed the CsnEW mRNA to be pre-
rL
1234567
Fig. 6. ~te~ination of the tspof CsnEW mRNA. (A) and (Bk Primer extension product of the 5’ end of the CsnEW mRNA from young immatu~ chicken (A) and egg laying hens (B). Total RNA was isolated by acid guanidine thiocyanate-phenol chloroform method of Chomczynski and Sacchi (1987). The RNA (30-40 ug) was hybridized to a 17-mer CsnEW oligo probe labeled at the 5’ end with [y-32P]ATP and polynucleotide kinase using standard procedures. The sequence of the oligo probe is T-CTCGTCGTTCTCATCTA and is complementary to nt 1.58-184 of the CsnEW cDNA. The primer was extended with 1 U AMV reverse transcriptase according to the manufacturer’s specification (Promega, Madison, WI, USA) and the extended product analyzed on an 8% urea/.5% polyacrylamide gel. (A): Primer extended product from: lane 2, kanamycin-positive control RNA (Promega); lane 3, CsnEW RNA synthesized from cDNA clone CYS-13 (Fig. 4); lanes 4-6, RNA isolated from heart, liver and brain, respectively, of young immature chickens. (B): Primer extension product of RNA isolated from egg-laying hen oviduct (lane 2), brain (lane 3), heart (lane 4), lung (lane 5); CsnEW RNA synthesized from cDNA clone CYS-13 (lane 6), and labeled primer incubated without RNA (lane 7). Lane 1 in both figures is that of the +X174 double-stranded DNA HinfI markers. (C) RNaseprotection analysis of CsnEW mRNA. A labeled 659-nt probe (SP, Fig. 4) was hybridized to chicken brain RNA according to the manufacturer’s specifications (Ambion, Austin, TX, USA). After hybridization, all samples except those in lanes (1) and (4) were digested with 2.5 u RNase A/200 u RNase Tl. Lanes: (I) 300 nt control mouse p-actin RNA probe, (2) S-actin probe hybridized to yeast RNA, (3) p-a&n probe hybridized to chicken brain RNA, (4) probe S’P, (5) SP hybridized to yeast RNA, (6) SP hybridized to RNA synthesized from CmEW cDNA, CYS-13, (7) SP hybridized to chicken brain RNA.
181 sent in all tissues examined, being most abundant in lung tissue and least abundant in liver and spleen. (3) RNase-protection and primer-extension analysis of the CsraEW mRNA indicated that there was one tsp which differed from that of the original CsnEW cDNA because the latter was 40-50 nt longer. This may result from multiple tsp. (4) Two polyadenylation signals were identified. Mapping of the 3’ end of CsnEW mRNA from brain tissue indicated that both poiyadenylation signals were used in this tissue.
ACKNOWLEDGEMENTS
The authors thank ity in providing the Hilton for his help Geoghegan, George careful review of the
Dr. Jeffrey Robbins for his generoschicken genomic library, Dr. Fritz with the figures and Drs. Thomas Mower, and John Wible for their manuscript.
BIBLIOGRAPHY Abrahamson, M., Grubb, A., Olafsson, 1. and Lundwatl, A.: Molecular cloning and sequence analysis of cDNA coding for the precursor of the human cysteine proteinase inhibitor cystatin C. FEBS Lett. 216 (1987) 229-233. Abrahamson, M., Olafsson, I., Palsdottir, A., Ulvsback, M., Lundwall, A., Jensson, 0. and Grubb, A.: Structure and expression of the human cystatin C gene. Biochem. J. 268 (1990) 287-294. Anastasi, A., Brown, M.A., Kembhavi, A.A., Nicklin, M.J.H., Sayer, CA., Sunter, D.C. and Barrett, A.J.: Cystatin, a protein inhibitor of cysteine proteinases. Biochem. J. 211 (1983) 129-138. Barrett, A.J., Fritz, H., Grubb, A., Isemura, S., Jarvinen, M., Katunuma, N., Machleidt, W., Muller- Esterl, W., Sasaki, M. and Turk, V.: Nomenclature and classification of the proteins homologous with the cysteine proteinase inhibitor chicken cystatin. Biochem. J. 236 (1986) 312. Bird, A.P.: CpG-rich islands and the function of DNA methylation. Nature 321 (1986) 209-213. Bird, J.W.C., Wood, L., Sohar, I., Fekete, E., Cotella, R., Yorke, G., Consentino, B. and Roisen, F.J.: Localization of cysteine proteinase and an endogenous cysteine proteinase inhibitor in cultured muscle cells. Biochem. Sot. Trans. 13 (1985) lOi8-1021. Carlson, M. and Botstein, D.: Two differentially regulated mRNAs with different 5’ ends encode secreted and intracellular forms of yeast invertase. Cell 28 (1982) 145-154. Chirgwin, J.M., Przybyla, E.A., MacDonald, R.J. and Rutter, W.J.: Isolation of biologically active ribonucleic acid from sources enriched in ribonuclease. Biochemistry 18 (1979) 5294-5299. Chomczynski, P. and Sacchi, N.: Single step method of RNA isolation
by acid guanidine thiocyanate-phenol chloroform extraction. Anal. Biochem. 162 (1987) 156-159. Colella, R., Sakaguchi, Y., Nagase, H. and Bird, J.W.C.: Chicken egg white cystatin: molecular cloning, nucleotide sequence, and tissue distribution. J. Biol. Chem. 264 (1989) 17 164-17 169. Colella, R., Johnson, A. and Bird, J.W.C.: Steady-state cystatin mRNA levels in chicken tissues in response to estrogen. Biomed. Biochem. Acta 50 (1991) 607-611. Elzanowski, A., Barker, W.C., Hunt, L.T. and Seibel-Ross, E.: Cystatin domains in a-2-HS- glycoprotein and fetuin. FEBS Lett. 227 (1988) 167-170. Freije, J.P., Abrahamson, M., Olafsson, I., Velasco, G., Grubb, A. and Lopez-Otin, C.: Structure and expression of the gene encoding cystatin D, a novel human cysteine proteinase inhibitor. J. Bioi. Chem. 266 (1991) 20538-20543s Grez, M., Land, H., Giesecke, K. and Schutz, G.: Multiple mRNAs are generated from the chicken lysozyme gene. Cell 25 (1981) 743-752. Jones, NC., Rigby, P.W.J. and Ziff, E.B.: Trans-acting protein factors and the regulation of eukaryotic trans~iption: lessons from studies on DNA tumor viruses. Genes Develop. 2 (1988) 267-281. Kitamura, N., Kitagawa, H., Fukushima, D., Takagaki, Y., Miyata, T. and Nakaniski, S.: Structural organization of the human kininogen gene and a model for its evolution. J. Biol. Chem. 260 (1985) 8610-8617. Koide, T.: Human histidine-~ch glycoprotein gene: evidence for evolutionary relatedness to cystatin supergene family. Thromb. Res. SVIII (1988) 91-97. McKnight, G.S. and Palmiter, R.D.: Transcriptional regulation of the ovalbumin and conalbumin genes by steroid hormones in the chick oviduct. J. Biol. Chem. 254 (1979) 9050-9058. Mount, S.: A catalogue of splice junction sequences. Nucleic Acid Res. IO (1982) 459- 472. O’Malley, B.W., McGuire, W.L., Kohler, P.O. and Korenman, S.G.: Studies on the mechanism of steroid hormone regulation of synthesis of specific proteins. Recent Prog. Horrn. Res. 25 (1969) 105-160. Rawlings, N.D. and Barrett, A.J.: Evolution of proteins of the cystatin superfamily. J. Mol. Evol. 30 (1990) 60-71. Saitoh, E., Kim, H.S., Smithies, 0. and Maeda, N.: Human cysteine proteinase inhibitors: nucleotide sequence analysis of three members of the cystatin gene family. Gene 61 (1987) 329-338. Salvesen, G., Parkes, C.. Abrahamson, M., Grubb, A. and Barrett, A.J.: Human low M, kininogen contains three copies of a cystatin sequence that are divergent in structure and in inhibitory activity for cysteine proteinases. Biochem. J. 234 (1986) 429-434. Sambrook, J., Fritsch, E.F. and Maniatis, T.: Molecular Cloning. A Laboratory Manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1989. Sanger, F., Nicklen, S. and Coulson, A.R.: DNA sequencing with chain terminating inhibitors. Proc. Natl. Acad. Sci. USA 74 (1977) 5463-5467. Schwabe, C., Anastasi, A., Crow, H., McDon~d, J.K. and Barrett, A.J.: Cystatin: amino acid sequence and possible secondary structure. Biochem. J. 217 (1984) 813-817. Sen, L.C. and Whitaker, J.R.: Some properties of a ficin-papain inhibitor from avian egg white. Arch, Biochem. Biophys. 158 (1973) 623-632. Shaw, P.A., Bouka, T., Woodin, A., Schacter, B.S. and Cox, J.L.: Expression and induction by 8- adrenergic agonists of the cystatin S gene in submandibular glands of developing rats. Biochem. J. 265 (1990) 115-120.