Differentiation
Differentiation (1986) 33:61-68
!C Springer-Verlag 1986
Comparison of mouse and human keratin IS: A component of intermediate fdaments expressed prior to implantation Robert G. Oshima*, J d Luis Millan, and Grace Ceceiia Cancer Research Center, La Jolla Cancer Research Foundation, 10901 North Torrey Pines Road, La Jolla, CA 92037, USA
Abstract. Keratin 18 is a type-I keratin that is found in a variety of simple epithelial tissues. In mice, the corresponding protein, called Endo B, is expressed at the 4-to 8 4 1 stage of mouse development and may be one of the first intermediate-filament proteins synthesized after fertilization. A cDNA clone for keratin 18, designated pK18, was isolated from a human placental cDNA library by hybridization with the mouse Endo-B probe. It was characterized by hybridization selection of RNA, translation, immunoprecipitation, Northern blotting, and sequence analysis. Synthetic T7 polymerase transcripts of the cDNA were indistinguishable in size from keratin-18 mRNA, suggesting that pK18 represents a full-length copy of the RNA. The cDNA insert is 1,428 nucleotides long and contains a single open reading frame of 1,342 nucleotides coding for 429amino acids. The deduced amino acid sequence is 89.7% identical with that of Endo B. The only extensive difference between the two sequences is due to 9 additional amino acids being present in the last half of the N-terminal domain of keratin 18. The 38-nucleotide-long 3’ noncoding region of the cDNA is 75% identical with the corresponding portion of Endo B. The 5’ noncoding regions are 59% identical. The expression of keratin-18 mRNA was found to vary more than tenfold when HeLa cells and BeWo trophoblastic cells were compared.
Introduction Intermediate-filament proteins are encoded by a large family of genes that is further subdivided into at least three smaller families of related sequences [19, 47, 541. The first two groups are the t y p e d and -11 keratins. Other intermediate-filament proteins (vimentin, desmin, glial fibrillar acidic protein, and three neurofilament proteins) have been considered to comprise the third group [47, 541, although recent information concerning the divergent gene structure of neurofilaments may necessitate further diversification of this classification [31]. Approximately 20 different type-I and -11 keratins have been identified, sometimes in complex combinations, in various human tissues [37, 491. At least one of each of the two types of keratins appears to be necessary for filament formation [18, 481. Keratin 18 belongs to the type-I class of intermediate filaments. It is expressed in simple epithelial tissues that are usually, but To whom offprint requests should be sent
not exclusively, of endodermal origin [37]. The persistent expression of tissue-specific intermediate-filament proteins may be useful for the pathological identification of human tumors and cell types [5, 55, 581. In cells of liver origin, keratin 18 appears to be the only type-I keratin expressed in both humans and mice [16, 171. This has allowed the identification of a murine extraembryonic endodermal cytoskeletal protein B (Endo B (38, 391) or cytokeratin D [16, 171 as the murine equivalent of keratin 18 [52]. Endo B and extraembryonic cytoskeletal protein A (Endo A [39]), the type-I1 keratin with which Endo B polymerizes, appear to be the first keratins that are synthesized during murine development. Endo B and Endo A are synthesized as early as the 4- to 8cell stage and are later located in the trophoblast, extraembryonic endoderm, and a variety of simple epithelial cell types [7, 15, 24, 27, 401. Human trophoblast cells express keratin 18 [12, 231; however, human embryonalcarcinoma cells, unlike murine embryonalcarcinoma cells and the inner cell masses of mouse embryos, also express keratin-related proteins [l 11. Human embryonal-carcinoma cells appear to represent an embryonic cell type which is equivalent to or even of a developmentally earlier stage than murine embryonal-carcinoma cells [2, 41. Evidence available to date suggests that the differentiation in vitro of both murine and human embryonal-carcinoma cells faithfully mimics early developmental cellular transitions [3, 221. As first steps in comparing the developmental regulatory mechanisms controlling the expression of keratin 18 and Endo B, we have cloned and characterized a cDNA coding for Endo B [a], and we now report the characterization of a full-length cDNA for keratin 18. Recently, the sequence of a cDNA coding for approximately one-half of the keratin-18 mRNA has been reported [44]. Our results basically confirm that sequence and, in addition, provide the complete N-terminal and 5’ noncoding sequences. Methods The human hepatoma cell line, HEP G2/93, was provided by Barbara Knowles [l]. The characteristics of the human trophoblast cell line, BeWo, have been described elsewhere [42]. The construction of the cDNA library, prepared in Agtl, from human placental RNA, has been described previously [35]. Approximately 125,000 plaques were screened by hybridization with an RNA probe corresponding to a 688 base-pair (bp) internal fragment of mouse Endo-B
62
cDNA [46]. The internal fragment was excised from the Endo-B cDNA clone by digestion with EcoRI and Hind 111, and was subcloned into the SP-64 plasmid (Promega Biotec, Madison, Wis, USA). The probe was prepared by transcription of the Eco-RI-digested plasmid DNA with SP6 polymerase as described by Melton et al. [34]. After hybridization, the filters were washed in 15 m M NaCl, 1.5 m M sodium citrate (pH 7.0), and 0.1% sodium dodecyl sulfate (SDS) at 55" C. One isolate, which, on subsequent Southern analysis, hybridized with probes specific for both the 5' and 3' portions of mouse EndoB cDNA, was analyzed further. The human cDNA insert was excised from the phage DNA by digestion with EcoRI and was subcloned into the pGEM-1 vector (Promega Biotec), which contains SP6 and T7 promoters flanking a polylinker derived from pUCl2. An orientation which produced sense mRNA from the T7 promoter was designated pKl8. The keratin-18 cDNA was subcloned into the EcoRI site of M13 mp18 in both orientations, and ordered deletions were created according to the method of Henikoff [21] after digestion with Sph I, Xba I, and exonuclease 111. The DNA sequence was determined according to the method of Sanger et al. [45] using 3sS-dATP [6] and 0.4- to 0.8-mm wedge gels. DNA sequence data was managed using the GEL program of the BIONET computer system (IntelliGenetics, Mountain View, Calif. USA). The final sequence was compiled from a total of 70 separate sequencing reactions, representing 25 isolates for one strand and 24 for the opposite strand. Complete overlap was achieved in both directions. Alignment of the Endo-B and keratin-18 DNA sequences was performed using the ALIGN operation of the IFIND program with a word size of 2 and a gap penalty of 6. This program is based on the method of Wilbur and Lipman [56]. The relatively high gap penalty was used so that the DNA alignments were consistent with the obvious protein homology. Protein alignments were performed manually according to the previous published alignments of Endo B with other intermediate-filament proteins [46] and were confirmed using the ALIGN program. Synthetic keratin-18 and Endo-B mRNAs were prepared by T7 or SP6 polymerase transcription of 5 pg linear plasmid DNA in a reaction containing 0.5 m M m'GpppG (Pharmacia, Piscataway, NJ, USA) and 0.1 pCi "P-UTP, as described in the Promega Biotec catalog. In order to obtain full-length transcripts of the Endo-B cDNA, the cDNA insert of pUC9B7 [46] was excised by digestion with Bam HI and partial digestion with Hind 111. The fragment was then inserted into the pGEM-1 vector. Hybridization selection was performed as described previously [46], except that the filters were washed finally four times in 1 0 m M Tris-HC1 and 1 mM ethylenediaminetetraacetate (EDTA), pH 7.5, at 55" C. The methods used for RNA preparation, translation, and immunoprecipitation have been described elsewhere [46, 50, 521. RNAs were resolved by agarose gel electrophoresis after denaturation with either glyoxal [8] or formaldehyde [33], transferred to nitrocellulose [51], and probed with nick-translated pK18 plasmid DNA [33]. Immunoprecipitated keratin 18 and Endo B were reacted with N-chlorosuccinimide as previously described [52], except that the concentration of the reagent was increased to as much as 50 m M and the temperature of reaction was increased to 37" C in an attempt to achieve more complete cleavage.
pk 18 Hybrid-Selected
RNA
I BeWo
Placenta
n 1-1
t
TOTAL RNA
& oe8 ee'6
'
1 2 3 4 5 6 7 8 91011121314
-
18
Fig. 1. Identification of pK18 by hybridization selection, translation, and immunoprecipitation. Nitrocellulose filters loaded with 15 pg denatured pK18 or pGEM-1 plasmid DNA were hybridized with either 55 pg/ml poly-At RNA isolated from the BeWo trophoblast cell line or 365 pg/ml total human placental RNA. The retained RNA was eluted, translated in the presence of "S-methionine, and subjected to immunoprecipitation with Endo-B antiserum. 1, HeLa cytoskeleton; 2, BeWo cytoskeleton; 3,4, translation products of BeWo RNA retained by *EM-1 and pK18 DNA, respectively; 5, 6,immunoprecipitates of the samples analyzed in lanes 3 and 4, respectively; 7, 8, translation products of placental RNA retained by @EM-1 and pK18 DNA, respectively; 9,10, irnmunoprecipitates of the samples analyzed in lanes 7 and 8, respectively; 11, 12, translation products of 1.8 pg BeWo poly-A+ RNA and 6.6 pg total placental RNA, respectively; 13,14, immunoprecipitates of the samples analyzed in lanes 11 and 12, respectively. All samples were run on the same gel; however, differences in exposure time were necessary for different lanes. The exposure times were as follows: lanesl-4, 11. 12, 18 h; lanes5-8, 13, 10 days; lanes 9, 10, 14, 30 days. The arrow on the right indicates the position of keratin 18
Results A candidate cDNA clone coding for keratin 18 was identified by screening a human placental cDNA library prepared in lgt,, with a mouse Endo-B probe. Figure 1 shows evidence for the identity of this clone (designated pK18 after subcloning) obtained by hybridization selection, followed
63
a
T7 p K 1 8 RNA
b
ao
Q
we
8 0. E0. E0 1 2 3
-2.2 -2.0 -1.5
-0.6 Fig. 2 4 b. Keratin-18 RNA analysis. a RNAs synthesized from the linear pK18 plasmid by T7 polymerase or isolated from cells were resolved in a 1 % agarose gel containing formaldehyde, blotted onto nitrocellulose, and hybridized with nick-translated pK18 DNA. h e s 1-4 contained 2 pg tRNA and the indicated amounts of synthetic pK18 RNA: 1,90 ng; 2,9 ng; 3,0.9 ng; 4 , O . W ng. Lane 5 , 2 pg poly-A+ RNA from mouse F9 cells that had been induced for 4 days with retinoic acid. Lane 6 , 2 pg p l y - A + RNA from BeWo cells. The positions of the migration of denatured, Hind-I11 digested, &DNA are shown on the right. b Twenty-microgram aliquots of total RNA isolated from the following: 1, BeWo cells; 2, the human hepatoma cell line, Hep G2/93; 3, HeLa cells denatured with glyoxal, resolved in an agarose gel, blotted, and probed with nick-translated pK18
by translation and immunoprecipitation. The hybridization of RNA from either the BeWo trophoblast cell line or human placenta with immobilized pK18 DNA resulted in the retention of RNAs that were translated into single proteins. These proteins co-migrate with a major component of the cytoskeleton of BeWo cells and HeLa-cell cytoskeletons (Fig. 1, lanes 4 and 8). Both proteins were recognized by antiserum to mouse Endo B, which has previously been shown to immunoprecipitate keratin 18 (Fig. 1, lanes 6 and 10 [38, 39, 521). The translation products of the pK18hybrid-selected RNAs were indistinguishable from keratin 18 precipitated directly from the translation products of nonselected RNAs (Fig. 1, lanes 13 and 14). The less intense signal obtained from placental RNA was due to the smaller amount of available mRNA. The polyadenylated fraction of human placental RNA represents approximately 1% of the total RNA. The signal seen in lane 14 of Fig. 1 represents the keratin-18 product of less than 4% of the poly A' RNA used in lane 13 but exposed for three times longer. This should also be borne in mind when considering the results shown in lanes 7-10 of Fig. 1. Preliminary Southem-blot analysis of pK18 indicated that the cDNA insert hybridized to probes specific to both the 5' and 3' ends of mouse Endo B (data not shown). In order to determine the size of the cDNA insert and
the relative abundance of the keratin-18 mRNA in several cell types, RNA blot analysis was performed on cellular RNAs and synthetic mRNA prepared by T7 polymerase transcription of the pK18 plasmid. T7 polymerase transcription of the RK18 insert (Fig. 2a, lanes 1-4) resulted in RNAs whose size was indistinguishable from that of the keratin-18 mRNA found in BeWo cells (Fig. 2a, lane 6; Fig. 2b, lane 1). The size of the keratin-18 mRNA was estimated to be 1.5 kilobase pairs (kb). These results indicate that the pK18 insert is a full-length copy of the keratin-I8 mRNA. In addition, the varying amounts of T7 polymerase transcripts allowed the abundance of the keratin-18 mRNA in BeWo cells to be estimated as representing approximately 1% of the polyadenylated RNA. mRNA of the same size was detected in human hepatoma cells and HeLa cells (Fig. 2b, lanes 2 and 3). However, the abundance of keratin-18 mRNA is much lower in HeLa cells, with intermediate levels being found in the hepatoma cells. The expression of keratin-18 mRNA was found to vary by more than tenfold in different cell lines. The DNA sequence of the keratin-I8 cDNA is shown in Fig. 3. The sequence is 1,428 nucleotides in length, including a poly-A tail of 16 residues. A single open reading frame begins with the ATG at nucleotide 53 and ends with the termination codon TAA at nucleotide 1,342. This choice of reading frame results in a 5' noncoding region of 54 nucleotides and 3' noncoding region of 38 nucleotides. The two alternative reading frames contain 15 and 17 stop codons. The putative ATG translational start is in a favorable context (AGCATGA) for a translational start [29]. The consensus polyadenylation signal, AATAAA [28], is found at nucleotide 1,384 embedded in the conserved sequence, CCAATAAAAGTT, and 24 nucleotides before the start of the short poly-A tail. The keratin-18 sequence contains 58% G + C . The G + C-rich regions are concentrated in the first 300 nucleotides, with an average of 68% G + C. The coding region of pK18 is 85.3% identical with the Endo-B coding sequence. Alignments of the 5' and 3' noncoding regions of keratin 18 and Endo B are shown in Fig. 4. The 5' noncoding leader is 59% identical to murine Endo B over matched nucleotides and without penalties for two gaps necessary to align the sequences. The possible conservation of a short region enriched in C and T near nucleotide 40 may be of significance, because a similar region is found in the 5' leader of Endo-A mRNA [53]. The 3'noncoding region is conserved to a much greater extent (75%), including sequences immediately downstream of the putative translational stop codon and the putative polyadenylation signal. The deduced amino acid sequence of the open reading frame of pKl8 results in a protein of 429 residues with a calculated molecular mass of 47,873 daltons, while that of mouse Endo B is a protein of 422 residues with a molecular mass of 47,400daltons [46]. The deduced amino acid sequences of keratin 18 and Endo B are compared in Fig. 5. The sequences are divided into the general domain structure of all intermediate-filament proteins [19,20, 541, with the generally nonhelical head and tail domains flanking the conserved central, predominantly a-helical, rod domain. The heptad repeat, which is characteristic of the coiled-coil structure found in intermediate filaments, is conserved in keratin 18. The central helical domain is interrupted by two small spacer or linker regions. The two sequences were easily aligned, with only two gaps that were located near the
64 1
CGGGGTCGTC CGCAMacCT MofCCTGTC CTTTCTCTCT C C W f f i CATG
55
AGC
115
CCC hoc TAC
TTC ACC ACT Coc Icc ACC TCC ACC M C TAC CGG Icc CTQ QQC TCT GTC CAG GCG Ser Phe Thr Thr Arg Ser Thr Phe Ser Thr Asn Tyr Arg Ser Leu 0 4 Ser Val Qln Ala
Pro &r 175
QQC GCC CGG CW 01c MC hoc Qa3 Mx: AGC G”C TAT GCA WC OCT (wo Qoc Tyr 011 Ala Are Pro Val Ber Ser Ala N a &r Val h r Ala Qly Ala Qly Qly
TCC Coo ATC TtX QTQ TCC Cac TtX ACC Mc TTC AW WC QQC ATG Oao TCC W M 011 Ser Arg 110 6ar Val Ber Arg Ser Thr Ber phs Arg Qly 011 Ibt Qly 8er 811
YCT 001
&r 235
GGC CTQ Qcc ACC QGQ ATA QCC GfN QQT CTQ GCA QOA A M W WC ATC CM M C GAQ Qly Iw Ala Thr 011 Ila Ala Q1y Qly Leu Ala Qly Ibt Qly Qly Ila Qln AM Qlu 4 s
295
MQ
355
OM1 4CC QAQ M C CGG MQ Cm Mo hoc AM ATC CQQ Mo CAC TTQ GAG MG AM ((M CCC Olu Thr Qlu Asn Arg Arg b u Qlu Ser 4 s Ila Arg Qlu E i a Leu Qlu 4 6 4 s 01y Pro
415
CAG GK: AM M C rW MC CAT TAC TTC MG ATC ATC QAQ QAC CTQ AW QCT CM ATC TTC Qln Val Arg Asp T r p 6ar E i s T y r Phe 4 s Ila 110 Qlu Asp Leu Arg Ala Qln 110 Phe
475
GCA M T ACT QTQ M C M T OCC Cac ATC GTT CTQ CM ATT M C M T acC CGT CTT QCT OCT Ala AM Thr Val Asp Ann Ala Arg Ile Val Leu Gln 11s Amp AM Ala Are lmu Alr Ala
535
M T M C TTT dab QTC MG TAT CMQ ACA CMQ CTG O C C A M Coc CAG TCT GTQ QAQ M C QAC Asp Asp pbe Ars Val L y s Tyr Qlu T h r Qlu Leu Ala Ibt Arg Qln Ser Val Qlu dan Aap
595
ATC CAT QGQ CIC WC Mo QTC ATT M T M C ACC AAT ATC ACA CGA CTQ CAG CTQ Mo ACA 110 Mia Qly Leu Ars 4 s Val 11s Asp Asp Thr Aan 11s Tbr Arg Leu Qln lmu Qlu Thr
855
QM ATC QM
ACC A m CM AOC CM M C M C Cac CM OCC TCT TAC CTQ QAC A M Oto AM AGC CM Qlu Ihr m t Qln 6rr b u Aan Asp Arg Leu Ala Ser Tyr Leu Asp Aru Val Aru Ser b u
OCT Crc
MQ QAQ
(PM
CM CTC TTC ATQ
MQMG M C CAC
aM
QM (UA QTA
Qlu 110 Qlu Ala Leu 4 s Qlu Qlu Lar Lw Phe Ibt Lys 4 s Asn His Qlu Qlu Qlu Val 715
AM Oac CTA CAA acC CM ATT Dcc AGC TCT GlM ACC GTQ GAG QTA GAT QCC CCC AM 4 s Qlr Leu Qln Ala Qln Ila Ala Sar Ber Qly Lau Ihr Val Qlu Val Asp Ala Pm 4 s
775
TCT CM M C CTC acC MG ATC ATQ GCA GAC ATC WO Qcc CM TAT QAC Mo CTQ QCT WO 6ar QlnAsp Lw Ala 4 s Ila Ibt Ala Asp Ila Arg A h Gln h r Anp Glu Iru Ala Arg
835
MG M C CGA QM QAQ CTA QAC AM3 T C MQ TCT CM CAQ ATT Mo GAG MC ACC ACA GTG 4 s AM Arg Qlu Qlu Lou Asp 4 s Tyr T r p Ser Qln Qln 110 Olu Qlu 8er Thr Thr Val
896
QTC ACC ACA
955
ACA Otc CM TCC TTQ W ATC M C CTQ QAC TCC A M A M M T CTG AM OCC hoc tfG GAG Thr Val Qln &r Leu Qlu Ila Asp Leu Aap Ser Ibt Arg Ann b u 4 s Ala &r Leu Olu
1015
M C hoc CM dao Mo QTQ
1075
CTQ CTQ CAC
1135
QAQ TAT
1195
WCCTQCTGGMGATWCCuaGACTTTMTCTTGQTGATGCCTTGM C A G C A G C M C T C C Arg Leu Leu Qlu Asp Glr Qlu Asp Phe Asn Leu Qly Asp Ala Leu Asp Ser 6ar Asn Ser
1255
ATG C M ACC ATC
1315
CM TCT GCT (UB GTT QQA QCT QCT QAG ACG ACG CTC ACA Mo CTQ AGA CGT Val Thr Thr Qln 8.r Ala Olu Val Qly Ala Ala Qlu Thr Thr Leu Thr Qlu Leu Ars Are
Cua OCC Cac TAC Qcc CTA CAG A M QAG CAQ CTC M C WQ ATC Ann Ber Leu ArU Qlu Val Qlu Ala Arg h r Ala Leu Gln k t Glu Qln Leu Aan Qly Ile CTT GAG rcb QM CTQ GCA CAG ACC CGQ GCA GAG aQIL CM CGC CM OCC CAG b u Leu Eis Leu Qlu 8er Qlu Lw Ala Qln Thr Arg Ala Glu Qly QlnArg Qln Ala Qln
GAG OCc CTG ClU M C ATC Mo GTC M G Cm QAQ GCT Mo ATC QCC ACC TAC CGC Qlu b r Qlu Ala Leu Leu Asn Ile 4 s Val Lys Leu Qlu Ala Glu 110 Ala Ru Tyr Arg
CM
MQ ACC ACC ACC
Ibt Qln Thr 11- Qln
T h r Ihr Ihr
ACC M T M C ACC
M A GTT cr(i AM CAT T M
Thr Aan Asp Thr 4 s Val Leu Ara Eis 1375
Coc WO ATA GTG GAT WC AM G M O M TCT GAG Asp 611 4 s Val Val &r Glu
Are Are Ile Val
*
OCCAQCMUA GCAaaQIACC G-
CAQQAGGCCA ATAMAAGTT CAUGTTCAT TQGATGTCM M A M M M A
Fig. 3. DNA sequence of the keratin18 cDNA and the predicted amino acid sequence. The DNA sequence starts with the first base after the EcoRI linker (GAATTCGGG) added to the double-stranded cDNA, and ends with the last base before the second linker. The deduced amino acid sequence is shown beneath the DNA sequence.The DNA sequence is 1,428 nucleotides coding for 429 amino acids
Fig. 4. Comparison of the 5’ and 3’ noncoding regions of keratin-I8 and Endo B mRNAs. The 5’ noncoding sequences start with the first nucleotides 10 20 30 40 50 KER 18 CffiGGTCGTCCOCAMGCCTGLKiTC---CTOTC---CTGTCCTTTCTCTCTCC---CCGG~~ATG after the linker or poly-C tracts used in ** ** **** *I ** ,*** ****** ** **** *** cloning, and end with the putative END0 B TCCOCGGCGOC\CICTCCTGTTCTGGTCTCTCTCGCTTCGCTCTCCTCTCCC\~~QATG translational start codon. The 10 20 30 40 50 3’ noncoding regions s t a r t with the putative translational stop d o n s , and end with the first nucleotide before the 1350 1s60 1370 1300 1390 1400 1410 poly-A tails. Matched nucleotides are KER 1E T C \ A G C C A G C A G ~ A G C f f i G G T ~ C C T T T G ~ ~ A ~ C ~ T ~ T T C ~ T T C A T T G G A l 6 l C indicated with asterisks. The putative ** I** *I)*** ***I **** ** *** ********* *** **** *** *** polyadenylation signal of keratin 18 ENW B TGffiGCC\G-ffiC\CIGGffiGGlWCCCCTGGGC\ACTGffiGGA-CCAATA~-GTTl3AGAGCTC~T~ starts at nucleotide 1,384 1334 1344 1354 1364 1374 1384
.
N-terminus and in the first spacer domain of keratin 18. Two gaps in the Endo-B sequence were necessary to accomodate optimally the nine additional amino acids found in the latter half of the head domain of keratin 18. These nine amino acids are the most extensive difference between
the two sequences. The head domain is the least conserved in the two sequences but is still 83% identical. Over the entire molecule, keratin 18 and Endo B are 89.9% identical and 92.3% homologous. The tail domain is remarkably conserved (97% ; 38 out of 39 residues). Analysis of aligned
65 hmad
............
............................................
1 S F T T R S T - F S T M Y R S L G S U O R P S Y 6 R R P U S S ~ S U Y R C R 6 6 S C S R I S U S ~ T ~ R ~ M 6 S 6 ~ R T S X R 6 6 L R 6 MK18 661 1 SFTTRSTTFSTNYllSL65URTPSORRPRSSRRSVYR6R6656SS6SRISU5RSU----YC6SU6SR-----6LRC~61
...................
tail 391 LSDALD55NSMPTIPKTTTRRIUD6KUUSETNDTKULRH
B
K1B
384 L M D ~ L D S S N S M P T U O K T T T ~ I U D C R U U S E T ~ T ~ L B~
Fig. 5. Comparison of the deduced protein sequence of keratin 18 and mouse Endo B. Protein sequences are displayed in one-letter code, with the keratin-18 sequence above the corresponding portion of the Endo-B sequence. Gaps introduced to optimize the alignment are designated with dashes. Identical residues are designated with asterisks, and conserved residues are indicated by plus signs. The heptad repeat is shown by the placement of dots immediately above the keratin-I8 sequence. Exclumation marks designate reversals or rotations of the suggested coiled structure. The head, coil, spacer, and tail domains of the proteins are indicated above the two sequences. Residues conserved in all intermediate-filament protein sequences analyzed so far are designated with dashes and vertical arrowheadr below the aligned sequences. Vertical arrowheads designate those charged residues conserved in all intermediate-filament proteins that are found at positions expected for nonpolar amino acids
representatives of each of the intermediate-filament proteins [46] resulted in the identification of 47 conserved residues. Additional analysis of two Xenopus type-I keratins [25, 571 resulted in a revised number of 45. Of the 45 conserved residues found in all intermediate filaments, 44 are conserved in keratin 18. In addition, four charged residues are conserved in keratin 18 at positions in the heptad repeat normally occupied by nonpolar residues [46]. The nucleotide and deduced amino acid sequence is in general agreement with the recently reported 3' half of a keratin-18 cDNA derived from a human bladder carcinoma cell line [44]. However, 12 nucleotides are different in the corresponding regions of the two sequences. Of these different nucleotides, 7 result in 4 amino acid differences. Nucleotides 655 and 787 result in glutamic acid and alanine residues instead of glutamine and serine, respectively. Nucleotides 976-978 and 985 and 986 result in aspartic acid and serine residues instead of two arginine residues. All of the charged residues proposed for the positions in question are conserved in the mouse Endo-B sequence. In addition, comparison of our primary data with additional data provided by W.W. Franke and V. Romano (personal communication) have confirmed these assignments. Five other single nucleotide differences are either silent or are located in the 3' noncoding region; these are located at nucleotides 771, 1,041, 1,356, 1,357, and 1,665. Finally, the sequence we determined contains 9 additional nucleotides at the extreme 3' end, i.e., immediately preceding the putative p l y - A tail. The predicted molecular mass of keratin 18 is 473 daltons (7 amino acids) greater than that of Endo B. However, on SDS-polyacrylamide gel electrophoresis, keratin 18 migrates faster than Endo B [38, 391. In order to test the accuracy of the sequence data and to confirm the apparently anomalous relative migration behavior of the two pro-
teins, the keratin-18 and Endo-B cDNAs were transcribed into capped mRNAs, and the proteins coded for by the synthetic mRNAs were compared before and after cleavage with N-chlorosuccinimide. N-chlorosuccinimide cleaves proteins specifically at tryptophan residues [32]. The deduced protein sequence of Endo B predicted a total of three tryptophans, one of which is found in the latter half of the head domain within the region that differs from keratin 18. Only two tryptophans were predicted for keratin 18. Figure 6a shows a comparison of the synthetic Endo-B and keratin-18 mRNAs. The T7 polymerase transcript of pK18 was expected to be 1,500 nucleotides in length. The SP6 polymerase transcript of EndoB was expected to be 1,492 nucleotides in length. Agarose gel electrophoresis revealed that the respective polymerase transcripts were mainly of full length and indistinguishable in size. However, translation of the two mRNAs into protein resulted in major products that differed with respect to their relative mobilities (Fig. 6b, lanes 2-6). Additional discrete smaller products may have been the result of partial proteolysis or premature termination. Reaction of the immunoprecipitated proteins with N-chlorosuccinimide resulted in the expected cleavage products of both proteins (Fig. 6c). Peptide 7 of Endo B (Fig. 6 b, lane 1) was obtained at a very low yield, as previously reported [&I. The corresponding peptide of keratin 18 (Fig. 6b, lane 2; peptide 5 ) had a slightly greater but still low yield. We found no evidence for a tryptophan in the head domain of keratin 18. The tryptophan in the head region of Endo B is responsible for peptide 9 in lane 1 of Fig. 6 b and an additional peptide which did not contain any methionine residues and was thus not detected. These results confirm the placement of tryptophan residues in the deduced amino acid sequence of keratin 18. The anomalous faster migration of keratin 18 as compared
66
KERATIN 18
C
I
BKKBKB 1 2 3 4 5 6
NI
,
2
-3
4
6 t 13.3 (3)
5 16.8 (3)
t
A
IC
4 17.9 (3)
ENDO B 1234-
a 1 2 3
-43
-4.4 -2.2 -2.0
-0.66
-0.12
5.5(0) 7.2(2)
18.0(3)
16.6 (3)
5ENDO B
KERATIN 18
6-
7-
8-
-17.2
- 12.3 -7.6
1 No.
Predicted
Observed
No.
Predicted
2 3 4 5
47.9 34.7 30.1 17.9 16.8 13.3
45 33.5 30.5 8.7 6.7 2.6
1 2 3 4 5
47.4 41.8 34.7 29.2 23.7 18.0 16.6 12.7 7.2 5.5
6
9-
-2.8
6
7 8 9 10
Observed
46 39 36 32 24 18.7 16.5 12.2 5.2
-
Fig. 6214. Confirmation of the placement of tryptophan residues in the deduced sequence of keratin 18. a Endo-B and keratin-18 cDNAs in pGEM-1 plasmids were digested with Bam HI and Hind 111, respectively, and were then purified and transcribed in vitro with SP6 and T7 polymerases, respectively, in the presence of 0.1 pCi P3’-CTP, all four nucleotide triphosphates, and m’GpppG. After digestion with DNAseI and organic solvent extraction, aliquots of the two RNAs were subjected to agarose gel electrophoresis in the prcsence of formaldehyde. The sizes of denatured DNA markers (in kilobase pairs) are shown on the right. Lane 1 , Endo-B SP6 polymerase RNA transcripts; lane 2, pK18 T7 polymerase transcripts. b Aliquots of the RNAs shown in a were translated in a reticulocyte lysate system in the presence of 35S-methionine,immunoprecipitated with Endo-B antibodies, reacted with N-chlorosuccinimide, and analyxd on an 18% acrylamide gel in the presence of SDS. Molecubdr-mass markers (in kilodaltons) are indicated on the right. TRANS, aliquots of the total translation reaction; PPT, aliquots of the immunoprecipitated proteins; NCS, products of reaction of the irnmunoprecipitated proteins with N-chlorosuccinimide; K,keratin-18 products; B, Endo-B products. The numbers in lanes I and 2 indicate the peptides represented in c. Peptide 7 of lane 1 may not be visible after reproduction. c Diagrammatic representation of the positions of tryptophan
residues in the two proteins and the expected cleavage products, and a summary of the results. Arrows indicate the predicted positions of tryptophan residues. Peptides are identified by their number. The sizes of thc complete cleavage products are indicated below the respective uncleaved molecule (peptide 1). The number of predicted methionine residues is indicated in parentheses
to that of Endo B may be due t o sequence differences between the two proteins in the latter half of the head domain. Calculation of the hydrophobicity of the keratin-18 sequence I301 suggests that the area of the keratin-18 head domain not found in Endo B extends a hydrophobic subdomain (data not shown). This might result in the binding of additional SDS and, consequently, faster migration. However, the disproportionate effect of even single amino acid changes upon electrophoretic behavior is also well documented [13, 141.
Discussion Any attempt to claim the unambiguous identification of a keratin-1 8 cDNA requires considerable caution, because there appear to be 15-20 genes in the human genome that
are homologous to Endo B [52]. These sequences probably represent both pseudogenes and a t least one active gene. The proposal that pK18 codes for keratin 18 is supported by several experimental results. Endo B has previously been identified as the mouse equivalent of keratin 18 [17, 521, and the cDNA for Endo B hybridizes to pK18 very strongly. Antibodies specific for Endo B recognize the translation products of trophoblast and placental R N A selected by hybridization with pK18 (Fig. 1) and RNA transcribed directly from pK18 in vitro (Fig. 2). Hepatomacell m R N A homologous to pK18 is the same size as that found in HeLa cells and BeWo trophoblastic cells; only a singlesized species has been found. In addition, the apparent molecular masses of the protein products of pK18 mRNA from placenta, BeWo, HeLa, and human hepatoma cells are the same as those of the proteins immunoprecipitated
67
from cells (unpublished data). The deduced amino acid sequence of pK18 is sufficiently related to Endo B to permit the conclusion that pK18 is the human equivalent of Endo B. We conclude that pK18 codes for keratin 18, and we have found no evidence of more than one gene product in different human cells that express the protein. The differences between the sequence presented here and the sequence previously reported for the 3’ half of a keratin18 cDNA derived from a different source [44]may reflect both allelic differences and technical ambiguities. Most of the differences are single nucleotide changes, and all of these differences in the coding regions have been resolved (W.W. Franke and V. Romano, personal communication). Intermediate-filament proteins have a very conserved primary structure consisting of variable head and tail domains flanking a central conserved, a-helical, rod domain [41, 47, 541. The central rod domain of about 310 residues is characterized by the heptad repeat expected for coiledcoil structures, and it is further divided by two or three spacer or linker regions. The most substantial region of divergence between keratin 18 and Endo B appears to be the result of an insertion of 9amino acids near the end of the head domain of keratin 18 or the deletion of the same area from Endo B. The conservation of the remaining portion of this domain suggests the distance of the conserved residues of the head from the remaining coil regions is not critical for function. In a previous comparison of the Endo-B sequence with other intermediate-filament proteins, the lack of conservation of sequence and the variability in length of the first linker region were noted [46]. The only gap necessary to align the keratin-18 and Endo-B sequences in the rod domain is located in this linker. Keratin 18 conserves the four charged residues found in all intermediate filaments (including Xenupus type-I keratins) at positions of the heptad repeat normally occupied by nonpolar residues. In SDS-polyacrylamide gel electrophoresis, keratin 18 migrates faster than Endo B even though keratin 18 is larger. This anomalous electrophoretic behavior is surprising considering the similarity of the sequences. However, a similar discrepancy between the electrophoretic behavior of keratins 14 and 17 has been reported [43]. The anomalous relative electrophoretic behavior of the two proteins may be attributable to the well-documented disproportionate affect of even single amino acid substitutions upon electrophoretic migration [13, 141. At the nucleotide level, the sequences of keratin 18 and Endo B are, as expected, extremely conserved over the coding portions of the mRNA. However, the 5’ noncoding regions are only modestly conserved (59%). The 3’ noncoding regions of keratin 18 and Endo B are highly conserved, as previously reported in a comparison of the bovine and human forms of an epidermal type-I keratin [26]. The conservation of the 3‘ noncoding regions of different members of the same gene family is well documented [9, 10, 361, but its significance is as yet unclear. The availability of probes for both Endo B and now keratin 18 should allow future comparisons of the regulation of these molecules during the early development of mice and humans. Acknowledgements. This work was supported by grants (R01 CA33946 and R 0 1 CA42302; Cancer Center Support Grant, P30 CA 30199) awarded by the National Cancer Institute, Department
of Health and Human Services, and by a grant to BIONET from the National Institutes of Health (U41 RR 01685). We thank Mr. Kenneth Browne, who helped with the sequencing reactions during a summer fellowship sponsored by the California Foundation, and Diana Lowe for typing the manuscript. In addition, we thank Drs. W.W. Franke and V. Romano of the German Cancer Research Center, Heidelberg, for communicating to us their unpublished data.
References 1. Aden DP, Fogel A, Plotkin S, Damjanov I, Knowles BB (1979)
Controlled synthesis of HBsAg in a differentiated human liver carcinoma-derived cell line. Nature 282: 61 5-61 6 2. Andrews PW (1984) Retinoic acid induces neuronal differentiation of a cloned human embryonal carcinoma cell line in vitro. Dev Biol 103:285293 3. Andrews PW, Goodfellow PN, Damjanov I (1983) Human teratocarcinoma cells in culture. Cancer Surv 2:42-73 4. Andrews PW, Damjanov I, Simon D, Banting GS, Carlin C, Cracopoli NC, Fogh J (1984) Pluripotent embryonal carcinoma clones derived from the human teratocarcinoma cell line Tera-2. Lab Invest 50: 147-161 5. Bannasch P, Zerban H, Schmid E, Franke WW (1980) Liver tumors distinguished by immunofluorescence microscopy with antibodies to proteins of intermediate-sized filaments. Proc Natl Acad Sci USA 77:4948-4952 6. Biggin D, Gibson TJ, Hong G F (1983) Buffer gradient gels and 35Slabel as an aid to rapid sequence determination. Proc Natl Acad Sci USA 80: 3963-3965 7. Brulet P, Babinet C, Kemler R, Jacob F (1980) Monoclonal antibodies against trophectoderm-specific markers during mouse blastocyst formation. Proc Natl Acad Sci USA 77:4113-4117 8. Carmichael GG, McMaster GK (1980) The analysis of nucleic
acids in gels using glyoxal and acridine orange. Methods Enzymol65: 386391 9. Cleveland DW, Lopata MA, MacDonald RJ, Cowan NJ, Rutter WJ, Kirschner MW (1980) Number and evolutionary conservation of a- and p-tubulin and cytoplasmic 8- and y-actin genes using specific cloned cDNA probes. Cell 20:95-105 10. Cowan NJ, Dobner PR, Fuchs EV, Cleveland DW (1983) Expression of human a-tubulin genes: Interspecies conservation of 3’ untransliated regions. Mol Cell Biol 3 :1738-1 745 11. Damjanov 1, Clark RK, Andrews PW (1985) Expression of keratin polypeptides in human embryonal carcinoma cells. In: Wang E, Fischman D, Liem RKH, Sun TT (eds) Intermediate filaments. Academy of Sciences, New York, pp 732-733 12. Debus E, Weber K, Osborn M (1982) Monoclonal cytokeratin antibodies that distinguish simple from stratified squamous epithelia: Characterization of human tissues. EMBO J 1 : 164-1647 13. de Jong WW, Zweers A, Cohen LH (1978) Influence of single
amino acid substitutions on electrophoretic mobility of sodium dodecyl sulfate-protein complexes. Biochem Biophys Res Commun 82:532-539 14. Der CJ, Cooper G M (1983) Altered gene products are associated with activation of cellular rm-K genes in human lung and colon carcinomas. Cell 32:201-208 15. Duprey P, Morello D, Vasseur M, Babinet C, Condamine H, Brulet P, Jacob F (1985) Expression of the cytokeratin endo A gene during early mouse embryogencsis. Proc Natl Acad Sci USA 82:85358539 16. Franke WW, Denk H. Kalt R, Schmid E (1981) Biochemical and immunological identification of cytokeratin proteins present in hepatocytes of mammalian liver tissue. Exp Cell Res 131:299-318 17. Franke WW, Schiller DL, Moll R, Winter S, Schmid E, Engelbrecht I (1981) Diversity of cytokeratins. J Mol Biol 153:933-959 18. Franke WW, Schiller DL, Hatzfeld M,Winter S (1983) Protein
68
complexes of intermediate-sized filaments : Melting of cytokeratin complexes in urea reveals different polypeptide separation characteristics. Proc Natl Acad Sci USA 80:711>7117 19. Geisler N, Weber K (1982) The amino acid sequence of chicken muscle desmin provides a common structural model for intermediate filament proteins. EMBO J 1 :1649-1656 20. Hanukoglu 1, Fuchs E (1983) The cDNA sequence of a type I1 cytoskeletal keratin reveals constant and variable structural domains among keratins. Cell 33 :9 1 5 9 2 4 21. Henikoff S (1984) Unidirectional digestion with exonuclease I11 creates targeted breakpoints for DNA sequencing. Gene 28:351-359 22. Hogan BLM, Barlow DP, Tilly R (1983) F9 teratocarcinoma
cells as a model for the differentiation of parietal and visceral endoderm in the mouse embryo. Cancer Surv 2: 115 - 1 4 23. Izhar M, Siebert PD, Trevor K, Oshima RG, Fukuda MN (1986) Trophoblastic differentiation of human teratocarcinoma cell line HT-H. Dev Biol 116:510-518 24. Jackson BW, Grund C, Schmid E, Burke K, Franke W, Illmensee K (1980) Formation of cytoskeletal elements during mouse embryogenesis. Intermediate filaments of the cytokeratin type and desmosomes in preimplantation embryos. Differentiation 17: 161-179 25. Jonas E, Sargent TD, Dawid IB (1985) Epidermal keratin gene expressed in embryos of Xenopus luwis. Proc Natl Acad Sci USA 82: 5413-5417 26. Jorcano JL, Rieger M, Frdnz JK, Schiller DL, Moll R, Franke WW (1984) Identification of two types of keratin polypeptides within the acidic cytokeratin subfamily 1. J Mol Biol 179:257-281 27. Kemler R, Brulet P, Schnebelen MT, Gaillard J, Jacob F (1981)
Reactivity of monoclonal antibodies against intermediate filament proteins during embryonic development. J Embryo1 Exp MoVhol64 :45-50 28. Kozak M (1983) Comparison of initiation of protein synthesis in procaryotes, eucaryotes and organelles. Microbiol Rev 47: 1 4 5 29. Koiak M (1986) Point mutations define a sequence flanking
the AUG initiator codon that modulates translation by eukaryotic ribosomes. Cell 44:283-292 30. Kyte J, Doolittle R F (1982) A simple method for displaying the hydropathic character of a protein. J Mol Biol157: 105-132 31. Lewis SA, Cowan NJ (1986) Anomalous placement of introns in a member of the intermediate filament multigene family: An evolutionary conundrum. Mol Cell Biol6: 1529-1534 32. Lischwc MA, Sung MT (1977) Usc of N-chlorosuccinimide/ urea for the selective cleavage of tryptophanyl peptide bonds in proteins. J Biol Chem 252:4976-4980 33. Meinkoth J, Wahl G (1984) Hybridiiation of nucleic acids immobilized on solid supports. Anal Biochem 138 :267-284 34. Melton DA, Krieg PA, Rebagliati MR, Maniatis T, Zinn K, Green MR (1984) Effcient in vitro synthesis of biologically active RNA and RNA hybridization probes from plasmids containing a bacteriophage SP6 promoter. Nucleic Acids Res 12:7035-7055 35. Millan JL (1986) Molecular cloning and sequence analysis of
human
placental
alkaline phosphatase.
J
Biol
Chem
261:3112-3115 36. Minty AJ, Alonso S, Caravatti M, Buckingham ME (1982)
A fetal skeletal muscle actin mRNA in the mouse and its identity with cardiac actin mRNA. Cell 30: 185-192 37. Moll R, Franke WW, Schiller DL, Geiger B, Krepler R (1982) The catalog of human cytokerdtins: Patterns of expression in normal epithelia, tumors, and cultured cells. Cell 31 :11-24 38. Oshima RG (1981) Identification and immunoprecipitation of cytoskcletal proteins from murine extra-embryonic endodermal cells. J Biol Chem 256:8124-8133 39. Oshima RG (1982) Developmental expression of murine extraembryonic endodermal cytoskeletal proteins. J Biol Chem 257:3414-3421
40.Oshima RG, Howe WE, Klier FG, Adamson ED, Shevinsky
LH (1983) Intermediate filament protein synthesis in preimplantation murine embryos. Dev Biol99:447-455 41. Parry DAD, Fraser RDB (1985) Intermediate filament structure. I. Analysis of IF protein sequence data. Int J Biol Macromol I :203-21 3 42. Patillo RA, Gey GO (1968) The establishment of a cell line of human hormone-synthesizing trophoblastic cells in vitro. Cancer Res 28:1231-1236 43. RayChaudhury A, Marchuk D, Lindhurst M, Fuchs E (1986) Three tightly linked genes encoding human type I keratins: Conservation of sequence in the 5’-untranslated leader and 5’upstream regions of co-expressed keratin genes. Mol Cell Biol 6: 539-548
44.Romano V, Hatzfeld M, Magin TM, Zimbelmann R, Franke WW, Maier G, Ponstingl H (1986) Cytokeratin expression in simple epithelia. Tdcntification of mRNA coding for human cytokeratin no. 18 by a cDNA clone. Differentiation 30: 244-253 45. Sanger F, Nicklen S, Coulson AR (1977) DNA sequencing
with chain-terminating inhibitors. Proc Natl Acad Sci USA 74:5463-5467 46. Singer PA, Trevor K, Oshima RG (1986) Molecular cloning
and characterization of the Endo B cytokeratin expressed in preimplantation mouse embryos. J Biol Chem 261 :538-547 47. Steinert PM, Parry DAD (1985) Intermediate filaments: Conformity and diversity of expression and structure. In: Palade GE, Alberts BM, Spudich JA (eds) Annual review of cell biology. Annual Reviews, Palo Alto, pp 41-65 48. Steinert PM, Idler WW, Zimmerman SB (1976) Self-assembly of bovine keratin filaments in vitro. J Mol Biol 108:547-567 49.Sun T-T, Eichner R, Schermer A, Cooper D, Nelson WG, Weiss RA (1984) Classification, expression, and possible mechanisms of evolution of mammalian epithelial kcratins: A unifying model. In: Levine A, Topp W, Van de Woude G, Watson J D (eds) The cancer cell. Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, pp 169-176 50. Tabor JM, Oshima RG (1982) Identification of mRNA species that code for extra-embryonic endodermal cytoskeletal proteins in differentiated derivatives of murine embryonal carcinoma cells. J Biol Chem 257: 8771-8774 51. Thomas PS (1983) Hybridization of denatured RNA transferred or dotted to nitrocellulose paper. Methods Enzymol 100:255266 52. Trevor K, Oshima RG (1985) Preimplantation mouse embryos
and liver express the same type I keratin gene product. J Biol Chem 260: 15885-15891 53. Vasseur M, Duprey P, Brulet P, Jacob F (1985) One gene and one pseudogene for the cytokeratin Endo A. Proc Natl Acad Sci USA 82:1155-1159 54. Weber K, Geisler N (1985) Intermediate filaments: Structural conservation and divergence. In: Wang E, Fischman D, Liem RKH, Sun TT (eds) Intermediate filaments. Academy Sciences, New York, pp 126-143 55. Weber K, Osborn M, Moll R, Wiklund B, Luning B (1984) Tissue polypeptide antigen (TPA) is related to the non-epiderma1 keratins 8, 18 and 19 typical of simple and non-squamous epithelia: Re-evaluation of a human tumor marker, EMBO J 3 :2707-21 14 56. Wilbur WJ, Lipman DJ (1983) Rapid similarity searches of nucleic acid and protein data banks. Proc Natl Acad Sci USA 80 :7 2 6 7 3 0 57. Winkles JA, Sargent TD, Parry DAD, Jonas E, Dawid IB (1985) Developmentally regulated cytokeratin gene in Xenupus luwk. Mol Cell Biol 5 :2575-2581 58. Woodcock-Mitchell J, Eichner R, Nelson WG, Sun T-T (1982) lmmunolocalization of keratin polypeptides in human cpidermis using monoclonal antibodies. J Cell Biol95: 580-588 Received August 1986 1 Accepted in revised form September 30, 1986