Cell, Vol. 39, 267-274,
December
1984(Part
I), CopyrIght
1984 by MIT
0092.8674/84/120267-08$02.00/O
The Human Transferrin Receptor Gene: Genomic Organization, and the Complete Primary Structure of the Receptor Deduced from a cDNA Sequence Alan McClelland,* Lukas C. Kuhn,+ and Frank H. Ruddle* *Yale University Department of Biology New Haven, Connecticut 06511 +lnstitut Suisse de Recherches sur le Cancer ISREC, Chemin des Boveresses CH-I 066 Epalinges, Switzerland
Summary Heteroduplex analysis shows that the transferrin receptor gene contains at least 19 distinct coding sequences distributed over 31 kb of genomic DNA. The nucleotide sequence of these coding regions has been determined from a cDNA clone. The sequence contains a single complete open reading frame of 2280 bases which specifies a 760 residue polypeptide with a molecular weight of 85K daltons. The deduced amino acid sequence of the receptor shows that it does not contain an N-terminal hydrophobic signal peptide. We have found a single region of sufficient length and hydrophobicity to span the membrane, located 61 amino acids from the N-terminus. This leads to the prediction that the receptor is oriented in the membrane with a cytoplasmic Nterminus and an extracellular C-terminus. The receptor has no significant homology with transferrin, or with any receptor for which a sequence is available. Introduction Transferrin receptor mediates cellular iron uptake via internalization and recycling of the iron carrying serum protein transferrin (reviewed by Newman et al., 1982). Binding of iron-loaded transferrin to the receptor leads to a rapid loss of the complex from the cell surface (Klausner et al., 1984) and the appearance of transferrin and the receptor within an acidic intracellular compartment where iron is dissociated from transferrin (Dautry-Varsat et al., 1983). Ligand and receptor remain together within a vesicle which forms part of the exocytic pathway that recycles both molecules to the cell surface (Yamashiro et al., 1984). The receptor is an integral membrane glycoprotein composed of two identical subunits of 95,000 daltons which are linked by a disulphide bridge to form a dimer (Schneider et al., 1981; Omary and Trowbridge, 1981). Expression of the receptor on the cell surface correlates with cellular proliferation, being highest on rapidly dividing cells and much lower on resting cells and most terminally differentiated cell types (Larrick and Creswell, 1979). Transferrin is a required component of culture medium to support growth of cells in vitro (Barnes and Sato, 1980), suggesting that an active iron transport system is necessary for cell proliferation. In support of this idea, monoclonal antibodies to the receptor that block transferrin binding will also lead to an arrest in cell division and an accumulation of cells in
S phase (Trowbridge and Lopez, 1982). Furthermore, in mitogen-stimulated T lymphocytes, induction of transferrin receptor expression is required before cells can initiate DNA synthesis (Neckers and Cossman, 1983). It therefore appears likely that in addition to its role in delivering iron, the transferrin receptor is involved in the regulation of cell growth. Recent studies on internalization of receptor/ligand complexes suggest that while there may be a general pathway of receptor-mediated endocytosis, there is a complex intracellular sorting of various receptors and ligands to their respective fates (Ciechanover et al., 1983; Geuze et al., 1984; Yamashiro et al., 1984). Comparisons of the amino acid sequences of receptors might be expected to reveal structural features that reflect common functional domains (Brown et al., 1983). Alternatively, the signals that direct internalization and intracellular routing of receptors may be unrecognizable at the level of primary structure. In this case in vitro manipulation of cloned receptor genes may prove to be the most promising approach to obtaining a detailed understanding of receptor recycling. We have recently described the isolation of the genomic sequences that encode the human transferrin receptor (Kuhn et al., 1984). In this report we present a physical map of the gene based on heteroduplex analysis between these genomic sequences and a cDNA clone derived from human fibroblasts. We also describe a cDNA sequence from which the complete amino acid sequence of the transferrin receptor can be deduced. The sequence indicates that transferrin receptor lacks an N-terminal signal peptide and predicts that it is oriented in the membrane with the N-terminus inside the cell. In this respect the receptor resembles the asialoglycoprotein receptor (Drickamer and Mamon, 1982; Drickamer et al., 1984) and differs from the LDL receptor (Russell et al., 1984) and EGF receptor (Ulrich et al., 1984). No obvious similarity at the amino acid sequence level has yet emerged from the available receptor sequences, Results The Transferrin Receptor Gene Contains 19 Exons Transfection experiments with cloned genomic DNA have defined a 31 kb fragment of human DNA that contains all of the coding information for the transferrin receptor (Kuhn et al., 1984). A cDNA clone designated pCDTR-I was isolated by hybridization to a probe from the 5’ end of this genomic fragment and contains an insert of 4.9 kb. Sequences that hybridize to the genomic DNA coding for the receptor are confined to a 2.8 kb fragment from the 5’ end of the cDNA. The remaining 2.1 kb consists of 3’ untranslated sequence located immediately downstream of the end of the cloned genomic DNA, most probably within a single exon (Kuhn et al., 1984). In order to deterrnine the exon-intron structure of the human transferrin receptor gene, we hybridized the cDNA clone pCDTR-I to the genomic sequences coding for the receptor. The heteroduplexes formed were examined in
Cell 268
Primary Structure 269
of Transferrin
Receptor
the electron microscope, and the lengths of hybridized DNA and unhybridized genomic loops were measured. The genomic fragments used were plasmid subclones constructed from the phage lambda clones described previously (Kuhn et al., 1984). They correspond to the 5’ end 6.7 kb Eco RI-Barn HI fragment, the central 12.8 kb Barn HI fragment, and the 13.2 kb Barn HI-Sal I fragment at the 3’ end (Figure 2). As shown in Figure 1 the three genomic fragments contain a minimum of 1, IO, and 8 exons, respectively. Exons of less than 50 bp may not have been detected in this analysis. The orientation of the central Barn HI fragment was determined by heteroduplex analysis of genomic segments which span the Barn HI sites (data not shown). The sum of exon and intron sizes, determined by comparison to the $X174 standard, exceeded slightly the expected values. This discrepancy was 1.1 O-fold for all exons and 1.05fold for introns in the central Barn HI fragment, and 1.18.fold for the introns in the Barn HI-Sal I fragment. The values of exon and intron sizes given in Table 1 have accordingly been corrected by these factors. Taking into account these corrections, the positions of restriction sites in the cDNA coincide with the corresponding sites present Table 1. Sizes of Exons and lntrons in the Human Transferrin Gene Exon
Size
lntron
Size
1 2 3 4 5 6 7 a 9 10 11 12 13 14 15 16 17 18 19
81 + 11 (19) 107 + 21 (10) 172+15(g) 182 + 23 (16) 144 + 15 (15) 148&15(7) 97+17(12) 108 + 18 (15) 148 f 21 (15) 158+20(14) 122&25(a) 77*13(20) 72* 13(20) 87*17(18) 67&11 (16) 72f 11 (12) 198 + 11 (12) 179 + 21 (12) 665 + 27 (11)
a b c d e f g h i j k I m n o p q r
4760 1515 1160 1830 480 1660 1550 355 1870 1045 1430 135 2395 1685 155 2925 1525 1425
+ + + + c c f + f f + + + + + + * f
Receptor
235 (8) 100 (6) 80 (9) 170 (14) 50 (7) 220 (7) 160 (13) 50 (16) 90 (14) 60 (15) 120 (8) 15 (19) 155 (20) 120 (10) 15 (11) 145 (12) 135 (10) 85 (9)
The exon-intron structure was determined by heteroduplex analysis. Sizes are expressed in base pairs + standard deviation. In parenthesis: the number of independent determinations.
Figure I. Electron Genomic DNA
Micrographs
of Heteroduplexes
Formed by Hybridization
in exons on the genomic DNA. A summary of the gene structure determined from these measurements is depicted in Figure 2. In addition to heteroduplex formation, we observed a frequent and reproducible hybridization of introns to themselves, or to sequences in adjacent introns (Figure 1, introns a, b, c, e, f, m, n, p, q, and r). This duplex formation within genomic DNA is probably due to the presence of homologous inverted repeats. In some instances these structures prevented the double stranded formation with the cDNA (Figure IC, introns e and f; Figure 1E, introns n, o, and p), thus giving rise to larger loop structures and leaving part of the cDNA single stranded. The locations of these repeats, which range in size from 70 bp to 250 bp, are indicated in Figure 2. Determination of the mRNA Sequence Coding for Transferrin Receptor The results presented above confirm our previous finding that the transferrin receptor coding sequences are contained within a 5’ 2.8 kb fragment of the cDNA which represents 60% of an unusually long mRNA. We have determined the nucleotide sequence of this fragment using the strategy outlined in Figure 3. The sequence contains an open reading frame of 2280 nucleotides which extends from position 96 to 2375 (Figure 4). There are two stop codons in phase upstream of this open reading frame and multiple stops in all three phases in the sequence downstream. The 760 amino acid residues specified by this sequence comprise a polypeptide with a molecular weight of 84,910 daltons, approximately 5000 daltons larger than the apparent molecular weight of the primary translation product of transferrin receptor mRNA (Schneider et al., 1983). The sequence of genomic DNA at the 5’ end of the 31 kb fragment which expresses the receptor was determined and found to overlap with the 5’ end of the cDNA (data not shown). An additional 90 nucleotides are present upstream of this overlap and define the limit of sequences at the 5’ end of the gene that are required for expression of the human receptor in mouse L cells. There are no initiation codons in any phase upstream of the two consecutive ATG triplets at nucleotides 96-101. The nucleotides flanking these ATG codons match the consensus sequences ANNAUGG and ANNAUGA which are the two most common translation initation sequences in eucaryotic mRNAs (Kozack, 1983). We have assigned the first ATG as coding for the N-terminal Met residue of the receptor,
of the Transferrin
Receptor
cDNA Clone pCDTR-1
with Subcloned
Fragments
of
(A) 5’ end 6.7 kb Eco RI-Barn HI fragment; (8) and (C) central 12.8 kb Barn HI fragment; (D) and (E) 3’ end 13.2 kb Barn HI-Sal I Fragment. The cDNA was either excised from the pCD vector as a 5.1 kb Barn HI fragment (B and C), or subcloned in PAT 153 with the same orientation as the genomic segments and linearized with Sal I prior to hybridization (A, D, and E). In (A) the locatron of the first exon is indicated by a 1, and the part of the first rntron contained in the 6.7 kb Eco RI-Barn HI fragment by a’. The location of the first exon and intron were determined wrth respect to the single-stranded loop of genomic DNA which represents 5’ flanking sequences extending from the upstream Eco RI site. The dashed lrne denotes cDNA insert DNA, the narrow continuous line represents single-stranded genomic DNA, and the heavy continuous line IS duplex DNA. Open arrows indicate the approximate limits of vector sequences, and closed arrows point out double-stranded DNA formed by hybridrzation within genomic DNA. Letters designate introns. Magnification: 50.000X.
Cell 270
E
.:-
BE ,I.
. _I
I 2
t
II 3
E
. 4
EE BE _ I 1 / ,-.. II I I, I,,, 56 7 89 10 1, 1213
Sal I, Small arrows indicate the location of repetitive only been mapped as far as the Sal I site (see text).
elements
e
.. I I, 14 ,516
.
. E. 111: 17 18
in introns. The numbers
u mob0 -. ,:’ W!, L ---AUG 5’ 1
PSI I
H#“drn Prtl I I
yy I
*.._
._
--.. xboI-..x~~I wo I I -.I
-eoobp
c---c
Figure 3. Strategy
.--=-
fQ! Determining
-z
=d---v cSTOP , 3’“T
the Transferrin
Receptor
Sequence
The restrictlon map of the 4.9 kb insert in pCDTR-1 is shown at the top of the figure. Sequence analysis was confined to the 2.8 kb fragment from the 5’ end of the cDNA to the single Bgl II site, which is shown in an expanded scale. The broad arrows indicate sequences that were determrned from clones constructed using the restriction sites indicated on the map. Thrn arrows denote sequence information derived from a pool of clones constucted from partial Alu I and partial Sau 3A digests of the 2.9 kb Barn HI-Bgl II fragment.
although it is possible that initiation may occur at either or both positions. Transferrin Receptor Lacks a Signal Peptide and Has an Unusual Transmembrane Orientation Many secreted and transmembrane proteins are synthesized with an N-terminal region of 15-30 predominantly hydrophobic amino acids which is required for translocation of the protein across the membrane of the endoplasmic reticulum (Sabatini et al., 1982). This signal sequence is subsequently cleaved to reveal the mature N-terminus of the protein. The deduced amino acid sequence of the transferrin receptor (Figure 4) indicates that it is not synthesized with a transient signal peptide, there being only 13 hydrophobic and seven charged residues within the first 30 amino acids, This finding explains the observation that incubation of transferrin receptor synthesized in vitro with dog pancreas microsomal membranes produces no detectable shift in its molecular weight (Schneider et al., 1983). The sequence of the receptor was examined using a computer generated plot of hydropathic index in order to locate the likely membrane spanning region (Figure 5). A large hydrophobic peak is apparent between residues 62 and 89 which is of sufficent length and hydrophobicity to
Figure 4. Nucleotide
and Deduced
Amino Acid Sequence
of the Human Transferrin
s
19
E
1
designate
I.
Figure 2. Exons and lntrons in the Human Transferrin Receptor Gene Restriction sites are B, Barn HI; E, Eco RI; and S, exons. Exon 19 is indicated by an open box, since it has
span the membrane. This region contains 14 uncharged residues and 14 hydrophobic amino acids, most of which are clustered between positions 71 and 83 (11 of 12 residues hydrophobic). No other part of the sequence appears likely to cross the membrane, supporting the conclusion that the receptor crosses the lipid bilayer only once. An additional feature that emerges from the hydropathy plot is that one of the most hydrophilic parts of the protein immediately precedes the transmembrane segment. This area contains the very basic sequence lys-prolys-arg and could correspond to a cytoplasmic anchor (Sabatini, 1982). Since this sequence precedes rather than follows the hydrophobic segment, this would predict that the N-terminus is on the cytoplasmic side of the membrane. Furthermore, since the external domain of the receptor accounts for at least 70,000 daltons of its molecular weight (Schneider et al., 1981), the location of the transmembrane domain close to the N-terminus also supports the conclusion that the molecule is oriented with the C-terminus outside the cell and the N-terminal segment forming a cytoplasmic domain of 61 amino acids (Figure 6). This agrees well with the estimated molecular weight of the cytoplasmic tail of 5000 daltons (Schneider et al., 1981). Transferrin receptor contains an extracellular trypsinsensitive cleavage site which releases a monomeric 70K polypeptide. A likely candidate for this site is the sequence lys-arg-lys at positions 128-130. Cleavage at the first and second positions in this sequence would produce carboxy terminal polypeptides of 632 and 631 residues with an unglycosylated molecular weight of approximately 70,000 daltons. The location of this putative trypsin cleavage site is therefore consistent with the carboxy terminus being extracellular. The sequence contains a total of eight cysteine residues, four of which are clustered in and around the transmembrane region. One or more of these cysteine residues most likely represents the site of disulfide bonding between subunits, since the locations of the remaining more C-terminal residues do not allow for the fact that the 70K tryptic fragment is not a dimer. There are five potential sites for asparagine-linked glycosylation in the molecule, three of which are in the proposed extracellular portion. Since the mature receptor has been estimated to contain
Receptor
The sequence of 2826 nucleotides from the 5’ end of the pCDTR-1 Insert. Numbers on the right refer to nucleotide position from the begrnning of the cDNA. Amino acids are numbered on the left, with position 1 being assigned to the proposed initiator Met residue. The putative transmembrane region is Indicated by the solid bar below the sequence. Potential asparagine-linked glycosylation sites in the portion of the molecule believed to be extracellular are shown by interrupted bars below the sequence. The asterisks mark the residues that are likely to represent the trypsin-sensitive cleavage site. The eight cysteine resrdues are indicated by black circles above the sequence.
;;yry
Structure
1 31
of Transferrin
Receptor
A~A~QAT~Q~MATcAQ~m:T~MC~~Q~Q~QMC~TmTcATATACCcoa~~Ccn;QcrWQcAnGTAGAT m Sr ASP QLN ALA m SRR ALA PRR SSR ASN LSQ WE QLY QLI QLU p1Q L8”
185 SRR Ttll TRP Ak3 WB
SIP LRU ALA Ali
Asp
275
M~CQATMCAQT~TQ~QMA~~~Q~QTAQATQMQMQMMTQCTGACMTMCACAM~GCCMTG~AACAAMCCAAM QLI ASP ASN SRR SIS VAL QL” YBT LX-S LSU ALA VAL ASP QLU QL” GL” ASN ALA ASP ASN ASN ,XR LB
ALA AsN VAL T,,R LYS RO LYS
AQG l!*T MT
QQC TAC RG
GGA AQT ATC I%
TAT QGiJ ACl’ AR
QCI GIG An: Qlt
TN
‘IT
‘I-IQ AR
QQA l-N
AI”
AR
81 91
QLN VU
QGC TAT I%
AM
385
LYS QQG QTA QM CU AM ACC QAQ I& QM MA GIG QU QQA ACC ON ‘WI CU GIG AGO QM ON CU GQA QAG QAC TIC CCT GU QU CGT 455 GLY VAL QL” PRO LYS IBP GL” CIS QL” ARQ Lw ALA GLY TSR GL” SRR PPO VAL AE3 QL” GLU PPO QLP GL” ASP WE PRO ALA ALA Am OGCR’ATATTOOQATQM:CIOMO~AMOTmTCQG~GAMQACffiCACAGACTZCACC~CACCAfiMECIIi~MTGMMT NQ LB0 NE lW ASP ASP LB0 “Jr” 3 L,f LE” SEE GL” LYS LSU ASP SRR TlJll ASP F%E IBP SRR lER ILR L’IS LRU LRU ASN QLU WN
545
121
~TATQ~CCTCOTGAOQ~GOA~CMAMQATQMMT~Gat~TATG~QMMTCM~~TGMlm~~C~ SEP NE VU PRO MO QL” ALA GLY SRR GLN LYS ASP GL” ASN LRU ALA LSU TIP VAL QL” ASN GLN WE AE
835
151
GLU pRE LYS Leo SRR LYS
181
GrcTcQcoTGAT~~TTnGrr~A~CffiGoTc~GACffiCQcrCMMCTcoGn;ArcATAGrrGAT;rMCGc;rMAcrrGR VU lltP NO ASP GIN EIS WE VAL LYS ILR GLN VAL LYS ASP SRR ALA GLN ASN SRR VAL ILE ILR VU
LTS ASN QLI *Ii
211
TAC~QIFGffiMTC~GOaG(iTTATGTOQC(iTATMTMOQ~GOCAACAGTTACTQ~AMCIO(rrCCATGCTMT~Q~A~~ 815 lYR Lm VAL GLU ASN PRO GLY GLY TYR VAL ALA Ipp SRR LX7 ALA ALA lllR VAL ‘IBR GLY LYS LRU VAL RI.3 ALA ASN pSR GLY TRR LYS
241
AAA GAT l-l-l GAG GAT TIA TAC ACT CCT GTO MT GGA TCI ATA GTQ All’ OTC AGA GCA GQG AAA ATC ACG TIT GU GM LYS ASP PRR CL” ASP LB3 TYR II,R PPO VAL ASN GLY SRR ILE VAL ILE VAL ABD ALA GLY LIS ILE I,IR PBE ALA CL”
271
GCT GM AGC RA MT GC4 ATT GQT GIG TI” ATA TAC AlE QAC CAG ACT AAA m CCC AIT OTT MC GCA GM CIT TCA TIC TTI QGA CAT 995 ALA GL” SEP LIB ASN ALA ILE GLY VAL. LRU IL,! TIP YET ASP GLN 189 LYS WE PRO ILE VAL ASN ALA GLU LRU SEP WE RlR GLY HIS
301
GCI CAT CTG GGG ACA GQT GAC CCT TAC ACA Cm GGA TIT CCI TCC TW MT CAC ACT CAG TIT CCA CCA TCT CGG TCA TCA GGA TI” CL7 108s ALA RIS Lm GLY TQR GLY ASP PBO TTR TRR PllO GLY PAE PllO SEB WE ASN RIS IIIR GLN WE PRO PRO SeP Ali SER SRR GLY LRU PRO
331
MT ATA CCT GTC CAG ACA AIC TCC AGA GC,’ GCT GU GM MG CII3 T,T Go0 MT Am GM QGA GAC I% CCC TCT GAC TGQ AkA ACA GAC ii75 ASN ILR PRO VAL GLN lER ILE SRR Ara? ALA ALA ALA GLU LYS LRU PBE GLY ASN RET CL” GLY ASP CPS PIlO SRR ASP TPP LYS lllR ASP
381
‘ITI ACA T% AQG AIS GTA ACC TCA GM AGC AAG MT Gl73 AN3 CTC ACT GlU AGC MT GTG CIE AM GAG ATA AAA AlT Cl-C MC AX l-R SER TSR CYS AR0 RRT VAL TRR SRR GLU SRR LYS ASN VAL LYS LRU IBR VAL SRR ASN VAL LSU LYS GLU ILR. LYS ILE LEU NN US mR
391
GGA GTT ATT AAA GGC m GTA GM WA GAT CAC TAT GlT GTA GIT GGG GCC CAG AGA GAT GCA lG0 GGC CCT GGA GCT GCA AAA TCC GG’I 1355 GLY VAL ILR LYS GLY PRR VAL QLU PRO ASP EIS nR VAL VAL VAL GLY ALA GLN ARG ASP ALA TPP GLY PRO GLY ALA ALA LYS SRS GLY
421
GTA GGC ACA GCI CTC CIA I-“3 AM CIT GCC CAG AI” TTC TCA CAT ATG OTC I-TA AAA GAT GM TIT C&i CCC AGC AGA AGC ATT ATC TIT 1445 VU GLY TSR ALA LSU LRU LRU LYS LRU ALA GLN RRT WE SRR ASP NET VAL LRU LYS ASP QLY WE GLN PPO SRB AI) SP.R ILR ILE pRR GCC ACT TOG AGT GCT GGA GAC T“T GGA TCG G-IT GUT GCC ACT GM TGQ CIA GAG QQA TAC CTT TCG TCC GIG CAT I-I-A MO Gm TTC ACI ALA SER TIW SER ALA GLY ASP PllE GLY SRR VAL GLY ALA TRR GLU TBP LRU GLU GLY ,YR LRU SER SRR LRV RIS LRU LYS ALA Al,! ,,,R
1535
451
TAT AIT MT IXG OAT AAA GiX GTI CTT GGT ACC AGC AAC T11: AAG GTI TCT GCC AGC CCA GIG 110 TAT ACO Cl”l Al-I GAG AAA ACA Al” l7R ILR ASN LFZl ASP LYS ALA VAL LEU GLY TSR SRR ASN WE LYS VAL SRR ALA SRR PRO LRU LRU TIP IIIR LRU IL!! GLU LYS TBE M
1825
481 511
cM MT GLN NN
541
GCT I-K CCT 7-X Cl-“ GCA TAT TCI’ GGA ATC CCA GC4 OTT TCI’ ‘ITC l!!T TN 8 C GAG GAC ACA CAT TAT CCI TAT Tl” GGT ACC ACC ATE 1805 ALA PRE PRO PRJ? LW ALA MI SRR GLY ILR PRO ALA VAL SER WE CYS WE CYS GLU ASP TER ASP TIP PPO TYR LRU QLY ,,,I TRP RRT
571
GAC ACC TAT MO GM CIU AlT GAG AGO ATT CCT GAG TIO MC AM GTG GC4 CGA GCA GCT GCA Offi QTC QCT GQI Cffi I-K On; Al7 AAA 1895 ASP TRR TYR LYS GLU LRU ILE GLU A,@ ILE PRO GLU LRU ASN LYS “AL ALA ARG ALA ALA ALA GL” VAL ALA GLY GLN RIE VAL ILE LYS CTA ACC CAT GA7 GTI GM T’IG AAC CPG GAC TAT GAG AGG TAC MC AGC CM C,,i CIT TCA ‘I-IT Gn; AGO GAT cn; MC CM TAC AGA GCA LEll TRR UIS ASP VAL GLU LEU ASN LRU ASP TYR GLU AR0 TYR ASN SRR GLN LEU LEU SSR WE VAL Apt ASP LEll ASN GLN TIP Ali ALA
1985
801
GAC ATA AAG GM ASP ILR LYS CL”
ATG GGC Cl-G AGT lTA CAG TGG cn; TAT TCI’ GW COT GQA GAC l-l-C TTC COT GCT ACT TCC AGA CTA ACA ACA GAT I-K MET GLY LEU SRR LRU GLN TBP LEU TfR SRR ALA ARG GLY ASP PHE PRE ARG ALA TRR Sk% APG LRU TRR T’RR WP RlR
2075
831
GG‘i MT MT GAG MA ACA GAC AGA TN OTC An; AN AM CIC MT CAT COT GTC AIG AGA GTG GAG TAT CAC TTC CTC TCI Ccc TAC GTA QLY ASN ALA G,,” LYS TRR ASP AW p”R VAL YET LYS LYS LEU ASN ASP APO VAL YRT AE V& QLU NP RIS F%R LRU sRR PPO ‘I1R VA,.
2185
881
MCUAMGffiTCTCCZTn:CGACATGTC~TClQGGCTCCGGCTCTCACACG~CCAQCTTTACIEGffiMC~AMcn;CGTAM SER PRO LYS GLU SER PRO PRE AR0 AIS VAL PRE TIP GLY SER GLY SRR RIS TRR LEU PRO ALA LRU LEU GLU ASN LRU LYs LRU A*i
2255
891 721
CM MT MC CIA ASN NN
751
GGT GAC GTT TGG QAC ATT GAC AAT GAG TIT TM AIOTGATACCCATffi~TCCA1GAEMCffi~G~ffiT~~ffiA~~CATTC1GCTAMTITT GLY ASP VAL TXP ASP ILE ASP ASN GLU WE END
725 NP
LE” VAL
AM? OTT GCA AAT 905 LIS VU ALA A9N
1285
~n3 AAG CAT CCG GTr ACT 000 cM TTr CTA TAT cffi GAC AGC MC EQ GCC AGC MA GIT GAG AM crc ACT n;\ GAC MT ~cf 1715 VU LYS AIS PRO VAL ‘IBR GLY GLN WE LRU “YR GLN ASP SRR ASN TRP ALA SRR LYS VAL GLU LYS LRU TT,R LRU ASP ASN ALA
LYS
GOT GCI RI MT GM ACX GIG Tlr AGA MC Cffi TlU 013 CI’A GCC ACT lUG Aa ATI CAG CGA GCI GCA AAT GCC C7C TCT 2345 GLY ALA PQR ASN GLU ‘IER LEW PBE ARQ ASN GLN LRU ALA LSU ALA TQR TRp TBR ILE GLN GLY ALA ALA ASN ALA W SER
TCTCUAATGAGAT’X
2453
Figure 5. Hydropathy Plot of the Transferrin ceptor Amino Acid Sequence
Re-
The graph represents a plot of the relative hydropathy values for each of the 760 residues of transferrin receptor, determined using the program HYDROPLOT, with a window size of 15. More hydrophobic values lie above the line and hydrophillic values below it. The numbers below the plot denote amino acid numbers measured from the Nterminus. The scale on the left denotes hydropathrc index (Kyte and Doolittle, 1982).
COOH
NH2
b CYTOPLASM 61
AMINO
EXTRACELLULAR ACIDS
671
Figure 6. A Model for the Transmembrane
AMINO
AClDS
Orientation
of the Receptor
The receptor is shown crossing the cell membrane (WV,% ), with an Nterminal cytoplasmic domain and a C-terminal extracellular polypeptide, as predicted from the sequence data. The unshaded region represents the hydrophobic transmembrane domain, Also shown are the positions of potential glycosylation ( 3. and the likely location of the disulfide link between receptor subunits.
three N-linked carbohydrate chains (Schneider et al., 1981) it is likely that all three of these sites are used in vivo. Discussion The transferrin receptor mRNA is derived from 19 separate coding sequences distributed over 33 kb of genomic DNA. With the exception of the exon at the 3’ end, these coding sequences are each less than 200 bp long. An alignment between the cDNA sequence and exon sizes predicts that the first exon represents 5’ untranslated sequence, and that the protein coding region of the mRNA begins in exon 2 and ends in exon 19. The remainder of exon 19 represents 3’ untranslated sequences which extend for a further 2.0 kb beyond the end of the cloned genomic sequences used in this analysis (Kuhn et al., 1984). Direct sequence analysis of the genomic DNA is now required in order to relate the primary structure of the receptor to the gene organization. The amino acid sequence of transferrin receptor de-
duced from corresponding mRNA sequence predicts a molecule of 84,910 daltons composed of 760 amino acids, somewhat larger than the apparent molecular weight of the primary translation product on SDS gels (Schneider et al., 1983). A surprising feature of the sequence is the absence of an N-terminal hydrophobic leader peptide, which is believed to be required for correct membrane insertion of the majority of secreted and integral membrane proteins (Sabatini et al., 1982). The transferrin receptor is therefore one of a small number of transmembrane proteins that are known to lack an N-terminal signal peptide. These include the rat and chicken asialoglycoprotein receptors (Drickamer and Mamon, 1982; Drickamer et al., 1984) and an HLADR antigen (Strubin et al., 1984). Interestingly, both the asialoglycoprotein receptor and the HLA-DR antigen invariant chain sequences indicate that these molecules are oriented with the N-terminus on the cytoplasmic side of the membrane. The primary structure of transferrin receptor is most consistent with a model which predicts a cytoplasmic N-terminal domain of 61 amino acids and an extracellular polypeptide of 671 residues (Figure 6). Thus there appears to be a correlation between the absence of an N-terminal signal peptide and this unusual transmembrane orientation. A model for the cotranslational membrane insertion of proteins that lack a cleavable N-terminal signal sequence has been proposed by Sabatini et al. (1982). The authors propose that some proteins destined for membrane insertion may contain an internal signal sequence close to the N-terminus, which directs extrusion of the distal C-terminal portion of the polypeptide through the membrane. In the absence of a subsequent “halt transfer” signal the final result would be a protein that traverses the membrane once, with a cytoplasmic N-terminal domain, and the Cterminus outside the cell. The structural features of the transferrin receptor predicted from our sequence data are in good agreement with this model. In the evolutionary scheme for transmembrane and se-
Primary Structure 273
of Transferrin
Receptor
creted proteins proposed by Sabatini et al. (1982), the earliest form of integral membrane proteins were those which had acquired a hydrophobic signal sequence located sufficiently close to the N-terminus for membrane association to occur while translation was still in progress, thereby enabling the coupling of translation and membrane insertion. This model would suggest that the transferrin receptor may have arisen very early in cellular evolution, since its structure is that of the proposed primitive form of transmembrane proteins. The virtually universal expression of transferrin receptors on eucaryotic cells, the insolubility of ferric iron, and yet its absolute requirement for cell growth, argue in favor of the early evolution of this fundamental iron transport system. In a search of available protein sequences for homologies with the transferrin receptor, we were unable to detect a sequence relationship to any members of the transferrin family. It will be of interest to compare the complete sequences of transferrin and its receptor with the sequence of gp97, an iron binding cell surface glycoprotein which has partial homology to transferrin (Brown et al., 1982). No homologies with other receptors that mediate endocytosis via coated pits was apparent from a search of the available sequences which include the asialoglycoprotein receptor (Drickamer et al., 1984), the epidermal growth factor receptor (Ullrich et al., 1984), and the LDL receptor (Russell et al., 1984; K. Luskey, personal communication). An 11 amino acid segment of the transferrin receptor sequence shares seven residues in common with the mouse EGF precursor (Gray et al., 1983). An alignment of a longer region of the two proteins over 240 amino acids indicated an overall homology of 21% with a significance score of three standard deviation units. It is of interest that this tentative homology overlaps with the region of the EGF precursor that is related to the LDL receptor (Russell et al., 1984) and is in a similar part of both receptors, a short distance distal to the transmembrane domain. Experimental
Procedures
DNA Sequencing The dldeoxy chain termination method of Sanger et al. (1977) was used with modifications as described by Biggin et al. (1983). The restriction fragments indicated in Figure 3 were subcloned into the Ml3 vectors mp8 and mpg, and 200-350 nucleotides of sequence was determined. This data was then supplemented by sequencing a random pool of 60 clones generated by partial Sau 3A and partial Alu I digestion. Using this strategy, 75% of the sequence was determined on both strands, and more than 95% of the sequence was determined from at least two independent Ml3 clones. Computer analysis of the sequence used the interactive program described by Sege et al. (1981), and the program hydroplot provided by R. Staden. Heteroduplex Analysis Hybridizations were performed in 70% formamide, 0.3 M NaCI, 10 mM TrisHCI (pH 8.5), 0.1 mM EDTA. Genomic DNA and cDNA were at concentrations of 1 rg/ml and 3 fig/ml, respectively. The DNA was denatured at 68°C for 5 min and annealed by cooling to 30°C for 30 min. Prior to spreading on a water hypophase the hybridization mixture was diluted 5fold and adjusted to a final concentration of 64% formamide, 100 mM TrisHCI (pH 8.5), 10 mM EDTA, 50 pg/ml cytochrome C (type VI, Sigma) and 0.2 @g/ml of both single- and double-stranded $X174 DNA. Spreads were transferred to collodion coated grids, stalned with uranyl acetate, and
shadowed with platinum-palladium. 400 electron microscope.
The grids were examined
on a Phillips
Acknowledgments We thank Russell Doolittle for his generous help with the computer homology searches, Peter Wellauer for advice on heteroduplex analysis, and Kristi Wharton for advice on dideoxy sequencing. We also thank Keith Miskimlns and Mike Roberts for comments on the manuscript and Suzy Pafka for her assistance In preparing the figures. This work was supported by National Institutes of Health grant GM09966 and by funds from an anonymous donor. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “adverlisement” in accordance with 18 U.S.C. Section 1734 solely to Indicate this fact. Received
September
11, 1984
References Barnes, D., and Sate, G. (1980). Methods for growth serumfree medium. Anal. Blochem. 102, 255-270.
of cultured
cells rn
Blggin, M. D., Gibson, T. J., and Hong, G. F. (1983). Buffer gradient gels and 35-S label as an aid to rapld DNA sequence determination. Proc. Nat Acad. SCI. USA 80, 3963-3965. Brown, J P.. Hewick, R. M., Hellstrom. I., Hellstrom, K. E., Doolittle, R. F., and Dreyer, W. J. (1982). Human melanoma-associated antigen p97 is structurally and functlonally related to transferrin. Nature 296, 171-173. Brown, M. S., Anderson, receptors: the round-trip 663-667.
R. G. W., and Goldstein, J. L. (1983). Recycling itinerary of migrant membrane proteins. Cell 32,
Ciechanover, A., Schwartz, and Lodish, H. F. (1983). Sorting and recycling of ceil surface receptors and endocytosed ligands: the asialoglycoprotein and transferrin receptors. J. Cell Biochem. 23, 107-130. DautryVarsaf A., Ciechanover, A., and Lodish, H. F. (1983). pH and the recycling of transferrin during receptor mediated endocytosis. Proc. Nat. Acad. Sci. USA 80, 2258-2262. Drickamer, K., and Mamon, J. F. (1982). Phosphorylatlon of a membrane receptor for glycoproteins. J. Biol. Chem. 257, 15156-15161. Drickamer, K., Mamon, J. F., Binns, G., and Leung, J. 0. (1984). Primary structure of the rat liver asialoglycoprotein receptor. J. Biol. Chem. 259, 770-778. Geuze, H. J., Slot, J. W.. Strous, G. J. A. M., Peppard, J., von Figura, K., Hasilik, A., and Schwartz, A. L. (1984). intracellular receptor sorting during endocytosis: comparative immunoelectron microscopy of multiple receptors in rat liver. Cell 37, 195-204. Gray, A., Dull, T., and Ullrich, A. (1983). Nucleotide sequence of epidermal growth factor cDNA predicts a 128,000.molecular weight protein precursor. Nature 303, 722-725. Klausner, R. D.. Harford, J., and van Renswoude, J. V. (1984). Rapid lnternalizatlon of the transferrin receptor In K562 cells is triggered by ligand binding or treatment with a phorbol ester. Proc. Nat. Acad. Sci. USA 87, 3005-3009. Kozak, M. (1983). Comparison of initiation of protein synthesis yotes, eucaryotes. and organelles. Microblol. Rev. 47, I-45. Kijhn, L. C., McClelland, expression, and molecular Ceil 37, 95-103.
A., and Ruddle, F. H. (1984). cloning of the human transferrin
in procar-
Gene transfer, receptor gene.
Kyte, J., and Doolittle, R. F. (1982). A simple method for displaying hydropathic character of a protein. J. Mol. Biol. 757, 105-132.
the
Larrick, J. W., and Cresswell, P. (1979). Modulation of cell surface iron transferrin receptors by cellular density and state of activation. J. Supramol. Struct. 7 7, 579-586. Neckers, L. M., and Cossman, J. (1983). Transferrin receptor inductjon in mitogen-stimulated human T lymphocytes is required for DNA synthesis and cell division and IS regulated by interleukin 2. Proc. Nat. Acad. Sci. USA 80.3494-3498.
Cell 274
Newman, R., Schneider, C., Sutherland, R., Vodrnelich, L., and Greaves, M. (1982). The transferrin receptor. Trends Biochem. Sci. 7, 397-400. Omary, M. B., and Trowbrrdge, I. S. (1981). Biosynthesis of the human transferrin receptor in cultured ceils. J. Biol. Chem. 256, 12888-12892. Russell, D. W., Schneider, W. J., Yamamoto, T., Luskey, K. L., Brown, M. S., and Goldstein, J. L. (1984). Domain map of the LDL receptor: sequence homology with the epidermal growth factor precursor. Cell 37, 577-585. Sabatrni, D. D., Kreibich, G., Morimoto, T., and Adesnik, M. (1982). Mechanisms for the incorporation of proteins in membranes and organelles. J. Ceil. Biol. 92, l-22. Sanger, F., Nicklen, S., and Coulson, A. R. (1977). DNA sequencing chain terminating inhibitors. Proc. Nat. Acad. Sci. USA 74, 5463-5467.
wrth
Schneider, C., Sutherland, R., Newman, R., and Greaves, M. (1981). Structural features of the cell surface receptor for transferrin that is recognized by the monoclonal antibody OKT9. J. Biol. Chem. 257, 8516-8522. Schneider, C., Asser, A., Sutherland, D. R., and Greaves, M. F. (1983). In vitro biosynthesis of the human cell surface receptor for transferrin. FEBS Lett 158, 259-264. Sege, R. D., Soll, D., Ruddle, F. H., and Queen, C. (1981). A conversational system for the computer analysis of nucleic acid sequences. Nucl. Acids. Res. 9, 437-444. Strubin, M., Mach, B., and Long, E. 0. (1984). The complete sequence of the mRNA for the HLADR associated invariant chain reveals a polypeptide with an unusual transmembrane polarity. EMBO J. 3, 869-872. Trowbridge, I. S., and Lopez, F. (1982). MonoclonaJ antibody to transfernn receptor blocks transferrin binding and Inhibits human tumor cell growth in vitro. Proc. Nat. Acad. Sci. USA 79, 1175-I 179. Ullrich, A., Coussens, L., Hayfick, J. S., Dull, T. J., Gray, A., Tam, A. W., Lee, J., Yarden, Y., Libermann, T. A., Schlessinger, Downward, J., Mayes, E. L. V., Whittle, N., Waterfield, M. D., and Seeburg, P. H. (1984). Human epidermal growth factor receptor cDNA sequence and aberrant expression of the amplified gene in A431 epidermoid carcinoma cells, Nature 309, 418-425. Yamashiro, D. J., Tycko, B., Fluss, S. R., and Maxfield, F. R. (1984). Segregation of transferrin to a mildly acidic (pH 6.4) para-golgi compartment in the recycling pathway. Cell 37, 789-800.
Note Added
in Proof
The nucleotide sequence of a transferrin receptor mRNA has recently been reported by Schnerder et al. (Nature 317, 675-678, 1984). The two sequences predict an identical amino acid sequence but doffer in the 5’ untranslated region. The cDNA clone analyzed by Schneider et al. contains an additronal 212 bp at posrtion 72 in our sequence. A likely explanation for this difference is an alternative splice in the mRNA.